Dark Mode

Home

Data Categories

AI & ML Data

Scholarly Contribution Binary Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Scholarly Contribution Binary Data

Data Science and Analytics

Tags and Keywords

Text

Intermediate

Nlp

Binary

Classification

Research

Trusted By

Scholarly Contribution Binary Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is specifically designed for binary classification tasks within Natural Language Processing (NLP) research. It comprises an excerpt of data from scholarly NLP articles, where contributions have been meticulously structured for integration into Knowledge Graph infrastructures such as the Open Research Knowledge Graph. The annotations include contribution sentences, scientific terms and their relations extracted from these sentences, and semantic triples. These triples are organised under various information units, including Research Problem, Approach, Model, Code, and Dataset, among others. The primary purpose of this dataset is to facilitate the training of models to identify statements within research papers as either contributing or non-contributing to the overall research.

Columns

contents: This column contains the textual statements extracted from scholarly articles.
label: This column holds a binary value, '0' or '1', indicating the classification of the corresponding statement. In this dataset, '0' represents a contributing statement in a research paper, while '1' represents a non-contributing statement.

Distribution

The dataset is typically provided in a CSV file format. It is derived from NLP scholarly articles and is structured to enable binary classification. The dataset contains a total of 55,201 unique statements, with 50,137 statements classified under one label (presumably '0') and 5,064 under the other (presumably '1'). Specific details regarding the exact number of rows or records beyond these label counts are not available. A script is needed to compile the data for use.

Usage

This dataset is ideal for a variety of applications in machine learning and NLP. It can be effectively used for:

Training and evaluating binary classification models to distinguish between contributing and non-contributing statements in academic texts.
Developing information extraction systems focused on scholarly content.
Populating and enhancing Knowledge Graph infrastructures with structured research contributions.
Conducting NLP research related to argument mining, discourse analysis, or summarisation of scientific articles.
Creating tools for automated literature review or scientific knowledge organisation.

Coverage

The dataset's coverage is global, as it is not restricted by any specific geographical region. It is derived from Natural Language Processing scholarly articles, focusing on the structuring of their contributions. There is no specific time range or demographic scope noted for the data. The dataset was listed on 27 June 2025.

License

CC-BY

Who Can Use It

This dataset is highly valuable for:

Data scientists and machine learning engineers looking to build and train text classification models.
NLP researchers and academics interested in automated knowledge extraction from scientific literature or the construction of knowledge graphs.
Organisations and developers aiming to create applications that analyse and summarise research papers.
Students and educators studying text classification, information retrieval, or knowledge representation in NLP.

Dataset Name Suggestions

NLP Contribution Classifier
Research Paper Contribution Classification
Scholarly Contribution Binary Data
Article Contribution Identifier
SemEval 2021 Contribution Dataset

Attributes

Original Data Source: Contribution Graph (Binary Classification)

Listing Stats

VIEWS

DOWNLOADS

LISTED

27/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Scholarly Contribution Binary Data

Data Science and Analytics

Tags and Keywords

Text

Intermediate

Nlp

Binary

Classification

Research

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS