Opendatabay APP

Scholarly Abstract Classifier

Education & Learning Analytics

Tags and Keywords

Education

Nlp

Multilabel

Abstracts

Classification

Arxiv

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Scholarly Abstract Classifier Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a valuable resource for developing and evaluating text classification systems, particularly for academic paper submission platforms. It enables the creation of baseline models that can suggest relevant subject areas based on paper titles and abstracts. Data analysts can also utilise this dataset to explore the intricate relationships between different papers and how their abstracts correlate with assigned categories. The collection aims to serve as a robust benchmark for building useful text classification systems.

Columns

  • titles: Represents the arXiv paper title.
  • summaries: Contains the arXiv paper abstract.
  • terms: Lists the associated arXiv paper categories.

Distribution

The dataset is typically provided in a tabular format, often as a CSV file. While the exact number of rows or records is not specified, there are over 38,900 unique values for both the 'terms' (categories) and 'titles' columns, indicating a substantial collection of data.

Usage

This dataset is ideal for:
  • Developing text classifier models capable of predicting subject areas for academic papers.
  • Building systems that can offer viable subject area suggestions within paper submission platforms like CMT or OpenReview.
  • Conducting analyses on the correlation between paper abstracts and their designated categories.
  • Serving as a benchmark for training and evaluating various text classification models.

Coverage

The data is collected from the arXiv portal, encompassing a global scope of academic papers. Specific time ranges or demographic details are not provided within the dataset's scope.

License

CCo

Who Can Use It

  • Developers: To construct automated subject area suggestion tools and text classification systems.
  • Data Analysts: For exploring academic paper content and understanding categorisation patterns.
  • Researchers: To benchmark and improve machine learning models for text classification tasks.

Dataset Name Suggestions

  • arXiv Paper Abstracts Dataset
  • Academic Paper Classification
  • arXiv Subject Tagging Data
  • Scholarly Abstract Classifier
  • Text Categorisation for Research Papers

Attributes

Original Data Source: arXiv Paper Abstracts

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format