Opendatabay APP

Headline News Categorisation

Entertainment & Media Consumption

Tags and Keywords

Model

Text

Multiclass

Nlp

News

Classification

Headlines

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Headline News Categorisation Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for news topic classification, offering a collection of news article headlines. It serves as a text classification benchmark derived from AG's news corpus, which comprises over 1 million news articles. These articles were gathered from more than 2000 news sources by ComeToMyHead, an academic news search engine, over a period of more than one year, starting from July 2004. The headlines are categorised into four distinct news topics: 'World', 'Sports', 'Business', and 'Sci/Tech', making it suitable for training and evaluating machine learning models for news categorisation.

Columns

  • text: This column contains the headline of a news article.
  • label: This column indicates the news article topic number. The numeric labels correspond to specific news topics: 0 for 'World', 1 for 'Sports', 2 for 'Business', and 3 for 'Sci/Tech'.

Distribution

The dataset is provided in CSV file format and has a size of 28.92 MB. It features 120,000 unique values across its labels, with an equal distribution of 30,000 instances for each of the four news topics (labels 0, 1, 2, and 3).

Usage

This dataset is ideally suited for various applications, including:
  • Developing and evaluating text classification models.
  • Conducting Natural Language Processing (NLP) tasks, particularly for news content.
  • Benchmarking the performance of machine learning algorithms in categorising textual data.
  • Building systems for automated news categorisation and content filtering.

Coverage

The dataset's coverage is global, encompassing news articles from more than 2000 news sources. The data was collected over a period of one year, beginning in July 2004, and includes over 1 million news articles. There are no specific notes on data availability for certain groups or years beyond this general collection period.

License

CC BY-SA

Who Can Use It

This dataset is beneficial for:
  • Machine learning engineers and data scientists working on text analytics and classification problems.
  • Researchers in the fields of NLP, AI, and information retrieval.
  • Developers creating applications that require automated news sorting or content recommendation.
  • Academic institutions for educational purposes and research projects involving textual data.

Dataset Name Suggestions

  • AG News Topic Classification Dataset
  • Headline News Categorisation
  • Four-Topic News Dataset
  • NLP News Classifier Data
  • Global News Headlines Dataset

Attributes

Original Data Source: News Topic Classification

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free