Opendatabay APP

News Headlines Dataset

Entertainment & Media Consumption

Tags and Keywords

News

Beginner

Text

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
News Headlines Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a follow-up to the News Category Dataset, specifically designed to offer beginners an easy-to-use resource for natural language processing tasks. It comprises approximately 45,500 news headlines collected from HuffPost, spanning the years 2012 to 2018. The dataset has undergone a cleaning and filtering process, with its target feature balanced, making it more accessible and manageable than its original counterpart. It aims to assist those new to NLP in getting started with real-world data applications.

Columns

  • category: Indicates the category to which a news article belongs. This serves as the target column.
  • headline: Contains the main headline of the news article.
  • short_description: Provides a brief summary or description of the news article.
  • links: Lists the URL links for the respective news articles.
  • keywords: Features the primary keywords extracted from the URLs present in the original dataset. Please note that this column may contain null values.

Distribution

The dataset contains 45,500 records, organised into 5 columns. It is typically provided as a data file, commonly in CSV format. Each of the target categories within the dataset contains 4,500 rows, ensuring a balanced distribution across different news topics.

Usage

This dataset is ideal for beginners embarking on natural language processing projects. It is well-suited for tasks such as text classification, news categorisation, and general machine learning applications involving text data. It can be used for training models to predict news categories or to analyse trends in news headlines over time.

Coverage

The dataset covers news articles published between the years 2012 and 2018. The news content is sourced from HuffPost and is global in its regional scope. It includes diverse news categories such as Business, Politics, Food & Drink, Travel, Parenting, Style & Beauty, Wellness, World news, Sports, and Entertainment.

License

CCO

Who Can Use It

This dataset is primarily intended for:
  • Beginners in NLP: Provides a clean and balanced starting point for learning text-based machine learning.
  • Students and Academics: Useful for educational purposes, assignments, and research in natural language processing and data science.
  • Data Scientists and Developers: Can be used for prototyping and developing text classification models.
  • Researchers: Those interested in analysing news trends and category distribution over specific time periods.

Dataset Name Suggestions

  • News Headlines Dataset 2012-2018
  • HuffPost News Articles for NLP
  • Cleaned News Category Dataset
  • Beginner NLP News Data
  • Multi-Category News Headlines

Attributes

Original Data Source: News Category Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free