Opendatabay APP

Indian Subcontinent Headline Events

News & Media Articles

Tags and Keywords

India

News

Headlines

Events

Society

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Indian Subcontinent Headline Events Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This news dataset serves as a persistent historical archive of notable events within the Indian subcontinent, spanning from early 2001 to mid-2023. It was recorded in real-time by journalists in India and contains approximately 3.8 million events published by the Times of India. The data primarily focuses on Indian local news, encompassing national, city-level, and entertainment categories. With a substantial daily volume of around 600 articles, this dataset offers a deep insight into Indian society, its priorities, ongoing events, pressing issues, and how these aspects have evolved over time.

Columns

  • publish_date: This column indicates the date when the article was published online, presented in the yyyyMMdd format.
  • headline_category: This column specifies the category of the headline, using ASCII characters and a dot-delimited, lowercase format.
  • headline_text: This column contains the actual text of the headline in English, exclusively using ASCII characters.

Distribution

The dataset is provided in a CSV format. It contains 3,876,557 rows and consists of 3 columns. The total file size is approximately 285.1 MB. The dataset is annually updated.

Usage

This dataset is ideal for various applications, including:
  • Gaining deep insight into Indian society, its priorities, and how events and issues have unfolded over time.
  • Performing text analysis and Natural Language Processing (NLP) tasks.
  • Analysing specific time ranges, such as headlines during the 2006 Mumbai bombings, the 2014 election, or ongoing health crises.
  • Filtering and analysing content based on specific categories like Citywise, Bollywood, ICC updates, Magazine, or Middle East.
  • Extracting insights using keywords related to crime, ecology, political parties, celebrities, or corporations.

Coverage

The dataset's coverage is primarily focused on the Indian subcontinent and India. It spans a time range from 1st January 2001 to 30th June 2023, capturing twenty-one years of headlines. The majority of the content concentrates on Indian local news, including national, city-level, and entertainment topics.

License

CC0: Public Domain

Who Can Use It

This dataset is suitable for:
  • Researchers and academics studying socio-political and cultural trends in India.
  • Journalists and media analysts seeking historical perspectives on Indian news.
  • Data scientists and NLP practitioners looking for a large, real-world text corpus for analysis and model training.
  • Anyone interested in understanding the evolution of Indian society's priorities and key events over more than two decades.

Dataset Name Suggestions

  • India News Headlines Archive
  • Times of India: 2001-2023 News Dataset
  • Indian Subcontinent Headline Events
  • Daily India News Text Data
  • Indian Society Insights through Headlines

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

ASIA

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format