Indian Subcontinent Headline Events
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This news dataset serves as a persistent historical archive of notable events within the Indian subcontinent, spanning from early 2001 to mid-2023. It was recorded in real-time by journalists in India and contains approximately 3.8 million events published by the Times of India. The data primarily focuses on Indian local news, encompassing national, city-level, and entertainment categories. With a substantial daily volume of around 600 articles, this dataset offers a deep insight into Indian society, its priorities, ongoing events, pressing issues, and how these aspects have evolved over time.
Columns
- publish_date: This column indicates the date when the article was published online, presented in the yyyyMMdd format.
- headline_category: This column specifies the category of the headline, using ASCII characters and a dot-delimited, lowercase format.
- headline_text: This column contains the actual text of the headline in English, exclusively using ASCII characters.
Distribution
The dataset is provided in a CSV format. It contains 3,876,557 rows and consists of 3 columns. The total file size is approximately 285.1 MB. The dataset is annually updated.
Usage
This dataset is ideal for various applications, including:
- Gaining deep insight into Indian society, its priorities, and how events and issues have unfolded over time.
- Performing text analysis and Natural Language Processing (NLP) tasks.
- Analysing specific time ranges, such as headlines during the 2006 Mumbai bombings, the 2014 election, or ongoing health crises.
- Filtering and analysing content based on specific categories like Citywise, Bollywood, ICC updates, Magazine, or Middle East.
- Extracting insights using keywords related to crime, ecology, political parties, celebrities, or corporations.
Coverage
The dataset's coverage is primarily focused on the Indian subcontinent and India. It spans a time range from 1st January 2001 to 30th June 2023, capturing twenty-one years of headlines. The majority of the content concentrates on Indian local news, including national, city-level, and entertainment topics.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Researchers and academics studying socio-political and cultural trends in India.
- Journalists and media analysts seeking historical perspectives on Indian news.
- Data scientists and NLP practitioners looking for a large, real-world text corpus for analysis and model training.
- Anyone interested in understanding the evolution of Indian society's priorities and key events over more than two decades.
Dataset Name Suggestions
- India News Headlines Archive
- Times of India: 2001-2023 News Dataset
- Indian Subcontinent Headline Events
- Daily India News Text Data
- Indian Society Insights through Headlines
Attributes
Original Data Source: Indian Subcontinent Headline Events