Daily Indian Express News Articles Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains 20,000 news headlines, descriptions, and full articles obtained from the Indian Express. It covers a period from August 11, 2019, to June 8, 2020, providing a valuable resource for analysing media content and trends over this timeframe. It is well-suited for a variety of natural language processing and machine learning applications.
Columns
- index: A unique numerical identifier for each entry.
- article_id: Generated unique identifiers for each news article.
- headline: The main title of the news piece.
- desc: A short description or summary of the article's content.
- date: The publication date and time of the article.
- url: The web address linking to the original article.
- articles: The entire body text of the news article.
- article_type: Indicates the length of the article, categorised as short, mid, or long. The distribution shows 51% long, 37% mid, and 12% other lengths.
- article_length: A numerical value representing the length of the article.
Distribution
The dataset is structured as a collection of 20,000 news records. Data files are typically provided in CSV format. Within the dataset, there are 19,870 unique article IDs, 19,924 unique headlines, 19,977 unique dates, and 19,959 unique URLs. The distribution of articles over the covered period shows varying counts for different fortnightly intervals, providing a detailed temporal spread.
Usage
This dataset is highly beneficial for:
- Natural Language Processing (NLP) tasks, including text classification, sentiment analysis, and topic modelling.
- Building and evaluating Deep Learning models that require large text corpora.
- Developing Recommender Systems for news content based on user preferences or article similarities.
- Conducting linguistics research on contemporary Indian news language.
- Analysing news trends, media bias, or public discourse within the specified time frame.
Coverage
The dataset spans news articles published by the Indian Express from August 11, 2019, to June 8, 2020. As it originates from a prominent Indian news source, the content primarily reflects Indian perspectives and news stories, including national and international events as reported by the Indian Express.
License
CC0
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning engineers seeking text data for model training and development.
- Researchers and academics in fields such as media studies, journalism, linguistics, and computer science.
- Students undertaking projects related to data analysis, AI, or natural language understanding.
- Anyone interested in analysing news content and understanding media landscapes.
Dataset Name Suggestions
- Indian Express News Archive 2019-2020
- Daily Indian Express News Articles Dataset
- 20K Indian News Headlines and Articles
- Indian Express News Text Corpus
- Historical Indian Express News Data
Attributes
Original Data Source: News Articles Dataset from Indian Express