Opendatabay APP

Daily Indian Express News Articles Dataset

Entertainment & Media Consumption

Tags and Keywords

Business

Arts

Classification

News

Nlp

Deep

Linguistics

Recommender

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Daily Indian Express News Articles Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains 20,000 news headlines, descriptions, and full articles obtained from the Indian Express. It covers a period from August 11, 2019, to June 8, 2020, providing a valuable resource for analysing media content and trends over this timeframe. It is well-suited for a variety of natural language processing and machine learning applications.

Columns

  • index: A unique numerical identifier for each entry.
  • article_id: Generated unique identifiers for each news article.
  • headline: The main title of the news piece.
  • desc: A short description or summary of the article's content.
  • date: The publication date and time of the article.
  • url: The web address linking to the original article.
  • articles: The entire body text of the news article.
  • article_type: Indicates the length of the article, categorised as short, mid, or long. The distribution shows 51% long, 37% mid, and 12% other lengths.
  • article_length: A numerical value representing the length of the article.

Distribution

The dataset is structured as a collection of 20,000 news records. Data files are typically provided in CSV format. Within the dataset, there are 19,870 unique article IDs, 19,924 unique headlines, 19,977 unique dates, and 19,959 unique URLs. The distribution of articles over the covered period shows varying counts for different fortnightly intervals, providing a detailed temporal spread.

Usage

This dataset is highly beneficial for:
  • Natural Language Processing (NLP) tasks, including text classification, sentiment analysis, and topic modelling.
  • Building and evaluating Deep Learning models that require large text corpora.
  • Developing Recommender Systems for news content based on user preferences or article similarities.
  • Conducting linguistics research on contemporary Indian news language.
  • Analysing news trends, media bias, or public discourse within the specified time frame.

Coverage

The dataset spans news articles published by the Indian Express from August 11, 2019, to June 8, 2020. As it originates from a prominent Indian news source, the content primarily reflects Indian perspectives and news stories, including national and international events as reported by the Indian Express.

License

CC0

Who Can Use It

This dataset is suitable for:
  • Data scientists and machine learning engineers seeking text data for model training and development.
  • Researchers and academics in fields such as media studies, journalism, linguistics, and computer science.
  • Students undertaking projects related to data analysis, AI, or natural language understanding.
  • Anyone interested in analysing news content and understanding media landscapes.

Dataset Name Suggestions

  • Indian Express News Archive 2019-2020
  • Daily Indian Express News Articles Dataset
  • 20K Indian News Headlines and Articles
  • Indian Express News Text Corpus
  • Historical Indian Express News Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

1

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free