Opendatabay APP

Real vs Fake Indian News Corpus

News & Media Articles

Tags and Keywords

News

Text

Classification

India

Fake

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Real vs Fake Indian News Corpus Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A classification dataset focusing on identifying Real and Fake news sourced from India. It is specifically designed for text classification tasks, enabling users to train machine learning models to differentiate between authentic and fabricated news articles. The dataset is suitable for intermediate machine learning practitioners.

Columns

  • label: Indicates the veracity of the news item. Values are strictly binary: FAKE (50%) or REAL (50%). There are two unique categories, and all valid records are matched perfectly to one of these types.
  • text: Contains the actual news article or text snippet used for classification. This column has a large number of unique values (over 2,200 unique snippets), ensuring diversity in the training material. A very small percentage of records (8 entries) are missing this text information.

Distribution

The data is provided in a single CSV file, named news_dataset.csv, with a file size of approximately 10.14 MB. The dataset contains two key columns. It is strongly recommended to partition the data, using 80% of the available records for model training purposes and reserving the remaining 20% to serve as a test dataset for evaluation. The structure is currently validated across nearly 3,730 records.

Usage

Ideal applications include the development of natural language processing (NLP) models specifically focused on detecting fake news. It can be employed for text classification research, benchmarking different machine learning algorithms, and studying the linguistic patterns associated with the propagation of misinformation within an Indian context.

Coverage

The data scope is explicitly focused on the domain of Indian Fake News. The resource is expected to be updated on an annual basis, ensuring ongoing relevance. Precise specific demographic or historical time-range details regarding the source material are not currently available.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For training and evaluating binary classification models tailored for news verification.
  • Academics/Researchers: To conduct social science studies on the spread and characteristics of online misinformation.
  • Students: For intermediate-level machine learning projects involving text analysis and applied NLP tasks.

Dataset Name Suggestions

  1. Indian News Veracity Classifier
  2. Real vs Fake Indian News Corpus
  3. India Text Misinformation Dataset
  4. News Dataset for Binary Classification

Attributes

Original Data Source: Real vs Fake Indian News Corpus

Listing Stats

VIEWS

9

DOWNLOADS

1

LISTED

21/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format