Dark Mode

Home

Data Categories

Web & Social Media Data

Real vs Fake Indian News Corpus

FREE DATASET LIBRARY

Verified Data Provider

£0

Real vs Fake Indian News Corpus

News & Media Articles

Tags and Keywords

News

Text

Classification

India

Fake

Trusted By

Real vs Fake Indian News Corpus Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A classification dataset focusing on identifying Real and Fake news sourced from India. It is specifically designed for text classification tasks, enabling users to train machine learning models to differentiate between authentic and fabricated news articles. The dataset is suitable for intermediate machine learning practitioners.

Columns

label: Indicates the veracity of the news item. Values are strictly binary: FAKE (50%) or REAL (50%). There are two unique categories, and all valid records are matched perfectly to one of these types.
text: Contains the actual news article or text snippet used for classification. This column has a large number of unique values (over 2,200 unique snippets), ensuring diversity in the training material. A very small percentage of records (8 entries) are missing this text information.

Distribution

The data is provided in a single CSV file, named news_dataset.csv, with a file size of approximately 10.14 MB. The dataset contains two key columns. It is strongly recommended to partition the data, using 80% of the available records for model training purposes and reserving the remaining 20% to serve as a test dataset for evaluation. The structure is currently validated across nearly 3,730 records.

Usage

Ideal applications include the development of natural language processing (NLP) models specifically focused on detecting fake news. It can be employed for text classification research, benchmarking different machine learning algorithms, and studying the linguistic patterns associated with the propagation of misinformation within an Indian context.

Coverage

The data scope is explicitly focused on the domain of Indian Fake News. The resource is expected to be updated on an annual basis, ensuring ongoing relevance. Precise specific demographic or historical time-range details regarding the source material are not currently available.

License

CC0: Public Domain

Who Can Use It

Data Scientists: For training and evaluating binary classification models tailored for news verification.
Academics/Researchers: To conduct social science studies on the spread and characteristics of online misinformation.
Students: For intermediate-level machine learning projects involving text analysis and applied NLP tasks.

Dataset Name Suggestions

Indian News Veracity Classifier
Real vs Fake Indian News Corpus
India Text Misinformation Dataset
News Dataset for Binary Classification

Attributes

Original Data Source: Real vs Fake Indian News Corpus

Listing Stats

VIEWS

DOWNLOADS

LISTED

21/10/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Real vs Fake Indian News Corpus

News & Media Articles

Tags and Keywords

News

Text

Classification

India

Fake

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS