Real vs Fake Indian News Corpus
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A classification dataset focusing on identifying Real and Fake news sourced from India. It is specifically designed for text classification tasks, enabling users to train machine learning models to differentiate between authentic and fabricated news articles. The dataset is suitable for intermediate machine learning practitioners.
Columns
- label: Indicates the veracity of the news item. Values are strictly binary: FAKE (50%) or REAL (50%). There are two unique categories, and all valid records are matched perfectly to one of these types.
- text: Contains the actual news article or text snippet used for classification. This column has a large number of unique values (over 2,200 unique snippets), ensuring diversity in the training material. A very small percentage of records (8 entries) are missing this text information.
Distribution
The data is provided in a single CSV file, named
news_dataset.csv, with a file size of approximately 10.14 MB. The dataset contains two key columns. It is strongly recommended to partition the data, using 80% of the available records for model training purposes and reserving the remaining 20% to serve as a test dataset for evaluation. The structure is currently validated across nearly 3,730 records.Usage
Ideal applications include the development of natural language processing (NLP) models specifically focused on detecting fake news. It can be employed for text classification research, benchmarking different machine learning algorithms, and studying the linguistic patterns associated with the propagation of misinformation within an Indian context.
Coverage
The data scope is explicitly focused on the domain of Indian Fake News. The resource is expected to be updated on an annual basis, ensuring ongoing relevance. Precise specific demographic or historical time-range details regarding the source material are not currently available.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For training and evaluating binary classification models tailored for news verification.
- Academics/Researchers: To conduct social science studies on the spread and characteristics of online misinformation.
- Students: For intermediate-level machine learning projects involving text analysis and applied NLP tasks.
Dataset Name Suggestions
- Indian News Veracity Classifier
- Real vs Fake Indian News Corpus
- India Text Misinformation Dataset
- News Dataset for Binary Classification
Attributes
Original Data Source: Real vs Fake Indian News Corpus
Loading...
