Fact-Check News Corpus
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for identifying real or fake news articles. It comprises text-based news content, each labelled to indicate its authenticity. Its primary purpose is to support research and development in natural language processing (NLP) and machine learning, particularly for classification tasks related to media veracity. The dataset offers a valuable resource for building and evaluating models that can distinguish between factual reporting and misinformation.
Columns
- number: An identifier for each record.
- title: The headline or title of the news article.
- text: The full body text of the news article.
- label: The classification of the news article, indicating whether it is "REAL" or "FAKE".
Distribution
The dataset is typically provided in a CSV format. It contains approximately 10.6 thousand records, with a balanced distribution between the "REAL" and "FAKE" labels, each making up 50% of the dataset. This balanced nature is beneficial for training unbiased machine learning models. The number column has approximately 10.6 thousand values, the title column contains about 6256 unique values, and the text column has around 6060 unique values.
Usage
This dataset is ideally suited for:
- Developing and testing fake news detection algorithms.
- Training natural language processing (NLP) models for text classification.
- Research in areas such as deep learning for misinformation analysis.
- Building applications that verify news authenticity.
Coverage
The dataset's content is global in its potential scope, though specific geographic or time range details for the news articles are not provided in the source material. It focuses on the classification of news articles into "real" or "fake" categories without specifying particular demographic groups or years of data availability.
License
CC0
Who Can Use It
This dataset is particularly useful for:
- Data Scientists and Machine Learning Engineers working on text classification and NLP projects.
- Researchers in artificial intelligence and deep learning focusing on misinformation.
- Academics and students studying media literacy, computational linguistics, or social media analysis.
- Organisations or individuals aiming to develop tools for news verification.
Dataset Name Suggestions
- Real or Fake News Text Dataset
- News Authenticity Classification Dataset
- Misinformation Detection Dataset
- Fact-Check News Corpus
- Textual News Verification Dataset
Attributes
Original Data Source: Fake or Real News