Dark Mode

Home

Data Categories

AI & ML Data

Fact-Check News Corpus

FREE DATASET LIBRARY

Verified Data Provider

£0

Fact-Check News Corpus

Entertainment & Media Consumption

Tags and Keywords

News

Intermediate

Nlp

Deep

Advanced

Trusted By

Fact-Check News Corpus Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for identifying real or fake news articles. It comprises text-based news content, each labelled to indicate its authenticity. Its primary purpose is to support research and development in natural language processing (NLP) and machine learning, particularly for classification tasks related to media veracity. The dataset offers a valuable resource for building and evaluating models that can distinguish between factual reporting and misinformation.

Columns

number: An identifier for each record.
title: The headline or title of the news article.
text: The full body text of the news article.
label: The classification of the news article, indicating whether it is "REAL" or "FAKE".

Distribution

The dataset is typically provided in a CSV format. It contains approximately 10.6 thousand records, with a balanced distribution between the "REAL" and "FAKE" labels, each making up 50% of the dataset. This balanced nature is beneficial for training unbiased machine learning models. The number column has approximately 10.6 thousand values, the title column contains about 6256 unique values, and the text column has around 6060 unique values.

Usage

This dataset is ideally suited for:

Developing and testing fake news detection algorithms.
Training natural language processing (NLP) models for text classification.
Research in areas such as deep learning for misinformation analysis.
Building applications that verify news authenticity.

Coverage

The dataset's content is global in its potential scope, though specific geographic or time range details for the news articles are not provided in the source material. It focuses on the classification of news articles into "real" or "fake" categories without specifying particular demographic groups or years of data availability.

License

CC0

Who Can Use It

This dataset is particularly useful for:

Data Scientists and Machine Learning Engineers working on text classification and NLP projects.
Researchers in artificial intelligence and deep learning focusing on misinformation.
Academics and students studying media literacy, computational linguistics, or social media analysis.
Organisations or individuals aiming to develop tools for news verification.

Dataset Name Suggestions

Real or Fake News Text Dataset
News Authenticity Classification Dataset
Misinformation Detection Dataset
Fact-Check News Corpus
Textual News Verification Dataset

Attributes

Original Data Source: Fake or Real News

Listing Stats

VIEWS

DOWNLOADS

LISTED

16/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Fact-Check News Corpus

Entertainment & Media Consumption

Tags and Keywords

News

Intermediate

Nlp

Deep

Advanced

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS