Opendatabay APP

Fact-Check News Corpus

Entertainment & Media Consumption

Tags and Keywords

News

Intermediate

Nlp

Deep

Advanced

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Fact-Check News Corpus Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for identifying real or fake news articles. It comprises text-based news content, each labelled to indicate its authenticity. Its primary purpose is to support research and development in natural language processing (NLP) and machine learning, particularly for classification tasks related to media veracity. The dataset offers a valuable resource for building and evaluating models that can distinguish between factual reporting and misinformation.

Columns

  • number: An identifier for each record.
  • title: The headline or title of the news article.
  • text: The full body text of the news article.
  • label: The classification of the news article, indicating whether it is "REAL" or "FAKE".

Distribution

The dataset is typically provided in a CSV format. It contains approximately 10.6 thousand records, with a balanced distribution between the "REAL" and "FAKE" labels, each making up 50% of the dataset. This balanced nature is beneficial for training unbiased machine learning models. The number column has approximately 10.6 thousand values, the title column contains about 6256 unique values, and the text column has around 6060 unique values.

Usage

This dataset is ideally suited for:
  • Developing and testing fake news detection algorithms.
  • Training natural language processing (NLP) models for text classification.
  • Research in areas such as deep learning for misinformation analysis.
  • Building applications that verify news authenticity.

Coverage

The dataset's content is global in its potential scope, though specific geographic or time range details for the news articles are not provided in the source material. It focuses on the classification of news articles into "real" or "fake" categories without specifying particular demographic groups or years of data availability.

License

CC0

Who Can Use It

This dataset is particularly useful for:
  • Data Scientists and Machine Learning Engineers working on text classification and NLP projects.
  • Researchers in artificial intelligence and deep learning focusing on misinformation.
  • Academics and students studying media literacy, computational linguistics, or social media analysis.
  • Organisations or individuals aiming to develop tools for news verification.

Dataset Name Suggestions

  • Real or Fake News Text Dataset
  • News Authenticity Classification Dataset
  • Misinformation Detection Dataset
  • Fact-Check News Corpus
  • Textual News Verification Dataset

Attributes

Original Data Source: Fake or Real News

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free