Opendatabay APP

Fake News Authenticity Dataset

News & Media Articles

Tags and Keywords

News

Fake

Misinformation

Social

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Fake News Authenticity Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to aid in the classification of news by type and label, primarily focusing on distinguishing between real and fake news based on its source and characteristics. In an era where social media is a vast source of content, news is accessed frequently, yet its authenticity is a major concern due to the widespread issue of misinformation [1]. Manually classifying news is a tedious, time-consuming, and potentially biased process [1]. This dataset provides a valuable resource for mitigating the spread of misinformation and informing individuals about the nature of news they consume [2]. It is particularly valuable as it incorporates crucial source information, author names, publication dates, and labels, which are essential for evaluating news trustworthiness [3].

Columns

  • author: The author of the news article [4].
  • published: The date on which the article was published [4].
  • title: The title of the news article [5].
  • text: The full text content of the article [5].
  • language: The language in which the article is written [6].
  • site_url: The URL of the website where the article was published [6].
  • main_img_url: The URL of the main image associated with the article [7].
  • type: The identified type of article, such as 'bs' (clickbait/misleading) or 'bias' [7].
  • label: A categorical label indicating whether the news is 'Fake' or 'Real' [8].
  • title_without_stopwords: The article title with common stop words removed for analysis [8].
  • text_without_stopwords: The article text with common stop words removed for analysis [9].
  • hasImage: A binary indicator showing whether the article includes an image or not [9].

Distribution

The dataset is provided as a CSV file named news_articles.csv, with a size of 10.97 MB [4]. It contains 12 distinct columns [4]. The dataset consists of approximately 2095 to 2096 records for most columns, with a slightly lower count for the text column (2050 valid records) [4-9].

Usage

This dataset is ideal for:
  • Developing practical applications that allow users to gain insight from articles they consume, such as fact-checking websites [2].
  • Creating built-in plugins and article parsers to automate the detection and flagging of misinformation [2].
  • Refining and making existing tools easier to access for greater public awareness regarding news authenticity [2].
  • Conducting research on source-based fake news classification and the characteristics that define untrustworthy news [1, 3].
  • Training machine learning models to label data as fake or untrustworthy based on various features [3].

Coverage

The dataset primarily features articles in English (96%) and German (3%), with a small percentage of other languages [6]. The publication dates of the articles span at least from 2006 to 2016 [5]. The scope is global in its relevance, addressing the universal challenge of misinformation on social media, though specific geographic or demographic data on the origin of news content is not detailed [1].

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For building and training models for fake news detection, news classification, and natural language processing tasks.
  • Researchers: Studying misinformation, media bias, and the impact of social media content.
  • Developers: Creating fact-checking platforms, browser extensions, or news analysis tools.
  • Journalists and Media Organisations: For verifying news authenticity and understanding content biases.
  • Anyone interested in combating misinformation: To gain insights into how news can be classified and verified.

Dataset Name Suggestions

  • Fake News Authenticity Dataset
  • Social Media News Classification
  • Misinformation Detection Corpus
  • News Source Trustworthiness Data

Attributes

Original Data Source: Fake News Authenticity Dataset

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format