Opendatabay APP

Fact-Checked Web Content Repository

Data Science and Analytics

Tags and Keywords

Text

Politics

Nlp

Categorical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Fact-Checked Web Content Repository Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides details on online content and its assessed credibility. It includes posts, their truthfulness status (real, fake, or in between), links to the original posts, and the start date when associated propaganda was first identified. It is ideal for researchers and analysts interested in evaluating the prevalence of online misinformation, tracking the spread of propaganda, and investigating the accuracy of various content types. The dataset offers valuable insights for identifying trends in digital content and assessing the reliability of information found on social media platforms and other online forums.

Columns

  • Title: The title or heading of the online post.
  • StartDate: The date when propaganda related to the post was initially identified.
  • Link: A direct URL to the original online post.
  • Post: The actual text or description of the post itself.
  • Status: The credibility assessment of the post, categorised as 'Real', 'Fake', or 'Between' (meaning partially true or misleading).

Distribution

The dataset is typically provided in CSV format. It contains approximately 22,020 records, covering a significant span of online content. Specific counts for certain periods include 2,646 entries for content dated between 5th December 2020 and 4th October 2021. The dataset also includes a breakdown of content types, with 10% being Facebook posts, 5% viral images, and 85% other types. Credibility status distribution indicates 28% false, 16% half-true, and 56% other.

Usage

This dataset can be used for several applications, including:
  • Evaluating the prevalence and impact of misinformation online.
  • Tracking the dissemination and evolution of propaganda.
  • Investigating the accuracy and reliability of various types of online content.
  • Identifying emerging trends in online content and assessing information reliability on social media platforms and other digital forums.

Coverage

The dataset's geographic scope is global. It spans a wide time range, with content from 1st April 1995 to 26th February 2023. More specifically, it covers a period from 2008 to 2022. Data includes various types of content, such as Facebook posts (10%), viral images (5%), and a large portion of other content (85%). Credibility assessments are provided for content categorised as false (28%), half-true (16%), and other statuses (56%).

License

CC-BY-SA

Who Can Use It

This dataset is primarily intended for:
  • Researchers: For academic studies on misinformation, propaganda, and digital content analysis.
  • Analysts: For insights into online content trends, media literacy initiatives, and risk assessment related to information spread.
  • Journalists and Fact-Checkers: To aid in verifying online claims and understanding the landscape of false information.
  • Policy Makers: To inform strategies for combating online disinformation.

Dataset Name Suggestions

  • Verified Posts: Fact-Checking Online Content
  • Social Media Post Credibility Analysis
  • Online Misinformation Tracking Dataset
  • Digital Propaganda Assessment
  • Fact-Checked Web Content Repository

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format