Opendatabay APP

Political Claim Verification Dataset

Government & Civic Records

Tags and Keywords

Text

Politics

Nlp

Government

Languages

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Political Claim Verification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a detailed collection of fact-checked claims scraped from Politifact.com. It includes claims made by various individuals and the corresponding assessments by Politifact curators. The primary purpose of this dataset is to facilitate the application of various Natural Language Processing (NLP) algorithms to analyse the integrity of information and determine the validity of claims. It serves as a valuable resource for research into misinformation, public discourse analysis, and the development of automated fact-checking systems.

Columns

  • sources: A string representing the individual associated with the quote or claim.
  • sources_dates: The date on which the information or quote was originally furnished by the source.
  • sources_post_location: The specific location or medium through which the source provided the information, such as a Facebook post.
  • sources_quote: The exact quote or statement made by the source under scrutiny.
  • curator_name: The name of the person from Politifact who curated, analysed, and assessed the source's quote.
  • curated_date: The date when the Politifact curator analysed and assessed the source's claim.
  • fact: The fact score or rating assigned to the source's quote by Politifact.
  • sources_url: The URL linking to the Politifact curator's article that discusses the source's quote.
  • curators_article_title: The title of the article written by the curator, which either supports or rejects the source's claim.
  • curator_complete_article: The full blog post or article written by the curator providing detailed reasoning for supporting or rejecting the source's claim.
  • curator_tags: Keywords or tags assigned by the curator to their blog post.
  • index: An identifier for the entry.

Distribution

The dataset is typically provided in a CSV file format. Specific row counts for individual files are updated separately, but the dataset contains approximately 19.4 thousand unique records. The data is structured with distinct columns detailing source information, claim content, and curatorial analysis, making it ready for various data processing tasks.

Usage

This dataset is ideally suited for researchers and developers working on:
  • Developing and testing NLP algorithms for fact-checking and truth detection.
  • Analysing patterns in misinformation and disinformation.
  • Studying the discourse around political claims and public statements.
  • Building models to predict the veracity of claims.
  • Training machine learning models for natural language understanding and text classification in the context of media integrity.

Coverage

The dataset covers claims and fact-checks globally. The time range for the collected information spans from 2nd May 2007 to 20th April 2021, reflecting a significant period of public discourse. While the demographic scope varies, examples include a notable percentage of claims from Donald Trump and those originating from Facebook posts.

License

CC0

Who Can Use It

This dataset is particularly beneficial for:
  • Data Scientists and NLP Engineers: For training and evaluating models related to text classification, sentiment analysis, and claim verification.
  • Academics and Researchers: Studying political science, media studies, communication, and computational social science.
  • Journalists and Fact-Checkers: As a reference or for building tools to assist in verifying information.
  • Public Policy Analysts: To understand the spread of information and its impact.

Dataset Name Suggestions

  • Politifact Fact-Check Data
  • Political Claim Verification Dataset
  • Public Fact-Checking Corpus
  • Media Truthfulness Data
  • NLP Fact-Checking Dataset

Attributes

Original Data Source: Politifact Factcheck Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

26/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format