Political Claim Verification Dataset
Government & Civic Records
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a detailed collection of fact-checked claims scraped from Politifact.com. It includes claims made by various individuals and the corresponding assessments by Politifact curators. The primary purpose of this dataset is to facilitate the application of various Natural Language Processing (NLP) algorithms to analyse the integrity of information and determine the validity of claims. It serves as a valuable resource for research into misinformation, public discourse analysis, and the development of automated fact-checking systems.
Columns
- sources: A string representing the individual associated with the quote or claim.
- sources_dates: The date on which the information or quote was originally furnished by the source.
- sources_post_location: The specific location or medium through which the source provided the information, such as a Facebook post.
- sources_quote: The exact quote or statement made by the source under scrutiny.
- curator_name: The name of the person from Politifact who curated, analysed, and assessed the source's quote.
- curated_date: The date when the Politifact curator analysed and assessed the source's claim.
- fact: The fact score or rating assigned to the source's quote by Politifact.
- sources_url: The URL linking to the Politifact curator's article that discusses the source's quote.
- curators_article_title: The title of the article written by the curator, which either supports or rejects the source's claim.
- curator_complete_article: The full blog post or article written by the curator providing detailed reasoning for supporting or rejecting the source's claim.
- curator_tags: Keywords or tags assigned by the curator to their blog post.
- index: An identifier for the entry.
Distribution
The dataset is typically provided in a CSV file format. Specific row counts for individual files are updated separately, but the dataset contains approximately 19.4 thousand unique records. The data is structured with distinct columns detailing source information, claim content, and curatorial analysis, making it ready for various data processing tasks.
Usage
This dataset is ideally suited for researchers and developers working on:
- Developing and testing NLP algorithms for fact-checking and truth detection.
- Analysing patterns in misinformation and disinformation.
- Studying the discourse around political claims and public statements.
- Building models to predict the veracity of claims.
- Training machine learning models for natural language understanding and text classification in the context of media integrity.
Coverage
The dataset covers claims and fact-checks globally. The time range for the collected information spans from 2nd May 2007 to 20th April 2021, reflecting a significant period of public discourse. While the demographic scope varies, examples include a notable percentage of claims from Donald Trump and those originating from Facebook posts.
License
CC0
Who Can Use It
This dataset is particularly beneficial for:
- Data Scientists and NLP Engineers: For training and evaluating models related to text classification, sentiment analysis, and claim verification.
- Academics and Researchers: Studying political science, media studies, communication, and computational social science.
- Journalists and Fact-Checkers: As a reference or for building tools to assist in verifying information.
- Public Policy Analysts: To understand the spread of information and its impact.
Dataset Name Suggestions
- Politifact Fact-Check Data
- Political Claim Verification Dataset
- Public Fact-Checking Corpus
- Media Truthfulness Data
- NLP Fact-Checking Dataset
Attributes
Original Data Source: Politifact Factcheck Data