Opendatabay APP

Wikipedia-Based Climate Change Fact-Checking Dataset

News & Media Articles

Tags and Keywords

Climate

Fever

Verification

Nlp

Wikipedia

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Wikipedia-Based Climate Change Fact-Checking Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Verifying the accuracy of environmental narratives has become increasingly critical as climate change discourse grows in global importance. This resource adopts the FEVER methodology to provide a structured framework for the verification of real-world claims collected from the internet. It consists of over 1,500 claims, each paired with five manually annotated evidence sentences retrieved from English Wikipedia. By categorising these pairs based on whether they support or refute a statement, the collection addresses the nuances of complex and disputed cases where conflicting evidence may coexist. This serves as a vital tool for solving Natural Language Processing (NLP) problems related to fact-checking and information retrieval in the context of atmospheric science.

Columns

  • claim_id: A unique numeric identifier assigned to each specific climate-related claim.
  • claim: The actual text of the real-world claim being investigated.
  • claim_label: The final verdict assigned to the claim (e.g., SUPPORTS, REFUTES, or NOT_ENOUGH_INFO) based on a majority vote of the evidence.
  • evidences: A collection containing the top five evidence sentences associated with the claim.
  • evidence_id: A unique identifier for each individual piece of evidence.
  • evidence_label: The micro-verdict assigned to a specific sentence regarding its relationship to the claim.
  • article: The title of the specific Wikipedia page from which the evidence was extracted.
  • evidence: The specific sentence used to validate or invalidate the claim.
  • entropy: A metric reflecting the level of uncertainty or disagreement among the votes for a label.
  • votes: An array documenting the individual votes cast during the manual annotation process.

Distribution

The data is delivered in a CSV file titled climate-fever.csv, with a file size of approximately 2.31 MB. It contains 1,535 unique claims, which expand into 7,675 individual claim-evidence pairs. The records demonstrate high integrity, with 100% validity across core fields such as the claim text and labels. This is a static archive with a usability score of 10.00, and no future updates are expected.

Usage

This collection is ideally suited for training and evaluating automated fact-checking systems and natural language inference models. Researchers can use it to develop algorithms capable of navigating "challenging" claims that involve multiple facets of climate science. It also provides a robust foundation for sentiment analysis and studies on the spread of information regarding global warming. By examining the entropy and voting patterns, data scientists can further investigate human uncertainty in the labelling of controversial scientific topics.

Coverage

The scope encompasses real-world claims gathered from across the internet, providing a broad view of contemporary climate discourse. The evidence is exclusively sourced from the English Wikipedia, ensuring a standardised baseline for verification. While the claims are global in nature, the demographic focus is on publicly available digital content. The records represent a fixed point in time, specifically around the dataset's publication in 2020, capturing the state of climate knowledge and internet claims up to that period.

License

CC0: Public Domain

Who Can Use It

NLP researchers can leverage these records to refine models for claim verification and evidence retrieval. Fact-checkers and journalists can utilise the annotated pairs to understand common climate myths and the evidence used to debunk them. Additionally, data science students can use the high-validity labels and voting data to practice classification and uncertainty modelling within a socially relevant context.

Dataset Name Suggestions

  • CLIMATE-FEVER: Real-World Claim Verification Archive
  • Wikipedia-Based Climate Change Fact-Checking Dataset
  • Annotated Climate Claims and Evidence Pairs for NLP
  • Global Warming Narrative Verification Registry
  • FEVER Methodology Climate Claim and Evidence Collection

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

22/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format