Opendatabay APP

Expert Scientific Claim Verification Dataset

Data Science and Analytics

Tags and Keywords

Nlp

Science

Technology

Diseases

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Expert Scientific Claim Verification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a unique and valuable resource for research aiming to uncover insights into the sentiment, fact-checking, and trustworthiness of scientific claims. It features approximately 1,400 expert-written scientific claims, each paired with evidence-containing abstracts and content. The dataset also includes human-generated structured annotations, complete with labels and rationales, offering extensive opportunities for researchers to explore the nuances of science communication. It is designed to assist in understanding the accuracy of scientific arguments and language choices.

Columns

The dataset is typically structured across several files, including corpus_train.csv, claims_train.csv, claims_validation.csv, and claims_test.csv, each containing specific columns:
  • id: A unique identifier for the claim.
  • claim: The primary statement or assertion made by an expert.
  • title: The title associated with the claim or its source.
  • abstract: Background information or a summary related to the claim.
  • structured: Labels applied to each annotation, along with rationales explaining their selection.
  • evidence_doc_id: Identifiers for documents containing supporting evidence.
  • evidence_label: The label assigned to the evidence.
  • evidence_sentences: Specific sentences from the evidence that support or refute the claim.
  • cited_doc_ids: Identifiers for documents that have been cited.

Distribution

The dataset is typically provided in CSV format, with data distributed across multiple files, such as claims_train.csv and corpus_train.csv. It includes approximately 1,400 expert-written scientific claims, each linked to relevant evidence and annotations. While specific row counts for each individual file are not detailed, the total volume of claims provides a robust foundation for analysis. The structure includes claims paired with abstracts, content, and human-annotated labels and rationales.

Usage

This dataset is ideal for various applications and research endeavours:
  • Fact-checking: Developing and testing algorithms to verify the accuracy and truthfulness of scientific claims against provided evidence.
  • Sentiment Analysis: Understanding the underlying sentiment of scientific claims and measuring the trustworthiness or accuracy of supporting evidence.
  • Natural Language Processing (NLP): Training predictive models to automatically generate structured annotations of evidence found in claims and abstracts.
  • Science Communication Research: Exploring how scientists express ideas through precise language and persuasive arguments.

Coverage

The dataset's geographic coverage is global, indicating its applicability and relevance across different regions. There is no specific mention of a time range or demographic scope for the data within the available information.

License

CC0

Who Can Use It

This dataset is primarily intended for researchers, data scientists, and developers working in areas such as:
  • Academic Research: For studies on scientific communication, fact-checking methodologies, and the trustworthiness of information.
  • Natural Language Processing (NLP) Development: To train and evaluate models for text classification, claim verification, and information extraction.
  • AI and Machine Learning Engineers: To build predictive models for automated annotation and sentiment analysis of scientific text.

Dataset Name Suggestions

  • SciFact Scientific Claims
  • Expert Scientific Claim Verification Dataset
  • Research Claims Fact-Checking
  • Annotated Scientific Statements

Attributes

Original Data Source: SciFact (Scientific Claims)

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

26/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format