Opendatabay APP

Textual Entailment Explanations Dataset

Education & Learning Analytics

Tags and Keywords

Computer

Education

Nlp

Text

Linguistics

Data

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Textual Entailment Explanations Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a valuable resource designed for researchers and practitioners in natural language processing. It builds upon the existing Stanford Natural Language Inference (SNLI) Dataset by incorporating annotated explanations for entailment relations. The dataset includes premises and hypotheses that are evaluated against each other, with each pair classified as entailment, contradiction, or neutral. Crucially, each entailment relation is further supported by three annotated explanations, providing deeper insights and clarifying the relationship between the premise and hypothesis. This rich annotation allows for the exploration of nuanced understanding and analysis of linguistic entailment relations from various domains and contexts.

Columns

  • premise: The sentence or text that serves as the context or background information for the entailment relation. (Text)
  • hypothesis: The sentence or text being evaluated or inferred based on the premise. (Text)
  • label: The classification of the entailment relation, indicating whether the relationship is an entailment, contradiction, or neutral. (Categorical)
  • explanation_1: An annotated explanation offering additional insights and understanding for the entailment relation. (Text)
  • explanation_2: An additional annotated explanation offering further insights and understanding for the entailment relation. (Text)
  • explanation_3: Another annotated explanation offering additional insights and understanding for the entailment relation. (Text)

Distribution

The dataset is provided in CSV format and consists of three main files: train.csv, validation.csv, and test.csv. Each file contains examples comprising premises, hypotheses, labels, and three annotated explanations per entailment relation. The dataset includes approximately 9,824 records/examples, distributed across its labels (e.g., around 3,368 for one label category, 3,219 for another, and 3,237 for a third, based on label counts). Specific row counts for each individual file are not explicitly detailed in the provided information.

Usage

This dataset is an excellent resource for a variety of natural language processing tasks. It can be utilised for:
  • Natural Language Inference (NLI): Developing models to determine if a hypothesis is entailed by a premise.
  • Natural Language Understanding (NLU): Training and evaluating models for tasks such as textual entailment, contradiction detection, and neutral classification. The annotated explanations allow models to learn the reasoning behind entailment decisions.
  • Model Explainability: Investigating and analysing the provided explanations to understand which linguistic patterns or features are important in determining premise-hypothesis relationships.
  • Data Augmentation: Incorporating the additional annotated explanations into existing datasets to provide more diverse examples, potentially improving model performance on related tasks like question answering or machine translation.
  • The dataset can also be used for exploring premises and hypotheses to gain an understanding of their relationships, examining label distributions for potential biases, and for general model training and evaluation.

Coverage

The dataset's coverage is global, drawing premises from real-world textual data sources. It addresses entailment relations emerging from various domains and contexts. Specific time ranges or demographic scopes are not detailed.

License

CC0

Who Can Use It

This dataset is intended for:
  • Researchers and practitioners in the field of natural language processing.
  • Individuals and teams working on artificial intelligence and machine learning applications.
  • Those involved in education and learning analytics.
  • Developers and scientists focusing on natural language understanding, textual entailment, contradiction detection, model explainability, and data augmentation.

Dataset Name Suggestions

  • Extended Stanford Natural Language Inference Dataset
  • e-SNLI with Explanations
  • Natural Language Inference Explanations
  • Textual Entailment Explanations Dataset

Attributes

Listing Stats

VIEWS

4

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free