Opendatabay APP

Natural Language Inference Bias Dataset

Data Science and Analytics

Tags and Keywords

Earth

Nature

Computer

Science

Nlp

Languages

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Natural Language Inference Bias Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset, known as HANS (Invalid NLI Heuristics Benchmark), serves as an evaluation set for Natural Language Inference (NLI) models. Its primary purpose is to test specific hypotheses regarding invalid heuristics that NLI models are prone to learning. By identifying these heuristic biases, the dataset aims to facilitate the improvement and robustness of NLI models.

Columns

  • premise: The initial statement or context for the example. (string)
  • hypothesis: The statement being evaluated in relation to the premise. (string)
  • label: The inferred relationship between the premise and hypothesis. (string)
  • parse_premise: The full parse tree of the premise. (string)
  • parse_hypothesis: The full parse tree of the hypothesis. (string)
  • binary_parse_premise: The binary parse tree of the premise. (string)
  • binary_parse_hypothesis: The binary parse tree of the hypothesis. (string)
  • heuristic: The specific invalid heuristic that the example is designed to test. (string)
  • subcase: A more granular classification within the identified heuristic. (string)
  • template: The underlying template used to generate the example. (string)

Distribution

The dataset is typically provided in CSV format, with files such as validation.csv and train.csv. Specific numbers for rows or records are not detailed in the available information.

Usage

This dataset is ideally used for evaluating and enhancing Natural Language Inference models. It provides a structured approach to identify and address biases related to invalid heuristics that NLI models might learn, thereby contributing to more reliable and accurate model performance. Researchers and developers can employ it to improve model generalisation and robustness.

Coverage

The dataset has a global coverage. No specific time range or demographic scope is indicated.

License

CC0

Who Can Use It

This dataset is particularly useful for machine learning engineers, data scientists, and researchers involved in Natural Language Processing (NLP) and Artificial Intelligence. It aids those aiming to build, evaluate, or improve NLI models, especially in understanding and mitigating model biases.

Dataset Name Suggestions

  • NLI Heuristics Benchmark
  • Natural Language Inference Bias Dataset
  • HANS NLI Evaluation Set
  • Heuristic Analysis for NLI Models

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

21/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format