Opendatabay APP

Adversarial ANLI Benchmark Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Earth

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Adversarial ANLI Benchmark Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The ANLI (Adversarial Natural Language Inference) dataset is a significant, new large-scale benchmark designed for Natural Language Inference (NLI). It stands out from earlier datasets like SNLI and MNLI due to its collection method, which involves an iterative, adversarial human-and-model-in-the-loop procedure. This unique approach makes the ANLI dataset considerably more challenging, offering a robust test for natural language understanding models and serving as an excellent benchmark to assess advancements in NLI systems. The dataset is structured across three rounds, each with distinct training, development, and testing splits, with consistent data fields across all splits.

Columns

The dataset, as exemplified by the dev_r1.csv file, includes the following columns:
  • premise: The initial statement of the sentence. (String)
  • hypothesis: The proposed hypothesis related to the premise. (String)
  • label: The assigned label for the relationship between the premise and hypothesis. (String)
  • reason: A justification for the given label. (String)
  • uid: A unique identifier.

Distribution

This dataset is typically provided in CSV format. It is organised into three distinct rounds, with each round further divided into training, development, and testing sets. All these splits maintain the same data fields. For instance, the dev_r1.csv file contains the data for the first round's development set. The unique identifier (uid) column contains 1000 unique values, suggesting 1000 records in that particular split. The premise column features 845 unique values, while hypothesis, label, and reason columns each contain 1000 unique values. Label counts for the development set include 334 entries for the 0.00-0.20 range, 333 for 1.00-1.20, and 333 for 1.80-2.00.

Usage

The ANLI dataset is ideal for various applications in natural language processing. It can be used to:
  • Train models to better understand natural language.
  • Develop models that are more robust to adversarial examples, improving their resilience against challenging inputs.
  • Improve the accuracy and performance of Natural Language Inference (NLI) systems, pushing the boundaries of current capabilities.

Coverage

The dataset's region of coverage is Global. Specific time ranges or demographic scopes are not detailed in the available information.

License

CC0 Original Data Source: ANLI - (Adversarial NLI Benchmark)

Who Can Use It

This dataset is particularly valuable for:
  • Data scientists and machine learning engineers working on Natural Language Processing (NLP) tasks.
  • Researchers in the field of Artificial Intelligence aiming to advance natural language understanding and NLI models.
  • Developers seeking to build more resilient and accurate AI systems capable of handling adversarial inputs.
  • Anyone interested in benchmarking the progress and capabilities of current NLI models.

Dataset Name Suggestions

  • Adversarial NLI Benchmark Dataset
  • ANLI Natural Language Inference
  • Challenging NLI Dataset
  • Human-in-the-Loop NLI

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free