Natural Language Inference Evaluation Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The HellaSwag dataset is a highly valuable resource for assessing a machine's sentence completion abilities based on commonsense natural language inference (NLI). It was initially introduced in a paper published at ACL2019. This dataset enables researchers and machine learning practitioners to train, validate, and evaluate models designed to understand and predict plausible sentence completions using common sense knowledge. It is useful for understanding the limitations of current NLI systems and for developing algorithms that reason with common sense.
Columns
The dataset includes several key columns:
- ind: The index of the data point. (Integer)
- activity_label: The label indicating the activity or event described in the sentence. (String)
- ctx_a: The first context sentence, providing background information. (String)
- ctx_b: The second context sentence, providing further background information. (String)
- endings: A list of possible sentence completions for the given context. (List of Strings)
- split: The dataset split, such as 'train', 'dev', or 'test'. (String)
- split_type: The type of split used for dividing the dataset, like 'random' or 'balanced'. (String)
- source_id: An identifier for the source.
- label: A label associated with the data point.
Distribution
The dataset is typically provided in CSV format and consists of three primary files:
train.csv
, validation.csv
, and test.csv
. The train.csv
file facilitates the learning process for machine learning models, validation.csv
is used to validate model performance, and test.csv
enables thorough evaluation of models in completing sentences with common sense. While exact total row counts for the entire dataset are not specified in the provided information, insights into unique values for fields such as activity_label
(9965 unique values), source_id
(8173 unique values), and split_type
(e.g., 'indomain' and 'zeroshot' each accounting for 50%) are available.Usage
This dataset is ideal for a variety of applications and use cases:
- Language Modelling: Training language models to better understand common sense knowledge and improve sentence completion tasks.
- Common Sense Reasoning: Developing and studying algorithms that can reason and make inferences based on common sense.
- Machine Performance Evaluation: Assessing the effectiveness of machine learning models in generating appropriate sentence endings given specific contexts and activity labels.
- Natural Language Inference (NLI): Benchmarking and improving NLI systems by evaluating their ability to predict plausible sentence completions.
Coverage
The dataset has a global region scope. It was listed on 17/06/2025. Specific time ranges for the data collection itself or detailed demographic scopes are not provided. The dataset includes various splits (train, dev, test) and split types (random, balanced) to ensure diversity for generalisation testing and fairness evaluation during model development.
License
CC0
Who Can Use It
The HellaSwag dataset is intended for researchers and machine learning practitioners. They can utilise it to:
- Train, validate, and evaluate machine learning models for tasks requiring common sense knowledge.
- Develop and refine algorithms for common sense reasoning.
- Benchmark and assess the performance and limitations of current natural language inference systems.
Dataset Name Suggestions
- HellaSwag: Commonsense NLI
- Commonsense Sentence Completion Data
- Natural Language Inference Evaluation Dataset
- AI Common Sense Benchmark
Attributes
Original Data Source: HellaSwag: Commonsense NLI