Dark Mode

Home

Data Categories

AI & ML Data

Contextual Language Comprehension Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Contextual Language Comprehension Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Trusted By

Contextual Language Comprehension Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset, known as HellaSwag (Commonsense NLI), is designed to evaluate a machine's ability to complete sentences in a logically coherent and sensible manner. It provides over 10,000 examples of sentence completion tasks, each featuring an initial sentence segment followed by four potential endings. The primary challenge for an artificial intelligence system is to identify and select the most appropriate ending that best completes the given sentence. This task is particularly demanding for machines because it necessitates an understanding that extends beyond mere word recognition to encompass deeper meaning and contextual nuances. While humans typically find this task straightforward due to their inherent grasp of language and common sense, it presents a significant hurdle for machines. The HellaSwag dataset represents a vital step towards the development of AI systems capable of communicating similarly to humans, offering a benchmark to assess current machine capabilities in language comprehension and generation, and highlighting areas requiring further advancement.

Columns

The dataset typically includes the following columns:

ind: An integer representing the index of the sentence.
activity_label: A string indicating the label of the activity.
ctx_a: A string containing the first context sentence.
ctx_b: A string containing the second context sentence.
endings: A string that holds the potential endings for the sentence.
split: A string denoting the division of the dataset (e.g., training or test set).
split_type: A string specifying the type of split, such as 'indomain' or 'zeroshot'.
label: The label indicating which of the possible endings is the correct one for the sentence completion.

Distribution

The dataset is primarily available in a data file format, commonly CSV. It comprises over 10,000 examples of sentence completion. While specific row or record counts for the entire dataset are not explicitly provided, it is structured with context sentences and multiple choice endings. The dataset can be readily split into training and test sets, for instance, using an 80/20 ratio for model development. The 'split' column helps categorise the data, with 'indomain' and 'zeroshot' types each accounting for 50% of the split.

Usage

This dataset is ideally suited for various machine learning and natural language processing applications, including:

Training models to generate novel sentence endings that mimic human-like creativity and coherence.
Developing models that enhance their understanding of sentence context, enabling them to select the most appropriate ending based on the given context.
Building models capable of evaluating two sentences with different endings and determining which one is more probable, drawing upon common-sense knowledge.

Coverage

The dataset is listed with a GLOBAL region scope. No specific geographical, temporal, or demographic coverage details regarding the content of the data itself are provided in the available information. The listing date for the dataset is noted as 17/06/2025.

License

CC0

Who Can Use It

This dataset is invaluable for:

Data scientists and machine learning engineers working on natural language understanding and generation tasks.
AI researchers focused on advancing the capabilities of artificial intelligence systems to interact and communicate more human-like.
Anyone involved in building models for sentence completion, contextual reasoning, and common-sense knowledge integration in AI.

Dataset Name Suggestions

HellaSwag (Commonsense NLI)
AI Sentence Completion Challenge
Contextual Language Comprehension Dataset
Commonsense Language Understanding Benchmark

Attributes

Original Data Source: HellaSwag (Commonsense NLI)

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Contextual Language Comprehension Dataset

Data Science and Analytics

Tags and Keywords

Text

Nlp

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS