Dark Mode

Home

Data Categories

AI & ML Data

Question Answering Text Spans Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Question Answering Text Spans Dataset

Education & Learning Analytics

Tags and Keywords

Education

Text

Nlp

Trusted By

Question Answering Text Spans Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a reading comprehension dataset designed to assess a machine's ability to answer questions by extracting information from text [1, 2]. It comprises questions posed by crowdworkers based on a collection of Wikipedia articles [1, 2]. The answers to these questions are provided as specific segments of text, or spans, directly from the corresponding reading passages [1, 2]. The dataset serves as a valuable resource for developing and testing natural language processing models focused on understanding and responding to text-based queries [2].

Columns

title: The title of the Wikipedia article from which the context is drawn. (String) [1, 3]
context: The full textual content of the Wikipedia article. (String) [1, 3]
question: The query formulated by a crowdworker, requiring an answer from the context. (String) [1, 3]
answers: The answer to the question, presented as a list of text spans extracted from the context. (List of strings) [1, 3]
id: A unique identifier for each individual question-answer pair within the dataset. (Integer) [3]

Distribution

The dataset is typically structured in a CSV file format, with train.csv and validation.csv splits available [1, 2, 4]. Each row within these files represents a single question-answer pair, providing a clear and structured layout for analysis [2]. While specific total row counts for the entire dataset are not explicitly provided, the train.csv split alone contains 87,355 unique IDs, indicating a substantial volume of data [3].

Usage

This dataset is ideal for a variety of applications and use cases, including:

Developing Reading Comprehension models capable of answering open-ended questions based on provided text passages [2].
Training models to extract specific text spans as answers for multiple-choice questions from source materials [2].
Building systems that generate large training datasets for reading comprehension by creating synthetic questions from existing passages [2].

Coverage

The dataset's coverage is global, drawing contexts from a wide array of Wikipedia articles [2, 5]. It does not specify a particular time range or demographic scope, as its content is derived from the broad and diverse nature of Wikipedia.

License

CC0

Who Can Use It

This dataset is intended for:

Researchers and developers in the fields of Artificial Intelligence and Machine Learning, particularly those focusing on Natural Language Processing (NLP) [2].
Educators and analysts interested in education and learning analytics, as it supports the development of tools for understanding textual information [2].
Individuals and teams working on text mining and text pre-processing tasks, where the ability to extract precise information from text is crucial [2].

Dataset Name Suggestions

Stanford Question Answering Dataset (SQuAD) [2]
Reading Comprehension Wikipedia Dataset
Question Answering Text Spans
Crowdworker QA Dataset

Attributes

Original Data Source: Stanford Question Answering Dataset (SQuAD)

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...