Opendatabay APP

BoolQ: Question Answering Dataset

Data Science and Analytics

Tags and Keywords

Earth

Nlp

Text

Question

Answering

Ai

Trusted By
Trusted by company1Trusted by company2Trusted by company3
BoolQ: Question Answering Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The BoolQ dataset is a valuable resource crafted for question answering tasks. It is organised into two main splits: a validation split and a training split. The primary aim of this dataset is to facilitate research in natural language processing (NLP) and machine learning (ML), particularly in tasks involving the answering of questions based on provided text. It offers a rich collection of user-posed questions, their corresponding answers, and the passages from which these answers are derived. This enables researchers to develop and evaluate models for real-world scenarios where information needs to be retrieved or understood from textual sources.

Columns

  • question: This column contains the specific questions posed by users. It provides insight into the information that needs to be extracted from the given passage.
  • answer: This column holds the correct answers to each corresponding question in the dataset. The objective is to build models that can accurately predict these answers. The 'answer' column includes Boolean values, with true appearing 5,874 times (62%) and false appearing 3,553 times (38%).
  • passage: This column serves as the context or background information from which questions are formulated and answers must be located.

Distribution

The BoolQ dataset consists of two main parts: a validation split and a training split. Both splits feature consistent data fields: question, answer, and passage. The train.csv file, for example, is part of the training data. While specific row or record counts are not detailed for the entire dataset, the 'answer' column uniquely features 9,427 boolean values.

Usage

This dataset is ideally suited for:
  • Question Answering Systems: Training models to identify correct answers from multiple choices, given a question and a passage.
  • Machine Reading Comprehension: Developing models that can understand and interpret written text effectively.
  • Information Retrieval: Enabling models to retrieve relevant passages or documents that contain answers to a given query or question.

Coverage

The sources do not specify the geographic, time range, or demographic scope of the data.

License

CC0

Who Can Use It

The BoolQ dataset is primarily intended for researchers and developers working in artificial intelligence fields such as Natural Language Processing (NLP) and Machine Learning (ML). It is particularly useful for those building or evaluating:
  • Question answering algorithms
  • Information retrieval systems
  • Machine reading comprehension models

Dataset Name Suggestions

  • BoolQ: Question Answering Dataset
  • Text-Based Question Answering Corpus
  • NLP Question-Answer-Passage Data
  • Machine Reading Comprehension BoolQ
  • Boolean Question Answering Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free