Opendatabay APP

Woodchuck Science Quiz Dataset

Education & Learning Analytics

Tags and Keywords

Classification

Nlp

Primary

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Woodchuck Science Quiz Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a unique opportunity for NLP researchers to develop models capable of answering multiple-choice questions based on a given context paragraph. It is particularly well-suited for the development and testing of question-answering systems that can handle real-world, noisy data. Originating from grade school science content, this dataset can be utilised to create interactive tools such as a question-answering chatbot, a multiple-choice quiz game, or systems that generate multiple-choice questions for students.

Columns

The dataset is primarily composed of three files: validation.csv, train.csv, and test.csv. Each file contains the following columns:
  • id: A unique identifier for each question record.
  • question: The text of the question (String).
  • choices: A list of multiple-choice answers for the question (List of Strings).
  • answerKey: The integer index corresponding to the correct answer within the choices list.
  • fact1: The first piece of supporting information (String).
  • fact2: The second piece of supporting information (String).
  • combinedfact: A combined piece of supporting information (String).
  • formatted_question: The question text with the multiple-choice answers inserted into it (String).

Distribution

The data files are typically provided in CSV format. For the test.csv file, there are 920 unique records for the id, question, choices, answerKey, and formatted_question columns. The fact1, fact2, and combinedfact columns are noted as having 100% null values in some distributions. This is a free dataset, listed on a data marketplace with a quality rating of 5 out of 5 and is available globally. The current version is 1.0.

Usage

This dataset is ideal for:
  • Developing and evaluating Natural Language Processing (NLP) models for question answering.
  • Creating question-answering chatbots that can respond to science-based queries.
  • Designing multiple-choice quiz games for educational purposes.
  • Generating multiple-choice questions to aid student learning and assessment.
  • Research into handling noisy, real-world data in Q&A systems.

Coverage

The dataset's scope is global in terms of availability. Its content focuses on grade school science, making it relevant for primary and secondary education contexts. While a specific time range for data collection is not provided, the dataset was listed on 16/06/2025.

License

CC0

Who Can Use It

  • NLP Researchers and Data Scientists focusing on question answering, text classification, and natural language understanding.
  • Educators and Content Developers looking to create educational tools, quizzes, or automated question generation systems.
  • Game Developers interested in building educational quiz games.
  • Anyone working on AI and Machine Learning models that require structured question-answer pairs for training and testing.

Dataset Name Suggestions

  • Grade School Science Q&A
  • Educational NLP Challenge Data
  • Multi-Choice Science Questions
  • Woodchuck Science Quiz Dataset
  • Primary/Secondary Science QA

Attributes

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free