Opendatabay APP

Scientific Knowledge Evaluation Dataset

Education & Learning Analytics

Tags and Keywords

Earth

Education

Nlp

Research

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Scientific Knowledge Evaluation Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains a collection of 13,679 crowdsourced science exam questions, primarily focusing on Physics, Chemistry, and Biology. The questions are presented in a multiple-choice format, each with four answer options. For the majority of the questions, an additional paragraph providing supporting evidence for the correct answer is also included. The dataset is designed to evaluate a person's knowledge of science and can be used for various research and application purposes.

Columns

  • question: The main text of the scientific question. (String)
  • distractor3: One of the incorrect answer options designed to distract the test taker. (String)
  • distractor1: Another incorrect answer option. (String)
  • distractor2: A third incorrect answer option. (String)
  • correct_answer: The accurate answer to the question. (String)
  • support: Supplementary text that provides evidence or further context for the correct answer, helping users understand the question. (String)

Distribution

The dataset is primarily available as a CSV file, specifically test.csv, which is used for evaluation. It comprises 13,679 records or individual science exam questions. The exact file size is not detailed in the provided information, but its structure is consistent with a tabular format where each row represents a question and its associated data.

Usage

This dataset is ideally suited for evaluating scientific knowledge and for research in natural language processing (NLP). It can be particularly useful for:
  • Developing and training models to answer scientific questions.
  • Creating AI-powered educational tools for science learning.
  • Assessing human or AI performance on science examinations.
  • Generating insights into common distractors and improving question design.

Coverage

The dataset offers global relevance as the scientific questions are not tied to a specific geographical region. It covers core science subjects including Physics, Chemistry, and Biology. No specific time range is indicated for the origin of the questions, suggesting they are general science concepts. There are no particular notes on data availability for specific demographic groups, as the focus is on subject matter knowledge.

License

CCO

Who Can Use It

The dataset is intended for a variety of users, including:
  • Researchers in AI, machine learning, and natural language processing to develop and test question-answering systems.
  • Educators and educational technology developers to create assessment tools or learning platforms.
  • Data scientists and analysts interested in text data analysis and knowledge representation.
  • Students undertaking projects related to scientific reasoning and AI.

Dataset Name Suggestions

  • Scientific Knowledge Evaluation Dataset
  • Science Exam Questions Collection
  • Multi-Choice Science Questions
  • SciQ Science Questions and Answers
  • AI Science Question-Answering Corpus

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format