Scientific Knowledge Evaluation Dataset
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains a collection of 13,679 crowdsourced science exam questions, primarily focusing on Physics, Chemistry, and Biology. The questions are presented in a multiple-choice format, each with four answer options. For the majority of the questions, an additional paragraph providing supporting evidence for the correct answer is also included. The dataset is designed to evaluate a person's knowledge of science and can be used for various research and application purposes.
Columns
- question: The main text of the scientific question. (String)
- distractor3: One of the incorrect answer options designed to distract the test taker. (String)
- distractor1: Another incorrect answer option. (String)
- distractor2: A third incorrect answer option. (String)
- correct_answer: The accurate answer to the question. (String)
- support: Supplementary text that provides evidence or further context for the correct answer, helping users understand the question. (String)
Distribution
The dataset is primarily available as a CSV file, specifically
test.csv
, which is used for evaluation. It comprises 13,679 records or individual science exam questions. The exact file size is not detailed in the provided information, but its structure is consistent with a tabular format where each row represents a question and its associated data.Usage
This dataset is ideally suited for evaluating scientific knowledge and for research in natural language processing (NLP). It can be particularly useful for:
- Developing and training models to answer scientific questions.
- Creating AI-powered educational tools for science learning.
- Assessing human or AI performance on science examinations.
- Generating insights into common distractors and improving question design.
Coverage
The dataset offers global relevance as the scientific questions are not tied to a specific geographical region. It covers core science subjects including Physics, Chemistry, and Biology. No specific time range is indicated for the origin of the questions, suggesting they are general science concepts. There are no particular notes on data availability for specific demographic groups, as the focus is on subject matter knowledge.
License
CCO
Who Can Use It
The dataset is intended for a variety of users, including:
- Researchers in AI, machine learning, and natural language processing to develop and test question-answering systems.
- Educators and educational technology developers to create assessment tools or learning platforms.
- Data scientists and analysts interested in text data analysis and knowledge representation.
- Students undertaking projects related to scientific reasoning and AI.
Dataset Name Suggestions
- Scientific Knowledge Evaluation Dataset
- Science Exam Questions Collection
- Multi-Choice Science Questions
- SciQ Science Questions and Answers
- AI Science Question-Answering Corpus
Attributes
Original Data Source: SciQ (Scientific Question Answering)