Coronavirus Question-Answer Model Data
Health Information Systems & Technology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to facilitate the creation of question-answer models, specifically tailored for the CORD-19 dataset. It provides a focused collection of high-quality articles, emphasising detected study designs. This makes it an invaluable resource for natural language processing (NLP) research and applications related to the coronavirus, aiding in the development of intelligent health information systems.
Columns
The dataset primarily includes files structured for question, context, and answer combinations. For instance, the CSV file contains the following key columns:
- category: Represents the category of the question.
- question: The actual text of the question.
- context: The contextual passage from which the answer can be extracted.
- answer: The specific answer text relevant to the question and context.
Distribution
The dataset is provided in multiple formats, including a line-by-line export of CORD-19 data in
cord19.txt
, a CSV file (cord19-qa.csv
) containing question, context, answer combinations, and a JSON file (cord19-qa.json
) formatted for SQuAD 2.0. Specific numbers for rows or records are not detailed in the available information. The current version of this dataset is 1.0 and it is listed as globally available.Usage
This dataset is ideally suited for building and fine-tuning transformer models for language modelling and SQuAD 2.0 tasks. It serves as a foundational resource for developing advanced question-answering systems in the medical and healthcare domain, particularly those focused on coronavirus-related information. It is highly valuable for researchers and developers working on AI and large language model (LLM) applications.
Coverage
The dataset's focus is global in its applicability, concentrating on high-quality articles with detected study designs related to the CORD-19 research. While a specific time range for the included articles is not provided, the dataset itself was listed on 17th June 2025, indicating its recent availability on the platform. The primary scope is medical and healthcare information, specifically concerning the coronavirus.
License
CC BY-SA
Who Can Use It
This dataset is intended for a broad range of users, including:
- AI/ML Engineers and Data Scientists: For training and evaluating question-answering models and other NLP tasks.
- Healthcare Researchers: To develop tools for quickly extracting information from a vast corpus of medical literature.
- Academic Institutions: For research and educational purposes in the fields of AI, NLP, and medical informatics.
- Start-ups and Enterprises: Developing innovative health information systems or AI-powered medical assistants.
Dataset Name Suggestions
- CORD-19 QA Dataset
- Coronavirus Question-Answer Model Data
- Medical NLP QA Resource
- AI Health Q&A Dataset
Attributes
Original Data Source: CORD-19 QA