Opendatabay APP

Coronavirus Question-Answer Model Data

Health Information Systems & Technology

Tags and Keywords

Coronavirus

Nlp

Healthcare

Medical

Ai

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Coronavirus Question-Answer Model Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to facilitate the creation of question-answer models, specifically tailored for the CORD-19 dataset. It provides a focused collection of high-quality articles, emphasising detected study designs. This makes it an invaluable resource for natural language processing (NLP) research and applications related to the coronavirus, aiding in the development of intelligent health information systems.

Columns

The dataset primarily includes files structured for question, context, and answer combinations. For instance, the CSV file contains the following key columns:
  • category: Represents the category of the question.
  • question: The actual text of the question.
  • context: The contextual passage from which the answer can be extracted.
  • answer: The specific answer text relevant to the question and context.

Distribution

The dataset is provided in multiple formats, including a line-by-line export of CORD-19 data in cord19.txt, a CSV file (cord19-qa.csv) containing question, context, answer combinations, and a JSON file (cord19-qa.json) formatted for SQuAD 2.0. Specific numbers for rows or records are not detailed in the available information. The current version of this dataset is 1.0 and it is listed as globally available.

Usage

This dataset is ideally suited for building and fine-tuning transformer models for language modelling and SQuAD 2.0 tasks. It serves as a foundational resource for developing advanced question-answering systems in the medical and healthcare domain, particularly those focused on coronavirus-related information. It is highly valuable for researchers and developers working on AI and large language model (LLM) applications.

Coverage

The dataset's focus is global in its applicability, concentrating on high-quality articles with detected study designs related to the CORD-19 research. While a specific time range for the included articles is not provided, the dataset itself was listed on 17th June 2025, indicating its recent availability on the platform. The primary scope is medical and healthcare information, specifically concerning the coronavirus.

License

CC BY-SA

Who Can Use It

This dataset is intended for a broad range of users, including:
  • AI/ML Engineers and Data Scientists: For training and evaluating question-answering models and other NLP tasks.
  • Healthcare Researchers: To develop tools for quickly extracting information from a vast corpus of medical literature.
  • Academic Institutions: For research and educational purposes in the fields of AI, NLP, and medical informatics.
  • Start-ups and Enterprises: Developing innovative health information systems or AI-powered medical assistants.

Dataset Name Suggestions

  • CORD-19 QA Dataset
  • Coronavirus Question-Answer Model Data
  • Medical NLP QA Resource
  • AI Health Q&A Dataset

Attributes

Original Data Source: CORD-19 QA

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format