Structured COVID-19 Trial Eligibility Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This resource contains highly structured data derived from the eligibility criteria of clinical trials related to COVID-19. It aims to provide researchers with knowledge extracted from the typically unstructured text of trial descriptions, focusing on key medical and procedural entities. Each entry in the data identifies a specific entity within a trial's criteria, maps it to a standardised vocabulary using concepts and domains (such as Condition, Drug, or Measurement), and normalises associated temporal and numerical attributes. The data clearly identifies whether the criterion serves as an inclusion or exclusion requirement for participation.
Columns
The dataset comprises 13 columns detailing the extracted information:
- nct_id: The unique identifier for the clinical trial as recorded on clinicaltrials.gov.
- entity_source_text: The precise segment of the original eligibility criteria text containing the medical or procedural entity (e.g., "pregnant").
- concept_id: The identifier used within the standardised vocabulary for the mapped entity.
- concept_name: The standardised name corresponding to the concept ID (e.g., "Disease caused by severe acute respiratory syndrome coronavirus 2").
- domain: The type or category of the concept, such as 'Condition', 'Drug', or 'Measurement'.
- start_index/end_index: The character positions indicating where the entity begins and ends within the full criteria source text.
- temporal_source_text: The text snippet from the criteria that indicates a time frame or duration (e.g., "history of"). This is frequently null.
- days: The temporal attribute normalised into a number of days.
- numeric_source_text: The text snippet describing a numerical attribute associated with the entity (e.g., "positive"). This is often null.
- numeric_att_min/numeric_att_max: The lower and upper bounds of the normalised numerical attribute.
- is_exclusion: A binary flag where '1' signifies an exclusion criterion and '0' signifies an inclusion criterion.
Distribution
The data is provided in a tabular format, typically a CSV file. It contains approximately 10.2 thousand valid records of extracted entities and attributes. The provided file size is 1.19 MB. Structure analysis indicates that concepts related to 'Condition' are the most frequent domain, accounting for over half of all entries.
Usage
This data product is highly valuable for several research and development applications, including:
- Developing and evaluating Natural Language Processing (NLP) models designed to automatically structure and extract complex clinical information from text.
- Analysing patterns in eligibility requirements across global COVID-19 clinical trials.
- Studying the ratio and types of inclusion versus exclusion criteria employed in pandemic-related research, noting that exclusion criteria significantly outnumber inclusion criteria.
- Research in medical informatics focusing on standardisation and interoperability of trial data.
Coverage
The scope of this data is based on clinical trial IDs sourced from the clinicaltrials.gov platform. The concepts covered directly pertain to COVID-19, including the disease itself and common related conditions, procedures, measurements, and demographics, such as pregnancy status. The dataset provides structured semantic tags applicable to a range of medical domains relevant to trial enrolment.
License
CC0: Public Domain
Who Can Use It
- Biomedical Researchers: To gain quick insights into population characteristics and restrictions in COVID-19 trials.
- Data Scientists and Machine Learning Engineers: To train models for entity recognition, semantic tagging, and attribute normalisation in clinical documents.
- Health Informatics Specialists: To explore methods for standardising clinical trial metadata and criteria.
Dataset Name Suggestions
- COVID-19 Clinical Trial Semantic Eligibility Criteria
- Structured COVID-19 Trial Eligibility Data
- Normalized Clinical Trial Inclusion and Exclusion Attributes
Attributes
Original Data Source: Structured COVID-19 Trial Eligibility Data
Loading...
