Opendatabay APP

COVID-19 ML Drug Development Data

Patient Health Records & Digital Health

Tags and Keywords

Covid-19

Drug

Discovery

Machine

Deep

Trusted By
Trusted by company1Trusted by company2Trusted by company3
COVID-19 ML Drug Development Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The discovery of drugs targeting Coronavirus (Covid-19), leveraging Machine Learning and Deep Learning approaches. Released as open source by the Wuhan Institute of Virology during the pandemic, the dataset aims to assist researchers and experts worldwide in identifying potential medicines. Future updates are anticipated, including details on the probability of various compounds reacting to COVID-19.

Columns

  • id: Unique identification values for each record. All 1,177 values are unique and valid.
  • gen: Contains generated values.
  • smile: Represents carbon and hydrogen compounds. There are 1,164 unique compound structures, all valid across 1,177 entries.
  • source: Indicates whether data points are 'generated' (94% of entries), from a 'training' dataset (4%), or 'other' (2%).
  • score: Reflects the generated and training scores, with all 1,177 entries showing a score of 99.9.

Distribution

The dataset typically comes in CSV format, with a sample file, master_results_table.csv, being 76.38 kB. It is structured with 5 columns and contains 1,177 individual records. Updates to this dataset are expected on a daily basis.

Usage

This dataset is ideal for applications in Machine Learning and Deep Learning focused on drug discovery for deadly viruses. It can be utilised to understand the properties of various compounds, identify drug candidates, and model their potential interactions with COVID-19. The resource LSTM_chem is suggested as a tool for deeper exploration.

Coverage

The dataset originates from the Wuhan Institute of Virology and is open source, making it globally accessible for research. Its focus is on the discovery of COVID-19 drugs, a topic relevant during the ongoing pandemic. There are no specific geographic, time range, or demographic limitations mentioned for the data's applicability.

License

CC0: Public Domain

Who Can Use It

The dataset is intended for a broad range of users, including machine learning and deep learning researchers, scientists in pharmaceutical and biomedical fields, data scientists, and public health innovators. It provides a foundation for those working on drug discovery, chemical compound analysis, and the application of artificial intelligence in addressing global health crises.

Dataset Name Suggestions

  • AI-Driven COVID-19 Drug Discovery Data
  • Pandemic Drug Research Dataset
  • COVID-19 ML Drug Development Data
  • Wuhan Virology Institute Drug Discovery

Attributes

Original Data Source: COVID-19 ML Drug Development Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format