Opendatabay APP

COVID-DeepPredictor Viral Sequence Data

Patient Health Records & Digital Health

Tags and Keywords

Covid

Dna

Sequence

Virus

Predictor

Trusted By
Trusted by company1Trusted by company2Trusted by company3
COVID-DeepPredictor Viral Sequence Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Explains viral DNA sequence data used for training and evaluation related to the COVID-19 virus. This product originated from the COVID-DeepPredictor research, which employed a Recurrent Neural Network approach to predict SARS-CoV-2 and other pathogenic viruses. It provides key features, including the DNA sequences themselves and associated virus classes.

Columns

The dataset, typically found in a file such as Trainingdata.csv, includes the following fields:
  • PID: The unique PID assigned to the Viral DNA.
  • CLASS: The specific class number allotted to that type of virus.
  • CLASSNAME: Specifies the name of the virus or the type of class it belongs to.
  • SEQ: The actual Sequence of the Viral DNA.

Distribution

The structure involves separate training and evaluation sequence datasets for COVID-19 Virus DNA. The test data is divided into 5 folds, which are unevenly distributed, with one fold specifically used for validation purposes. The dataset is generally expected to be available in a CSV file format. Specific row or record counts are not detailed in the available materials.

Usage

This data is ideally suited for training and evaluating various Classifier Models. Suitable models include standard statistical and machine learning approaches such as Random forests, KNN, Logistic Regression, and Naive Bayes. The primary application is classification based on the sequence of the Viral DNA. Furthermore, the dataset supports the development and use of advanced models like the Transformers (specifically DNA-Bert), as demonstrated in related sample code.

Coverage

The data content focuses exclusively on viral DNA sequence data, particularly covering SARS-CoV-2 and other pathogenic viruses, acquired from the COVID-DeepPredictor project. Specific geographic coverage, time ranges, or demographic details are not applicable as the content is strictly genomic.

License

CC0: Public Domain

Who Can Use It

Intended users include data scientists, researchers, and bioinformaticians. They can utilise this data for developing and refining machine learning models for virus prediction, sequence analysis, and comparative genomics studies involving pathogenic viruses.

Dataset Name Suggestions

  • Covid-Deeppredictor Viral Sequences
  • SARS-CoV-2 DNA Classification Data
  • Pathogenic Virus Genetic Predictor Set
  • Viral DNA Sequence Classifier Training Data

Attributes

Listing Stats

VIEWS

3

DOWNLOADS

1

LISTED

05/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format