COVID-DeepPredictor Viral Sequence Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Explains viral DNA sequence data used for training and evaluation related to the COVID-19 virus. This product originated from the COVID-DeepPredictor research, which employed a Recurrent Neural Network approach to predict SARS-CoV-2 and other pathogenic viruses. It provides key features, including the DNA sequences themselves and associated virus classes.
Columns
The dataset, typically found in a file such as
Trainingdata.csv, includes the following fields:- PID: The unique PID assigned to the Viral DNA.
- CLASS: The specific class number allotted to that type of virus.
- CLASSNAME: Specifies the name of the virus or the type of class it belongs to.
- SEQ: The actual Sequence of the Viral DNA.
Distribution
The structure involves separate training and evaluation sequence datasets for COVID-19 Virus DNA. The test data is divided into 5 folds, which are unevenly distributed, with one fold specifically used for validation purposes. The dataset is generally expected to be available in a CSV file format. Specific row or record counts are not detailed in the available materials.
Usage
This data is ideally suited for training and evaluating various Classifier Models. Suitable models include standard statistical and machine learning approaches such as Random forests, KNN, Logistic Regression, and Naive Bayes. The primary application is classification based on the sequence of the Viral DNA. Furthermore, the dataset supports the development and use of advanced models like the Transformers (specifically DNA-Bert), as demonstrated in related sample code.
Coverage
The data content focuses exclusively on viral DNA sequence data, particularly covering SARS-CoV-2 and other pathogenic viruses, acquired from the COVID-DeepPredictor project. Specific geographic coverage, time ranges, or demographic details are not applicable as the content is strictly genomic.
License
CC0: Public Domain
Who Can Use It
Intended users include data scientists, researchers, and bioinformaticians. They can utilise this data for developing and refining machine learning models for virus prediction, sequence analysis, and comparative genomics studies involving pathogenic viruses.
Dataset Name Suggestions
- Covid-Deeppredictor Viral Sequences
- SARS-CoV-2 DNA Classification Data
- Pathogenic Virus Genetic Predictor Set
- Viral DNA Sequence Classifier Training Data
Attributes
Original Data Source: COVID-DeepPredictor Viral Sequence Data
Loading...
