COVID-19 Mu Variant Genomic and Demographic Registry
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Tracking the genetic profile and spread of the Mu Variant (B.1.621) is essential for developing effective machine learning models for COVID-19 classification and prediction. This collection provides specific genomic data, collection dates, and patient demographics gathered primarily during the early stages of the variant's emergence in 2021. By documenting specific mutations such as S:N501Y and S:E484K alongside regional data from Colombia and the USA, the resource supports the identification of viral patterns and the progression of infection across different populations. It serves as a vital foundation for researchers examining the evolutionary trajectory of the virus and its impact on public health.
Columns
- ID: A unique identifier for each sample entry, often containing the specific strain name and date.
- Accession: The international accession number for the genomic sequence (e.g., EPI_ISL_1820959).
- Collection_date: The specific calendar date the sample was collected.
- Lineage: The assigned viral lineage, which for these records is predominantly B.1.621.
- Clade: The evolutionary clade designation, such as GH or G.
- LocationCountry: The country of origin for the sample, with the majority sourced from Colombia.
- Gender: The biological sex of the patient, categorised as Male, Female, or Other.
- Age: The age of the individual at the time of sample collection.
- Status: The patient's health status at the time of recording, such as Live or unknown.
- YEAR: The year the sample was collected (2021).
- MONTH: The month of the year the sample was collected (ranging from 1 to 4).
- DAY: The specific day of the month the sample was collected.
- S:N501Y: A binary indicator (0 or 1) representing the absence or presence of the N501Y mutation in the spike protein.
- S:E484K: A binary indicator (0 or 1) representing the absence or presence of the E484K mutation in the spike protein.
- S:R346K: A binary indicator (0 or 1) representing the absence or presence of the R346K mutation in the spike protein.
- S:P681H: A binary indicator (0 or 1) representing the absence or presence of the P681H mutation in the spike protein.
- Coverage: The genomic sequence coverage percentage, representing the quality of the sequencing.
- Department: The specific administrative region or department within a country, such as Magdalena or Bolivar.
Distribution
The data is delivered in a single CSV file titled
mu-variant-data.csv with a file size of approximately 13.94 kB. It consists of 89 unique records structured across 18 distinct columns. The records maintain a 100% validity rate with no missing or mismatched entries, ensuring high usability for analytical tasks.Usage
This resource is ideal for training artificial intelligence models to classify viral variants based on genomic markers. It is well-suited for epidemiological studies to track the geographic spread of the Mu variant in South and North America. Additionally, bioinformaticians can use the mutation data to study the correlation between specific spike protein changes and patient outcomes, while public health officials can utilise the regional data to monitor infection trends in specific departments.
Coverage
The geographic scope is primarily focused on Colombia (85%) and the United States (9%), with additional records from other regions. Temporally, the records are concentrated in the first four months of 2021, providing a snapshot of the variant's early propagation. The demographic scope includes individuals of both sexes and various ages, predominantly within the 34 to 58 age range.
License
CC0: Public Domain
Who Can Use It
Data scientists can leverage these records to develop predictive algorithms for viral classification. Epidemiologists may utilise the location and date metadata to model the spread of B.1.621 across different regions. Furthermore, medical researchers and virologists can use the mutation-specific columns to investigate the genetic characteristics of the Mu variant and its potential resistance to immunisation efforts.
Dataset Name Suggestions
- Mu Variant [B.1.621] Machine Learning Classification Dataset
- COVID-19 Mu Variant Genomic and Demographic Registry
- B.1.621 Lineage: Colombian and USA Patient Metadata
- Spike Protein Mutation Analysis: Mu Variant [B.1.621]
- Genomic Predictors for COVID-19 Mu Variant Classification
Attributes
Original Data Source: COVID-19 Mu Variant Genomic and Demographic Registry
Loading...
Free
Download Dataset in CSV Format
Recommended Datasets
Loading recommendations...
