Opendatabay APP

Global SARS-CoV-2 Variant Sequencing Data

Patient Health Records & Digital Health

Tags and Keywords

Covid

Variants

Sequencing

Epidemiology

Gisaid

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Global SARS-CoV-2 Variant Sequencing Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data is provided on SARS-CoV-2 variants observed across the world, including specific types such as alpha, beta, and gamma. This resource makes sequence data, originally sourced from GISAID via CoVariants, available for analysis. It is designed to allow detailed monitoring of viral lineages and their geographical proliferation over time.

Columns

  • location: Name of the country or specific sub-national region where the variant observation was recorded. This field includes data from 98 unique global locations.
  • date: The specific date of the recorded observation. The date range spans from May 2020 through September 2021.
  • variant: The specific variant name. The naming convention uses the established WHO label for Variants of Concern (VoC) and Variants of Interest (VoI), and Pango Lineage for other mutations. Special values such as 'others' (all variants not specified in the list) and 'non_who' (all variants without a WHO label, including 'others') are also included.
  • num_sequences: The raw number of sequenced samples that fall into the specified variant category. Note that 38% of records contain missing values for this field.
  • perc_sequences: The relative percentage of the total sequenced samples that belong to the specified variant category. Note that 38% of records contain missing values for this field.
  • num_sequences_total: The total number of samples that were sequenced during the preceding two weeks, providing the context for the percentage and raw counts listed.

Distribution

The data is delivered as a single CSV file, which is typically named covid-variants.csv. The file size is approximately 1.57 MB. It features six columns and contains 41.8 thousand validated records, making it suitable for immediate use with standard data analysis tools.

Usage

This dataset is ideal for applications requiring detailed analysis of viral prevalence and spread patterns. Potential use cases include:
  • Epidemiological modelling to forecast the impact and spread dynamics of emerging variants.
  • Informing public health policy decisions related to travel restrictions, sequencing targets, and resource allocation.
  • Academic research focused on tracking mutation rates and global lineage shifts over time.
  • Visualisation projects aimed at mapping variant dominance across different geographic regions.

Coverage

The data provides a global scope, capturing observations across 98 distinct countries or regions. The time period covered spans from May 11, 2020, to September 28, 2021. The dataset is expected to be updated on a weekly basis, ensuring currency for ongoing surveillance.

License

CC0: Public Domain

Who Can Use It

  • Governmental Health Agencies: For real-time surveillance and conducting risk assessment for new Variants of Concern and Interest.
  • Researchers and Academics: For performing detailed studies on viral mutation and comparing variant dominance worldwide.
  • Data Scientists and Modellers: For developing and testing predictive models related to infectious disease transmission and control.

Dataset Name Suggestions

  • Global SARS-CoV-2 Variant Sequencing Data
  • Worldwide Covid Variant Prevalence Tracker
  • International Viral Mutation Lineage Summary

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

09/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format