National COVID-19 Epidemiology Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Public health data detailing positive cases of COVID-19 across Colombia, updated through December 2023. This product offers a granular view of the pandemic's impact, capturing case reporting dates, geographical location down to the municipality level, demographic factors such as age and sex, and clinical outcomes including recovery and fatality status. The underlying dataset has undergone deep analysis and feature engineering, addressing initial data quality issues like typographical errors in categorical fields, to generate consolidated categories that maximise value for machine learning model development. A derived version is available containing over 30 new, engineered features designed to enhance predictive performance.
Columns
The dataset contains 23 fields covering demographic, geographical, and temporal attributes of positive COVID-19 cases:
- ID de caso: Unique identifier for each recorded case.
- Fecha de notificación: Date when the case was officially reported.
- Fecha de reporte web: Date when the case was published on the web.
- Fecha de diagnóstico: Date when the positive diagnosis was confirmed.
- Fecha de inicio de síntomas: Date when symptoms began.
- Fecha de recuperación: Date of recovery.
- Fecha de muerte: Date of death (Note: This field has a high percentage of missing values, approximately 97%).
- Código DIVIPOLA departamento: Geographical code for the department.
- Nombre departamento: Name of the department (e.g., BOGOTA, ANTIOQUIA).
- Código DIVIPOLA municipio: Geographical code for the municipality.
- Nombre municipio: Name of the municipality (e.g., BOGOTA, MEDELLIN).
- Edad: Age of the individual.
- Unidad de medida de edad: Unit used to measure age (e.g., years, months).
- Sexo: Sex of the individual (Male or Female).
- Tipo de contagio: Classification of contagion (e.g., Comunitaria (Community), Relacionado (Related)).
- Ubicación del caso: Location status of the case (e.g., Casa (Home), Fallecido (Deceased)).
- Estado: Clinical condition of the patient (e.g., Leve (Mild), Fallecido (Deceased)).
- Recuperado: Final recovery status (e.g., Recuperado, Fallecido).
- Tipo de recuperación: Method used to determine recovery (e.g., Tiempo (Time elapsed), PCR).
- Pertenencia étnica: Code indicating ethnic belonging.
- Nombre del grupo étnico: Name of the ethnic group (Note: This field has a very high percentage of missing values, nearly 99%).
- Código ISO del país: ISO code for the country (Highly missing).
- Nombre del país: Name of the country (Highly missing).
Distribution
The dataset is provided as a Full Dataset file (
COVID19 COLOMBIA - Complete Dataset DEC-2023.csv) with a size of 1.18 GB. It contains approximately 6.39 million records across 23 columns. The data was last updated on 27 December 2023.Usage
This data product is ideal for applications focused on epidemiological modelling, public health research, and geographical impact analysis. Specific use cases include:
- Developing predictive models for case trends or mortality rates.
- Analysing the effectiveness of public health interventions over time.
- Investigating geographical disparities in case reporting and outcomes across Colombian departments and municipalities.
- Studying demographic vulnerability related to age, sex, and ethnicity.
Coverage
Geographical Scope: National coverage for Colombia, with detailed geographical codes and names included for 41 departments and 1057 municipalities. BOGOTA accounts for 30% of the entries, while ANTIOQUIA accounts for 15% of department data.
Time Range: Case records span from early in the pandemic, specifically dating from March 2020, up to the last update in December 2023. The date of report (
fecha reporte web) ranges from 6 March 2020 to 26 December 2023.
Demographic Scope: Contains records for individuals ranging in age from 1 to 114 years. Data is categorised by sex (53% Female, 47% Male) and includes ethnicity details, although specific ethnic group names are largely unavailable.License
CC BY-SA 4.0
Who Can Use It
- Data Scientists and Machine Learning Engineers: For training high-performance predictive models using the raw data and the more than 30 generated features from the engineering process.
- Epidemiologists: For tracking disease spread, notification lags, and mortality trends.
- Government Analysts: For informing regional public health strategy and resource allocation based on detailed departmental and municipal statistics.
- Academics: For conducting research into the socio-demographic factors influencing COVID-19 outcomes in Colombia.
Dataset Name Suggestions
- Colombia COVID-19 Case Data (Dec 2023)
- Colombian Positive COVID-19 Cases with Feature Engineering
- National COVID-19 Epidemiology Dataset (Colombia)
- In-Depth Colombian COVID-19 Public Health Records
Attributes
Original Data Source: National COVID-19 Epidemiology Dataset
Loading...
