COVID-19 Patient Outcome Prediction Dataset
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides anonymised patient-related information, including pre-existing conditions, to support the prediction of COVID-19 patient risk. It was created to assist healthcare providers in anticipating medical resource requirements for individuals testing positive for COVID-19 or even before, enabling more efficient procurement and arrangement of necessary resources. While most people experience mild to moderate respiratory illness from COVID-19, older individuals and those with underlying medical conditions such as cardiovascular disease, diabetes, chronic respiratory disease, and cancer are at a higher likelihood of developing serious illness. The primary objective is to facilitate the development of machine learning models that can predict whether a patient is at high risk based on their current symptoms, status, and medical history.
Columns
The dataset contains 21 unique features. For Boolean features, '1' indicates "yes" and '2' indicates "no". Values '97' and '99' represent missing data.
- sex: Indicates the patient's sex ('1' for female, '2' for male).
- age: Represents the patient's age.
- classification: Describes the COVID test findings. Values '1-3' signify that the patient was diagnosed with COVID-19 across different degrees of severity, while '4' or higher indicates the patient is not a carrier or the test result is inconclusive.
- patient type: Denotes the type of care received by the patient in the unit ('1' for returned home, '2' for hospitalisation).
- pneumonia: Indicates whether the patient has air sacs inflammation.
- pregnancy: States whether the patient is pregnant.
- diabetes: Specifies whether the patient has diabetes.
- copd: Indicates whether the patient has Chronic Obstructive Pulmonary Disease.
- asthma: Indicates whether the patient has asthma.
- inmsupr: Indicates whether the patient is immunosuppressed.
- hypertension: Indicates whether the patient has hypertension.
- cardiovascular: Specifies whether the patient has heart or blood vessels related disease.
- renal chronic: Indicates whether the patient has chronic renal disease.
- other disease: States whether the patient has other diseases.
- obesity: Indicates whether the patient is obese.
- tobacco: Indicates whether the patient is a tobacco user.
- usmr: Shows whether the patient was treated in medical units of the first, second, or third level.
- medical unit: Identifies the type of institution within the National Health System that provided care.
- intubed: Indicates whether the patient was connected to a ventilator.
- icu: Specifies whether the patient was admitted to an Intensive Care Unit.
- date died: If the patient died, this field provides the date of death; otherwise, it is '9999-99-99'.
Distribution
The dataset is in CSV format, named "Covid Data.csv", and is approximately 58.45 MB in size. It comprises 21 columns and contains an enormous number of unique patients, specifically 1,048,576 records. The dataset's expected update frequency is 'Never'.
Usage
This dataset is ideally suited for:
- Developing machine learning models to predict the risk level of COVID-19 patients.
- Assisting healthcare providers in forecasting resource demands for incoming patients.
- Supporting authorities in planning and arranging essential medical resources efficiently to save lives.
- Analysing correlations between pre-existing conditions, symptoms, and patient outcomes in COVID-19 cases.
Coverage
The data originates from the Mexican government and includes anonymised patient information. It covers a diverse demographic range by including features such as age, sex, and pregnancy status. The 'date died' column includes specific dates from the pandemic period, though a precise time range for the entire dataset is not explicitly defined beyond the presence of these dates.
License
CC0: Public Domain
Who Can Use It
- Healthcare Administrators and Planners: For strategic resource allocation and emergency preparedness.
- Data Scientists and Machine Learning Engineers: For building and validating predictive models for patient risk and resource needs.
- Public Health Researchers: For epidemiological studies, understanding risk factors, and disease progression related to COVID-19.
- Government Agencies: For informing public health policies and healthcare infrastructure planning during pandemics.
Dataset Name Suggestions
- COVID-19 Patient Risk & Resource Prediction Dataset
- Mexican COVID-19 Patient Clinical Data
- COVID-19 Patient Outcome Prediction Dataset
- Healthcare Resource Planning COVID-19 Dataset
Attributes
Original Data Source: COVID-19 Patient Outcome Prediction Dataset