ML-Ready Heart Condition Risk Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A cleaned and preprocessed dataset specifically tailored for predicting the risk of a heart attack. It integrates a wide array of variables spanning demographics, medical conditions, and lifestyle habits, making it highly suitable for machine learning, deep learning, and advanced statistical analysis in the healthcare domain. The data aims to support the development of effective preventative models by correlating various input features with a binary heart attack risk outcome.
Columns
The dataset contains 27 distinct features, including:
- Demographic & Clinical Inputs: Age, Gender (coded as 0 for Female, 1 for Male, along with an 'Other' category), Cholesterol (normalized), Heart rate (normalized resting heart rate), Diabetes (binary), Smoking (binary), Obesity (binary), Previous Heart Problems (binary), Medication Use (binary), Family History (heart disease in the family), Stress Level (normalized), BMI (normalized), Triglycerides (normalized), Income (normalized), Systolic blood pressure (normalized), and Diastolic blood pressure (normalized).
- Lifestyle Indicators: Alcohol Consumption (frequency), Exercise Hours Per Week, Diet (categorised habits), Sedentary Hours Per Day, Sleep Hours Per Day, and Physical Activity Days Per Week.
- Biomarkers & Results: Blood sugar (normalized), CK-MB (Creatine Kinase-MB enzyme level), and Troponin (Troponin enzyme level).
- Target Variables: Heart Attack Risk (Binary: 0 = Low Risk, 1 = High Risk) and Heart Attack Risk (Text).
Distribution
This dataset is provided in a standard CSV file format. It consists of 9651 records and 27 columns. Most features exhibit zero missing values; however, certain lifestyle and condition indicators, such as Diabetes, Smoking, Obesity, and Stress Level, have approximately 3% missing data (274 instances) which has been noted in the original sample distribution statistics.
Usage
This data product is exceptionally useful for several key applications:
- Training machine learning models to accurately predict an individual's likelihood of experiencing a heart attack.
- Conducting detailed data analysis to isolate and quantify the influence of various key risk factors on cardiovascular health outcomes.
- Developing sophisticated AI-driven systems designed for proactive health monitoring and patient risk stratification.
- Serving as a foundational resource for medical research, epidemiological studies, and academic projects focused on preventive cardiology.
Coverage
The dataset focuses on health and lifestyle factors relevant to cardiovascular risk. It covers various demographic inputs, including age and gender. While specific geographic regions or exact temporal spans are not provided within the metadata, the data encapsulates typical clinical and lifestyle variables often collected in health assessments.
License
CC BY-SA 4.0
Who Can Use It
- Machine Learning Engineers: For creating predictive models in the healthcare sector.
- Public Health Analysts: To study population-level risk patterns associated with lifestyle and medical metrics.
- Clinical Data Scientists: To identify patients at elevated risk for cardiac events based on their profiles.
- Researchers and Educators: For teaching or performing non-commercial research on medical diagnostics and risk prediction.
Dataset Name Suggestions
- Cardiovascular Risk Factor Prediction Data
- Clinical Features for Heart Attack Modelling
- Patient Lifestyle and Health Metrics
- ML-Ready Heart Condition Risk Dataset
Attributes
Original Data Source: ML-Ready Heart Condition Risk Dataset
Loading...
