Opendatabay APP

ML-Ready Heart Condition Risk Dataset

Patient Health Records & Digital Health

Tags and Keywords

Prediction

Cardiac

Healthcare

Risk

Machine

Trusted By
Trusted by company1Trusted by company2Trusted by company3
ML-Ready Heart Condition Risk Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A cleaned and preprocessed dataset specifically tailored for predicting the risk of a heart attack. It integrates a wide array of variables spanning demographics, medical conditions, and lifestyle habits, making it highly suitable for machine learning, deep learning, and advanced statistical analysis in the healthcare domain. The data aims to support the development of effective preventative models by correlating various input features with a binary heart attack risk outcome.

Columns

The dataset contains 27 distinct features, including:
  • Demographic & Clinical Inputs: Age, Gender (coded as 0 for Female, 1 for Male, along with an 'Other' category), Cholesterol (normalized), Heart rate (normalized resting heart rate), Diabetes (binary), Smoking (binary), Obesity (binary), Previous Heart Problems (binary), Medication Use (binary), Family History (heart disease in the family), Stress Level (normalized), BMI (normalized), Triglycerides (normalized), Income (normalized), Systolic blood pressure (normalized), and Diastolic blood pressure (normalized).
  • Lifestyle Indicators: Alcohol Consumption (frequency), Exercise Hours Per Week, Diet (categorised habits), Sedentary Hours Per Day, Sleep Hours Per Day, and Physical Activity Days Per Week.
  • Biomarkers & Results: Blood sugar (normalized), CK-MB (Creatine Kinase-MB enzyme level), and Troponin (Troponin enzyme level).
  • Target Variables: Heart Attack Risk (Binary: 0 = Low Risk, 1 = High Risk) and Heart Attack Risk (Text).

Distribution

This dataset is provided in a standard CSV file format. It consists of 9651 records and 27 columns. Most features exhibit zero missing values; however, certain lifestyle and condition indicators, such as Diabetes, Smoking, Obesity, and Stress Level, have approximately 3% missing data (274 instances) which has been noted in the original sample distribution statistics.

Usage

This data product is exceptionally useful for several key applications:
  • Training machine learning models to accurately predict an individual's likelihood of experiencing a heart attack.
  • Conducting detailed data analysis to isolate and quantify the influence of various key risk factors on cardiovascular health outcomes.
  • Developing sophisticated AI-driven systems designed for proactive health monitoring and patient risk stratification.
  • Serving as a foundational resource for medical research, epidemiological studies, and academic projects focused on preventive cardiology.

Coverage

The dataset focuses on health and lifestyle factors relevant to cardiovascular risk. It covers various demographic inputs, including age and gender. While specific geographic regions or exact temporal spans are not provided within the metadata, the data encapsulates typical clinical and lifestyle variables often collected in health assessments.

License

CC BY-SA 4.0

Who Can Use It

  • Machine Learning Engineers: For creating predictive models in the healthcare sector.
  • Public Health Analysts: To study population-level risk patterns associated with lifestyle and medical metrics.
  • Clinical Data Scientists: To identify patients at elevated risk for cardiac events based on their profiles.
  • Researchers and Educators: For teaching or performing non-commercial research on medical diagnostics and risk prediction.

Dataset Name Suggestions

  1. Cardiovascular Risk Factor Prediction Data
  2. Clinical Features for Heart Attack Modelling
  3. Patient Lifestyle and Health Metrics
  4. ML-Ready Heart Condition Risk Dataset

Attributes

Listing Stats

VIEWS

6

DOWNLOADS

1

LISTED

14/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format