Patient Heart Condition Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers refined and accurate data for the prediction of heart disease in individuals. Originating from the Cleveland Heart Disease dataset found in the UCI repository, its primary purpose is a classification task: to determine whether an individual is suffering from heart disease (represented by 1) or is considered normal (represented by 0). This makes it a valuable resource for developing predictive models in cardiovascular health.
Columns
This database includes 13 attributes and a target variable, featuring 8 nominal values and 5 numeric values. A detailed description of each feature is provided below:
- Age: Patient's age in years (Numeric). The mean age is 54.4 years, with a standard deviation of 9.02, and values range from 29 to 77 years.
- Sex: Gender, where 1 denotes Male and 0 denotes Female (Nominal). The dataset contains 97 females and 206 males.
- cp: Type of chest pain experienced by the patient, categorised into 4 types: 0 (typical angina), 1 (atypical angina), 2 (non-anginal pain), and 3 (asymptomatic) (Nominal). Asymptomatic pain is the most frequent type, with 144 counts.
- trestbps: Patient's resting blood pressure in mm/HG (Numerical). The mean resting blood pressure is 132 mm/HG, with a standard deviation of 17.6, ranging from 94 to 200 mm/HG.
- chol: Serum cholesterol in mg/dl (Numeric). The mean serum cholesterol is 247 mg/dl, with a standard deviation of 51.7, ranging from 126 to 564 mg/dl.
- fbs: Fasting blood sugar levels, represented as 1 if > 120 mg/dl (true) and 0 if false (Nominal). There are 45 instances where fasting blood sugar is greater than 120 mg/dl.
- restecg: Result of electrocardiogram while at rest, with 3 distinct values: 0 (Normal), 1 (having ST-T wave abnormality), and 2 (showing probable or definite left ventricular hypertrophy by Estes' criteria) (Nominal). Both normal and left ventricular hypertrophy results are well represented.
- thalach: Maximum heart rate achieved (Numeric). The mean maximum heart rate is 150, with a standard deviation of 22.8, ranging from 71 to 202.
- exang: Angina induced by exercise, where 0 depicts NO and 1 depicts Yes (Nominal). 99 individuals experienced exercise-induced angina.
- oldpeak: Exercise induced ST-depression relative to the state of rest (Numeric). The mean oldpeak is 1.04, with a standard deviation of 1.16, ranging from 0 to 6.2.
- slope: ST segment measured in terms of slope during peak exercise, with 3 values: 0 (up sloping), 1 (flat), and 2 (down sloping) (Nominal). Up sloping and flat slopes are the most common.
- ca: The number of major vessels (0–3) (Nominal). Most individuals have 0 major vessels affected, with 180 counts.
- thal: A blood disorder called thalassemia, with 4 values: 0 (NULL), 1 (normal blood flow), 2 (fixed defect), and 3 (reversible defect) (Nominal). Normal blood flow is the most frequent category.
- target: The variable to be predicted, where 1 means the patient is suffering from heart disease and 0 means the patient is normal. There are 139 instances of heart disease presence and 164 of absence.
Distribution
The dataset is typically provided in a CSV format (
Heart_disease_cleveland_new.csv
). It consists of 303 individuals' data and 14 columns. Significantly, there are no missing values across any of the attributes, ensuring data integrity. The total file size is 11.33 kB. Each column has 303 valid observations.Usage
This dataset is ideal for a variety of applications, including:
- Developing and evaluating machine learning models for heart disease prediction.
- Medical research into factors influencing cardiovascular health.
- Creating patient risk assessment tools for early diagnosis.
- Educational purposes in data science and medical informatics, offering a practical classification problem.
Coverage
The dataset focuses on patient data collected from Cleveland. It includes information from 303 individuals. The data was contributed by creators from various institutions, including the Hungarian Institute of Cardiology, University Hospital Zurich, University Hospital Basel, and V.A. Medical Center, Long Beach, and Cleveland Clinic Foundation. No specific time range for data collection is provided within the sources.
License
CC BY-SA 4.0
Who Can Use It
This dataset is suitable for:
- Data Scientists and Machine Learning Engineers: To build and benchmark predictive models for health outcomes.
- Medical Researchers: To study correlations between various health indicators and heart disease.
- Healthcare Analysts: To inform strategies for early diagnosis and preventative care.
- Students: For academic projects and learning about classification problems in healthcare.
Dataset Name Suggestions
- Cleveland Heart Disease Prediction
- Cardiovascular Health Indicators (Cleveland)
- Patient Heart Condition Dataset
- Heart Disease Diagnosis Data
- Cleveland Clinic Heart Data
Attributes
Original Data Source: Patient Heart Condition Dataset