Opendatabay APP

Cholesterol and Heart Disease Prediction

Patient Health Records & Digital Health

Tags and Keywords

Heart

Disease

Cholesterol

Diagnosis

Cardiology

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Cholesterol and Heart Disease Prediction Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides clinical information for heart disease diagnosis, specifically focusing on data from the Cleveland Clinic Foundation. Its primary purpose is to enable the prediction of heart disease presence based on various patient attributes. The dataset is particularly notable as it treats cholesterol levels as a key class attribute for analysis. It has been widely utilised in machine learning experiments for tasks such as numeric prediction and conceptual clustering, aiming to distinguish between the presence and absence of heart disease in patients.

Columns

The dataset contains 14 key attributes:
  • age: Patient's age in years.
  • sex: Patient's sex (1 for male, 0 for female).
  • cp: Chest pain type, categorised into typical angina, atypical angina, non-anginal pain, or asymptomatic.
  • trestbps: Resting blood pressure of the patient (in mm Hg upon hospital admission).
  • chol: Serum cholesterol level (in mg/dl), often treated as a class attribute.
  • fbs: Fasting blood sugar level (1 if > 120 mg/dl, 0 otherwise).
  • restecg: Resting electrocardiographic results, indicating normal, ST-T wave abnormality, or probable/definite left ventricular hypertrophy.
  • thalach: Maximum heart rate achieved during exercise.
  • exang: Exercise-induced angina (1 for yes, 0 for no).
  • oldpeak: ST depression induced by exercise relative to rest.
  • slope: The slope of the peak exercise ST segment, classified as upsloping, flat, or downsloping.
  • ca: Number of major vessels (0-3) coloured by fluoroscopy.
  • thal: Thallium scan results, indicating normal, a fixed defect, or a reversible defect.
  • num: Diagnosis of heart disease (angiographic disease status), with 0 indicating < 50% diameter narrowing and 1 indicating > 50% diameter narrowing in any major vessel. This is the predicted attribute.

Distribution

The dataset is typically available in a CSV file format, with a sample file provided as dataset_2190_cholesterol.csv. It contains 303 instances or records, representing patient data. All attributes within the dataset are numeric-valued. Missing attribute values are indicated by the value -9.0.

Usage

This dataset is ideal for:
  • Developing and testing machine learning models for heart disease diagnosis.
  • Researching numeric prediction techniques, particularly with cholesterol as a target variable.
  • Exploring classification tasks to distinguish between heart disease presence and absence.
  • Conducting conceptual clustering experiments in medical informatics.
  • Educational purposes in statistics, data science, and artificial intelligence, showcasing real-world medical data.

Coverage

The data was collected from the Cleveland Clinic Foundation in July 1988. While it originates from patient records, sensitive information such as names and social security numbers have been anonymised and replaced with dummy values. The dataset specifically focuses on 14 key cardiac-related attributes selected from a larger set of 76 raw attributes, which have been consistently used in previous research.

License

CC0: Public Domain

Who Can Use It

  • Machine learning researchers and practitioners.
  • Medical data analysts and bioinformaticians.
  • Students and academics in fields such as health informatics, statistics, and artificial intelligence.
  • Public health researchers interested in cardiovascular disease patterns.
  • Developers building diagnostic support systems or predictive health tools.

Dataset Name Suggestions

  • Cleveland Heart Disease Predictor
  • Cardiovascular Diagnosis Dataset (Cleveland)
  • Heart Disease Patients (Cleveland)
  • Cleveland Clinic Cardiology Data
  • Cholesterol and Heart Disease Prediction

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

13/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format