Opendatabay APP

Diabetes Risk Prediction Clinical Dataset

Patient Health Records & Digital Health

Tags and Keywords

Health

Diabetes

Cardiovascular

Kidney

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Diabetes Risk Prediction Clinical Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides clinical data from patients, analysed to assess cardiovascular health and kidney function. It is important for evaluating the risk of heart disease, diabetes, and associated impaired kidney function. The dataset was created to support the research and development of risk prediction models for these conditions. Featuring relevant patient characteristics and clear diagnosis labels, it is suitable for building and testing accurate predictive models.

Columns

  • Age: The patient's age in years. Age is a recognised risk factor for diabetes, with risk generally increasing as a person gets older.
  • Gender: Indicates the patient's gender (Male/Female), which can influence diabetes prediction.
  • BMI: Body Mass Index, a measurement that uses a person's height and weight to categorise them as normal weight, overweight, or obese.
  • Chol: The total cholesterol level in the blood, measured in mg/dL.
  • TG: Types of fat found in the blood, measured in mg/dL.
  • HDL: High-density lipoprotein, known as "good" cholesterol, which aids in transporting excess cholesterol from body tissues back to the liver for processing or excretion (mg/dL).
  • LDL: Low-density lipoprotein, known as "bad" cholesterol, which can contribute to plaque buildup in arteries, thereby increasing the risk of heart disease and stroke (mg/dL).
  • Cr: Creatinine, waste products of muscle metabolism that are filtered and excreted from the body via the kidneys (mg/dL).
  • BUN: Blood Urea Nitrogen, a blood test that measures the amount of urea nitrogen present in the blood (mg/dL).
  • Diagnosis: An indicator of whether a patient has diabetes (1: Yes/ 0: No).

Distribution

The dataset contains a total of 5132 values across 11 columns, with all values being valid and no mismatched or missing data. The data file is typically in CSV format.
  • Age: Valid 5132 entries, Mean 49 years, Standard Deviation 14, Minimum 20, Maximum 93.
  • Gender: Valid 5132 entries, 63% Male, 37% Female.
  • BMI: Valid 5132 entries, Mean 24.6, Standard Deviation 4.28, Minimum 15, Maximum 47.
  • Chol: Valid 5132 entries, Mean 4.87 mg/dL, Standard Deviation 1, Minimum 0, Maximum 11.7.
  • TG: Valid 5132 entries, Mean 1.72 mg/dL, Standard Deviation 1.33, Minimum 0, Maximum 32.6.
  • HDL: Valid 5132 entries, Mean 1.59 mg/dL, Standard Deviation 1.04, Minimum 0, Maximum 9.9.
  • LDL: Valid 5132 entries, Mean 2.91 mg/dL, Standard Deviation 0.95, Minimum 0.3, Maximum 9.9.
  • Cr: Valid 5132 entries, Mean 71.1 mg/dL, Standard Deviation 28.5, Minimum 4.86, Maximum 800.
  • BUN: Valid 5132 entries, Mean 4.9 mg/dL, Standard Deviation 1.69, Minimum 0.5, Maximum 38.9.
  • Diagnosis: Valid 5132 entries, Mean 0.39, Standard Deviation 0.49, Minimum 0, Maximum 1. There are 3139 cases without diabetes (0) and 1993 cases with diabetes (1).

Usage

This dataset is ideal for building and testing prediction models for heart disease, diabetes, and impaired kidney function. It can be used to evaluate the risk of these conditions in patient populations.

Coverage

The dataset includes patient demographic information such as age, ranging from 20 to 93 years, and gender distribution with 63% male and 37% female patients. No specific geographic location or time range for the data collection is specified.

License

CC0: Public Domain

Who Can Use It

This dataset is intended for researchers and developers focused on creating and testing predictive models for health conditions. It is also suitable for data scientists engaged in diabetes classification and for anyone interested in assessing health risks related to cardiovascular and kidney function.

Dataset Name Suggestions

  • Cardiovascular and Kidney Health Patient Data
  • Diabetes Risk Prediction Clinical Dataset
  • Patient Health Metrics for Predictive Modelling
  • Clinical Health Risk Assessment Data

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

30/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format