Diabetes Risk Prediction Clinical Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides clinical data from patients, analysed to assess cardiovascular health and kidney function. It is important for evaluating the risk of heart disease, diabetes, and associated impaired kidney function. The dataset was created to support the research and development of risk prediction models for these conditions. Featuring relevant patient characteristics and clear diagnosis labels, it is suitable for building and testing accurate predictive models.
Columns
- Age: The patient's age in years. Age is a recognised risk factor for diabetes, with risk generally increasing as a person gets older.
- Gender: Indicates the patient's gender (Male/Female), which can influence diabetes prediction.
- BMI: Body Mass Index, a measurement that uses a person's height and weight to categorise them as normal weight, overweight, or obese.
- Chol: The total cholesterol level in the blood, measured in mg/dL.
- TG: Types of fat found in the blood, measured in mg/dL.
- HDL: High-density lipoprotein, known as "good" cholesterol, which aids in transporting excess cholesterol from body tissues back to the liver for processing or excretion (mg/dL).
- LDL: Low-density lipoprotein, known as "bad" cholesterol, which can contribute to plaque buildup in arteries, thereby increasing the risk of heart disease and stroke (mg/dL).
- Cr: Creatinine, waste products of muscle metabolism that are filtered and excreted from the body via the kidneys (mg/dL).
- BUN: Blood Urea Nitrogen, a blood test that measures the amount of urea nitrogen present in the blood (mg/dL).
- Diagnosis: An indicator of whether a patient has diabetes (1: Yes/ 0: No).
Distribution
The dataset contains a total of 5132 values across 11 columns, with all values being valid and no mismatched or missing data. The data file is typically in CSV format.
- Age: Valid 5132 entries, Mean 49 years, Standard Deviation 14, Minimum 20, Maximum 93.
- Gender: Valid 5132 entries, 63% Male, 37% Female.
- BMI: Valid 5132 entries, Mean 24.6, Standard Deviation 4.28, Minimum 15, Maximum 47.
- Chol: Valid 5132 entries, Mean 4.87 mg/dL, Standard Deviation 1, Minimum 0, Maximum 11.7.
- TG: Valid 5132 entries, Mean 1.72 mg/dL, Standard Deviation 1.33, Minimum 0, Maximum 32.6.
- HDL: Valid 5132 entries, Mean 1.59 mg/dL, Standard Deviation 1.04, Minimum 0, Maximum 9.9.
- LDL: Valid 5132 entries, Mean 2.91 mg/dL, Standard Deviation 0.95, Minimum 0.3, Maximum 9.9.
- Cr: Valid 5132 entries, Mean 71.1 mg/dL, Standard Deviation 28.5, Minimum 4.86, Maximum 800.
- BUN: Valid 5132 entries, Mean 4.9 mg/dL, Standard Deviation 1.69, Minimum 0.5, Maximum 38.9.
- Diagnosis: Valid 5132 entries, Mean 0.39, Standard Deviation 0.49, Minimum 0, Maximum 1. There are 3139 cases without diabetes (0) and 1993 cases with diabetes (1).
Usage
This dataset is ideal for building and testing prediction models for heart disease, diabetes, and impaired kidney function. It can be used to evaluate the risk of these conditions in patient populations.
Coverage
The dataset includes patient demographic information such as age, ranging from 20 to 93 years, and gender distribution with 63% male and 37% female patients. No specific geographic location or time range for the data collection is specified.
License
CC0: Public Domain
Who Can Use It
This dataset is intended for researchers and developers focused on creating and testing predictive models for health conditions. It is also suitable for data scientists engaged in diabetes classification and for anyone interested in assessing health risks related to cardiovascular and kidney function.
Dataset Name Suggestions
- Cardiovascular and Kidney Health Patient Data
- Diabetes Risk Prediction Clinical Dataset
- Patient Health Metrics for Predictive Modelling
- Clinical Health Risk Assessment Data
Attributes
Original Data Source: Diabetes Risk Prediction Clinical Dataset