Diabetes and Digestive Health Patient Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of clinical data is designed for use in machine learning and predictive analytics, focusing on the diagnosis of diabetes. The primary objective is to predict whether a patient has diabetes based on a set of nine diagnostic measurements. The information originates from the National Institute of Diabetes and Digestive and Kidney Diseases, providing a focused view on health metrics for a specific cohort.
Columns
The dataset includes nine critical health indicators necessary for predictive modelling:
- Pregnancies: Records the total number of times the patient has been pregnant.
- Glucose: Plasma glucose concentration measured at 2 hours during an oral glucose tolerance test.
- BloodPressure: Diastolic blood pressure, recorded in mm Hg.
- SkinThickness: Triceps skin fold thickness, measured in mm.
- Insulin: Serum insulin levels measured at 2 hours, recorded in mu U/ml.
- BMI: Body mass index, calculated as weight in kg divided by height in metres squared.
- DiabetesPedigreeFunction: A function that scores the risk of diabetes based on family history.
- Age: The patient’s age in years.
- Outcome: The class variable, coded as 0 (no diabetes) or 1 (diabetes).
Distribution
The data file, named
diabetes.csv, is provided in CSV format. It contains 9 distinct columns. The structure includes 768 valid records. Data quality is excellent, with zero missing or mismatched values across all observed variables. The file size is 26.68 kB. The dataset is static, as its expected update frequency is listed as 'Never'.Usage
This resource is ideally suited for various analytical tasks, including:
- Developing and testing machine learning classification models to predict diabetes onset.
- Conducting detailed exploratory data analysis and data cleaning exercises.
- Model comparison, particularly for methods like Support Vector Machines (SVM).
- Educational use in demonstrating binary classification problems in health informatics.
Coverage
The scope of this data is strictly defined by specific constraints placed during its selection. It focuses exclusively on females of Pima Indian heritage. Furthermore, all subjects included in the data are at least 21 years old, with ages ranging up to 81 years. The dataset does not contain specific temporal coverage or time-series data.
License
CC0: Public Domain
Who Can Use It
Intended users include data scientists building predictive health models, students learning classification techniques, and medical researchers investigating specific demographic risk factors associated with diabetes. Academics can utilise this as a well-defined case study for classification algorithm testing.
Dataset Name Suggestions
- Pima Indian Diabetes Prediction Data
- NIDDK Diabetes Diagnostic Measurements
- Diabetes and Digestive Health Patient Data
Attributes
Original Data Source: Diabetes and Digestive Health Patient Data
Loading...
