National Institute Diabetes Forecasting Set
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Medical records and diagnostic measurements collected from the National Institute of Diabetes and Digestive and Kidney Diseases, focusing on the Pima Indian heritage. The primary objective of this preprocessed data is to accurately predict whether a patient has diabetes based on diagnostic features. It serves as a robust foundation for building machine learning models, specifically classification algorithms, having previously achieved an accuracy metric score of 92.86% using a Random Forest Classifier. The data allows for the analysis of various health indicators such as glucose levels, insulin, and BMI to forecast the onset of diabetes.
Columns
- Pregnancies: The number of times the patient has been pregnant (Range: 0 to 17).
- Glucose: Plasma glucose concentration measured in a 2-hour oral glucose tolerance test (OGTT) in mg/dl. (Values < 140 are considered Normal; > 200 are Diabetic).
- BloodPressure: Diastolic blood pressure in mm/Hg. (Values 60-80 are Normal; > 90 indicates Hypertension).
- SkinThickness: Triceps skin fold thickness in mm (Mean: 29.1 mm).
- Insulin: 2-Hour serum insulin in mu U/ml.
- BMI: Body Mass Index calculated as weight in kg/(height in m)^2. (Values > 30 are classified as Obese).
- DiabetesPedigreeFunction: A scoring function indicating diabetes history in relatives and the genetic relationship of those relatives to the patient.
- Age: Age of the individual in years (Range: 21 to 81).
- Outcome: The target variable indicating the presence of diabetes (0 for no, 1 for yes).
Distribution
The dataset is provided in a CSV format (
diabetes.csv) with a file size of approximately 25.71 kB. It contains 768 valid records (rows) and 9 columns. The data is 100% valid with zero mismatched or missing values, ensuring a clean structure for immediate analysis. The target variable (Outcome) shows a distribution where 268 cases are positive for diabetes (1) and 500 are negative (0).Usage
- Machine Learning Classification: Training models to predict the binary outcome of diabetes presence.
- Healthcare Analytics: Analysing correlations between BMI, glucose levels, and age regarding diabetes risk.
- Web-Service Development: creating backend prediction systems for medical applications.
- Academic Research: Studying the impact of genetic pedigree on disease onset within specific demographics.
Coverage
The data encompasses health metrics for females of Pima Indian heritage, with ages ranging from 21 to 81 years. The dataset originates from the National Institute of Diabetes and Digestive and Kidney Diseases. It includes specific medical ranges for categorising patients, such as Pre-Diabetic glucose levels and various stages of Hypertension.
License
CC0: Public Domain
Who Can Use It
- Data Scientists seeking clean datasets for benchmarking classification algorithms.
- Medical Researchers investigating the onset of diabetes in high-risk populations.
- Students requiring preprocessed data for statistics or machine learning projects.
- Healthcare Developers building diagnostic support tools.
Dataset Name Suggestions
- Pima Indians Diabetes Prediction Data
- Preprocessed Diabetes Diagnostic Indicators
- National Institute Diabetes Forecasting Set
- Pima Female Health and Diabetes Records
Attributes
Original Data Source: National Institute Diabetes Forecasting Set
Loading...
