Synthetic Diabetes Patient Records Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
£79.99
About
The Diabetes Dataset is a comprehensive resource designed to support researchers, data scientists, and healthcare professionals interested in diabetes risk assessment and prediction. With a broad spectrum of health-related attributes, this dataset is ideal for developing predictive models and exploring factors associated with diabetes risk. By providing this dataset, we aim to encourage collaboration and innovation in data science and healthcare, potentially leading to more accurate early diagnoses and personalized diabetes treatment strategies.
Dataset Features:
- Id: Unique identifier for each data entry.
- Pregnancies: Number of times the patient has been pregnant.
- Glucose: Plasma glucose concentration measured over a 2-hour period during an oral glucose tolerance test.
- Blood_Pressure: Diastolic blood pressure in mm Hg.
- Skin_Thickness: Thickness of the triceps skinfold, measured in mm.
- Insulin: Serum insulin level after 2 hours (mu U/ml).
- BMI: Body mass index, calculated as weight in kg divided by height in m².
- Diabetes_Pedigree: Genetic risk score for diabetes, indicating familial history.
- Age: Age of the patient in years.
- Outcome: A binary variable indicating diabetes status; 1 indicates diabetes presence, while 0 indicates its absence.
Data distribution and Outliers:


Correlations and Relationships:



Usage:
This dataset can be used for:
- Diabetes research: To analyze and uncover patterns in diabetes risk factors and demographics.
- Educational purposes: Teaching data science skills such as cleaning, transformation, visualization, and model development within a healthcare context.
- Predictive modelling: Building models that assess diabetes risk, support feature selection, and enable insights into the health indicators of diabetes.
Coverage:
As a synthetic and anonymized dataset, it offers a secure environment for experimentation and learning without compromising individual privacy.
License:
CCO (Public Domain)
Who can use it:
- Researchers and educators: Ideal for studies and teaching diabetes analytics and healthcare data science.
- Data science enthusiasts and professionals: For practising data manipulation, feature engineering, and machine learning modelling focused on diabetes prediction.