Simulated Health Data for Diabetes Risk Analysis
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Simulates individual medical and lifestyle profiles for predicting the risk of developing diabetes. This synthetic health dataset was created to train and test classification models, build explainable AI tools, and simulate health screening applications in a secure environment. It includes a target variable, 'at_risk_diabetes', which is calculated using a rule-based model that weighs factors like BMI, glucose levels, age, family history, and smoking, with added random noise to make prediction tasks more realistic.
Columns
- age: The person's age, ranging from 20 to 80.
- bmi: Body Mass Index, with values between 15 and 45.
- glucose_level: The blood glucose level, ranging from 60 to 200.
- physical_activity_level: The individual's activity level, categorised as 'low', 'moderate', or 'high'.
- family_history: A binary indicator (1 for yes, 0 for no) showing if diabetes runs in the family.
- smoker: A binary indicator (1 for yes, 0 for no) showing if the person is a smoker.
- at_risk_diabetes: The target binary label, where 1 indicates a high risk of diabetes and 0 indicates a low risk.
Distribution
The dataset is provided in a tabular CSV format (
diabetes_risk_dataset.csv
) and contains 12,000 rows. Each row represents a unique individual.Usage
Ideal applications for this dataset include:
- Training classification models to predict diabetes risk.
- Developing and testing explainable AI tools for healthcare applications.
- Practising data analysis and visualisation techniques on health-related data.
- Simulating health screening tools within a safe, synthetic data environment.
Coverage
This is a synthetic dataset, so it does not have real-world geographic or demographic scope. The features are simulated to represent a diverse range of individual profiles, with ages spanning from 20 to 80.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For training and evaluating predictive models.
- AI/ML Engineers: For building and validating explainable AI systems in the healthcare domain.
- Students and Researchers: For practising data analysis, visualisation, and exploring health data simulations.
- Healthcare Innovators: For prototyping and simulating digital health screening tools.
Dataset Name Suggestions
- Synthetic Diabetes Risk Prediction Profiles
- Lifestyle and Biometric Data for Diabetes Prediction
- Simulated Health Data for Diabetes Risk Analysis
Attributes
Original Data Source: Simulated Health Data for Diabetes Risk Analysis