Opendatabay APP

Simulated Health Data for Diabetes Risk Analysis

Patient Health Records & Digital Health

Tags and Keywords

Diabetes

Health

Synthetic

Prediction

Risk

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Simulated Health Data for Diabetes Risk Analysis Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Simulates individual medical and lifestyle profiles for predicting the risk of developing diabetes. This synthetic health dataset was created to train and test classification models, build explainable AI tools, and simulate health screening applications in a secure environment. It includes a target variable, 'at_risk_diabetes', which is calculated using a rule-based model that weighs factors like BMI, glucose levels, age, family history, and smoking, with added random noise to make prediction tasks more realistic.

Columns

  • age: The person's age, ranging from 20 to 80.
  • bmi: Body Mass Index, with values between 15 and 45.
  • glucose_level: The blood glucose level, ranging from 60 to 200.
  • physical_activity_level: The individual's activity level, categorised as 'low', 'moderate', or 'high'.
  • family_history: A binary indicator (1 for yes, 0 for no) showing if diabetes runs in the family.
  • smoker: A binary indicator (1 for yes, 0 for no) showing if the person is a smoker.
  • at_risk_diabetes: The target binary label, where 1 indicates a high risk of diabetes and 0 indicates a low risk.

Distribution

The dataset is provided in a tabular CSV format (diabetes_risk_dataset.csv) and contains 12,000 rows. Each row represents a unique individual.

Usage

Ideal applications for this dataset include:
  • Training classification models to predict diabetes risk.
  • Developing and testing explainable AI tools for healthcare applications.
  • Practising data analysis and visualisation techniques on health-related data.
  • Simulating health screening tools within a safe, synthetic data environment.

Coverage

This is a synthetic dataset, so it does not have real-world geographic or demographic scope. The features are simulated to represent a diverse range of individual profiles, with ages spanning from 20 to 80.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For training and evaluating predictive models.
  • AI/ML Engineers: For building and validating explainable AI systems in the healthcare domain.
  • Students and Researchers: For practising data analysis, visualisation, and exploring health data simulations.
  • Healthcare Innovators: For prototyping and simulating digital health screening tools.

Dataset Name Suggestions

  • Synthetic Diabetes Risk Prediction Profiles
  • Lifestyle and Biometric Data for Diabetes Prediction
  • Simulated Health Data for Diabetes Risk Analysis

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

16/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format