Synthetic BRFSS Health Behaviour and Condition Dataset
Patient Health Records & Digital Health
Related Searches
Trusted By




"No reviews yet"
£19.99
About
The Synthetic BRFSS Health Behaviour and Condition Dataset is a large-scale, anonymized synthetic dataset designed for educational and research purposes to analyze behavioral risk factors and chronic health conditions. It simulates data typically collected by the Behavioral Risk Factor Surveillance System (BRFSS), enabling investigation of lifestyle, demographic, and health status indicators influencing cardiovascular and related diseases.
Dataset Features
- HeartDisease: Presence of heart disease (Yes/No).
- Smoking: Current smoking status (Yes/No).
- AlcoholDrinking: Regular alcohol consumption status (Yes/No).
- Stroke: History of stroke (Yes/No).
- PhysicalHealth: Number of days physical health was not good in the past month (integer).
- DiffWalking: Difficulty walking or climbing stairs (Yes/No).
- Sex: Biological sex of the respondent (Male/Female).
- AgeCategory: Age group category (e.g., 18-24, 40-44).
- Race: Self-reported race category.
- Diabetic: Diabetes status (Yes/No/Borderline/During pregnancy).
- PhysicalActivity: Engagement in physical activity (Yes/No).
- GenHealth: Self-rated general health (e.g., Excellent, Very good, Fair).
- SleepTime: Average hours of sleep per day (integer).
- Asthma: Presence of asthma (Yes/No).
- KidneyDisease: Presence of kidney disease (Yes/No).
- SkinCancer: Presence of skin cancer (Yes/No).
Distribution

Usage
This dataset can be used for the following applications:
- Public Health Research: Analyze associations between lifestyle factors and chronic disease outcomes.
- Predictive Modeling: Develop classifiers to predict heart disease or stroke risk based on behavioural and demographic indicators.
- Health Informatics: Study patterns in self-reported health status and risk factor prevalence across age and race groups.
- Educational Purposes: Offer students and researchers a realistic dataset to practice data cleaning, visualisation, and modelling in epidemiology.
Coverage
The data is fully anonymised and synthetically generated to protect privacy while maintaining realistic feature distributions, allowing safe use in academic and data science projects related to health risk assessment.
License
CC0 (Public Domain)
Who Can Use It
- Public Health Researchers and Epidemiologists: To explore risk factors and disease correlations.
- Data Scientists and Machine Learning Practitioners: To build and evaluate health risk prediction models.
- Healthcare Educators and Students: As a comprehensive dataset for training and education in health data analysis.