US Adult Diabetes Risk Factors Survey
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides cleaned health indicator survey responses from the 2021 Behavioral Risk Factor Surveillance System (BRFSS), specifically tailored for analysing and predicting diabetes. Diabetes is a prevalent chronic condition impacting how the body converts food into energy, with types including type 1, type 2, and gestational diabetes. The BRFSS, conducted annually by the Centers for Disease Control and Prevention (CDC), is a state-based telephone survey collecting data on health-related risk behaviours, chronic conditions, and preventive service usage among adults in the United States. This refined dataset offers valuable insights into the health status and behaviours of the U.S. adult population related to diabetes.
Columns
The dataset features 21 health indicator variables and one target variable,
Diabetes_012
, which classifies respondents into three categories: no diabetes, prediabetes, or diabetes. The columns included are:- Diabetes_012: Target variable indicating diabetes status (0 = no diabetes or only during pregnancy, 1 = prediabetes, 2 = diabetes).
- HighBP: Indicates high blood pressure (0 = no high BP, 1 = high BP).
- HighChol: Indicates high cholesterol (0 = no high cholesterol, 1 = high cholesterol).
- CholCheck: Denotes if cholesterol was checked in the past five years (0 = no check, 1 = yes check).
- BMI: Body Mass Index, a measure of body fat based on height and weight.
- Smoker: Identifies individuals who have smoked at least 100 cigarettes in their lifetime (0 = no, 1 = yes).
- Stroke: Indicates if the individual has ever been told they had a stroke (0 = no, 1 = yes).
- HeartDiseaseorAttack: Indicates a history of Coronary Heart Disease (CHD) or Myocardial Infarction (MI) (0 = no, 1 = yes).
- PhysActivity: Records physical activity in the past 30 days, excluding job-related activity (0 = no, 1 = yes).
- Fruits: Denotes if an individual consumes fruit one or more times per day (0 = no, 1 = yes).
- Veggies: Denotes if an individual consumes vegetables one or more times per day (0 = no, 1 = yes).
- HvyAlcoholConsump: Identifies heavy drinkers (adult men with more than 14 drinks/week, adult women with more than 7 drinks/week) (0 = no, 1 = yes).
- AnyHealthcare: Indicates if the individual has any kind of health care coverage (0 = no, 1 = yes).
- NoDocbcCost: Asks if there was a time in the past 12 months when a doctor was needed but not seen due to cost (0 = no, 1 = yes).
- GenHlth: Self-reported general health status on a scale of 1-5 (1 = excellent, 5 = poor).
- MentHlth: Number of days in the past 30 days when mental health was not good (scale 0-30).
- PhysHlth: Number of days in the past 30 days when physical health was not good (scale 0-30).
- DiffWalk: Indicates serious difficulty walking or climbing stairs (0 = no, 1 = yes).
- Sex: Gender of the respondent (0 = female, 1 = male).
- Age: 13-level age category (1 = 18-24, 13 = 80 or older).
- Education: Education level on a scale of 1-6 (1 = Never attended school or kindergarten, 6 = Graduate).
- Income: Income scale on a scale of 1-11 (1 = less than $10,000, 11 = $200,000 or more).
Distribution
The dataset is provided in CSV format and includes three files, each with 21 feature variables:
- diabetes_012_health_indicators_BRFSS2021.csv: Contains 236,378 survey responses. The
Diabetes_012
target variable has three classes (no diabetes/pregnancy, prediabetes, diabetes) and exhibits class imbalance. - diabetes_binary_5050split_health_indicators_BRFSS2021.csv: Contains 67,136 survey responses. The
Diabetes_binary
target variable has two classes (no diabetes, prediabetes or diabetes) and is balanced with a 50-50 split. - diabetes_binary_health_indicators_BRFSS2021.csv: Contains 236,378 survey responses. The
Diabetes_binary
target variable has two classes (no diabetes, prediabetes or diabetes) and is not balanced.
Usage
This dataset is ideal for:
- Machine Learning Model Development: Building predictive models to identify individuals at risk of diabetes or to classify diabetes status based on health indicators.
- Public Health Research: Analysing the prevalence and contributing factors of diabetes in the U.S. adult population.
- Statistical Analysis: Investigating correlations between various health behaviours and indicators and diabetes outcomes.
- Data Visualisation: Creating visual representations of health trends and risk factors.
Coverage
The dataset covers adults aged 18 years and older residing in the United States. The data was collected as part of the BRFSS 2021 survey.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Data Scientists and Machine Learning Engineers for building and testing predictive models.
- Public Health Researchers and Epidemiologists for studying chronic disease prevalence and risk factors.
- Healthcare Analysts for understanding population health trends.
- Students and Academics for educational projects and research studies on health data.
Dataset Name Suggestions
- Diabetes Health Indicators BRFSS 2021
- US Adult Diabetes Risk Factors Survey
- BRFSS 2021 Health & Diabetes Data
- Chronic Disease Health Indicators (US)
Attributes
Original Data Source: US Adult Diabetes Risk Factors Survey