Opendatabay APP

Diabetes Hypertension Stroke Predictor

Clinical Trials & Research

Tags and Keywords

Health

Disease

Prediction

Survey

Risk

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Diabetes Hypertension Stroke Predictor Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a collection of 70,692 cleaned, augmented, and class-balanced survey responses from the BRFSS 2015 survey, specifically tailored for the prediction of diabetes, hypertension, and stroke. It offers crucial health indicators and lifestyle factors to facilitate predictive modelling and public health analysis.

Columns

  • Age: A 13-level age category, ranging from 1 (18-24 years old) to 13 (80 years or older).
  • Sex: The patient's gender, represented as 0 for female and 1 for male.
  • HighChol: Indicates whether the individual has high cholesterol, with 0 for no and 1 for yes.
  • CholCheck: Denotes if a cholesterol check was performed within the last 5 years, with 0 for no and 1 for yes.
  • BMI: Body Mass Index, a measure of body fat based on height and weight.
  • Smoker: Records if the individual has smoked at least 100 cigarettes in their entire life (equivalent to 5 packs), with 0 for no and 1 for yes.
  • HeartDiseaseorAttack: Indicates a history of coronary heart disease (CHD) or myocardial infarction (MI), with 0 for no and 1 for yes.
  • PhysActivity: Shows whether the individual engaged in physical activity (excluding job-related activities) in the past 30 days, with 0 for no and 1 for yes.
  • Fruits: Records if the individual consumes fruit one or more times per day, with 0 for no and 1 for yes.
  • Veggies: Records if the individual consumes vegetables one or more times per day, with 0 for no and 1 for yes.
  • HvyAlcoholConsump: Indicates heavy alcohol consumption (defined as >=14 drinks per week for adult men and >=7 drinks per week for adult women), with 0 for no and 1 for yes.
  • GenHlth: An individual's self-reported general health status on a scale of 1 to 5, where 1 is excellent and 5 is poor.
  • MentHlth: The number of days in the past 30 days that an individual experienced poor mental health, on a scale of 1 to 30 days.
  • PhysHlth: The number of days in the past 30 days that an individual experienced physical illness or injury, on a scale of 1 to 30 days.
  • DiffWalk: Indicates if the individual has serious difficulty walking or climbing stairs, with 0 for no and 1 for yes.
  • Stroke: Records if the individual has ever had a stroke, with 0 for no and 1 for yes.
  • HighBP: Indicates if the individual has high blood pressure, with 0 for no and 1 for yes.
  • Diabetes: The primary target variable, indicating whether the individual has diabetes, with 0 for no and 1 for yes.

Distribution

The dataset is provided in a CSV file format, specifically diabetes_data.csv, and has a size of 5.29 MB. It comprises 70,692 individual survey responses across 18 distinct columns. The data has undergone cleaning, augmentation, and class balancing to ensure readiness for analysis and model training.

Usage

This dataset is ideal for developing and evaluating machine learning models aimed at predicting the likelihood of diabetes, hypertension, and stroke based on demographic, lifestyle, and health indicators. It is also highly suitable for epidemiological studies, public health research, and risk factor identification related to these chronic conditions.

Coverage

The data originates from the BRFSS 2015 survey, providing a snapshot of health behaviours and conditions from that year. It includes demographic details such as age (covering adult populations from 18 to over 80 years old) and sex (male/female). The dataset captures various health attributes, including cholesterol levels, BMI, physical activity, dietary habits, alcohol consumption, general health perception, mental health, physical health, and mobility.

License

CC0: Public Domain

Who Can Use It

This dataset is particularly valuable for:
  • Data Scientists and Machine Learning Engineers: For training and testing predictive models for chronic disease risk.
  • Public Health Researchers and Epidemiologists: For analysing population health trends, identifying key risk factors, and informing public health interventions.
  • Healthcare Analysts: For understanding patient demographics and lifestyle factors contributing to major health conditions.

Dataset Name Suggestions

  • BRFSS 2015 Health Prediction Dataset
  • Chronic Disease Risk Factors Data
  • Diabetes Hypertension Stroke Predictor
  • US Public Health Survey 2015
  • Healthcare Lifestyle Dataset

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format