Health Risk Behaviour Surveillance Data
Health Information Systems & Technology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for Cardiovascular Diseases (CVDs) risk prediction using personal lifestyle factors [1]. It is derived from the 2021 Behavioral Risk Factor Surveillance System (BRFSS) Dataset from CDC [1]. The BRFSS is the United Kingdom's leading system of health-related telephone surveys, which gathers state-level data on U.S. residents concerning their health-related risk behaviours, chronic health conditions, and use of preventive services [1]. The provided dataset has been preprocessed and cleaned, focusing on 19 unique variables that are directly related to lifestyle factors that may contribute to an individual's risk of developing Cardiovascular Diseases [2].
Columns
The dataset contains 19 columns, all of which have 309,000 valid records with no mismatched or missing entries [3-21]. Here is a description of each:
- General_Health: Describes the respondent's general health status, with categories such as "Very Good" (36%) and "Good" (31%) [3].
- Checkup: Indicates how long it has been since the respondent's last routine doctor checkup, with 78% reporting "Within the past year" [3, 4].
- Exercise: A boolean variable indicating whether the respondent participated in any physical activities or exercises (e.g., running, gardening) during the past month, with 78% reporting "true" [4].
- Heart_Disease: A boolean variable indicating if respondents reported having coronary heart disease or myocardial infarction, with 8% reporting "true" [5].
- Skin_Cancer: A boolean variable indicating if respondents reported having skin cancer, with 10% reporting "true" [5].
- Other_Cancer: A boolean variable indicating if respondents reported having any other types of cancer, with 10% reporting "true" [5, 6].
- Depression: A boolean variable indicating if respondents reported having a depressive disorder (including major, dysthymia, or minor depression), with 20% reporting "true" [6].
- Diabetes: Indicates if respondents reported having diabetes and, if so, the type. 84% reported "No" [7].
- Arthritis: A boolean variable indicating if respondents reported having arthritis, with 33% reporting "true" [7].
- Sex: The respondent's gender, with 52% identifying as "Female" and 48% as "Male" [7].
- Age_Category: The age group of the respondent, with "65-69" being the most common at 11% [7, 8].
- Height_(cm): The respondent's height in centimetres. The mean height is 171 cm, with a standard deviation of 10.7 cm [8-10].
- Weight_(kg): The respondent's weight in kilograms. The mean weight is 83.6 kg, with a standard deviation of 21.3 kg [10-12].
- BMI: The respondent's Body Mass Index. The mean BMI is 28.6, with a standard deviation of 6.52 [12-14].
- Smoking_History: A boolean variable indicating if the respondent has a history of smoking, with 41% reporting "true" [14, 15].
- Alcohol_Consumption: A numerical variable representing alcohol consumption. The mean is 5.1, with a standard deviation of 8.2 [15, 16].
- Fruit_Consumption: A numerical variable representing fruit consumption. The mean is 29.8, with a standard deviation of 24.9 [16-18].
- Green_Vegetables_Consumption: A numerical variable representing green vegetables consumption. The mean is 15.1, with a standard deviation of 14.9 [18-20].
- FriedPotato_Consumption: A numerical variable representing fried potato consumption. The mean is 6.3, with a standard deviation of 8.58 [20, 21].
Distribution
The dataset is provided in CSV format (
CVD_cleaned.csv
) [3]. It has a size of 32.45 MB and contains 19 columns [3]. There are 309,000 records in total, with no missing or mismatched values across any of the variables [3-21].Usage
This dataset is ideal for:
- Predicting Cardiovascular Diseases risk using personal lifestyle factors [1].
- Exploratory Data Analysis to understand health trends and correlations [3].
- Developing and testing binary classification machine learning models in healthcare [3].
- Creating data visualisations related to health conditions and lifestyle [2].
- Supporting healthcare research and public health initiatives [3].
Coverage
The dataset covers U.S. residents, collecting state-level data [1]. The data specifically pertains to the 2021 BRFSS Dataset [1]. It encompasses a wide demographic scope, including information on various age categories and gender, detailing health-related risk behaviours, chronic health conditions, and preventive services usage [1, 7, 8]. The dataset is expected to be updated annually [2].
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: To build and refine models for cardiovascular disease risk prediction [1, 3].
- Healthcare Researchers: For studying lifestyle factors and their impact on health outcomes [3].
- Public Health Analysts: To gain insights into population health behaviours and conditions [1].
- Application Developers: To create health-related web applications or tools [1].
Dataset Name Suggestions
- BRFSS Cardiovascular Risk Factors
- Lifestyle Heart Health Predictor
- U.S. Health Behaviour Dataset
- CVD Lifestyle Data 2021
- Health Risk Behaviour Surveillance Data
Attributes
Original Data Source: Health Risk Behaviour Surveillance Data