Opendatabay APP

Obesity Risk Factors Dataset

NLP / Natural Language Processing

Tags and Keywords

Obesity

Health

Diet

Exercise

Cvd

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Obesity Risk Factors Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a detailed collection of health factors and lifestyle attributes for individuals, primarily aimed at estimating obesity levels and exploring cardiovascular disease (CVD) risks. It captures data from a survey platform, encompassing a diverse range of eating habits and physical conditions. The dataset is suitable for machine learning applications like classification, prediction, segmentation, and association analysis, enabling a deeper understanding of the determinants of obesity and related health conditions.

Columns

The dataset contains 17 attributes and 2111 records, including:
  • Gender: Categorical variable indicating the gender of the participant (e.g., Male, Female).
  • Age: Numerical variable representing the participant's age, ranging from 14 to 61 years.
  • Height: Numerical variable for the participant's height, from 1.45 to 1.98 metres.
  • Weight: Numerical variable for the participant's weight, from 39 to 173 kilograms.
  • family_history_with_overweight: Boolean indicating if a family member has suffered or suffers from overweight (true/false).
  • FAVC (Frequent consumption of high caloric food): Boolean indicating frequency of high caloric food consumption.
  • FCVC (Frequency of consumption of vegetables): Numerical variable (e.g., 1 to 3).
  • NCP (Number of main meals): Numerical variable (e.g., 1 to 4).
  • CAEC (Consumption of food between meals): Categorical variable (e.g., Sometimes, Frequently).
  • SMOKE (Smoker or not): Boolean indicating if the participant is a smoker (true/false).
  • CH2O (Consumption of water daily): Numerical variable (e.g., 1 to 3 litres).
  • SCC (Calories consumption monitoring): Boolean indicating if calorie consumption is monitored.
  • FAF (Physical activity frequency): Numerical variable (e.g., 0 to 3 days per week).
  • TUE (Time using technology devices): Numerical variable for screen time (e.g., 0 to 2 hours).
  • CALC (Consumption of alcohol): Categorical variable (e.g., Sometimes, no).
  • MTRANS (Transportation used): Categorical variable (e.g., Public_Transportation, Automobile).
  • NObeyesdad (Obesity level deducted): Categorical variable classifying obesity levels (e.g., Underweight, Normal, Overweight, Obesity I, Obesity II, Obesity III).

Distribution

This dataset is provided in CSV format and consists of 2111 records and 17 distinct attributes. The file size is 263.65 KB. It includes both numerical and continuous data types.

Usage

This dataset is ideal for:
  • Classifying individuals into different obesity levels or risk categories for cardiovascular diseases.
  • Predicting obesity status based on eating habits, physical condition, and demographic information.
  • Segmenting populations to identify groups with similar health behaviours and obesity risks.
  • Performing association analysis to uncover relationships between various lifestyle factors and obesity.
  • Exploring the underlying factors contributing to cardiovascular diseases.

Coverage

The data originates from individuals in Mexico, Peru, and Colombia. Participants' ages range from 14 to 61 years, with a near even split between genders (approximately 51% Male, 49% Female). The dataset covers diverse eating habits and physical conditions, providing insights into a broad demographic scope within these regions.

License

CC BY-SA 4.0

Who Can Use It

This dataset is particularly useful for:
  • Data Scientists and Machine Learning Engineers for developing and testing classification, regression, and clustering models related to health.
  • Researchers in public health, nutrition, and sports science to study obesity prevalence, risk factors, and intervention strategies.
  • Health Professionals and Policy Makers for understanding demographic patterns of obesity and informing public health campaigns.
  • Academics and Students for educational purposes in data analysis, statistics, and health informatics.

Dataset Name Suggestions

  • Obesity Risk Factors Dataset
  • Cardiovascular Health & Obesity Survey Data
  • Lifestyle and Obesity Levels Data
  • Mexico Peru Colombia Obesity Study
  • Health Habits Obesity Dataset

Attributes

Original Data Source: Obesity Risk Factors Dataset

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format