Anonymised Nutrition P-Hacking Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of data captures anonymised Food Frequency Questionnaire (FFQ) responses and supplementary lifestyle survey answers from participants. The resource was originally compiled to demonstrate how easily misleading or spurious statistical associations can be generated when handling multiple variables—a phenomenon often described as p-hacking. The file offers highly granular detail on the consumption frequency and quantity for hundreds of distinct food and beverage items, complemented by precise calculations of dietary intake, caloric expenditure, and various demographic factors. Analysts should approach this material with the understanding that it represents experimental data used primarily for statistical education and demonstration of potentially "evil (statistical) work."
Columns
The dataset features a large quantity of variables, spanning several hundred columns. These fall into several main domains:
- Health and Demographics: Includes identifiers, self-reported health conditions (such as cancer, diabetes, and heart disease), handedness, smoking status, and variables detailing political and religious affiliations.
- Food Consumption: This is the most detailed section, composed of pairs of columns specifying the frequency (FREQ) and quantity (QUAN) of intake for hundreds of items, ranging from basic commodities like eggs and rice to specific items like breakfast sandwiches, pastries, various vegetables, and alcoholic drinks.
- Nutritional Metrics: Provides calculated dietary and nutritional data, including total calories (DT_KCAL), protein (DT_PROT), fats (DT_TFAT, SFAT, MFAT, PFAT), vitamins (e.g., VITD, VITC), and minerals (e.g., CALC, IRON).
- Consumption Groups: Aggregated totals for specific consumption categories, such as grams consumed or total kilocalories derived from solid foods, sugary beverages, or alcohol.
- Lifestyle and Activity: Variables capturing physical activity levels (e.g., LOWMINS, MODMINS, VIGMINS) and unique survey responses concerning media consumption and pet ownership (cat, dog).
Distribution
The data is presented in a tabular structure, typically delivered as a CSV file. It contains 54 full, distinct responses from the participants who completed the food frequency questionnaire and supplementary survey. Specific metrics for total row count or size in bytes are not provided.
Usage
This data product is highly suitable for:
- Educational Demonstrations: Utilisation in statistics or data science curricula to illustrate the pitfalls of multiple comparisons and the potential for spurious correlations in large multivariate datasets.
- Research Methodology: Studying the effectiveness and inherent limitations of Food Frequency Questionnaire design.
- Advanced Data Analysis Training: Providing a rich, real-world scenario for practising data cleaning, feature engineering, and robust statistical modelling techniques.
Coverage
The sources indicate that the data was collected from 54 participants answering a nutrition survey.
- Demographic Scope: Contains self-reported categories for race/ethnicity (White, Black, Asian, Latino, NativeAmer, Hawaiian), political leaning, and religious affiliation (Jewish, atheist).
- Geographic Coverage: Specific geographic location is not detailed.
- Time Range: The period during which the survey responses were collected is not detailed.
License
Attribution 4.0 International (CC BY 4.
Who Can Use It
- Statisticians and Academics: For replicating and exploring the concept of statistical p-hacking and teaching multivariate analysis.
- Data Analysts: Individuals needing complex survey data to test and refine analysis pipelines and visualisation tools.
- Public Health Researchers: Professionals examining the relationship between self-reported diet, lifestyle factors, and simulated health outcomes.
Dataset Name Suggestions
- Anonymised Nutrition P-Hacking Data
- 538 Food Frequency and Survey Results
- Multivariate Health and Dietary Habits Sample
Attributes
Original Data Source: Anonymised Nutrition P-Hacking Data
Loading...
