CrossFit Performance Analysis Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset offers a detailed collection of information about CrossFit athletes. CrossFit is a high-intensity fitness programme combining weightlifting, gymnastics, and cardio exercises. Athletes participating in this programme engage in various competitions to demonstrate their abilities. The dataset aims to provide insights into athlete characteristics and performance. It serves as a valuable resource for fitness researchers and practitioners interested in understanding the physiological and psychological elements that contribute to successful CrossFit outcomes. By analysing the data, trends and patterns linked to higher performance levels, such as age, gender, training volume, and exercise selection, can be identified. This information can then be utilised to develop evidence-based training regimes for CrossFit athletes, helping to optimise their physical capabilities and minimise the risk of injury.
Columns
- athlete_id: A unique identifier for each athlete.
- name: The name of the athlete.
- region: The geographical region the athlete is associated with.
- team: The team the athlete belongs to.
- affiliate: The CrossFit affiliate (gym) the athlete is associated with.
- gender: The declared gender of the athlete (Male, Female, or Other).
- age: The age of the athlete, a numerical value typically ranging from 13 to 125, with a majority between 27 and 37.
- height: The height of the athlete, recorded as a numerical value.
- weight: The weight of the athlete, recorded as a numerical value, with most athletes weighing between 145 and 192.
- fran: The time taken to complete the 'Fran' workout, a numerical performance metric.
- helen: The time taken to complete the 'Helen' workout, a numerical performance metric.
- grace: The time taken to complete the 'Grace' workout, a numerical performance metric.
- filthy50: The time taken to complete the 'Filthy 50' workout, a numerical performance metric.
- fgonebad: The time recorded for the 'Fight Gone Bad' workout, a numerical performance metric.
- run400: The time taken to complete a 400-metre run, a numerical performance metric.
- run5k: The time taken to complete a 5-kilometre run, a numerical performance metric.
- candj: The weight lifted in the Clean and Jerk exercise, a numerical performance metric.
- snatch: The weight lifted in the Snatch exercise, a numerical performance metric.
- deadlift: The weight lifted in the Deadlift exercise, a numerical performance metric.
- backsq: The weight lifted in the Back Squat exercise, a numerical performance metric.
- pullups: The maximum number of pull-ups completed in a single set, a numerical performance metric.
- eat: Information regarding the athlete's diet (e.g., "I eat quality foods but don't measure the amount|").
- train: Information about the athlete's training habits (e.g., "I workout mostly at a CrossFit Affiliate|").
- background: Details about the athlete's athletic history (e.g., "I played youth or high school level sports|").
- experience: Information on how the athlete began their CrossFit journey (e.g., "I began CrossFit with a coach (e.g. at an affiliate)|").
- schedule: The athlete's typical training schedule (e.g., "I usually only do 1 workout a day|").
- howlong: The duration of the athlete's training experience (e.g., "1-2 years|").
Distribution
The dataset is provided as a CSV file (
athletes.csv
) and is approximately 71.55 MB in size. It contains 27 columns and an estimated 423,000 records. There are notable percentages of missing values across many columns, for instance, height
has 62% missing data, fran
has 87% missing data, and filthy50
has 95% missing data, indicating varying levels of data availability for different metrics and athlete demographics.Usage
This dataset is ideal for:
- Predictive modelling of CrossFit athlete performance.
- Analysing athlete characteristics in relation to competition outcomes.
- Understanding the physiological and psychological factors driving performance in high-intensity fitness.
- Identifying performance trends based on athlete demographics, training methods, and exercise choices.
- Developing data-driven training programmes designed to enhance athlete abilities and minimise injury.
Coverage
The data has been collected from CrossFit competitions and events globally. It includes demographic information such as age, gender, and training experience. Geographically, data originates from various regions worldwide, with Europe being the most frequently represented. Demographic data includes a breakdown of genders (46% Male, 32% Female, 22% Other) and ages, showing a concentration of athletes between 27 and 37 years old, though ages range from 13 to 125. Specific time ranges for data collection are not detailed. Data availability varies significantly across different metrics, with many performance-related columns having a high percentage of missing values.
License
Attribution 4.0 International (CC BY 4.0).
Who Can Use It
- Fitness Researchers: To study the factors influencing CrossFit performance and athlete physiology.
- Sports Scientists: For in-depth analysis of training methodologies and their impact on athletic outcomes.
- CrossFit Coaches and Trainers: To inform the development of individualised training plans and performance optimisation strategies.
- Data Scientists and Analysts: For building predictive models and extracting valuable patterns from athletic performance data.
- CrossFit Community Members: To gain insights into athlete characteristics and training approaches within the sport.
Dataset Name Suggestions
- CrossFit Athlete Performance Metrics
- Global CrossFit Athlete Data
- CrossFit Performance Analysis Dataset
- Athlete Fitness & Competition Records
Attributes
Original Data Source: CrossFit Performance Analysis Dataset