Historical TDF and TDFF Rider Statistics
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This historical cycling database captures the history of Le Tour de France and includes dedicated files for Le Tour de France Femmes avec Zwift. It contains detailed performance records for every cyclist and historical stage information. The database is highly suitable for statistical analysis of athletic performance, long-term trends, and race dynamics in professional road cycling.
Columns
The data is structured across multiple files, including TDF_Riders_History.csv and TDF_Stages_History.csv, and corresponding TDFF files for the women's tour. The rider history files, such as TDFF_Riders_History.csv, typically contain 15 fields.
- Rank: The final standing of the rider in the general classification (maximum value is 123 in the sample data).
- Rider: The name of the cyclist; the women’s data example contains 179 unique values.
- Rider No.: The unique identification number assigned to the rider (maximum value is 236 in the sample).
- Team: The professional cycling team affiliation (e.g., MOVISTAR TEAM WOMEN).
- Times: The recorded total time for the race.
- Gap: The time difference relative to the race leader.
- B: Bonus seconds awarded (this field is heavily missing, showing an 86% null rate in the sample).
- P: Penalty seconds applied (this field is highly missing, showing a 98% null rate in the sample).
- Year: The year the specific race took place (e.g., 2022, 2023).
- Distance (km): The total distance of the race in kilometres (mean distance of 990 km in the sample).
- Number of stages: The total number of stages in that race (consistently 8 in the TDFF data sample).
- TotalSeconds: The total race time expressed in seconds (mean value is approximately 97,200).
- GapSeconds: The time difference to the leader expressed in seconds (mean value is approximately 3,390 seconds).
- ResultType: The category of the recorded result, which is uniformly 'time'.
Distribution
The information is provided in CSV file format. Separate files are maintained for the men's and women's tours using TDF and TDFF prefixes, respectively, to assure backward compatibility. For example, the TDFF Rider History file includes 15 columns and 232 valid records across all primary fields. The dataset is rated with a usability score of 10.00.
Usage
This database is highly valuable for Data Analytics, Data Visualization, and Exploratory Data Analysis. Ideal applications include modelling and predicting race outcomes, tracking longitudinal performance of teams and individual riders, analysing changes in the structure of the tour over the decades, and serving as a robust source for data cleaning exercises using tools like pandas.
Coverage
Time Range: The data covers the history of Le Tour de France from 1903 up to 2023.
Subject Scope: Data includes details on every cyclist who participated in the Tour de France.
Availability Notes: Data for the women's event, Le Tour de France Femmes avec Zwift, is specifically included starting in 2023, following its availability on the official tour website.
License
CC0: Public Domain
Who Can Use It
Data scientists focusing on time-series and sports analytics, academic researchers studying athletic endurance and race dynamics, students requiring realistic datasets for data cleaning and visualisation tutorials, and cycling enthusiasts seeking detailed, historical performance metrics.
Dataset Name Suggestions
- Le Tour De France DataBase
- Historical TDF and TDFF Rider Statistics
- Century of Cycling Data
Attributes
Original Data Source: Historical TDF and TDFF Rider Statistics
Loading...
