Synthetic Recruitment Data for Fairness Metric Evaluation
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Identifying gender and age bias in hiring processes is a critical challenge in modern recruitment. These synthetic records offer a structured environment for educators and researchers to explore various fairness definitions, such as statistical parity, equalised odds, and treatment equality. By providing pre-calculated decisions from five different methods, the collection allows users to bypass model training and focus entirely on evaluating the ethics and social implications of algorithmic decision-making.
Columns
- mainid: A hidden identification number not intended for use in predictive models.
- mod15: A hidden identification number derived from the ID modulo 15, used to generate features.
- div15: A hidden identification number derived from the ID modulo 225 divided by 15.
- Age: The age of the candidate, recorded as a sensitive feature for fairness analysis.
- Speed: The reaction speed of the candidate, which affects suitability but is a hidden variable not available as a standard feature.
- Gender: The gender of the candidate, classified as a sensitive feature.
- Strength: The physical strength of the candidate, which affects suitability but is a hidden variable.
- Speedtest: A hidden metric representing the results of a speed-based assessment.
- Lifttest: A hidden metric representing the results of a lifting-based assessment.
- testresult: The observable result of a candidate's test, used as a primary feature for recruitment decisions.
Distribution
The information is delivered in a single CSV file titled
2025-fairness-recruitment-dataset.csv, with a file size of approximately 15.53 kB. The collection contains 225 records across 21 columns. Data integrity is high, with a 100% validity rate across the primary fields and no missing or mismatched entries reported. The resource holds a usability score of 10.00 and is updated on an annual basis.Usage
This resource is ideal for academic settings where students need to learn how to detect and mitigate bias in machine learning. It is well-suited for researchers comparing different fairness metrics, such as predictive parity and group fairness. Additionally, the data can be used to illustrate the concept of fairness through unawareness by testing how models perform when sensitive features like age or gender are removed.
Coverage
The geographic and demographic scope is based on a synthetic model designed for educational purposes, rather than real-world individuals. The candidate pool features an age range from 20 to 49 years and a gender distribution consisting of 67% male and 33% female participants. Because the data is synthetic, it provides a controlled environment for testing specific fairness scenarios without the noise of real-world recruitment variables.
License
CC BY-SA 4.0
Who Can Use It
AI ethics researchers can leverage these records to benchmark new fairness-aware algorithms against established decision methods. Educators in data science may utilise the dataset to provide hands-on examples of how bias manifests in automated systems. Furthermore, students can use the hidden features to understand the underlying drivers of suitability that are often omitted from official recruitment models.
Dataset Name Suggestions
- Utrecht Fairness Recruitment: Bias Detection and Ethics Archive
- Synthetic Recruitment Data for Fairness Metric Evaluation
- Gender and Age Bias in Hiring: An Educational Dataset
- Algorithmic Fairness and Recruitment Decision Registry
- Utrecht Recruitment Bias and Fairness Definitions Collection
Attributes
Original Data Source: Synthetic Recruitment Data for Fairness Metric Evaluation
Loading...
Free
Download Dataset in ZIP Format
Recommended Datasets
Loading recommendations...
