Synthetic Employee Risk Assessment Dataset
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Simulates detailed employee-level metrics tailored specifically for machine learning projects focused on predicting and classifying workplace burnout. The data is designed to facilitate binary classification model training, deep exploratory data analysis, and critical feature importance investigation. The 'Burnout' column serves as the primary binary target variable, identifying employees exhibiting signs of high stress, low satisfaction, and extended working hours.
Columns
- Name: A synthetic identifier for the employee, included purely for realism, not intended for machine learning use.
- Age: The current age of the employee, with a mean of 40.7.
- Gender: Categorical variable detailing if the employee is Male or Female.
- JobRole: Specifies the type of employment, such as Engineer, HR, or Manager.
- Experience: The total number of years of professional work experience, with a mean of 10.1 years.
- WorkHoursPerWeek: The calculated average number of hours worked weekly, ranging from 30 to 70 hours.
- RemoteRatio: The percentage of time the employee spends working remotely, scaled from 0 to 100.
- SatisfactionLevel: The employee's self-reported level of satisfaction, measured on a scale of 1.0 to 5.0.
- StressLevel: The employee's self-reported stress magnitude, quantified from 1 (low) to 10 (high).
- Burnout: The crucial target variable. A value of 1 indicates signs of burnout (derived from high stress, low satisfaction, and long hours), while 0 indicates no burnout.
Distribution
The data is delivered in a single CSV file format, titled
synthetic_employee_burnout.csv, and is approximately 87.51 kB in size. The structure includes 10 columns and contains 2000 valid records. All columns have 100% validity with zero missing or mismatched entries. The data is static, and the expected update frequency is never.Usage
- Training and evaluating binary classification models focused on predicting employee attrition or health risks.
- Conducting exploratory data analysis to understand correlations between workload, stress, and satisfaction.
- Identifying key features that drive burnout using feature importance techniques.
- Developing HR analytics dashboards and risk assessment tools.
Coverage
This is a synthetic dataset, meaning it does not represent real-world individuals, time periods, or geographies, although it is intended for global modelling application. It simulates employee characteristics across various age brackets and experience levels, covering different job roles.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For building, validating, and benchmarking binary classification models.
- HR Analysts: To simulate how various employee factors (work hours, remote work) contribute to burnout risk.
- Students/Researchers: For educational projects requiring clean, labelled data for predictive modelling in the domain of mental health and labour analytics.
Dataset Name Suggestions
- HR Workforce Burnout Predictor Data
- Synthetic Employee Risk Assessment Dataset
- Workplace Stress and Burnout Classification Data
- Global HR Burnout Simulation
Attributes
Original Data Source: Synthetic Employee Risk Assessment Dataset
Loading...
