HR Employee Retention Study
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to help organisations understand and predict employee attrition, enabling them to identify factors contributing to staff turnover. It allows for detailed analysis of employee demographics, job satisfaction, performance, and work-life balance to uncover patterns and relationships that influence attrition. The dataset is a fictional collection created by IBM data scientists, suitable for exploring complex questions about employee retention, such as the impact of distance from home on job role and attrition, or the correlation between monthly income, education, and attrition.
Columns
The dataset contains 35 columns, providing a rich set of attributes for each employee record:
- Age: Numerical, representing the employee's age.
- Attrition: Categorical (Boolean). Indicates if an employee has left the company, with values 'true' (237 instances) or 'false' (1,233 instances).
- BusinessTravel: Categorical. Describes the frequency of business travel, including 'Travel_Rarely' (71%), 'Travel_Frequently' (19%), and other categories.
- DailyRate: Numerical, representing the daily rate of pay.
- Department: Categorical. Specifies the employee's department, such as 'Research & Development' (65%) and 'Sales' (30%).
- DistanceFromHome: Numerical, indicating the distance of the employee's home from work.
- Education: Categorical. Represents the level of education, with values: 1 'Below College', 2 'College', 3 'Bachelor', 4 'Master', 5 'Doctor'.
- EducationField: Categorical. The field of study, including 'Life Sciences' (41%) and 'Medical' (32%).
- EmployeeCount: Numerical (constant value of 1 for all records).
- EmployeeNumber: Numerical, a unique identifier for each employee.
- EnvironmentSatisfaction: Categorical. Measures satisfaction with the work environment, with values: 1 'Low', 2 'Medium', 3 'High', 4 'Very High'.
- Gender: Categorical. 'Male' (60%) or 'Female' (40%).
- HourlyRate: Numerical, the employee's hourly pay rate.
- JobInvolvement: Categorical. Describes job involvement, with values: 1 'Low', 2 'Medium', 3 'High', 4 'Very High'.
- JobLevel: Categorical. The job level within the organisation.
- JobRole: Categorical. The employee's specific job role, e.g., 'Sales Executive' (22%) or 'Research Scientist' (20%).
- JobSatisfaction: Categorical. Measures job satisfaction, with values: 1 'Low', 2 'Medium', 3 'High', 4 'Very High'.
- MaritalStatus: Categorical. Marital status, including 'Married' (46%) and 'Single' (32%).
- MonthlyIncome: Numerical, the employee's monthly income.
- MonthlyRate: Numerical, the monthly rate of pay.
- NumCompaniesWorked: Numerical, the number of companies the employee has worked for previously.
- Over18: Categorical (constant 'true' for all records).
- OverTime: Categorical (Boolean). Indicates if the employee works overtime, with 'true' (28%) or 'false' (72%).
- PercentSalaryHike: Numerical, the percentage increase in salary.
- PerformanceRating: Categorical. Employee performance rating, with values: 1 'Low', 2 'Good', 3 'Excellent', 4 'Outstanding'.
- RelationshipSatisfaction: Categorical. Measures relationship satisfaction at work, with values: 1 'Low', 2 'Medium', 3 'High', 4 'Very High'.
- StandardHours: Numerical (constant value of 80 for all records).
- StockOptionLevel: Categorical, the stock option level granted to the employee.
- TotalWorkingYears: Numerical, the total number of years the employee has worked.
- TrainingTimesLastYear: Numerical, the number of training sessions attended in the last year.
- WorkLifeBalance: Categorical. Measures work-life balance, with values: 1 'Bad', 2 'Good', 3 'Better', 4 'Best'.
- YearsAtCompany: Numerical, the number of years the employee has been with the current company.
- YearsInCurrentRole: Numerical, the number of years in the current job role.
- YearsSinceLastPromotion: Numerical, the number of years since the last promotion.
- YearsWithCurrManager: Numerical, the number of years with the current manager.
Distribution
The dataset is provided as a CSV file, named "HR Employee Attrition.csv". It has a file size of 227.97 kB and contains 1,470 records across all 35 columns, with no missing values.
Usage
This dataset is ideal for:
- Predictive modelling: Building models to forecast employee attrition.
- Data analysis and visualisation: Uncovering underlying factors and trends related to employee turnover.
- Human Resources analytics: Gaining insights into employee behaviour, satisfaction, and retention strategies.
- Hypothesis testing: Investigating specific relationships, such as the effect of distance from home on attrition, or income and education on attrition rates.
- Machine learning applications: Training classification models to identify employees at risk of leaving.
Coverage
This is a fictional dataset created by IBM data scientists, therefore it does not represent specific real-world geographic locations, time ranges, or demographic groups. Its purpose is to simulate real-world employee data for analytical and predictive exercises.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning engineers for building predictive models.
- HR analysts and business intelligence professionals seeking to understand and improve employee retention.
- Researchers and students in fields such as human resources, organisational psychology, and data science for academic studies and projects.
- Management consultants interested in workforce analytics and talent management strategies.
Dataset Name Suggestions
- Employee Attrition Prediction Data
- IBM HR Analytics Attrition Dataset
- Workforce Turnover Factors Data
- HR Employee Retention Study
- Organisational Attrition Data
Attributes
Original Data Source: HR Employee Retention Study