Employee Survival Prediction Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of human resources data is designed to facilitate survival analysis and prediction concerning employee attrition. The data is based on the original IBM attrition dataset but includes additional features specifically added for use in prediction models. It serves as a valuable resource for classification tasks related to employment stability and tenure.
Columns
The dataset comprises 32 distinct features related to employee demographics and job metrics:
- Age: The employee's age, ranging from 18 to 60.
- Attrition: A boolean field indicating whether the employee has left the organisation (16% True).
- BusinessTravel: Indicates the frequency of business travel, with 'Travel_Rarely' being the most common category (71%).
- Department: The department of employment, where Research & Development accounts for 65% of records.
- DistanceFromHome: The daily distance travelled by the employee to reach work, ranging from 1 to 29 miles.
- Gender: Gender of the employee, with 60% being Male.
- JobInvolvement: The rating (1 to 4) reflecting employee involvement in their job.
- JobLevel: The level (1 to 5) at which the employee is working.
- JobRole: The specific roles and responsibilities, with 'Sales Executive' being the most frequent.
- JobSatisfaction: The employee's satisfaction rating (1 to 4) with their job.
- MaritalStatus: Marital status of the employee, with 'Married' being the most reported status (46%).
- MonthlyIncome: The employees' monthly income, spanning from £1,009 to £20,000.
- NumCompaniesWorked: The total number of companies the employee has worked for, from 0 to 9.
- OverTime: A boolean indicating if the employee works Overtime (28% True).
- PercentSalaryHike: The percentage salary increase since their appointment, ranging from 11% to 25%.
- PerformanceRating: The performance rating of the employee, generally 3 or 4.
- StockOptionLevel: The level (0 to 3) of opted-for stock sharing.
- TotalWorkingYears: The total years worked by the employees, up to 40 years.
- TrainingTimesLastYear: Number of trainings undertaken by the employee, 0 to 6.
- YearsAtCompany: Years spent at the current organisation, up to 40 years.
- YearsSinceLastPromotion: Time elapsed in years since the last promotion, up to 15 years.
- YearsWithCurrManager: Years working under the current manager, up to 17 years.
- Higher_Education: The employee's higher education level.
- Date_of_Hire: Date the employee was hired in the current organisation.
- Date_of_termination: Date of termination (Note: 100% missing data in sample).
- Status_of_leaving: The stated reason for leaving the organisation.
- Mode_of_work: Indicates if the work mode is WFH (52%) or OFFICE (48%).
- Leaves: Total permitted leaves taken by the employee (0 to 5 days).
- Absenteeism: Total days the employee was absent (0 to 3 days).
- Work_accident: A boolean indicating if a work accident occurred (50% True).
- Source_of_Hire: The recruitment source (Recruiter 27%, Job Event 25%).
- Job_mode: Working status (FullTime 35%, Contract 33%).
Distribution
The data is provided in a file named Attrition.csv, which is 259.29 kB in size. The sample includes 1470 records. Most measured attributes contain complete information with 100% valid entries and zero missing or mismatched values across the 1470 records.
Usage
This data product is ideally suited for HR analytics and predictive modelling initiatives. Primary applications include:
- Performing survival analysis to estimate employee tenure.
- Building classification models to predict employee attrition risk.
- Intermediate-level studies in employment trends and workplace behaviour.
- Examining the influence of various factors (e.g., job satisfaction, income, travel) on retention rates.
Coverage
The dataset is based on the IBM attrition framework and includes contextual information referencing India. It provides a snapshot of various employment metrics across different demographic groups and job roles, although a specific time range is not defined.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For training machine learning models predicting attrition.
- HR Analysts: To identify key drivers of employee turnover and develop retention strategies.
- Academic Researchers: For studying employment patterns and workforce dynamics.
Dataset Name Suggestions
- Employee Survival Prediction Data
- IBM HR Attrition Analysis Set
- Workforce Retention Predictor
- Global Employee Exit Data
Attributes
Original Data Source: Employee Survival Prediction Data
Loading...
