IBM Employee Attrition Factors
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This corpus facilitates advanced HR project analysis by supplying features necessary for constructing predictive models of employee departure. The central goal is to analyse individual and workplace attributes to determine which factors most influence the termination of employment, thereby allowing organisations to identify and mitigate major attrition risks.
Columns (13 total)
- Age: The age of the employee (ranging from 18 to 60).
- Attrition: A boolean status indicating whether the employee has left the company (True or False). Approximately 16% of records show true attrition.
- Department: The work department, primarily Research & Development (65%) and Sales (30%).
- DistanceFromHome: The numerical distance (in an unspecified unit, range 1 to 29) between the employee's home and workplace.
- Education: A categorical field using numerical levels: 1-Below College, 2-College, 3-Bachelor, 4-Master, 5-Doctor.
- EducationField: The specific field of education, predominantly Life Sciences (41%) and Medical (32%).
- EnvironmentSatisfaction: Satisfaction level with the work environment, rated 1 (Low) to 4 (Very High).
- JobSatisfaction: Satisfaction level with the current job, rated 1 (Low) to 4 (Very High).
- MaritalStatus: Marital condition, with Married (46%) being the most common status.
- MonthlyIncome: The employee's monthly earnings in US Dollars (ranging from $1,009 to $20,000, mean $6,500).
- NumCompaniesWorked: The count of companies worked at prior to current tenure at IBM (ranging from 0 to 9).
- WorkLifeBalance: Rating of work-life balance, rated 1 (Bad) to 4 (Best).
- YearsAtCompany: The number of years of service the employee has accumulated at IBM (ranging from 0 to 40, mean 7.01 years).
Distribution
The data is typically structured as a single file,
IBM.csv, with a size of approximately 94.18 kB. It contains 13 columns and consists of 1470 valid records, with no missing or mismatched values reported across the primary columns listed. The data type is predominantly tabular.Usage
This dataset is ideally suited for building and evaluating various classification models aimed at predicting employee attrition. Users can apply techniques to fine-tune model hyperparameters and compare performance across different classification algorithms to determine the most effective predictive strategy for employee turnover. It serves as a foundational resource for risk mitigation in human capital management.
Coverage
The scope focuses on employee characteristics and employment factors within the context of IBM. The organization itself operates in approximately 170 countries globally. Although specific geolocation details are absent, the dataset captures essential demographic, satisfaction, financial, and tenure information. The expected update frequency for this type of dataset is annual.
License
CC0: Public Domain
Who Can Use It
- Machine Learning Engineers and Data Scientists: For developing, training, and testing binary classification models.
- HR Analysts and Consultants: To identify significant drivers of turnover and inform corporate retention strategies.
- Academic Researchers: For studying organisational behaviour, job satisfaction metrics, and factors affecting employee stability.
Dataset Name Suggestions
- IBM Employee Attrition Factors
- Corporate Turnover Prediction Set
- HR Employee Risk Analysis
- IBM Staff Retention Data
Attributes
Original Data Source:IBM Employee Attrition Factors
Loading...
