Employee Compensation Factors Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
captures employee salaries for use in predictive modelling, specifically targeting a regression problem. The data includes employee features such as gender, age, and possession of a PhD degree. Although the feature set and sample size are limited, the dataset provides a challenging yet suitable foundation for building robust regression models and assessing their generalisability, making it ideal for introductory machine learning projects.
Columns
The data structure consists of four core attributes:
- Salary: Represents the employee's salary in thousands of US dollars (k$). Values range from 0.25 to 190.
- Gender: A binary classification attribute indicating gender (0 = Female, 1 = Male).
- Age: The employee's age, recorded in years. The range spans from 20 to 77 years.
- PhD: A binary flag indicating whether the employee is a PhD graduate (0 = No, 1 = Yes).
Distribution
The dataset is provided in a CSV file format, titled Salary.csv, with a size of approximately 1.16 kB. It contains exactly 100 instances (records) and 4 attributes (columns, including the class attribute). The data quality is high, with 100% validity across all attributes, showing no missing or mismatched values.
Usage
This data is perfectly suited for machine learning applications, particularly those focused on:
- Regression Modelling: Building and training various regression models to predict employee salaries based on demographic and educational features.
- Model Evaluation: Comparing model performance metrics, such as R2 scores and Root Mean Square Error (RMSE).
- Data Cleanup Exercises: While the dataset provided is clean, the objective can include understanding data quality requirements.
- Beginner ML Projects: Serving as an accessible starting point for users learning predictive analytics techniques.
Coverage
The dataset's scope focuses on demographic variables (Gender, Age, PhD status). The gender attribute exhibits a balanced representation, with 50 records for males and 50 for females. Age coverage ranges from 20 to 77 years. Updates to the dataset are expected to occur annually.
License
CC0: Public Domain
Who Can Use It
The primary audience includes:
- Beginner Data Scientists: Those seeking straightforward regression problems with well-defined features.
- Students and Academics: For educational purposes, especially in courses covering statistical modelling and evaluation.
- Data Analysts: Individuals interested in exploring feature impact on salary prediction using a simple model architecture.
Dataset Name Suggestions
- Employee Compensation Factors Data
- Simplified Employee Regression Input
- Salary Prediction Features (Age, Gender, PhD)
- Small Sample Employee Data
Attributes
Original Data Source: Employee Compensation Factors Data
Loading...
