Opendatabay APP

Employee Compensation Factors Data

Data Science and Analytics

Tags and Keywords

Salary

Regression

Employee

Age

Gender

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Employee Compensation Factors Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

captures employee salaries for use in predictive modelling, specifically targeting a regression problem. The data includes employee features such as gender, age, and possession of a PhD degree. Although the feature set and sample size are limited, the dataset provides a challenging yet suitable foundation for building robust regression models and assessing their generalisability, making it ideal for introductory machine learning projects.

Columns

The data structure consists of four core attributes:
  • Salary: Represents the employee's salary in thousands of US dollars (k$). Values range from 0.25 to 190.
  • Gender: A binary classification attribute indicating gender (0 = Female, 1 = Male).
  • Age: The employee's age, recorded in years. The range spans from 20 to 77 years.
  • PhD: A binary flag indicating whether the employee is a PhD graduate (0 = No, 1 = Yes).

Distribution

The dataset is provided in a CSV file format, titled Salary.csv, with a size of approximately 1.16 kB. It contains exactly 100 instances (records) and 4 attributes (columns, including the class attribute). The data quality is high, with 100% validity across all attributes, showing no missing or mismatched values.

Usage

This data is perfectly suited for machine learning applications, particularly those focused on:
  • Regression Modelling: Building and training various regression models to predict employee salaries based on demographic and educational features.
  • Model Evaluation: Comparing model performance metrics, such as R2 scores and Root Mean Square Error (RMSE).
  • Data Cleanup Exercises: While the dataset provided is clean, the objective can include understanding data quality requirements.
  • Beginner ML Projects: Serving as an accessible starting point for users learning predictive analytics techniques.

Coverage

The dataset's scope focuses on demographic variables (Gender, Age, PhD status). The gender attribute exhibits a balanced representation, with 50 records for males and 50 for females. Age coverage ranges from 20 to 77 years. Updates to the dataset are expected to occur annually.

License

CC0: Public Domain

Who Can Use It

The primary audience includes:
  • Beginner Data Scientists: Those seeking straightforward regression problems with well-defined features.
  • Students and Academics: For educational purposes, especially in courses covering statistical modelling and evaluation.
  • Data Analysts: Individuals interested in exploring feature impact on salary prediction using a simple model architecture.

Dataset Name Suggestions

  • Employee Compensation Factors Data
  • Simplified Employee Regression Input
  • Salary Prediction Features (Age, Gender, PhD)
  • Small Sample Employee Data

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

13/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format