Synthetic Organisational Workforce Data
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection features dummy data detailing 5,000 employees within a single organisation. The dataset is engineered for data analytics and data science practice. It contains derived financial figures, such as gross salary, calculated using specific formulae for House Rent Allowance (HRA), Dearness Allowance (DA), and Provident Fund (PF), alongside critical organisational metrics like position, department, and years of experience. Personal identifiers, including names and addresses, are generated using random string formats. A strict development criteria ensured that the data maintains parity across demographics, avoiding gender bias or other discriminatory characteristics.
Columns
- Serial Number: Defines the unique sequence number of the employee.
- Name: Defines the name of the employee (all 5,000 values are unique).
- Address: Defines the employee’s address (all 5,000 values are unique).
- Salary: Defines the employee’s calculated salary (mean approximately 93.9k).
- DOJ: Defines the employee's Date of Joining (ranging from 1985 to 2024).
- DOB: Defines the employee's Date of Birth (ranging from 1964 to 2002).
- Age: Defines the employee’s age (ranging from 21 to 60 years old, mean 40.4).
- Sex: Defines the gender of the employee (three unique categories, equally distributed).
- Dependents: Defines the number of dependents the employee has (approximately 2% missing values observed).
- HRA: Defines the House Rent Allowance calculation (mean approximately 7.99k).
- DA: Defines the Dearness Allowance calculation (mean approximately 22.2k).
- PF: Defines the Employee Provident Fund deduction (mean approximately 13.9k).
- Gross Salary: Defines the total gross salary the employee receives (mean approximately 110k, maximum 173k).
- Insurance: Defines the type of insurance held, including "None" if uninsured (four unique types, "Both" being the most common).
- Marital Status: Defines the marital status of the employee (59% are listed as Married).
- In Company Years: Years of experience in the current organisation (mean approximately 9.79 years).
- Year of Experience: Total years of professional experience (mean approximately 19.4 years).
- Department: Defines the employee’s working department (5 unique values, including Human Resources and IT).
- Position: Defines the employee’s designation (40 unique unique values).
Distribution
The data structure contains 5,000 records across 19 columns. The sample file,
Dummy_5000_Employee_Details_Dataset.csv, is approximately 1.05 MB in size. It is typically available in CSV format. All salary and employment-related information is derived using formulae, ensuring the data is structured yet entirely synthetic.Usage
This data is perfectly suited for general Data Analytics and Data Science projects. It is an excellent resource for simulating Human Resources management systems and internal payroll structures. Users can employ the dataset for training machine learning models focused on salary prediction or employee attrition (Jobs and Career tags). It is also suitable for educational purposes, such as practicing SQL queries or Python data manipulation techniques.
Coverage
Time Range: The employment history spans from 1985 to the end of 2024. Employee Dates of Birth range from January 1964 to December 2002.
Demographic Scope: Employee ages range strictly from 21 to 60. The dataset is balanced across three gender categories (Sex). It includes details on marital status and the number of dependents.
Geographic Scope: Addresses are synthetic strings generated using a random format.
License
CC0: Public Domain
Who Can Use It
- Data Analysts: For performing descriptive statistical analysis on employee retention, salary distribution, and demographic representation.
- Students and Educators: For practical exercises in data cleaning, database design, and exploratory data analysis using Python or other tools.
- Data Scientists: For developing predictive models concerning employee tenure or factors influencing compensation (Income analysis).
Dataset Name Suggestions
- Dummy Employee HR and Payroll Details
- Synthetic Organisational Workforce Data
- Corporate Staffing Simulation Dataset
- 5000 Employee Details for Data Practice
Attributes
Original Data Source:Synthetic Organisational Workforce Data
Loading...
