18-Month Employee Behaviour and Efficacy Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This synthetic dataset tracks factory workers’ daily performance, behaviours, and ultimate attrition over an 18-month period. It contains rich causal relationships ideal for testing advanced machine-learning approaches. The data simulates the interactions and outcomes within a factory workforce, comprising 508 workers actively employed at any moment, though a total of 687 unique individuals are tracked due to employee turnover. Observations reflect both regular daily measures, such as attendance ("Presence") and demonstrated productivity ("Efficacy"), and unique events like resignations, terminations, accidents ("Slip"), or innovative acts ("Idea"). A unique feature is the separation between an employee's "hidden" psychological attributes (e.g., health, commitment, sociality) and the external records made by their supervisors, who often estimate key metrics like Efficacy. Researchers can investigate factors such as how high performance might lead to being hired away by competitors, how mental lapses or physical accidents indicate potential future illness-related absences, or how age differences between a worker and their supervisor influence daily Efficacy levels. The data was prepared using Synaptans WorkforceSim version 0.3.15.
Columns
The dataset features 42 fields detailing individual events. These fields are broadly categorised into three elements: Subject-Related fields, Supervisor-Related fields, and Event-Related fields.
Subject Fields relate to the worker performing the behaviour, including demographics (
sub_age, sub_sex), organisational structure details (sub_shift, sub_role), and ‘hidden’ psychological stats (ending in _h), such as sub_commitment_h, sub_perceptiveness_h, and sub_goodness_h.Supervisor Fields relate to the person observing and recording the event, including their ID, name, age, and role, along with the calculated difference in age between the supervisor and the subject (
sup_sub_age_diff).Event Fields detail the occurrence itself, including time markers (
event_date, event_weekday_name), the actual behaviour performed (behav_comptype_h), and the corresponding record made by the supervisor in the HRM/ERP system (record_comptype). For Efficacy events, both the actual performance (actual_efficacy_h) and the supervisor's estimated score (recorded_efficacy) are present.Distribution
This time-series dataset contains 411,948 observations, with each row reflecting a single event related to a worker on a particular day. Workers can generate multiple events (rows) on the same day (e.g., "Presence", "Efficacy", and "Teamwork"). The data is distributed across 42 columns and is typically available as a CSV file, with the source file size noted as 184.37 MB.
Usage
Ideal applications for this dataset include developing and validating predictive models for human resources outcomes, specifically employee turnover (resignation and termination). The data is excellent for exploring causal inference models to determine the true impact of specific variables on productivity and engagement. It can be used to investigate relationships between internal psychological states (the hidden stats) and observed workplace behaviors (like "Idea," "Feat," "Slip," "Disruption," "Sacrifice," and "Sabotage"). Researchers may also classify workers based on their stability and variability in daily Efficacy performance.
Coverage
The dataset spans a period of 18 months, running from 2021-01-01 to 2022-06-30, covering 546 simulated days. The scope is confined to a single factory organizational structure, which includes four distinct roles: Laborers (480 positions), Team Leaders (24 positions), Shift Managers (3 positions), and 1 Production Director. The records cover both male and female subjects and supervisors, across three different work shifts.
License
CC BY-SA 4.0
Who Can Use It
Intended users include data scientists focused on time series analysis and behavioural modeling, researchers in organizational psychology and management science, and machine learning practitioners building simulation-based predictive tools.
Dataset Name Suggestions
- Synthetic Factory Workforce Attrition and Performance Log
- 18-Month Employee Behaviour and Efficacy Data
- Workforce Event Log for Causal Modelling
Attributes
Original Data Source: 18-Month Employee Behaviour and Efficacy Data
Loading...
