Hospital Patient Length of Stay Prediction
Public Health & Epidemiology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on predicting the length of stay for patients in hospitals, a critical parameter for enhancing healthcare management efficiency. Prompted by insights from the recent Covid-19 pandemic, the data aims to help identify patients at high risk of extended stays at the point of admission. Early identification allows for the optimisation of treatment plans, minimisation of patient length of stay, and a reduction in the chance of staff or visitor infection. Furthermore, foreknowledge of patient stay duration can significantly aid in logistical planning, such as the allocation of rooms and beds. The dataset provides case-by-case patient information, with the target variable "Stay" representing the patient's length of stay, categorised into 11 distinct classes ranging from 0-10 days to over 100 days. The primary goal is to accurately predict these stay durations to support optimal resource allocation and improved hospital functioning.
Columns
- case_id: A unique identifier for each patient's admission case.
- Stay: The length of time a patient remains in the hospital. This is the target variable and is presented in 11 different classes.
Distribution
The dataset is primarily available in CSV format, contained within a
Train.zip
archive which includes one CSV file and an associated data dictionary. A sample_submission.csv
is also provided, with a size of 1.64 MB. The 'Stay' column, representing length of stay, is divided into 11 distinct classes. Statistical information indicates that there are approximately 137,000 valid records for both label counts and patient stay durations. The minimum value observed is around 318,000, with a maximum around 455,000, a mean of approximately 387,000, and a standard deviation of about 39,600 for the label count. The most common category for 'Stay' is 0-10 days.Usage
This dataset is ideal for:
- Developing predictive models to identify patients at high risk of prolonged hospital stays.
- Optimising patient treatment plans to reduce the overall length of stay.
- Improving hospital logistics, including bed and room allocation.
- Enhancing overall hospital management efficiency and resource utilisation.
- Modelling multi-class classification problems in a healthcare context.
Coverage
The sources do not provide explicit details regarding the geographic region, specific time range, or demographic scope of the patient data.
License
CC0: Public Domain
Who Can Use It
- Hospitals: To improve resource allocation and operational efficiency.
- Healthcare Management Organisations: Such as HealthMan, for professional and optimal management of hospital functions.
- Data Scientists and Analysts: For developing and deploying machine learning models to predict patient length of stay.
- Researchers: Studying healthcare analytics, patient flow, and operational efficiency within healthcare systems.
- Policy Makers: To inform strategies for improving public health and hospital resilience.
Dataset Name Suggestions
- Hospital Patient Length of Stay Prediction
- Healthcare Analytics II: Patient Stay Duration
- Patient LOS Prediction Dataset
- Hospital Resource Optimisation
- Medical Stay Duration Predictor
Attributes
Original Data Source: Hospital Patient Length of Stay Prediction