Dark Mode

Home

Data Categories

Medical & Healthcare Data

Hospital Patient Length of Stay Prediction

FREE DATASET LIBRARY

Verified Data Provider

£0

Hospital Patient Length of Stay Prediction

Public Health & Epidemiology

Tags and Keywords

Health

Hospital

Patient

Stay

Prediction

Trusted By

Hospital Patient Length of Stay Prediction Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset focuses on predicting the length of stay for patients in hospitals, a critical parameter for enhancing healthcare management efficiency. Prompted by insights from the recent Covid-19 pandemic, the data aims to help identify patients at high risk of extended stays at the point of admission. Early identification allows for the optimisation of treatment plans, minimisation of patient length of stay, and a reduction in the chance of staff or visitor infection. Furthermore, foreknowledge of patient stay duration can significantly aid in logistical planning, such as the allocation of rooms and beds. The dataset provides case-by-case patient information, with the target variable "Stay" representing the patient's length of stay, categorised into 11 distinct classes ranging from 0-10 days to over 100 days. The primary goal is to accurately predict these stay durations to support optimal resource allocation and improved hospital functioning.

Columns

case_id: A unique identifier for each patient's admission case.
Stay: The length of time a patient remains in the hospital. This is the target variable and is presented in 11 different classes.

Distribution

The dataset is primarily available in CSV format, contained within a Train.zip archive which includes one CSV file and an associated data dictionary. A sample_submission.csv is also provided, with a size of 1.64 MB. The 'Stay' column, representing length of stay, is divided into 11 distinct classes. Statistical information indicates that there are approximately 137,000 valid records for both label counts and patient stay durations. The minimum value observed is around 318,000, with a maximum around 455,000, a mean of approximately 387,000, and a standard deviation of about 39,600 for the label count. The most common category for 'Stay' is 0-10 days.

Usage

This dataset is ideal for:

Developing predictive models to identify patients at high risk of prolonged hospital stays.
Optimising patient treatment plans to reduce the overall length of stay.
Improving hospital logistics, including bed and room allocation.
Enhancing overall hospital management efficiency and resource utilisation.
Modelling multi-class classification problems in a healthcare context.

Coverage

The sources do not provide explicit details regarding the geographic region, specific time range, or demographic scope of the patient data.

License

CC0: Public Domain

Who Can Use It

Hospitals: To improve resource allocation and operational efficiency.
Healthcare Management Organisations: Such as HealthMan, for professional and optimal management of hospital functions.
Data Scientists and Analysts: For developing and deploying machine learning models to predict patient length of stay.
Researchers: Studying healthcare analytics, patient flow, and operational efficiency within healthcare systems.
Policy Makers: To inform strategies for improving public health and hospital resilience.

Dataset Name Suggestions

Hospital Patient Length of Stay Prediction
Healthcare Analytics II: Patient Stay Duration
Patient LOS Prediction Dataset
Hospital Resource Optimisation
Medical Stay Duration Predictor

Attributes

Original Data Source: Hospital Patient Length of Stay Prediction

Listing Stats

VIEWS

DOWNLOADS

LISTED

22/08/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Hospital Patient Length of Stay Prediction

Public Health & Epidemiology

Tags and Keywords

Health

Hospital

Patient

Stay

Prediction

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS