Opendatabay APP

Hospital Patient Length of Stay Prediction

Public Health & Epidemiology

Tags and Keywords

Health

Hospital

Patient

Stay

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Hospital Patient Length of Stay Prediction Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset focuses on predicting the length of stay for patients in hospitals, a critical parameter for enhancing healthcare management efficiency. Prompted by insights from the recent Covid-19 pandemic, the data aims to help identify patients at high risk of extended stays at the point of admission. Early identification allows for the optimisation of treatment plans, minimisation of patient length of stay, and a reduction in the chance of staff or visitor infection. Furthermore, foreknowledge of patient stay duration can significantly aid in logistical planning, such as the allocation of rooms and beds. The dataset provides case-by-case patient information, with the target variable "Stay" representing the patient's length of stay, categorised into 11 distinct classes ranging from 0-10 days to over 100 days. The primary goal is to accurately predict these stay durations to support optimal resource allocation and improved hospital functioning.

Columns

  • case_id: A unique identifier for each patient's admission case.
  • Stay: The length of time a patient remains in the hospital. This is the target variable and is presented in 11 different classes.

Distribution

The dataset is primarily available in CSV format, contained within a Train.zip archive which includes one CSV file and an associated data dictionary. A sample_submission.csv is also provided, with a size of 1.64 MB. The 'Stay' column, representing length of stay, is divided into 11 distinct classes. Statistical information indicates that there are approximately 137,000 valid records for both label counts and patient stay durations. The minimum value observed is around 318,000, with a maximum around 455,000, a mean of approximately 387,000, and a standard deviation of about 39,600 for the label count. The most common category for 'Stay' is 0-10 days.

Usage

This dataset is ideal for:
  • Developing predictive models to identify patients at high risk of prolonged hospital stays.
  • Optimising patient treatment plans to reduce the overall length of stay.
  • Improving hospital logistics, including bed and room allocation.
  • Enhancing overall hospital management efficiency and resource utilisation.
  • Modelling multi-class classification problems in a healthcare context.

Coverage

The sources do not provide explicit details regarding the geographic region, specific time range, or demographic scope of the patient data.

License

CC0: Public Domain

Who Can Use It

  • Hospitals: To improve resource allocation and operational efficiency.
  • Healthcare Management Organisations: Such as HealthMan, for professional and optimal management of hospital functions.
  • Data Scientists and Analysts: For developing and deploying machine learning models to predict patient length of stay.
  • Researchers: Studying healthcare analytics, patient flow, and operational efficiency within healthcare systems.
  • Policy Makers: To inform strategies for improving public health and hospital resilience.

Dataset Name Suggestions

  • Hospital Patient Length of Stay Prediction
  • Healthcare Analytics II: Patient Stay Duration
  • Patient LOS Prediction Dataset
  • Hospital Resource Optimisation
  • Medical Stay Duration Predictor

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

22/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format