Patient LOS Risk Modelling Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Clinical data collected during the COVID-19 pandemic provides insight into the immense strain placed on global health systems. Accurately projecting future requirements for healthcare resources—including beds, staff, and essential equipment—is a vital priority for effective healthcare management. This dataset supports the critical task of optimizing hospital functioning by allowing for the precise prediction of individual patient Length of Stay (LOS). The prediction task involves classifying a patient’s stay into one of eleven distinct duration classes, ranging from brief stays of 0-10 days to extended stays exceeding 100 days.
Columns
The dataset contains 18 features capturing details about the patient, the hospital, and the final outcome:
- case_id: A serial identifier for each case.
- Hospital: Unique identifier for the hospital facility.
- Hospital_type: Classification category for the hospital.
- Hospital_city: Geographic code indicating the hospital's location.
- Hospital_region: The broader region where the hospital is situated.
- Available-Extra-Rooms-in-Hospital: The count of extra rooms currently available in the hospital.
- Department: The medical department overseeing the case (e.g., gynecology, anesthesia, surgery).
- Ward_Type: Categorical classification of the ward.
- Ward_Facility: Categorical designation of the ward facility.
- Bed_Grade: Indicator of the condition or quality of the bed in the ward (1 to 4).
- patientid: Unique identifier assigned to the patient.
- City_Code_Patient: Geographic code identifying the patient's city of origin.
- Type of Admission: The method of admission registered by the hospital (Emergency, Trauma, Urgent).
- Illness_Severity: The recorded severity of the illness at the time of admission (Extreme, Moderate, Minor).
- Patient_Visitors: The number of visitors associated with the patient.
- Age: The patient’s age group (e.g., 41-50, 61-70).
- Admission_Deposit: The deposit amount paid at the time of admission, ranging between approximately 1,800 and 11,000.
- Stay_Days (Target): The duration of the patient’s stay, categorized into 11 time windows (e.g., 21-30 days, More than 100 Days).
Distribution
The data is provided in a CSV file format named
host_train.csv
, sized approximately 26.92 MB. It consists of around 318,000 individual patient records. Key attributes, such as hospital ID, have 32 unique identifiers, and patient age is distributed across 10 defined categories. The primary prediction task is a multiclass classification problem with 11 possible outcomes for the target variable, Stay_Days.Usage
This data product is invaluable for improving the efficiency of healthcare management. Ideal applications include:
- Risk Identification: Identifying patients likely to experience a prolonged Length of Stay (high LOS risk) immediately upon admission.
- Treatment Optimisation: Tailoring treatment plans for high-risk patients to minimize their required hospital duration.
- Infection Control: Lowering the risk of infection spread among staff and visitors by better managing patient flow.
- Logistical Planning: Assisting hospital management with crucial logistics, such as advanced planning for room and bed allocation.
Coverage
The data focuses on patients receiving hospital treatment, specifically within the context of COVID-19 hospital treatment plans. The variables cover patient demographics (age categories, city codes) and detailed hospital infrastructure characteristics (32 hospitals across 13 cities and 3 regions). The condition of the data reflects various types of hospital admissions and illness severities encountered during the pandemic.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: For building and benchmarking predictive models aimed at multiclass classification of patient LOS.
- Hospital Administrators and Operations Managers: For gaining insights into factors that influence hospital resource utilization and efficiency.
- Public Health Researchers: For analysing patterns in patient recovery and identifying influential factors related to long-term hospitalisation during a pandemic.
Dataset Name Suggestions
- COVID-19 Patient Length of Stay Predictor
- Hospital Resource Allocation Optimization Data
- Patient LOS Risk Modelling Dataset
- Healthcare Efficiency Management Data
Attributes
Original Data Source: Patient LOS Risk Modelling Dataset