Opendatabay APP

Cirrhosis Mortality Prediction Dataset

Patient Health Records & Digital Health

Tags and Keywords

Cirrhosis

Liver

Survival

Healthcare

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Cirrhosis Mortality Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to predict the survival state of patients diagnosed with liver cirrhosis [1]. Cirrhosis, a condition resulting from prolonged liver damage and extensive scarring often due to conditions like hepatitis or chronic alcohol consumption, is the focus [1]. The data was originally collected as part of a Mayo Clinic study on primary biliary cirrhosis (PBC) carried out between 1974 and 1984 [1]. It includes 17 clinical features to aid in the prediction of patient outcomes, where survival states are classified as 'D' (death), 'C' (censored), or 'CL' (censored due to liver transplantation) [1].

Columns

The dataset contains the following 20 clinical features:
  • ID: A unique identifier for each patient [2].
  • N_Days: The number of days from baseline to endpoint for the patient [3].
  • Status: The patient's survival status, which can be Death ('D'), Censored ('C'), or Censored due to Liver Transplantation ('CL') [1, 3].
  • Drug: Indicates whether the patient was administered D-penicillamine, a Placebo, or if their drug status is 'Other' (for non-trial patients) [4].
  • Age: The patient's age in days [4].
  • Sex: The patient's gender, either Female ('F') or Male ('M') [5].
  • Ascites: Indicates the presence of ascites ('N' for No, 'NA' for Not Applicable/Not Recorded, or 'Other') [5].
  • Hepatomegaly: Indicates the presence of hepatomegaly ('Y' for Yes, 'N' for No, or 'Other') [5].
  • Spiders: Indicates the presence of spider angiomata ('N' for No, 'NA' for Not Applicable/Not Recorded, or 'Other') [6].
  • Edema: Indicates the presence of edema ('N' for No, 'S' for Some, or 'Other') [6].
  • Bilirubin: Serum Bilirubin levels, a measure of liver function [6].
  • Cholesterol: Serum Cholesterol levels [7].
  • Albumin: Serum Albumin levels, another indicator of liver function [7].
  • Copper: Urine Copper levels [7].
  • Alk_Phos: Alkaline Phosphatase levels, an enzyme found in liver cells [8].
  • SGOT: Serum Glutamic Oxaloacetic Transaminase levels, an enzyme indicating liver damage [8].
  • Tryglicerides: Serum Triglycerides levels [8].
  • Platelets: Platelet count in the blood [8].
  • Prothrombin: Prothrombin time, a measure of blood clotting ability [9].
  • Stage: The histological stage of the disease, typically ranging from 1 to 4 [9].

Distribution

The dataset is provided in CSV format (cirrhosis.csv) and is approximately 31.86 kB in size [2]. It comprises 418 unique instances (rows), each representing an individual patient, with data for 20 columns [2]. Before being made available, the dataset underwent preprocessing steps which included dropping rows where missing values were present in the 'Drug' column, imputing other missing values with mean results, and applying one-hot encoding for all categorical attributes [10].

Usage

This dataset is ideally suited for:
  • Developing and evaluating machine learning models for patient survival prediction in liver cirrhosis [1].
  • Researching factors influencing mortality and patient outcomes in primary biliary cirrhosis [2].
  • Studying the progression of liver disease and the potential impact of various clinical features on patient prognosis [1].
  • Classification tasks related to health outcomes and mortality prediction in a clinical context [2].

Coverage

The data covers patients referred to the Mayo Clinic between 1974 and 1984 [1, 10]. It includes demographic information such as gender (89% female, 11% male) and age, with an average age of approximately 18,500 days (around 50.7 years) [4, 5, 10]. The original study involved 424 PBC patients, consisting of 312 who participated in a randomised trial and 112 who agreed to basic metrics and survival tracking (of which 106 records are present in addition to the trial patients) [10]. The instances in this dataset represent individual people [10].

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is valuable for:
  • Medical researchers and epidemiologists studying liver disease and patient prognosis.
  • Data scientists and machine learning engineers working on healthcare analytics and predictive modelling.
  • Healthcare professionals seeking insights into patient survival characteristics for educational or research purposes.
  • Students undertaking projects in bioinformatics, biostatistics, or health informatics.

Dataset Name Suggestions

  • Liver Cirrhosis Patient Survival Data
  • Mayo Clinic PBC Patient Outcomes
  • Cirrhosis Mortality Prediction Dataset
  • PBC Patient Clinical Features
  • Liver Disease Survival Study

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

20/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format