Opendatabay APP

Breast Cancer Recurrence and Survival Dataset

Patient Health Records & Digital Health

Tags and Keywords

Health

Cancer

Prognosis

Breast

Medical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Breast Cancer Recurrence and Survival Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains patient records from a 1984-1989 trial conducted by the German Breast Cancer Study Group (GBSG). It focuses on patients with node-positive breast cancer and includes variables relevant for prognostic modelling. The dataset was specifically used for the external validation of a Cox prognostic model in a paper by Royston and Altman (2013), where another dataset (Rotterdam data) was used for model creation. It provides valuable information for understanding factors related to breast cancer recurrence and survival.

Columns

  • pid: A unique patient identifier.
  • age: The patient's age in years.
  • meno: Menopausal status, where 0 indicates premenopausal and 1 indicates postmenopausal.
  • size: The size of the tumour in millimetres (mm).
  • grade: The tumour grade.
  • nodes: The number of positive lymph nodes.
  • pgr: Progesterone receptor levels, measured in fmol/l.
  • er: Oestrogen receptor levels, measured in fmol/l.
  • hormon: Indication of hormonal therapy, where 0 means no and 1 means yes.
  • rfstime: Recurrence-free survival time, representing the number of days to the first event of recurrence, death, or the last follow-up.
  • status: The patient's outcome status, where 0 indicates alive without recurrence and 1 indicates recurrence or death.

Distribution

The dataset is structured as a data file containing 686 observations (patient records) and 11 variables (columns). All variables have complete data, with no missing or mismatched entries reported. The expected update frequency for this dataset is never, indicating it is a static collection of historical trial data.

Usage

This dataset is ideally suited for:
  • External validation of breast cancer prognostic models.
  • Medical research into factors influencing breast cancer recurrence and survival.
  • Statistical analysis of patient outcomes in oncology.
  • Developing and testing predictive models for breast cancer prognosis.
  • Educational purposes in biostatistics and clinical research.

Coverage

  • Geographic Scope: Data originates from a clinical trial conducted by the German Breast Cancer Study Group.
  • Time Range: The patient records are from a trial conducted between 1984 and 1989.
  • Demographic Scope: The dataset includes 686 patients diagnosed with node-positive breast cancer. Patient ages range from 21 to 80 years.

License

CC0: Public Domain

Who Can Use It

  • Oncology Researchers: To validate prognostic models or study long-term outcomes of breast cancer patients.
  • Biostatisticians and Data Scientists: For applying and testing survival analysis techniques and predictive algorithms in a real-world medical context.
  • Medical Professionals: To gain insights into historical patient cohorts and factors influencing breast cancer prognosis.
  • Academic Institutions: For teaching and research purposes in health data analytics.

Dataset Name Suggestions

  • GBSG Breast Cancer Prognostic Validation Data
  • Node-Positive Breast Cancer Outcomes (1984-1989)
  • German Breast Cancer Study Group Clinical Trial Data
  • Breast Cancer Recurrence and Survival Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

12/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format