Breast Cancer Recurrence and Survival Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains patient records from a 1984-1989 trial conducted by the German Breast Cancer Study Group (GBSG). It focuses on patients with node-positive breast cancer and includes variables relevant for prognostic modelling. The dataset was specifically used for the external validation of a Cox prognostic model in a paper by Royston and Altman (2013), where another dataset (Rotterdam data) was used for model creation. It provides valuable information for understanding factors related to breast cancer recurrence and survival.
Columns
- pid: A unique patient identifier.
- age: The patient's age in years.
- meno: Menopausal status, where 0 indicates premenopausal and 1 indicates postmenopausal.
- size: The size of the tumour in millimetres (mm).
- grade: The tumour grade.
- nodes: The number of positive lymph nodes.
- pgr: Progesterone receptor levels, measured in fmol/l.
- er: Oestrogen receptor levels, measured in fmol/l.
- hormon: Indication of hormonal therapy, where 0 means no and 1 means yes.
- rfstime: Recurrence-free survival time, representing the number of days to the first event of recurrence, death, or the last follow-up.
- status: The patient's outcome status, where 0 indicates alive without recurrence and 1 indicates recurrence or death.
Distribution
The dataset is structured as a data file containing 686 observations (patient records) and 11 variables (columns). All variables have complete data, with no missing or mismatched entries reported. The expected update frequency for this dataset is never, indicating it is a static collection of historical trial data.
Usage
This dataset is ideally suited for:
- External validation of breast cancer prognostic models.
- Medical research into factors influencing breast cancer recurrence and survival.
- Statistical analysis of patient outcomes in oncology.
- Developing and testing predictive models for breast cancer prognosis.
- Educational purposes in biostatistics and clinical research.
Coverage
- Geographic Scope: Data originates from a clinical trial conducted by the German Breast Cancer Study Group.
- Time Range: The patient records are from a trial conducted between 1984 and 1989.
- Demographic Scope: The dataset includes 686 patients diagnosed with node-positive breast cancer. Patient ages range from 21 to 80 years.
License
CC0: Public Domain
Who Can Use It
- Oncology Researchers: To validate prognostic models or study long-term outcomes of breast cancer patients.
- Biostatisticians and Data Scientists: For applying and testing survival analysis techniques and predictive algorithms in a real-world medical context.
- Medical Professionals: To gain insights into historical patient cohorts and factors influencing breast cancer prognosis.
- Academic Institutions: For teaching and research purposes in health data analytics.
Dataset Name Suggestions
- GBSG Breast Cancer Prognostic Validation Data
- Node-Positive Breast Cancer Outcomes (1984-1989)
- German Breast Cancer Study Group Clinical Trial Data
- Breast Cancer Recurrence and Survival Dataset
Attributes
Original Data Source: Breast Cancer Recurrence and Survival Dataset