NKI Cancer Prognosis Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features NKI Breast Cancer Data, focusing on gene expression and patient metadata for 272 breast cancer patients. Its core purpose is to facilitate the understanding of factors influencing patient survival outcomes. The data includes a binary 'Event Death' column (0 or 1), which is pivotal for supervised versus unsupervised analyses aimed at identifying genetic indicators associated with survival. Uncovering insights from groups such as "good survivors" who exhibit atypical genetic indicators, like low ESR1 levels (a typical prognostic indicator for poor outcomes in breast cancer), is considered vital for improving mortality rates for this disease. This resource is also well-suited for students to practise data analysis.
Columns
The dataset is structured with 272 rows (patients) and 1570 columns primarily dedicated to gene expression data. It also incorporates patient information, treatment details, and survival metadata. Key columns included are:
- Patient: A unique identifier for each patient.
- ID: An additional identifier, which may be excluded in some analytical subsets.
- eventdeath: A crucial binary outcome variable (0 for 'alive', 1 for 'dead'), used as a "Data Lens" for analytical approaches.
- ESR1 levels: A specific genetic expression level noted as a prognostic indicator in breast cancer.
- Numerous other unnamed gene expression features.
- Other unnamed columns containing patient demographic and treatment information.
Distribution
The dataset is typically available in a CSV format. It contains 272 individual records or rows, each representing a breast cancer patient. There are 1570 columns dedicated to gene expression, complemented by additional metadata columns. While the exact file size for the raw dataset is not specified, an associated example notebook is approximately 146.06 kB. A sample file of the data is expected to be provided separately.
Usage
This dataset is particularly useful for a variety of applications, including:
- Conducting exploratory data analysis.
- Developing and training machine learning models, such as logistic regression and decision trees, for predictive tasks.
- Performing survival analysis within the context of breast cancer.
- Identifying and investigating genetic prognostic indicators related to patient outcomes.
- Analysing sub-populations of patients with distinct survival patterns, for example, groups with 100% survival or varying estrogen expression levels.
- Serving as a practical resource for students to hone their data analysis skills in a real-world context.
Coverage
The dataset specifically covers NKI Breast Cancer patients. Explicit geographic or time range details for the data collection are not provided, though it is noted that the dataset may not be entirely up-to-date and that collection years, if specified by the original uploader, would offer more context.
License
CC BY 4.0
Who Can Use It
This dataset is intended for a range of users, including:
- Students: To practise and refine their data analysis methodologies.
- Researchers and Data Scientists: Particularly those with an interest in breast cancer research, genomics, survival prediction, and the application of machine learning in healthcare.
- Medical Professionals and Biostatisticians: Who wish to gain a deeper understanding of prognostic indicators and patient outcomes in breast cancer.
Dataset Name Suggestions
- NKI Breast Cancer Patient Outcomes
- Breast Cancer Gene Expression and Survival
- NKI Cancer Prognosis Data
- Genomic Indicators for Breast Cancer Survival
- Clinical Breast Cancer Patient Data
Attributes
Original Data Source: NKI Cancer Prognosis Data