Female Breast Carcinoma Dataset
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on female breast cancer patients, specifically those with infiltrating duct and lobular carcinoma [5]. It was derived from the 2017 November update of the SEER Program of the National Cancer Institute, which offers population-based cancer statistics [5]. The data includes patients diagnosed between 2006 and 2010. Patients with missing information regarding tumour size, examined regional lymph nodes, positive regional lymph nodes, or survival months less than one month were excluded, resulting in 4024 included patients [5].
Columns
- Age: The patient's age [6].
- Race: Categorisation of patient race, including White, Other, American Indian/AK Native, and Asian/Pacific Islander [7].
- Marital Status: The patient's marital status, categorised as Married, Single, or Other [7].
- T Stage: The Adjusted AJCC 6th T stage of the tumour, indicating its size and extent [7].
- N Stage: The Adjusted AJCC 6th N stage, indicating the extent of cancer in regional lymph nodes [8].
- 6th Stage: The Breast Adjusted AJCC 6th Stage, a staging system for breast cancer [8].
- differentiate: The differentiation grade of the tumour, such as Moderately differentiated or Poorly differentiated [8].
- Grade: The tumour grade, typically a numerical value indicating how abnormal the cancer cells look under a microscope [9].
- A Stage: Specifies whether the neoplasm has extended regionally (spread to adjacent tissues or regional lymph nodes) or distantly (spread to remote parts of the body) [9].
- Tumor Size: The exact size of the tumour in millimetres [9, 10].
- Estrogen Status: Indicates whether the tumour is Estrogen Receptor positive or negative [10].
- Progesterone Status: Indicates whether the tumour is Progesterone Receptor positive or negative [11].
- Regional Node Examined: The number of regional lymph nodes examined [11].
- Reginol Node Positive: The number of regional lymph nodes found to be positive for cancer [11, 12].
- Survival Months: The number of months the patient survived [12, 13].
- Status: The patient's vital status, either Alive or Dead [13].
Distribution
The dataset is provided as a CSV file (
Breast_Cancer.csv
), with a size of 396.12 kB [6]. It contains data for 4024 patients across 16 columns [5, 6]. Specific numbers for rows or records are available through the patient count.Usage
This dataset is ideal for:
- Cancer Research: Investigating factors influencing breast cancer prognosis and patient outcomes [5].
- Predictive Modelling: Developing classification models to predict patient survival or disease progression [6].
- Data Analytics: Performing statistical analysis to uncover correlations between patient attributes and cancer characteristics [6].
- Machine Learning Applications: Training decision tree and ensembling algorithms for medical diagnostics [6].
- Epidemiological Studies: Studying population-based cancer statistics and trends [5].
Coverage
The dataset covers female patients diagnosed with specific types of breast cancer (infiltrating duct and lobular carcinoma) in 2006-2010 [5]. It is population-based, originating from the SEER Program of the NCI [5]. There are no specific notes on data availability for certain groups or years beyond the initial selection criteria.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Medical Researchers: To analyse patient demographics and tumour characteristics related to survival.
- Data Scientists: To build and test machine learning models for cancer prognostication.
- Public Health Analysts: To understand patterns and statistics of breast cancer within a population.
- Students and Academics: For educational purposes in bioinformatics, statistics, and public health courses.
Dataset Name Suggestions
- Breast Cancer Patient Outcomes
- SEER Breast Cancer Survival Data
- Female Breast Carcinoma Dataset
- Oncology Patient Data (Breast)
Attributes
Original Data Source: Female Breast Carcinoma Dataset