UCI Breast Cancer Symptom Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Data captures the characteristics and symptoms related to Breast Cancer, focusing on patient attributes frequently analysed in machine learning literature. This dataset is designed for prognostic studies and classification tasks, illustrating how various clinical variables impact the likelihood of recurrence events. It features 286 instances, partitioned into two distinct classes: 201 instances belonging to one category and 85 instances belonging to the other.
Columns
- Class: Indicates the outcome, either 'no-recurrence-events' or 'recurrence-events'.
- Age: Categorical ranges spanning from 10-19 up to 90-99.
- Menopause: Status defined as 'lt40', 'ge40', or 'premeno'.
- Tumor-size: Categorical size ranges, such as 0-4, 5-9, up to 55-59.
- Inv-nodes: The number of involved lymph nodes, presented in ranges (e.g., 0-2, 3-5, up to 36-39).
- Node-caps: Specifies whether node capsules are involved ('yes' or 'no').
- Deg-malig: Degree of malignancy, a nominal variable with values 1, 2, or 3.
- Breast: Indicates the location of the breast affected ('left' or 'right').
- Breast-quad: Specifies the quadrant of the breast affected (e.g., 'left-up', 'right-low', 'central').
- Irradiat: Denotes whether the patient received post-operative irradiation ('yes' or 'no').
Distribution
The dataset is structured as a single data file, typically available in CSV format, containing 11 columns in total. There are 286 validated records in the file. The data quality is high, with 0% mismatched or missing values across the included attributes. For the 'Class' variable, the most common outcome is 'no-recurrence-events', accounting for 70% of the instances. The most frequent age range is 50-59 (34%), and 78% of instances report 'no' involvement of node capsules.
Usage
This data is ideally suited for developing predictive models and classification algorithms within machine learning. Specific use cases include:
- Predicting the likelihood of breast cancer recurrence.
- Evaluating the prognostic significance of various symptoms and clinical factors.
- Statistical analysis in oncology and public health to understand disease patterns.
Coverage
The demographic scope is defined by age groups ranging from 10 to 99, encompassing pre-menopausal, post-menopausal, and other specified menopause status categories. Clinical variables detail tumour size, malignancy grade, and treatment characteristics (e.g., irradiation). Geographical and temporal scopes are not specified within the available metadata.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data Scientists and Machine Learning Engineers: For building and testing classification models aimed at health outcomes prediction.
- Health Researchers and Oncologists: To gain insights into symptom relationships, severity metrics, and factors influencing recurrence rates.
- Public Health Professionals: For epidemiological studies related to cancer incidence and attributes.
Dataset Name Suggestions
UCI Breast Cancer Symptom Data
Oncology Recurrence Events Data
Clinical Breast Cancer Prognosis
ML Breast Cancer Attribute Analysis
Attributes
Original Data Source: UCI Breast Cancer Symptom Data
Loading...
