Opendatabay APP

UCI Breast Cancer Symptom Data

Patient Health Records & Digital Health

Tags and Keywords

Health

Cancer

Prognosis

Symptoms

Oncology

Trusted By
Trusted by company1Trusted by company2Trusted by company3
UCI Breast Cancer Symptom Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data captures the characteristics and symptoms related to Breast Cancer, focusing on patient attributes frequently analysed in machine learning literature. This dataset is designed for prognostic studies and classification tasks, illustrating how various clinical variables impact the likelihood of recurrence events. It features 286 instances, partitioned into two distinct classes: 201 instances belonging to one category and 85 instances belonging to the other.

Columns

  • Class: Indicates the outcome, either 'no-recurrence-events' or 'recurrence-events'.
  • Age: Categorical ranges spanning from 10-19 up to 90-99.
  • Menopause: Status defined as 'lt40', 'ge40', or 'premeno'.
  • Tumor-size: Categorical size ranges, such as 0-4, 5-9, up to 55-59.
  • Inv-nodes: The number of involved lymph nodes, presented in ranges (e.g., 0-2, 3-5, up to 36-39).
  • Node-caps: Specifies whether node capsules are involved ('yes' or 'no').
  • Deg-malig: Degree of malignancy, a nominal variable with values 1, 2, or 3.
  • Breast: Indicates the location of the breast affected ('left' or 'right').
  • Breast-quad: Specifies the quadrant of the breast affected (e.g., 'left-up', 'right-low', 'central').
  • Irradiat: Denotes whether the patient received post-operative irradiation ('yes' or 'no').

Distribution

The dataset is structured as a single data file, typically available in CSV format, containing 11 columns in total. There are 286 validated records in the file. The data quality is high, with 0% mismatched or missing values across the included attributes. For the 'Class' variable, the most common outcome is 'no-recurrence-events', accounting for 70% of the instances. The most frequent age range is 50-59 (34%), and 78% of instances report 'no' involvement of node capsules.

Usage

This data is ideally suited for developing predictive models and classification algorithms within machine learning. Specific use cases include:
  • Predicting the likelihood of breast cancer recurrence.
  • Evaluating the prognostic significance of various symptoms and clinical factors.
  • Statistical analysis in oncology and public health to understand disease patterns.

Coverage

The demographic scope is defined by age groups ranging from 10 to 99, encompassing pre-menopausal, post-menopausal, and other specified menopause status categories. Clinical variables detail tumour size, malignancy grade, and treatment characteristics (e.g., irradiation). Geographical and temporal scopes are not specified within the available metadata.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For building and testing classification models aimed at health outcomes prediction.
  • Health Researchers and Oncologists: To gain insights into symptom relationships, severity metrics, and factors influencing recurrence rates.
  • Public Health Professionals: For epidemiological studies related to cancer incidence and attributes.

Dataset Name Suggestions

UCI Breast Cancer Symptom Data Oncology Recurrence Events Data Clinical Breast Cancer Prognosis ML Breast Cancer Attribute Analysis

Attributes

Original Data Source: UCI Breast Cancer Symptom Data

Listing Stats

VIEWS

3

DOWNLOADS

1

LISTED

14/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format