Opendatabay APP

Breast Tumour Classification Data

Patient Health Records & Digital Health

Tags and Keywords

Breast

Cancer

Diagnosis

Tumour

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Breast Tumour Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a valuable resource for predicting breast cancer and exploring various machine learning techniques for binary classification tasks. It provides detailed characteristics of cell nuclei, computed from digitised images of fine needle aspirates (FNA) of breast masses. The primary goal is to classify whether a tumour is benign (B) or malignant (M). Breast cancer, a significant global health concern, affects millions, predominantly women over 50, but also men and younger individuals. This dataset serves as an excellent foundation for practice and reference in machine learning studies and applications related to medical diagnostics.

Columns

The dataset contains 32 attributes, including an ID, a diagnosis, and 30 real-valued input features describing characteristics of the cell nuclei.
  • id: Unique identification number for each patient. (Valid: 569, Missing: 0%)
  • diagnosis: The diagnosis of the tumour, either M (malignant) or B (benign). (Valid: 569, Missing: 0%, Most Common: B (63%), Unique: 2)
  • radius_mean: Mean of distances from the centre to points on the perimeter. (Valid: 569, Missing: 0%, Mean: 14.1, Std. Deviation: 3.52)
  • texture_mean: Standard deviation of grey-scale values. (Valid: 569, Missing: 0%, Mean: 19.3, Std. Deviation: 4.3)
  • perimeter_mean: Sum of the tumour boundary lengths. (Valid: 569, Missing: 0%, Mean: 92, Std. Deviation: 24.3)
  • area_mean: Mean area enclosed by the tumour boundary. (Valid: 569, Missing: 0%, Mean: 655, Std. Deviation: 352)
  • smoothness_mean: Mean of local variation in radius lengths. (Valid: 569, Missing: 0%, Mean: 0.1, Std. Deviation: 0.01)
  • compactness_mean: Mean of perimeter^2 / area - 1.0. (Valid: 569, Missing: 0%, Mean: 0.1, Std. Deviation: 0.05)
  • concavity_mean: Mean severity of concave portions of the contour. (Valid: 569, Missing: 0%, Mean: 0.09, Std. Deviation: 0.08)
  • concave points_mean: Mean number of concave portions of the contour. (Valid: 569, Missing: 0%, Mean: 0.05, Std. Deviation: 0.04)
  • symmetry_mean: (Valid: 569, Missing: 0%, Mean: 0.18, Std. Deviation: 0.03)
  • fractal_dimension_mean: (Valid: 569, Missing: 0%, Mean: 0.06, Std. Deviation: 0.01)
  • radius_se: (Valid: 569, Missing: 0%, Mean: 0.41, Std. Deviation: 0.28)
  • texture_se: (Valid: 569, Missing: 0%, Mean: 1.22, Std. Deviation: 0.55)
  • perimeter_se: (Valid: 569, Missing: 0%, Mean: 2.87, Std. Deviation: 2.02)
  • area_se: (Valid: 569, Missing: 0%, Mean: 40.3, Std. Deviation: 45.5)
  • smoothness_se: (Valid: 569, Missing: 0%, Mean: 0.01, Std. Deviation: 0)
  • compactness_se: (Valid: 569, Missing: 0%, Mean: 0.03, Std. Deviation: 0.02)
  • concavity_se: (Valid: 569, Missing: 0%, Mean: 0.03, Std. Deviation: 0.03)
  • concave points_se: (Valid: 569, Missing: 0%, Mean: 0.01, Std. Deviation: 0.01)
  • symmetry_se: (Valid: 569, Missing: 0%, Mean: 0.02, Std. Deviation: 0.01)
  • fractal_dimension_se: (Valid: 569, Missing: 0%, Mean: 0, Std. Deviation: 0)
  • radius_worst: (Valid: 569, Missing: 0%, Mean: 16.3, Std. Deviation: 4.83)
  • texture_worst: (Valid: 569, Missing: 0%, Mean: 25.7, Std. Deviation: 6.14)
  • perimeter_worst: (Valid: 569, Missing: 0%, Mean: 107, Std. Deviation: 33.6)
  • area_worst: (Valid: 569, Missing: 0%, Mean: 881, Std. Deviation: 569)
  • smoothness_worst: (Valid: 569, Missing: 0%, Mean: 0.13, Std. Deviation: 0.02)
  • compactness_worst: (Valid: 569, Missing: 0%, Mean: 0.25, Std. Deviation: 0.16)
  • concavity_worst: (Valid: 569, Missing: 0%, Mean: 0.27, Std. Deviation: 0.21)
  • concave points_worst: (Valid: 569, Missing: 0%, Mean: 0.11, Std. Deviation: 0.07)
  • symmetry_worst: (Valid: 569, Missing: 0%, Mean: 0.29, Std. Deviation: 0.06)
  • fractal_dimension_worst: (Valid: 569, Missing: 0%, Mean: 0.08, Std. Deviation: 0.02)

Distribution

The dataset is provided as a CSV file, breast-cancer-wisconsin-data.csv, with a size of 125.14 kB. It contains 569 instances (rows/records) and 32 attributes (columns). All data fields are valid with no missing or mismatched values.

Usage

This dataset is ideal for:
  • Developing and evaluating machine learning models to predict breast cancer diagnosis (benign or malignant).
  • Educational purposes, serving as a practice and reference resource for students and peers in machine learning.
  • Experimenting with various classification algorithms and data analysis techniques.
  • Research in medical diagnostics, image analysis, and early cancer detection.
  • Applications focusing on breast tumour diagnosis and prognosis.

Coverage

The dataset originates from the UCI Machine Learning Repository and was created by researchers at the University of Wisconsin. The data was collected and uploaded around November 1995, with past usage documented from 1992 to 1995. The data is expected to be updated annually. Breast cancer primarily affects women and people assigned female at birth (AFAB) aged 50 and older, but it can also occur in men and people assigned male at birth (AMAB), as well as younger women.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is suitable for:
  • Machine Learning Practitioners: For building and testing predictive models for medical diagnosis.
  • Students and Educators: As a practical resource for learning and teaching machine learning concepts, especially binary classification.
  • Medical Researchers: To explore patterns in breast cancer diagnostic features and potentially contribute to improved detection methods.
  • Data Scientists and Analysts: For conducting statistical analysis and feature engineering on health-related data.
  • Healthcare Developers: To integrate predictive capabilities into diagnostic tools or systems.

Dataset Name Suggestions

  • Wisconsin Breast Cancer Diagnosis Dataset
  • Breast Tumour Classification Data
  • FNA Breast Cancer Machine Learning Dataset
  • UCI Breast Cancer Predictive Dataset
  • Medical Breast Tumour Features

Attributes

Original Data Source: Breast Tumour Classification Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

30/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format