Breast Tumour Classification Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a valuable resource for predicting breast cancer and exploring various machine learning techniques for binary classification tasks. It provides detailed characteristics of cell nuclei, computed from digitised images of fine needle aspirates (FNA) of breast masses. The primary goal is to classify whether a tumour is benign (B) or malignant (M). Breast cancer, a significant global health concern, affects millions, predominantly women over 50, but also men and younger individuals. This dataset serves as an excellent foundation for practice and reference in machine learning studies and applications related to medical diagnostics.
Columns
The dataset contains 32 attributes, including an ID, a diagnosis, and 30 real-valued input features describing characteristics of the cell nuclei.
- id: Unique identification number for each patient. (Valid: 569, Missing: 0%)
- diagnosis: The diagnosis of the tumour, either M (malignant) or B (benign). (Valid: 569, Missing: 0%, Most Common: B (63%), Unique: 2)
- radius_mean: Mean of distances from the centre to points on the perimeter. (Valid: 569, Missing: 0%, Mean: 14.1, Std. Deviation: 3.52)
- texture_mean: Standard deviation of grey-scale values. (Valid: 569, Missing: 0%, Mean: 19.3, Std. Deviation: 4.3)
- perimeter_mean: Sum of the tumour boundary lengths. (Valid: 569, Missing: 0%, Mean: 92, Std. Deviation: 24.3)
- area_mean: Mean area enclosed by the tumour boundary. (Valid: 569, Missing: 0%, Mean: 655, Std. Deviation: 352)
- smoothness_mean: Mean of local variation in radius lengths. (Valid: 569, Missing: 0%, Mean: 0.1, Std. Deviation: 0.01)
- compactness_mean: Mean of perimeter^2 / area - 1.0. (Valid: 569, Missing: 0%, Mean: 0.1, Std. Deviation: 0.05)
- concavity_mean: Mean severity of concave portions of the contour. (Valid: 569, Missing: 0%, Mean: 0.09, Std. Deviation: 0.08)
- concave points_mean: Mean number of concave portions of the contour. (Valid: 569, Missing: 0%, Mean: 0.05, Std. Deviation: 0.04)
- symmetry_mean: (Valid: 569, Missing: 0%, Mean: 0.18, Std. Deviation: 0.03)
- fractal_dimension_mean: (Valid: 569, Missing: 0%, Mean: 0.06, Std. Deviation: 0.01)
- radius_se: (Valid: 569, Missing: 0%, Mean: 0.41, Std. Deviation: 0.28)
- texture_se: (Valid: 569, Missing: 0%, Mean: 1.22, Std. Deviation: 0.55)
- perimeter_se: (Valid: 569, Missing: 0%, Mean: 2.87, Std. Deviation: 2.02)
- area_se: (Valid: 569, Missing: 0%, Mean: 40.3, Std. Deviation: 45.5)
- smoothness_se: (Valid: 569, Missing: 0%, Mean: 0.01, Std. Deviation: 0)
- compactness_se: (Valid: 569, Missing: 0%, Mean: 0.03, Std. Deviation: 0.02)
- concavity_se: (Valid: 569, Missing: 0%, Mean: 0.03, Std. Deviation: 0.03)
- concave points_se: (Valid: 569, Missing: 0%, Mean: 0.01, Std. Deviation: 0.01)
- symmetry_se: (Valid: 569, Missing: 0%, Mean: 0.02, Std. Deviation: 0.01)
- fractal_dimension_se: (Valid: 569, Missing: 0%, Mean: 0, Std. Deviation: 0)
- radius_worst: (Valid: 569, Missing: 0%, Mean: 16.3, Std. Deviation: 4.83)
- texture_worst: (Valid: 569, Missing: 0%, Mean: 25.7, Std. Deviation: 6.14)
- perimeter_worst: (Valid: 569, Missing: 0%, Mean: 107, Std. Deviation: 33.6)
- area_worst: (Valid: 569, Missing: 0%, Mean: 881, Std. Deviation: 569)
- smoothness_worst: (Valid: 569, Missing: 0%, Mean: 0.13, Std. Deviation: 0.02)
- compactness_worst: (Valid: 569, Missing: 0%, Mean: 0.25, Std. Deviation: 0.16)
- concavity_worst: (Valid: 569, Missing: 0%, Mean: 0.27, Std. Deviation: 0.21)
- concave points_worst: (Valid: 569, Missing: 0%, Mean: 0.11, Std. Deviation: 0.07)
- symmetry_worst: (Valid: 569, Missing: 0%, Mean: 0.29, Std. Deviation: 0.06)
- fractal_dimension_worst: (Valid: 569, Missing: 0%, Mean: 0.08, Std. Deviation: 0.02)
Distribution
The dataset is provided as a CSV file,
breast-cancer-wisconsin-data.csv
, with a size of 125.14 kB. It contains 569 instances (rows/records) and 32 attributes (columns). All data fields are valid with no missing or mismatched values.Usage
This dataset is ideal for:
- Developing and evaluating machine learning models to predict breast cancer diagnosis (benign or malignant).
- Educational purposes, serving as a practice and reference resource for students and peers in machine learning.
- Experimenting with various classification algorithms and data analysis techniques.
- Research in medical diagnostics, image analysis, and early cancer detection.
- Applications focusing on breast tumour diagnosis and prognosis.
Coverage
The dataset originates from the UCI Machine Learning Repository and was created by researchers at the University of Wisconsin. The data was collected and uploaded around November 1995, with past usage documented from 1992 to 1995. The data is expected to be updated annually. Breast cancer primarily affects women and people assigned female at birth (AFAB) aged 50 and older, but it can also occur in men and people assigned male at birth (AMAB), as well as younger women.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is suitable for:
- Machine Learning Practitioners: For building and testing predictive models for medical diagnosis.
- Students and Educators: As a practical resource for learning and teaching machine learning concepts, especially binary classification.
- Medical Researchers: To explore patterns in breast cancer diagnostic features and potentially contribute to improved detection methods.
- Data Scientists and Analysts: For conducting statistical analysis and feature engineering on health-related data.
- Healthcare Developers: To integrate predictive capabilities into diagnostic tools or systems.
Dataset Name Suggestions
- Wisconsin Breast Cancer Diagnosis Dataset
- Breast Tumour Classification Data
- FNA Breast Cancer Machine Learning Dataset
- UCI Breast Cancer Predictive Dataset
- Medical Breast Tumour Features
Attributes
Original Data Source: Breast Tumour Classification Data