Breast Cancer Diagnosis Features
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a classic resource for training and benchmarking machine learning algorithms, particularly for classifying breast mass biopsies. It contains biopsy features derived from digital images of fine needle aspirate biopsy slides. The purpose is to distinguish between malignant (cancer) and benign (not cancer) tumours. The features correspond to properties of cell nuclei, such as size, shape, and regularity, providing a rich set of attributes for predictive modelling.
Columns
The dataset includes 30 predictive features and one target variable:
- x.radius_mean: Mean radius of the tumour cells.
- x.texture_mean: Mean texture of the tumour cells.
- x.perimeter_mean: Mean perimeter of the tumour cells.
- x.area_mean: Mean area of the tumour cells.
- x.smoothness_mean: Mean smoothness of the tumour cells.
- x.compactness_mean: Mean compactness of the tumour cells.
- x.concavity_mean: Mean concavity of the tumour cells.
- x.concave_points_mean: Mean number of concave portions of the contour of the tumour cells.
- x.symmetry_mean: Mean symmetry of the tumour cells.
- x.fractal_dimension_mean: Mean "coastline approximation" of the tumour cells.
- x.radius_se: Standard error of the radius of the tumour cells.
- x.texture_se: Standard error of the texture of the tumour cells.
- x.perimeter_se: Standard error of the perimeter of the tumour cells.
- x.area_se: Standard error of the area of the tumour cells.
- x.smoothness_se: Standard error of the smoothness of the tumour cells.
- x.compactness_se: Standard error of the compactness of the tumour cells.
- x.concavity_se: Standard error of the concavity of the tumour cells.
- x.concave_points_se: Standard error of the number of concave portions of the contour of the tumour cells.
- x.symmetry_se: Standard error of the symmetry of the tumour cells.
- x.fractal_dimension_se: Standard error of the "coastline approximation" of the tumour cells.
- x.radius_worst: Worst (largest) radius of the tumour cells.
- x.texture_worst: Worst (most severe) texture of the tumour cells.
- x.perimeter_worst: Worst (largest) perimeter of the tumour cells.
- x.area_worst: Worst (largest) area of the tumour cells.
- x.smoothness_worst: Worst (most severe) smoothness of the tumour cells.
- x.compactness_worst: Worst (most severe) compactness of the tumour cells.
- x.concavity_worst: Worst (most severe) concavity of the tumour cells.
- x.concave_points_worst: Worst (most severe) number of concave portions of the contour of the tumour cells.
- x.symmetry_worst: Worst (most severe) symmetry of the tumour cells.
- x.fractal_dimension_worst: Worst (most severe) "coastline approximation" of the tumour cells.
- y target: The outcome, indicating whether a mass is malignant ("M") or benign ("B").
Distribution
The dataset is provided in CSV format and contains 569 records (biopsy instances). It comprises 32 columns, including 30 predictive features and 1 target outcome. All records are valid, with no mismatched or missing values reported. The target variable 'y' shows that 63% of the masses are benign, and 37% are malignant. The file size is 124.89 kB.
Usage
This dataset is ideally suited for:
- Developing and evaluating machine learning algorithms for classification tasks.
- Creating predictive models to assist in breast cancer diagnosis.
- Benchmarking the performance of different classification techniques.
- Educational purposes in data science and medical informatics.
Coverage
The dataset consists of 569 breast mass biopsies. Information regarding the geographic origin, specific time range, or demographic scope of the data is not available.
License
CC0: Public Domain.
Who Can Use It
- Machine Learning Engineers and Researchers: To train, test, and compare various classification algorithms.
- Data Scientists: For practical application of classification techniques in a real-world medical context.
- Students and Educators: As an accessible and widely recognised dataset for learning and teaching about classification problems and medical data analysis.
- Healthcare Researchers: To explore feature importance and build diagnostic support tools.
Dataset Name Suggestions
- Breast Cancer Diagnosis Features
- Wisconsin Breast Biopsy Classification
- Cell Nuclei Tumour Classification
- Malignant Benign Breast Mass Dataset
Attributes
Original Data Source: Breast Cancer Diagnosis Features