Opendatabay APP

Breast Cancer Diagnosis Features

Patient Health Records & Digital Health

Tags and Keywords

Cancer

Biopsy

Tumour

Diagnosis

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Breast Cancer Diagnosis Features Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a classic resource for training and benchmarking machine learning algorithms, particularly for classifying breast mass biopsies. It contains biopsy features derived from digital images of fine needle aspirate biopsy slides. The purpose is to distinguish between malignant (cancer) and benign (not cancer) tumours. The features correspond to properties of cell nuclei, such as size, shape, and regularity, providing a rich set of attributes for predictive modelling.

Columns

The dataset includes 30 predictive features and one target variable:
  • x.radius_mean: Mean radius of the tumour cells.
  • x.texture_mean: Mean texture of the tumour cells.
  • x.perimeter_mean: Mean perimeter of the tumour cells.
  • x.area_mean: Mean area of the tumour cells.
  • x.smoothness_mean: Mean smoothness of the tumour cells.
  • x.compactness_mean: Mean compactness of the tumour cells.
  • x.concavity_mean: Mean concavity of the tumour cells.
  • x.concave_points_mean: Mean number of concave portions of the contour of the tumour cells.
  • x.symmetry_mean: Mean symmetry of the tumour cells.
  • x.fractal_dimension_mean: Mean "coastline approximation" of the tumour cells.
  • x.radius_se: Standard error of the radius of the tumour cells.
  • x.texture_se: Standard error of the texture of the tumour cells.
  • x.perimeter_se: Standard error of the perimeter of the tumour cells.
  • x.area_se: Standard error of the area of the tumour cells.
  • x.smoothness_se: Standard error of the smoothness of the tumour cells.
  • x.compactness_se: Standard error of the compactness of the tumour cells.
  • x.concavity_se: Standard error of the concavity of the tumour cells.
  • x.concave_points_se: Standard error of the number of concave portions of the contour of the tumour cells.
  • x.symmetry_se: Standard error of the symmetry of the tumour cells.
  • x.fractal_dimension_se: Standard error of the "coastline approximation" of the tumour cells.
  • x.radius_worst: Worst (largest) radius of the tumour cells.
  • x.texture_worst: Worst (most severe) texture of the tumour cells.
  • x.perimeter_worst: Worst (largest) perimeter of the tumour cells.
  • x.area_worst: Worst (largest) area of the tumour cells.
  • x.smoothness_worst: Worst (most severe) smoothness of the tumour cells.
  • x.compactness_worst: Worst (most severe) compactness of the tumour cells.
  • x.concavity_worst: Worst (most severe) concavity of the tumour cells.
  • x.concave_points_worst: Worst (most severe) number of concave portions of the contour of the tumour cells.
  • x.symmetry_worst: Worst (most severe) symmetry of the tumour cells.
  • x.fractal_dimension_worst: Worst (most severe) "coastline approximation" of the tumour cells.
  • y target: The outcome, indicating whether a mass is malignant ("M") or benign ("B").

Distribution

The dataset is provided in CSV format and contains 569 records (biopsy instances). It comprises 32 columns, including 30 predictive features and 1 target outcome. All records are valid, with no mismatched or missing values reported. The target variable 'y' shows that 63% of the masses are benign, and 37% are malignant. The file size is 124.89 kB.

Usage

This dataset is ideally suited for:
  • Developing and evaluating machine learning algorithms for classification tasks.
  • Creating predictive models to assist in breast cancer diagnosis.
  • Benchmarking the performance of different classification techniques.
  • Educational purposes in data science and medical informatics.

Coverage

The dataset consists of 569 breast mass biopsies. Information regarding the geographic origin, specific time range, or demographic scope of the data is not available.

License

CC0: Public Domain.

Who Can Use It

  • Machine Learning Engineers and Researchers: To train, test, and compare various classification algorithms.
  • Data Scientists: For practical application of classification techniques in a real-world medical context.
  • Students and Educators: As an accessible and widely recognised dataset for learning and teaching about classification problems and medical data analysis.
  • Healthcare Researchers: To explore feature importance and build diagnostic support tools.

Dataset Name Suggestions

  • Breast Cancer Diagnosis Features
  • Wisconsin Breast Biopsy Classification
  • Cell Nuclei Tumour Classification
  • Malignant Benign Breast Mass Dataset

Attributes

Original Data Source: Breast Cancer Diagnosis Features

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format