Dark Mode

Home

Data Categories

Medical & Healthcare Data

Breast Cancer Prediction Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Breast Cancer Prediction Dataset

Clinical Trials & Research

Tags and Keywords

Cancer

Tumour

Diagnosis

Prediction

Healthcare

Trusted By

Breast Cancer Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for the binary classification of breast cancer tumours, distinguishing between malignant (cancerous) and benign (non-cancerous) types. Breast cancer is a prevalent global health concern, affecting over 2.1 million people in 2015 and accounting for 25% of all cancer cases. The dataset aims to support the development of machine learning models, specifically mentioning Support Vector Machines (SVMs), to predict tumour classification. It facilitates understanding the data, performing any necessary cleanup, building and fine-tuning classification algorithms, and comparing their evaluation metrics. Tumours typically form as lumps or are detectable via X-ray.

Columns

id: A unique identifier for each record.
diagnosis: The target variable, indicating the tumour type: 'M' for Malignant or 'B' for Benign.
radius_mean: The mean value of the radius of the breast lobes.
texture_mean: The mean value of the surface texture.
perimeter_mean: The mean value of the outer perimeter of the lobes.
area_mean: The mean value of the area of the lobes.
smoothness_mean: The mean value of smoothness levels.
compactness_mean: The mean value of compactness.
concavity_mean: The mean value of concavity.
concave points_mean: The mean value of concave points.
symmetry_mean: The mean value of symmetry.
fractal_dimension_mean: The mean value of fractal dimension.
radius_se: The standard error of the radius.
texture_se: The standard error of the texture.
perimeter_se: The standard error of the perimeter.
area_se: The standard error of the area.
smoothness_se: The standard error of smoothness.
compactness_se: The standard error of compactness.
concavity_se: The standard error of concavity.
concave points_se: The standard error of concave points.
symmetry_se: The standard error of symmetry.
fractal_dimension_se: The standard error of fractal dimension.
radius_worst: The "worst" or largest mean value for radius.
texture_worst: The "worst" or largest mean value for texture.
perimeter_worst: The "worst" or largest mean value for perimeter.
area_worst: The "worst" or largest mean value for area.
smoothness_worst: The "worst" or largest mean value for smoothness.
compactness_worst: The "worst" or largest mean value for compactness.
concavity_worst: The "worst" or largest mean value for concavity.
concave points_worst: The "worst" or largest mean value for concave points.
symmetry_worst: The "worst" or largest mean value for symmetry.
fractal_dimension_worst: The "worst" or largest mean value for fractal dimension.

Distribution

The dataset is provided as a CSV file named breast-cancer.csv, with a size of 124.57 kB. It contains 569 records and 32 columns. All columns are valid, with no mismatched or missing values reported. The diagnosis column, which is the target for classification, shows a distribution of 63% Benign (B) and 37% Malignant (M) tumours.

Usage

This dataset is ideal for:

Developing and testing machine learning classification models to predict breast cancer type.
Conducting data exploration and cleanup activities.
Experimenting with various hyperparameter tuning techniques for classification algorithms.
Comparing the performance and evaluation metrics of different classification models, such as SVMs.
Educational purposes in data science and machine learning, particularly in the healthcare domain.

Coverage

The dataset is referred to as the Breast Cancer Wisconsin (Diagnostic) Dataset, implying a geographic focus related to Wisconsin. No specific time range or demographic breakdown for the dataset itself is provided within the source material.

License

CC0: Public Domain

Who Can Use It

This dataset is suitable for:

Data Scientists and Machine Learning Engineers: For building, training, and evaluating predictive models for breast cancer diagnosis.
Healthcare Researchers: To explore relationships between tumour characteristics and malignancy, potentially aiding in diagnostic research.
Students and Educators: As a practical example for learning about binary classification, data preprocessing, and model evaluation in a real-world medical context.
Developers: Creating diagnostic support systems or applications that require automated tumour classification.

Dataset Name Suggestions

Breast Cancer Wisconsin Diagnostic Dataset
Malignant-Benign Tumour Classification Data
Breast Cancer Prediction Dataset
Oncology Tumour Data for ML

Attributes

Original Data Source: Breast Cancer Prediction Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

08/07/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...