Opendatabay APP

Breast Cancer Diagnostic Metrics

Patient Health Records & Digital Health

Tags and Keywords

Cancer

Classification

Diagnostic

Benign

Healthcare

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Breast Cancer Diagnostic Metrics Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Detecting and classifying breast cancer tumours constitutes a critical task in medical diagnostics and healthcare informatics. This data product provides a robust collection of diagnostic parameters computed from digitized images of a fine needle aspirate (FNA) of a breast mass. The primary objective of the data is to facilitate the prediction of whether a cancer diagnosis is Benign (B) or Malignant (M). Featuring 32 distinct columns, the file captures various characteristics of cell nuclei present in the images, offering a rich ground for performing Exploratory Data Analysis (EDA), visualisation, and training machine learning classification models. It serves as a valuable resource for beginners and professionals in healthcare analytics aiming to improve diagnostic accuracy through predictive modelling.

Columns

  • id: Unique identification number assigned to each patient.
  • diagnosis: The target variable indicating the cancer status, classified as either Malignant (M) or Benign (B).
  • radius_mean: The mean of distances from the centre to points on the perimeter.
  • texture_mean: Standard deviation of grey-scale values (mean).
  • perimeter_mean: Mean size of the core tumour.
  • area_mean: Mean area of the tumour.
  • smoothness_mean: Mean of local variation in radius lengths.
  • compactness_mean: Mean of perimeter^2 / area - 1.0.
  • concavity_mean: Mean severity of concave portions of the contour.
  • concave points_mean: Mean number of concave portions of the contour.
  • symmetry_mean: Mean symmetry of the nucleus.
  • fractal_dimension_mean: Mean 'coastline approximation' - 1.
  • radius_se: Standard error for the mean of distances from centre to points on the perimeter.
  • texture_se: Standard error for texture.
  • perimeter_se: Standard error for perimeter size.
  • area_se: Standard error for the area.
  • smoothness_se: Standard error for local variation in radius lengths.
  • compactness_se: Standard error for compactness.
  • concavity_se: Standard error for concavity.
  • concave points_se: Standard error for concave points.
  • symmetry_se: Standard error for symmetry.
  • fractal_dimension_se: Standard error for fractal dimension.
  • radius_worst: "Worst" or largest mean value for radius.
  • texture_worst: "Worst" or largest mean value for texture.
  • perimeter_worst: "Worst" or largest mean value for perimeter.
  • area_worst: "Worst" or largest mean value for area.
  • smoothness_worst: "Worst" or largest mean value for smoothness.
  • compactness_worst: "Worst" or largest mean value for compactness.
  • concavity_worst: "Worst" or largest mean value for concavity.
  • concave points_worst: "Worst" or largest mean value for concave points.
  • symmetry_worst: "Worst" or largest mean value for symmetry.
  • fractal_dimension_worst: "Worst" or largest mean value for fractal dimension.

Distribution

  • Format: CSV (Comma Separated Values)
  • Size: Approximately 125.2 kB
  • Structure: 32 Columns and 569 Valid Rows
  • Data Types:
    • Numeric (Float) for all feature columns (mean, standard error, worst).
    • Categorical for diagnosis (2 unique values: B and M).
    • Integer/Numeric for id.

Usage

  • Binary Classification: Training algorithms to predict if a tumour is Malignant or Benign based on physical features.
  • Exploratory Data Analysis (EDA): Visualising correlations between tumour size, texture, and malignancy.
  • Feature Selection: Identifying which cell nucleus characteristics are most indicative of cancer.
  • Educational Training: Teaching beginners the fundamentals of healthcare data science and logistic regression.
  • Diagnostic Tool Development: Prototyping support systems for medical professionals.

Coverage

  • Domain: Healthcare and Oncology.
  • Scope: The data focuses on the physical attributes of cell nuclei derived from breast mass images.
  • Demographics: Patient IDs are anonymised; data represents a cohort of 569 distinct cases.
  • Completeness: The file reports 0% missing data and 0% mismatched values across all columns.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For building and testing classification algorithms.
  • Medical Researchers: To analyse morphological features of breast masses.
  • Students: As a standard practice set for machine learning coursework.
  • Healthcare Analysts: To model diagnostic trends.

Dataset Name Suggestions

  • Breast Cancer Diagnostic Metrics
  • Benign vs Malignant Tumour Classification Data
  • Cell Nuclei Features for Cancer Detection
  • Oncology Diagnostic Parameter Set

Attributes

Original Data Source: Breast Cancer Diagnostic Metrics

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

03/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format