Breast Cancer Diagnostic Metrics
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Detecting and classifying breast cancer tumours constitutes a critical task in medical diagnostics and healthcare informatics. This data product provides a robust collection of diagnostic parameters computed from digitized images of a fine needle aspirate (FNA) of a breast mass. The primary objective of the data is to facilitate the prediction of whether a cancer diagnosis is Benign (B) or Malignant (M). Featuring 32 distinct columns, the file captures various characteristics of cell nuclei present in the images, offering a rich ground for performing Exploratory Data Analysis (EDA), visualisation, and training machine learning classification models. It serves as a valuable resource for beginners and professionals in healthcare analytics aiming to improve diagnostic accuracy through predictive modelling.
Columns
id: Unique identification number assigned to each patient.diagnosis: The target variable indicating the cancer status, classified as either Malignant (M) or Benign (B).radius_mean: The mean of distances from the centre to points on the perimeter.texture_mean: Standard deviation of grey-scale values (mean).perimeter_mean: Mean size of the core tumour.area_mean: Mean area of the tumour.smoothness_mean: Mean of local variation in radius lengths.compactness_mean: Mean of perimeter^2 / area - 1.0.concavity_mean: Mean severity of concave portions of the contour.concave points_mean: Mean number of concave portions of the contour.symmetry_mean: Mean symmetry of the nucleus.fractal_dimension_mean: Mean 'coastline approximation' - 1.radius_se: Standard error for the mean of distances from centre to points on the perimeter.texture_se: Standard error for texture.perimeter_se: Standard error for perimeter size.area_se: Standard error for the area.smoothness_se: Standard error for local variation in radius lengths.compactness_se: Standard error for compactness.concavity_se: Standard error for concavity.concave points_se: Standard error for concave points.symmetry_se: Standard error for symmetry.fractal_dimension_se: Standard error for fractal dimension.radius_worst: "Worst" or largest mean value for radius.texture_worst: "Worst" or largest mean value for texture.perimeter_worst: "Worst" or largest mean value for perimeter.area_worst: "Worst" or largest mean value for area.smoothness_worst: "Worst" or largest mean value for smoothness.compactness_worst: "Worst" or largest mean value for compactness.concavity_worst: "Worst" or largest mean value for concavity.concave points_worst: "Worst" or largest mean value for concave points.symmetry_worst: "Worst" or largest mean value for symmetry.fractal_dimension_worst: "Worst" or largest mean value for fractal dimension.
Distribution
- Format: CSV (Comma Separated Values)
- Size: Approximately 125.2 kB
- Structure: 32 Columns and 569 Valid Rows
- Data Types:
- Numeric (Float) for all feature columns (mean, standard error, worst).
- Categorical for
diagnosis(2 unique values: B and M). - Integer/Numeric for
id.
Usage
- Binary Classification: Training algorithms to predict if a tumour is Malignant or Benign based on physical features.
- Exploratory Data Analysis (EDA): Visualising correlations between tumour size, texture, and malignancy.
- Feature Selection: Identifying which cell nucleus characteristics are most indicative of cancer.
- Educational Training: Teaching beginners the fundamentals of healthcare data science and logistic regression.
- Diagnostic Tool Development: Prototyping support systems for medical professionals.
Coverage
- Domain: Healthcare and Oncology.
- Scope: The data focuses on the physical attributes of cell nuclei derived from breast mass images.
- Demographics: Patient IDs are anonymised; data represents a cohort of 569 distinct cases.
- Completeness: The file reports 0% missing data and 0% mismatched values across all columns.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For building and testing classification algorithms.
- Medical Researchers: To analyse morphological features of breast masses.
- Students: As a standard practice set for machine learning coursework.
- Healthcare Analysts: To model diagnostic trends.
Dataset Name Suggestions
- Breast Cancer Diagnostic Metrics
- Benign vs Malignant Tumour Classification Data
- Cell Nuclei Features for Cancer Detection
- Oncology Diagnostic Parameter Set
Attributes
Original Data Source: Breast Cancer Diagnostic Metrics
Loading...
