Breast Cancer Wisconsin Diagnostic Features
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Digitised images of a fine-needle aspirate (FNA) biopsy of a breast mass serve as the foundation for this dataset. The primary objective is to classify tumours as either malignant (cancerous) or benign (non-cancerous) by analysing the characteristics of cell nuclei present in the images. Derived from detailed image analysis, the data captures specific metrics regarding the shape and texture of the cell nuclei. This collection acts as a standard benchmark for binary classification tasks within machine learning and medical imaging research, facilitating the development of automated diagnostic tools.
Columns
The dataset contains 32 columns. The first is a unique identifier (ID), and the second is the target variable, 'diagnosis', which labels samples as Malignant (M) or Benign (B). The remaining 30 columns consist of numeric features representing ten distinct characteristics of the cell nuclei:
- Radius: Distance from the centre to the perimeter.
- Texture: Standard deviation of grey-scale values, indicating visual roughness.
- Perimeter: The total length of the boundary of the cell nucleus.
- Area: The pixel count inside the nucleus.
- Smoothness: The local variation in radius lengths.
- Compactness: Computed as the perimeter squared divided by the area, minus 1.0.
- Concavity: The severity of indentations on the contour of the nucleus.
- Concave Points: The count of concave portions on the contour.
- Symmetry: A measure of the symmetry of the nucleus shape.
- Fractal Dimension: An approximation of the "coastline" complexity of the cell border.
For each of these ten characteristics, three specific statistics are recorded to capture the distribution of values within the image:
- Mean: The average value.
- Standard Error (se): The variation in the measurement.
- Worst: The largest (mean of the three largest) value found.
Distribution
The file, formatted as "Breast Cancer Wisconsin.csv", contains 569 distinct sample records with no missing values. Of these samples, 357 (63%) are identified as Benign, while 212 (37%) are identified as Malignant. The feature values vary significantly in scale; for instance,
area_mean ranges from approximately 144 to 2,500, whereas smoothness_mean ranges from 0.05 to 0.16.Usage
This data is ideally suited for:
- Training and testing machine learning models for binary classification.
- Research in medical imaging and computer-aided diagnosis (CAD).
- Educational purposes to demonstrate feature extraction and classification techniques in oncology.
- Evaluating the importance of different morphological features in predicting malignancy.
Coverage
The data represents a static historical record with an expected update frequency of "Never", making it suitable for reproducible research. The samples are specific to breast cancer diagnosis via FNA biopsy. The filename suggests an origin associated with Wisconsin.
License
CC0: Public Domain
Who Can Use It
- Data Scientists
- Medical Researchers
- Machine Learning Engineers
- Students in Bioinformatics or Health Informatics
Dataset Name Suggestions
- Breast Cancer Wisconsin Diagnostic Features
- FNA Biopsy Cell Nuclei Morphology
- Wisconsin Breast Cancer Classification Data
- Malignant vs Benign Tumor Metrics
Attributes
Original Data Source: Breast Cancer Wisconsin Diagnostic Features
Loading...
