Breast Cancer Classification Data
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a clean version of the Breast Cancer Wisconsin (original) dataset, designed for logistic regression analysis. It contains real patient data, enabling the classification of a dependent variable into either malignant or benign diagnoses. The dataset is ready for analysis, featuring no missing values [1].
Columns
The dataset contains 10 columns, including the dependent 'Class' variable, and 9 independent variables related to cell characteristics:
- Clump Thickness: Describes the thickness of cell clumps.
- Label Count: Ranges from 1.00 - 1.90 (139 counts) to 9.10 - 10.00 (69 counts) [2].
- Mean: 4.44, Std. Deviation: 2.82 [2].
- Quantiles: Min 1, 25% 2, 50% 4, 75% 6, Max 10 [2].
- Uniformity of Cell Size: Measures the uniformity in cell size.
- Label Count: Ranges from 1.00 - 1.90 (373 counts) to 9.10 - 10.00 (67 counts) [2, 3].
- Mean: 3.15, Std. Deviation: 3.06 [3].
- Quantiles: Min 1, 25% 1, 50% 1, 75% 5, Max 10 [3].
- Uniformity of Cell Shape: Indicates the uniformity in cell shape.
- Label Count: Ranges from 1.00 - 1.90 (346 counts) to 9.10 - 10.00 (58 counts) [3].
- Mean: 3.22, Std. Deviation: 2.99 [3, 4].
- Quantiles: Min 1, 25% 1, 50% 1, 75% 5, Max 10 [4].
- Marginal Adhesion: Reflects the degree of cell adhesion to each other.
- Label Count: Ranges from 1.00 - 1.90 (393 counts) to 9.10 - 10.00 (55 counts) [4].
- Mean: 2.83, Std. Deviation: 2.86 [4].
- Quantiles: Min 1, 25% 1, 50% 1, 75% 4, Max 10 [4].
- Single Epithelial Cell Size: Size of a single epithelial cell.
- Label Count: Ranges from 1.00 - 1.90 (44 counts) to 9.10 - 10.00 (31 counts) [4, 5].
- Mean: 3.23, Std. Deviation: 2.22 [5].
- Quantiles: Min 1, 25% 2, 50% 2, 75% 4, Max 10 [5].
- Bare Nuclei: Describes the presence of bare nuclei.
- Label Count: Ranges from 1.00 - 1.90 (402 counts) to 9.10 - 10.00 (132 counts) [5].
- Mean: 3.54, Std. Deviation: 3.64 [5].
- Quantiles: Min 1, 25% 1, 50% 1, 75% 6, Max 10 [5].
- Bland Chromatin: Refers to the chromatin's texture.
- Label Count: Ranges from 1.00 - 1.90 (150 counts) to 9.10 - 10.00 (20 counts) [6].
- Mean: 3.45, Std. Deviation: 2.45 [6].
- Quantiles: Min 1, 25% 2, 50% 3, 75% 5, Max 10 [6].
- Normal Nucleoli: Indicates the normality of nucleoli.
- Label Count: Ranges from 1.00 - 1.90 (432 counts) to 9.10 - 10.00 (60 counts) [6, 7].
- Mean: 2.87, Std. Deviation: 3.05 [7].
- Quantiles: Min 1, 25% 1, 50% 1, 75% 4, Max 10 [7].
- Mitoses: Counts the number of mitoses.
- Label Count: Ranges from 1.00 - 1.90 (563 counts) to 9.10 - 10.00 (14 counts) [7].
- Mean: 1.6, Std. Deviation: 1.73 [7].
- Quantiles: Min 1, 25% 1, 50% 1, 75% 1, Max 10 [7].
- Class: The dependent variable, indicating the diagnosis.
- Label Count: 2.00 - 2.20 (444 counts) and 3.80 - 4.00 (239 counts) [7].
- Mean: 2.7, Std. Deviation: 0.95 [8].
- Quantiles: Min 2, 25% 2, 50% 2, 75% 4, Max 4 [8].
Distribution
The dataset is provided in a CSV format [2] and has a size of 15.02 kB [2]. It consists of 10 columns [2]. The original dataset contained 699 observations [1], and the cleaned version has 683 valid observations across all detailed columns [2-8]. There are no missing values [1].
Usage
This dataset is ideal for logistic regression analysis and is particularly suitable for classifying breast cancer as malignant or benign [1]. It can be used for developing and testing predictive models in medical diagnosis.
Coverage
The dataset is derived from the Breast Cancer Wisconsin (original) dataset [1]. It is a clean dataset with no missing values [1], ensuring high data quality for analysis. No specific geographic, time range, or demographic scope details are available within the provided sources.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning practitioners developing classification models.
- Researchers in the field of oncology and medical diagnostics.
- Students learning about logistic regression and binary classification.
Dataset Name Suggestions
- Breast Cancer Classification Data
- Wisconsin Breast Cancer Prediction Dataset (Cleaned)
- Malignant/Benign Breast Cancer Data
- Logistic Regression Breast Cancer Dataset
Attributes
Original Data Source: Breast Cancer Classification Data