Breast Cell Morphological Attribute Set
Patient Health Records & Digital Health
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The Breast Cancer Dataset contains detailed information concerning individuals diagnosed with breast cancer. This collection focuses on capturing essential demographic details, specific tumor characteristics—including dimensions, structure, and position—and corresponding clinical results. This resource is vital for medical study, supporting the development of successful predictive modelling and therapeutic protocols aimed at advancing breast cancer treatment and improving patient recovery.
Columns
The dataset is structured with 32 distinct columns. Key variables detailing the characteristics of the cell nuclei include:
- id: The unique identification number assigned to each patient record.
- diagnosis: The primary classification, indicating whether the tumor is Malignant (M) or benign (B).
- radius_mean: The average measure of the tumor's radius.
- texture_mean: The average measure of the tumor's surface texture.
- perimeter_mean: The average length of the tumor's boundary.
- area_mean: The average calculated surface area of the tumor.
- smoothness_mean: The average measure of the local variation in radius lengths.
- compactness_mean: The average measure of the tumor's compactness.
- concavity_mean: The average severity of concave portions of the contour.
- concave points_mean: The average number of concave portions found on the tumor contour.
- symmetry_mean: The average measure of the tumor's symmetry.
- fractal_dimension_mean: The average calculation of the tumor's fractal dimension.
- Note: Standard error (
_se
) and "worst" or largest (_worst
) values are also provided for these morphological attributes.
Distribution
The data resides in a single CSV file named
data.csv
, which has a file size of 125.2 kB. The collection contains 569 valid records across all statistical fields. Data integrity is high, with zero reported mismatched or missing values for these fields (100% validity). The distribution of diagnoses shows that 63% of records are classified as benign (B), while 37% are classified as malignant (M). The expected update frequency for this dataset is never.Usage
This data product is suited for a variety of high-impact analytical and research goals:
- Building and testing machine learning models focused on binary classification for malignancy prediction based on cell morphology.
- Conducting statistical research to understand the relationships between specific tumor metrics (such as mean area or texture) and clinical outcomes.
- Refining existing clinical guidelines and treatment procedures for breast cancer management.
- Serving as a benchmark dataset for validating novel algorithms used in medical diagnostics.
Coverage
The dataset focuses primarily on the detailed morphological and clinical attributes of breast cancer tumors, alongside fundamental patient identifying data. It includes measurements crucial for detailed pathological analysis. Specific geographical origins or defined time ranges for data collection are not detailed.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: Ideal for training classification algorithms, feature selection, and model tuning in a medical context.
- Oncology Researchers: To investigate early markers of aggressive disease or validate diagnostic hypotheses using quantitative data.
- Biostatisticians: For performing statistical modelling and deriving key metrics related to tumor progression.
- Academics and Students: An excellent, well-structured dataset for educational projects in healthcare analytics and artificial intelligence.
Dataset Name Suggestions
- Malignancy Diagnostic Prediction Data
- Clinical Tumor Biometric Records
- Breast Cell Morphological Attribute Set
- Patient Outcome and Tumor Metric Data
Attributes
Original Data Source: Breast Cell Morphological Attribute Set