Cancer Visual Characteristics Dataset
Clinical Trials & Research
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides characteristics of patients diagnosed with cancer, offering a valuable resource for developing and evaluating models aimed at cancer diagnosis [1, 2]. It is designed to assist in understanding and analysing visual features of cancer to improve diagnostic methods [2].
Columns
The dataset includes the following columns:
- id: A unique identifier for each patient [1].
- diagnosis: Indicates the type of cancer, with 'M' for Malignant (Benign) and 'B' for Benign (Malignant) [1].
- radius_mean: The mean value of the cancer's radius [3, 4].
- texture_mean: The mean value of the cancer's texture [3, 5].
- perimeter_mean: The mean value of the cancer's perimeter [3, 5].
- area_mean: The mean value of the cancer's area [3, 6].
- smoothness_mean: The mean value of the cancer's smoothness [3, 7].
- compactness_mean: The mean value of the cancer's compactness [3, 7].
- concavity_mean: The mean value of the cancer's concavity [3, 8].
- concave points_mean: The mean value of the cancer's concave points [3, 9].
- symmetry_mean: The mean value of the cancer's symmetry [9].
- fractal_dimension_mean: The mean value of the cancer's fractal dimension [10].
- radius_se: The standard error of the cancer's radius [11].
- texture_se: The standard error of the cancer's texture [11].
- perimeter_se: The standard error of the cancer's perimeter [12].
- area_se: The standard error of the cancer's area [13].
- smoothness_se: The standard error of the cancer's smoothness [13].
- compactness_se: The standard error of the cancer's compactness [14].
- concavity_se: The standard error of the cancer's concavity [14].
- concave points_se: The standard error of the cancer's concave points [15].
- symmetry_se: The standard error of the cancer's symmetry [15].
- fractal_dimension_se: The standard error of the cancer's fractal dimension [16].
- radius_worst: The "worst" or largest mean value of the cancer's radius [16].
- texture_worst: The "worst" or largest mean value of the cancer's texture [17].
- perimeter_worst: The "worst" or largest mean value of the cancer's perimeter [18].
- area_worst: The "worst" or largest mean value of the cancer's area [18].
- smoothness_worst: The "worst" or largest mean value of the cancer's smoothness [19].
- compactness_worst: The "worst" or largest mean value of the cancer's compactness [20].
- concavity_worst: The "worst" or largest mean value of the cancer's concavity [20].
- concave points_worst: The "worst" or largest mean value of the cancer's concave points [21].
- symmetry_worst: The "worst" or largest mean value of the cancer's symmetry [21].
- fractal_dimension_worst: The "worst" or largest mean value of the cancer's fractal dimension [22].
Distribution
The dataset is provided as a CSV file named
Cancer_Data.csv
[23]. It has a size of 125.2 kB and contains 32 columns [23]. There are 569 valid records within the dataset [4].Usage
This dataset is ideal for training or testing machine learning models and algorithms used to make cancer diagnoses [2]. Specific use cases include:
- Binary classification problems, such as predicting cancer type (Malignant or Benign) using visual features [24].
- Applying K-Nearest Neighbors (KNN) to diagnose cancer by considering neighbourhood relationships among patients with similar characteristics [24].
- Utilising Support Vector Machines (SVM) for classification tasks, particularly in two-class problems, by focusing on the clear separation of cancer types [25].
- Logistic Regression can be applied effectively for predicting cancer type [24].
Coverage
The sources do not provide explicit details regarding the geographic, time range, or demographic scope of the patient data included in this dataset.
License
CC BY-NC-SA 4.0
Who Can Use It
This dataset is suitable for:
- Machine learning practitioners and researchers who aim to train or test algorithms for medical diagnosis [2].
- Data scientists looking to apply classification techniques like Logistic Regression, KNN, or SVM to real-world medical data [24, 25].
- Students and educators for educational purposes, exploring data analysis and machine learning in a healthcare context [26].
- Anyone interested in improving cancer-related visual features and diagnostic accuracy [2].
Dataset Name Suggestions
- Breast Cancer Diagnostic Features
- Breast Cancer Classification Data
- Cancer Visual Characteristics Dataset
- Medical Image Feature Analysis for Cancer
- Wisconsin Breast Cancer (Diagnostic) Data
Attributes
Original Data Source:Cancer Visual Characteristics Dataset