Opendatabay APP

Clinical Breast Cancer Dataset

Patient Health Records & Digital Health

Tags and Keywords

Cancer

Breast

Patient

Surgery

Medical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Clinical Breast Cancer Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains real breast cancer patient data, primarily intended for healthcare and cancer data analysis. It is particularly useful for hypothesis testing and statistical analysis due to its structure and content. The dataset includes information on a group of breast cancer patients who have undergone tumour removal surgery. With over 400 rows, it serves as an excellent starting point for beginners interested in data analysis.

Columns

  • Patient_ID: A unique identifier for each patient.
  • Age: The patient's age at diagnosis, expressed in years.
  • Gender: Indicates the patient's gender, recorded as Male or Female.
  • Protein1, Protein2, Protein3, Protein4: Represent expression levels of four distinct proteins, with units currently undefined.
  • Tumour_Stage: The stage of the tumour, classified as I, II, or III.
  • Histology: Describes the histological type of the tumour, including Infiltrating Ductal Carcinoma, Infiltrating Lobular Carcinoma, and Mucinous Carcinoma.
  • ER status: The Oestrogen Receptor status of the tumour, recorded as Positive or Negative.
  • PR status: The Progesterone Receptor status of the tumour, recorded as Positive or Negative.
  • HER2 status: The Human Epidermal growth factor Receptor 2 status of the tumour, recorded as Positive or Negative.
  • Surgery_type: The type of surgery performed, such as Lumpectomy, Simple Mastectomy, Modified Radical Mastectomy, or Other.
  • Date_of_Surgery: The specific date on which the surgery was carried out, in DD-MON-YY format.
  • Date_of_Last_Visit: The date of the patient's last visit, in DD-MON-YY format. This field may be null if the patient did not visit again after the surgery.
  • Patient_Status: The patient's current status (Alive/Dead). This field may be null if the patient did not visit again after surgery and no further information is available regarding their status.

Distribution

The dataset is typically provided in a CSV format and is structured as tabular data. It contains a substantial number of records, exceeding 400 rows in total, making it suitable for analysis. Specifically, variables like Patient_ID, Age, Gender, Protein levels, Tumour_Stage, Histology, ER/PR/HER2 status, Surgery_type, and Date_of_Surgery have 334 valid entries. The Date_of_Last_Visit column has 317 valid entries, while Patient_Status has 321 valid entries. A small percentage of entries are missing across several columns, ranging from 2% to 7% of the total values for these variables.

Usage

This dataset is ideally suited for:
  • Conducting hypothesis testing in medical research.
  • Performing statistical analysis related to breast cancer outcomes.
  • Healthcare data analysis and cancer data analysis.
  • Beginner-level data projects and educational purposes due to its accessible size and structure.

Coverage

The dataset covers a short time frame for patient records. Surgical dates range from 15th January 2017 to 21st November 2019, while dates of last visit extend from 5th April 2017 to 24th September 2026. Demographically, the patients' ages at diagnosis range from 29 to 90 years, with a mean age of 58.9 years. The patient cohort is predominantly female, accounting for 97% of the recorded genders. Notably, some data points for Date_of_Last_Visit and Patient_Status may be null if patients did not have follow-up visits or if their status after surgery is unknown.

License

CC0: Public Domain

Who Can Use It

  • Healthcare researchers and medical professionals for studying patient outcomes and characteristics.
  • Data scientists and analysts for developing predictive models or conducting statistical inquiries.
  • Oncology specialists for understanding various aspects of breast cancer and treatment.
  • Students and educators in data science, statistics, or health informatics as a practical learning dataset.

Dataset Name Suggestions

  • Breast Cancer Patient Records
  • Oncology Surgical Outcomes Data
  • Patient Breast Tumour Characteristics
  • Clinical Breast Cancer Dataset

Attributes

Original Data Source:Clinical Breast Cancer Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

22/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format