Opendatabay APP

Synthetic Colorectal Cancer Global Dataset

Patient Health Records & Digital Health

Tags and Keywords

Colorectal

Cancer

Data

Synthetic

Ai

Ml

Llm

Dataset

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Synthetic Colorectal Cancer Global Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

£2.49

About

The Synthetic Colorectal Cancer Global Dataset is a fully anonymised, high-dimensional synthetic dataset designed for global cancer research, predictive modelling, and educational use. It encompasses demographic, clinical, lifestyle, genetic, and healthcare access factors relevant to colorectal cancer incidence, outcomes, and survivability.

Dataset Features

  • Patient_ID: Unique identifier for each patient.
  • Country: Patient's country of residence.
  • Age: Age at diagnosis (in years).
  • Gender: Biological sex of the patient (Male/Female/Other).
  • Cancer_Stage: Stage of colorectal cancer at diagnosis (e.g., Stage I–IV).
  • Tumor_Size_mm: Size of the tumor in millimeters.
  • Family_History: Presence of colorectal cancer in family history (True/False).
  • Smoking_History: Smoking behavior or history (e.g., Current, Former, Never).
  • Alcohol_Consumption: Level of alcohol consumption (e.g., High, Moderate, None).
  • Obesity_BMI: BMI classification related to obesity.
  • Diet_Risk: Diet-related cancer risk (e.g., High Fat, Low Fiber).
  • Physical_Activity: Level of physical activity (e.g., Sedentary, Active).
  • Diabetes: Diabetes diagnosis (True/False).
  • Inflammatory_Bowel_Disease: Presence of IBD (True/False).
  • Genetic_Mutation: Genetic mutations relevant to colorectal cancer (e.g., APC, KRAS).
  • Screening_History: History of cancer screenings (True/False).
  • Early_Detection: Whether cancer was detected early (True/False).
  • Treatment_Type: Primary treatment type (e.g., Surgery, Chemotherapy, Radiation).
  • Survival_5_years: 5-year survival status (True/False).
  • Mortality: Mortality outcome (Alive/Deceased).
  • Healthcare_Costs: Estimated treatment costs (in USD).
  • Incidence_Rate_per_100K: Country-level incidence rate per 100,000 people.
  • Mortality_Rate_per_100K: Country-level mortality rate per 100,000 people.
  • Urban_or_Rural: Patient's living area (Urban/Rural).
  • Economic_Classification: Country's economic level (e.g., Low, Middle, High income).
  • Healthcare_Access: Access level to healthcare services (e.g., Good, Limited).
  • Insurance_Status: Insurance coverage status (Insured/Uninsured).
  • Survival_Prediction: Model-derived survival prediction (probability or binary).

Distribution

Synthetic Colorectal Cancer Global Data Distribution.png

Usage

This dataset can be used for:
  • Global Cancer Research: Analyze how clinical, lifestyle, and socioeconomic factors affect colorectal cancer outcomes worldwide.
  • Predictive Modeling: Develop models to estimate survival probability or treatment outcomes.
  • Healthcare Policy Analysis: Study disparities in healthcare access and outcomes across countries.
  • Educational Use: Support training in epidemiology, oncology, public health, and machine learning.

Coverage

The dataset includes 100% synthetic yet clinically plausible records from diverse countries and demographic groups. It is anonymized and modeled to reflect real-world variability in risk factors, diagnosis stages, treatment, and survival without compromising patient privacy.

License

CC0 (Public Domain)

Who Can Use It

  • Epidemiologists and Medical Researchers: To explore global patterns in colorectal cancer.
  • Public Health Experts and Policymakers: For assessing equity in healthcare access and cancer outcomes.
  • Data Scientists and Educators: As a rich dataset for teaching data analysis, classification, regression, and health informatics.

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

£2.49