Synthetic Pancreatic Cancer Patient Records Dataset
Patient Health Records & Digital Health
Related Searches
Trusted By




"No reviews yet"
£19.99
About
The Synthetic Pancreatic Cancer Patient Records Dataset has been developed for educational and research purposes to aid in the analysis of clinical biomarkers, demographic indicators, and diagnostic patterns associated with pancreatic cancer. This fully synthetic and anonymised dataset reflects realistic patient cohorts and laboratory data, offering a valuable resource for exploring early detection markers and cancer progression.
Dataset Features
- Patient Cohort: Group label identifying patient subset (e.g., Cohort1).
- Sample Origin: Tissue source of the sample (e.g., LIV = Liver, BPTB = Biopsy via Transbronchial Needle).
- Age: Patient age in years.
- Sex: Biological sex (Male/Female).
- Diagnosis: Coded disease classification (e.g., 2 = Early Cancer, 3 = Advanced Cancer).
- Stage: Cancer stage based on TNM classification (e.g., IIA, IIB, III), available for a subset of patients.
- Benign Sample Diagnosis: Non-malignant diagnosis (e.g., pancreatitis), available for benign cases only.
- Plasma CA19-9: Blood level of the carbohydrate antigen 19-9, a pancreatic cancer biomarker.
- Creatinine: Renal function marker measured in plasma.
- LYVE1: Lymphatic vessel endothelial hyaluronan receptor 1 concentration, a biomarker candidate.
- REG1B: Regenerating islet-derived protein 1-beta concentration.
- TFF1: Trefoil factor 1 concentration, implicated in mucosal healing and carcinogenesis.
- REG1A: Regenerating islet-derived protein 1-alpha concentration.
Distribution

Usage
This dataset can be used for the following applications:
- Cancer Research: Investigate how biomarkers like CA19-9, REG1A/B, and LYVE1 correlate with pancreatic cancer stages and progression.
- Predictive Modeling: Train models to classify disease stage, malignancy, or predict biomarker levels.
- Clinical Insight: Study the diagnostic overlap between benign and malignant samples and identify early diagnostic signals.
- Educational Purposes: Serve as a comprehensive dataset for training in biomedical data analysis, feature engineering, and model evaluation.
Coverage
This dataset is entirely synthetic and anonymised, modelled to reflect the real-world complexity of pancreatic disease diagnosis. It supports both classification and regression tasks and includes numerical, categorical, and partially missing data fields for a realistic preprocessing experience.
License
CC0 (Public Domain)
Who Can Use It
- Medical Researchers and Oncologists: To explore the diagnostic utility of biomarker combinations.
- Data Scientists: To develop and test robust models for early detection and cancer classification.
- Healthcare Educators and Students: As a resource for practical instruction in oncology data science and medical data handling.