ULTIMATE EHR + Genomics + PK/PD + Multi-Omics Supersuite – 1,6m Record
Synthetic Images & Vision Datasets
Tags and Keywords
Trusted By




"No reviews yet"
£89
About
This is the world’s most comprehensive synthetic clinical data package combining EHR, genomics, epigenetics, pharmacokinetics/pharmacodynamics, and wearable health data in a single integrated suite. With 1,6millio records across 6 interconnected tables and 10,000 synthetic patients, it delivers unparalleled depth for pharma R&D, biotech AI development, precision medicine research, and multi-omics modeling without any real patient data or privacy risk.
Dataset Features
patients.csv (10,000 patients):
patient_id: Unique synthetic patient identifier linking all tables.
country: Patient’s country (USA, UK, Germany, Hungary, China, etc.).
birth_year: Year of birth (1940–2005 range for diverse age groups).
sex: Biological sex (Male, Female, Other).
ethnicity: Ethnic background (Caucasian, Asian, African, Hispanic, Mixed).
income_usd: Annual income in USD. encounters.csv (40,000 clinical encounters):
patient_id: Linked to patients table.
encounter_date: Date of clinical visit (2015–2024).
icd10: Primary diagnosis code (E11, I10, E78, J45, etc.).
severity: Disease severity (mild, moderate, severe). genomics.csv (1,579,000 SNP records):
patient_id: Linked to patients table.
snp_id: Single nucleotide polymorphism identifier.
genotype: Allele combination (AA, AG, GG, CT, TT).
risk_score: Polygenic risk score for disease prediction. pkpd.csv (2,306,000 drug records):
patient_id: Linked to patients table.
drug: Drug name or ATC code.
dose_mg: Prescribed dose in milligrams.
clearance_l_h: Drug clearance rate (liters per hour).
half_life_h: Drug elimination half-life in hours. epigenetics.csv (799,868 epigenetic markers):
patient_id: Linked to patients table.
methylation_site: DNA methylation site identifier.
level: Methylation level (0–1 scale).
histone_modification: Type of histone modification (H3K4me3, H3K27ac, H3K9me3). wearables.csv (1,647,000 daily records):
patient_id: Linked to patients table.
date: Measurement date.
hr_bpm: Heart rate in beats per minute.
steps: Daily step count.
sleep_hours: Hours of sleep per day.
patient_id: Unique synthetic patient identifier linking all tables.
country: Patient’s country (USA, UK, Germany, Hungary, China, etc.).
birth_year: Year of birth (1940–2005 range for diverse age groups).
sex: Biological sex (Male, Female, Other).
ethnicity: Ethnic background (Caucasian, Asian, African, Hispanic, Mixed).
income_usd: Annual income in USD. encounters.csv (40,000 clinical encounters):
patient_id: Linked to patients table.
encounter_date: Date of clinical visit (2015–2024).
icd10: Primary diagnosis code (E11, I10, E78, J45, etc.).
severity: Disease severity (mild, moderate, severe). genomics.csv (1,579,000 SNP records):
patient_id: Linked to patients table.
snp_id: Single nucleotide polymorphism identifier.
genotype: Allele combination (AA, AG, GG, CT, TT).
risk_score: Polygenic risk score for disease prediction. pkpd.csv (2,306,000 drug records):
patient_id: Linked to patients table.
drug: Drug name or ATC code.
dose_mg: Prescribed dose in milligrams.
clearance_l_h: Drug clearance rate (liters per hour).
half_life_h: Drug elimination half-life in hours. epigenetics.csv (799,868 epigenetic markers):
patient_id: Linked to patients table.
methylation_site: DNA methylation site identifier.
level: Methylation level (0–1 scale).
histone_modification: Type of histone modification (H3K4me3, H3K27ac, H3K9me3). wearables.csv (1,647,000 daily records):
patient_id: Linked to patients table.
date: Measurement date.
hr_bpm: Heart rate in beats per minute.
steps: Daily step count.
sleep_hours: Hours of sleep per day.
- Column 1 Name: Description of what this column represents.
- Column 2 Name: Add as needed...
Distribution
• Adatformátum: 6 CSV fájl egyetlen ZIP archívumba
csomagolva.
• Adatmennyiség: Több mint 1,6 millió rekord 6 táblázatban, amelyek tartalmazzák a következöket:
• patients.csv: 10 000 szintetikus beteg
• encounters.csv: 40 000 klinikai találkozás
• genomics.csv: 1 579 000 SNP rekord
• pkpd.csv: 2 306 000 gyógyszerrekord
(farmakokinetikai/farmakodinamikai adatok)
• epigenetics.csv: 799 868 epigenetikus marker
• wearables.csv: 1 647 000 napi viselhetó egészségügyi adatpont
• Szerkezet: Relációs adatmodell, amelyben a patient_id elsödleges kulcsként összekapcsolja az
összes táblázatot, lehetóvé téve a komplex,
többdimenziós elemzést.
• Adatmennyiség részletei: Sorok/rekordok száma az
egyes fájlokban, oszlopok száma és a táblák
összekapcsolhatósága a betegazonosító
segítségével.
- Data Volume: Number of rows/records, number of columns, etc.
Usage
Ez az adathalmaz ideális számos alkalmazáshoz:
Alkalmazás: Precision medicine AI training – predicting disease risk, drug response, and treatment outcomes using integrated genomic, clinical, and lifestyle data.
Alkalmazás: Pharma R&D and clinical trial simulation – modeling patient populations, dosing strategies, PK/PD relationships, and adverse event prediction.
Alkalmazás: Multi-omics research – studying gene-environment-drug interactions, epigenetic modifications, and personalized health monitoring.
Alkalmazás: Healthcare AI platform development – building and testing EHR analytics, risk stratification, and clinical decision support systems with realistic synthetic data.
Alkalmazás: Precision medicine AI training – predicting disease risk, drug response, and treatment outcomes using integrated genomic, clinical, and lifestyle data.
Alkalmazás: Pharma R&D and clinical trial simulation – modeling patient populations, dosing strategies, PK/PD relationships, and adverse event prediction.
Alkalmazás: Multi-omics research – studying gene-environment-drug interactions, epigenetic modifications, and personalized health monitoring.
Alkalmazás: Healthcare AI platform development – building and testing EHR analytics, risk stratification, and clinical decision support systems with realistic synthetic data.
- Application: Brief description of the first use case.
- Application: Add more as needed.
Coverage
Proprietary enterprise-grade synthetic healthcare data suite. Permitted for internal research, AI/ML development, pharmaceutical R&D, academic studies, and commercial product development. Redistribution or resale of raw data prohibited without license.
Időtartomány: Clinical encounters 2015–2024; wearable data 2024; genomic and epigenetic data timeless but contemporary.
Demográfiai adatok: Age range 20–85 years (birth years 1940–2005), balanced sex distribution, diverse ethnicities, wide income spectrum.
Demográfiai adatok: Age range 20–85 years (birth years 1940–2005), balanced sex distribution, diverse ethnicities, wide income spectrum.
- Geographic Coverage: Region, country, or global.
- Time Range: Start date - End date of data collection.
- Demographics (if applicable): Age groups, gender, industries, etc.
License
Proprietary
Who Can Use It
Adattudósok: Training deep learning models for precision medicine, patient outcome prediction, drug response modeling, and multi-omics integration.
Kutatók: Academic and pharmaceutical research on gene-drug interactions, epigenetic mechanisms, longitudinal health patterns, and synthetic data validation studies.
Vállalkozások: Pharma companies, biotech startups, healthtech platforms, clinical AI developers, and genomics firms building products requiring realistic but privacy-safe clinical data at scale. További megjegyzés: 100% synthetic, GDPR/HIPAA compliant, zero re-identification risk. All 6 tables are relationally linked via patient_id for seamless join operations and complex analytical workflows. Perfect for proof-of-concept, regulatory submissions, and large-scale AI training without PHI concerns.
Kutatók: Academic and pharmaceutical research on gene-drug interactions, epigenetic mechanisms, longitudinal health patterns, and synthetic data validation studies.
Vállalkozások: Pharma companies, biotech startups, healthtech platforms, clinical AI developers, and genomics firms building products requiring realistic but privacy-safe clinical data at scale. További megjegyzés: 100% synthetic, GDPR/HIPAA compliant, zero re-identification risk. All 6 tables are relationally linked via patient_id for seamless join operations and complex analytical workflows. Perfect for proof-of-concept, regulatory submissions, and large-scale AI training without PHI concerns.
- Data Scientists: For training machine learning models.
- Researchers: For academic or scientific studies.
- Businesses: For analysis, insights, or AI development.
Include any additional notes or context about the dataset that might be helpful for users.
Loading...
