Opendatabay APP

ULTIMATE EHR + Genomics + PK/PD + Multi-Omics Supersuite – 1,6m Record

Synthetic Images & Vision Datasets

Tags and Keywords

Synthetic

Data

Ehr

Genomics

Pharmacokinetics

Pharmacodynamics

Epigenetics

Wearables

Multi-omics

Precision

Medicine

Biotech

Ai

Pharma

Research

Longitudinal

Clinical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
ULTIMATE EHR + Genomics + PK/PD + Multi-Omics Supersuite – 1,6m Record Dataset on Opendatabay data marketplace

"No reviews yet"

£89

About

This is the world’s most comprehensive synthetic clinical data package combining EHR, genomics, epigenetics, pharmacokinetics/pharmacodynamics, and wearable health data in a single integrated suite. With 1,6millio records across 6 interconnected tables and 10,000 synthetic patients, it delivers unparalleled depth for pharma R&D, biotech AI development, precision medicine research, and multi-omics modeling without any real patient data or privacy risk.

Dataset Features

patients.csv (10,000 patients):
patient_id: Unique synthetic patient identifier linking all tables.
country: Patient’s country (USA, UK, Germany, Hungary, China, etc.).
birth_year: Year of birth (1940–2005 range for diverse age groups).
sex: Biological sex (Male, Female, Other).
ethnicity: Ethnic background (Caucasian, Asian, African, Hispanic, Mixed).
income_usd: Annual income in USD. encounters.csv (40,000 clinical encounters):
patient_id: Linked to patients table.
encounter_date: Date of clinical visit (2015–2024).
icd10: Primary diagnosis code (E11, I10, E78, J45, etc.).
severity: Disease severity (mild, moderate, severe). genomics.csv (1,579,000 SNP records):
patient_id: Linked to patients table.
snp_id: Single nucleotide polymorphism identifier.
genotype: Allele combination (AA, AG, GG, CT, TT).
risk_score: Polygenic risk score for disease prediction. pkpd.csv (2,306,000 drug records):
patient_id: Linked to patients table.
drug: Drug name or ATC code.
dose_mg: Prescribed dose in milligrams.
clearance_l_h: Drug clearance rate (liters per hour).
half_life_h: Drug elimination half-life in hours. epigenetics.csv (799,868 epigenetic markers):
patient_id: Linked to patients table.
methylation_site: DNA methylation site identifier.
level: Methylation level (0–1 scale).
histone_modification: Type of histone modification (H3K4me3, H3K27ac, H3K9me3). wearables.csv (1,647,000 daily records):
patient_id: Linked to patients table.
date: Measurement date.
hr_bpm: Heart rate in beats per minute.
steps: Daily step count.
sleep_hours: Hours of sleep per day.
  • Column 1 Name: Description of what this column represents.
  • Column 2 Name: Add as needed...

Distribution

• Adatformátum: 6 CSV fájl egyetlen ZIP archívumba csomagolva. • Adatmennyiség: Több mint 1,6 millió rekord 6 táblázatban, amelyek tartalmazzák a következöket: • patients.csv: 10 000 szintetikus beteg • encounters.csv: 40 000 klinikai találkozás • genomics.csv: 1 579 000 SNP rekord • pkpd.csv: 2 306 000 gyógyszerrekord (farmakokinetikai/farmakodinamikai adatok) • epigenetics.csv: 799 868 epigenetikus marker • wearables.csv: 1 647 000 napi viselhetó egészségügyi adatpont • Szerkezet: Relációs adatmodell, amelyben a patient_id elsödleges kulcsként összekapcsolja az összes táblázatot, lehetóvé téve a komplex, többdimenziós elemzést. • Adatmennyiség részletei: Sorok/rekordok száma az egyes fájlokban, oszlopok száma és a táblák összekapcsolhatósága a betegazonosító segítségével.
  • Data Volume: Number of rows/records, number of columns, etc.

Usage

Ez az adathalmaz ideális számos alkalmazáshoz:
Alkalmazás: Precision medicine AI training – predicting disease risk, drug response, and treatment outcomes using integrated genomic, clinical, and lifestyle data.
Alkalmazás: Pharma R&D and clinical trial simulation – modeling patient populations, dosing strategies, PK/PD relationships, and adverse event prediction.
Alkalmazás: Multi-omics research – studying gene-environment-drug interactions, epigenetic modifications, and personalized health monitoring.
Alkalmazás: Healthcare AI platform development – building and testing EHR analytics, risk stratification, and clinical decision support systems with realistic synthetic data.
  • Application: Brief description of the first use case.
  • Application: Add more as needed.

Coverage

Proprietary enterprise-grade synthetic healthcare data suite. Permitted for internal research, AI/ML development, pharmaceutical R&D, academic studies, and commercial product development. Redistribution or resale of raw data prohibited without license. Időtartomány: Clinical encounters 2015–2024; wearable data 2024; genomic and epigenetic data timeless but contemporary.
Demográfiai adatok: Age range 20–85 years (birth years 1940–2005), balanced sex distribution, diverse ethnicities, wide income spectrum.
  • Geographic Coverage: Region, country, or global.
  • Time Range: Start date - End date of data collection.
  • Demographics (if applicable): Age groups, gender, industries, etc.

License

Proprietary

Who Can Use It

Adattudósok: Training deep learning models for precision medicine, patient outcome prediction, drug response modeling, and multi-omics integration.
Kutatók: Academic and pharmaceutical research on gene-drug interactions, epigenetic mechanisms, longitudinal health patterns, and synthetic data validation studies.
Vállalkozások: Pharma companies, biotech startups, healthtech platforms, clinical AI developers, and genomics firms building products requiring realistic but privacy-safe clinical data at scale. További megjegyzés: 100% synthetic, GDPR/HIPAA compliant, zero re-identification risk. All 6 tables are relationally linked via patient_id for seamless join operations and complex analytical workflows. Perfect for proof-of-concept, regulatory submissions, and large-scale AI training without PHI concerns.
  • Data Scientists: For training machine learning models.
  • Researchers: For academic or scientific studies.
  • Businesses: For analysis, insights, or AI development.

Include any additional notes or context about the dataset that might be helpful for users.

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

02/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

£89

Download Dataset in ZIP Format