Zalingo Synthetic Healthcare Encounters — 100k Sample (Parquet)
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£249
About
Zalingo Synthetic Healthcare Encounters — 100k Sample (Parquet)
This listing provides a 100,000-row, privacy-safe synthetic EMR/encounter sample generated by Zalingo’s data refinery. It emulates realistic clinical activity (encounters, diagnoses, procedures, meds, labs, vitals, payer and outcomes) with stable schemas—no PHI/PII and not derived from real patients. Use it to prototype ML features, validate pipelines, benchmark models, and run analytics without access hurdles.
Need larger volumes or scheduled refresh? After purchasing this sample, message us about enterprise-scale bundles and monthly/weekly/daily subscriptions.
Dataset Features (representative)
- patient_id — Synthetic patient identifier (non-linkable).
- encounter_id — Synthetic encounter visit ID.
- ts_utc — Encounter timestamp (ISO-8601, UTC).
- age — Age at encounter (integer).
- sex — male | female | other (synthetic distribution).
- encounter_type — inpatient | outpatient | emergency | telemed.
- facility_type — hospital | clinic | gp | urgent care.
- diagnosis_primary — ICD-10 code (synthetic).
- diagnosis_primary_desc — Human-readable label.
- diagnosis_secondary_codes — Pipe-delimited ICD-10 list (optional).
- procedure_code — Procedure code (CPT-like; synthetic).
- med_atc — Medication ATC code (synthetic).
- med_dose — Normalised dose text (optional).
- lab_name — e.g., HbA1c, CRP, WBC.
- lab_value / lab_units / lab_flag — Numeric result, units, normal|abnormal.
- vital_hr / vital_bp_sys / vital_bp_dia / vital_temp_c / vital_spo2 — Vitals snapshot.
- payer_type — public | private | self.
- length_of_stay_days — For inpatient encounters.
- discharge_disposition — home | transfer | deceased (synthetic).
- readmission_30d — 0/1 synthetic outcome flag.
- country / city — ISO-2 country code + synthetic city.
(Columns can vary slightly by bundle; see included preview CSV for exact schema.)
Distribution
- Format: ZIP with Parquet shards (Snappy) + README.
- Volume: 100,000 rows, ~15–25 columns, 1–5 parquet parts.
- Size: ~3–5 MB zipped (category-dependent).
- Schema stability: Names/types consistent across healthcare samples; full datasets are date/attribute-partitioned.
Usage
- Model prototyping: readmission, triage risk, LOS, utilization.
- Quality & monitoring: pipeline tests, schema & drift checks.
- Analytics & education: reproducible exercises without PHI.
- Time-series experiments on visit, lab, and vitals trajectories.
Coverage
- Geographic: Multi-country synthetic coverage (ISO country codes).
- Time Range: Recent multi-year synthetic window (not tied to real events).
- PII/PHI: None. Fully synthetic; not re-identifiable.
License
Proprietary — internal use rights; redistribution/resale not permitted.
Who Can Use It
- Data Scientists/ML Engineers — feature engineering & baselines.
- Healthcare Analysts/Researchers — exploratory analysis & benchmarks.
- Product & Ops — workflow testing and demo environments.
Important Notes / Disclaimers
- Not real patient data. Not for clinical decision-making.
- Codes (ICD/ATC/CPT-like) follow synthetic distributions calibrated to public statistics; they do not represent actual prevalence at any provider.