Zalingo Synthetic Healthcare — 30-Day Readmission & LOS Risk — 100k Fo
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£749
About
Zalingo Synthetic Healthcare — 30-Day Readmission & Length-of-Stay (LOS) Risk — 100k Focused Sample (Parquet)
A 100,000-row privacy-safe focused sample built for readmission (30-day) and LOS modelling. It emulates encounter-level EMR data with diagnoses, procedures, meds, labs, vitals, utilization history and ground-truth labels for
readmission_30d
and los_days
/los_bucket
. Ideal for feature engineering, benchmarking models, triage policy what-ifs, and pipeline QA—without handling PHI/PII.Need larger volumes or scheduled refresh? After purchasing this focused sample, message us about enterprise bundles and monthly/weekly/daily subscriptions (readmission/LOS only or mixed clinical domains).
Dataset Features (representative)
- patient_id — Synthetic, non-linkable identifier.
- encounter_id / encounter_type — Inpatient | outpatient | ED | telemed.
- ts_admit_utc / ts_discharge_utc — ISO-8601 timestamps (UTC).
- los_days / los_bucket — Numeric LOS and bucket (e.g., 0–1, 2–3, 4–7, 8+).
- readmission_30d — 0/1 label; readmission_ts_utc when 1.
- age / sex — Demographics at encounter.
- diagnosis_primary (ICD-10) — Code; diagnosis_primary_desc label.
- diagnosis_ccsr_group — High-level diagnostic group (synthetic).
- procedure_code (CPT-like) — Primary procedure code.
- cci_score — Charlson-style comorbidity index (synthetic 0+).
- prior_12m_visits / prior_12m_admits — Utilization history counts.
- payer_type — public | private | self.
- triage_acuity — ED triage 1–5 (if ED encounter).
- vitals: vital_hr, vital_bp_sys, vital_bp_dia, vital_temp_c, vital_spo2.
- labs: lab_name, lab_value, lab_units, lab_flag (normal/abnormal).
- med_atc — Active medication ATC code(s) during encounter.
- procedure_anesthesia_flag — 0/1.
- icu_admit_flag / icu_hours — ICU utilization markers.
- discharge_disposition — home | rehab | transfer | deceased (synthetic).
- sdoH_index — Synthetic social determinants index (0–1).
- country / city — ISO-2 country + synthetic city.
- risk_score_0_1 — Calibrated continuous score for readmission/LOS demo. (Columns can vary slightly by bundle; see the preview CSV for the exact schema.)
Distribution
- Format: ZIP with Parquet shards (Snappy) + README.
- Volume: 100,000 rows, ~22–32 columns, 1–5 parts.
- Size: ~3–6 MB zipped.
- Schema stability: Consistent across readmission/LOS focused bundles; full datasets partitioned by admit_date / facility / acuity.
Usage
- Readmission modelling: baselines, threshold tuning, feature ablations.
- LOS prediction: case-mix adjustment, bed-day planning, discharge planning what-ifs.
- Triage & capacity policy: acuity-based routing experiments.
- Quality & monitoring: drift tests, KPI dashboards, alert simulations.
- Education & enablement: reproducible exercises without PHI.
Coverage
- Geographic: Multi-country synthetic coverage (ISO codes).
- Time Range: Recent multi-year synthetic window with weekly/seasonal patterns.
- PII/PHI: None — fully synthetic; not re-identifiable.
License
Proprietary — internal use rights; redistribution/resale not permitted.
Who Can Use It
- Clinical Analytics & Data Science — feature engineering, baseline models.
- Operations/Bed Management — LOS planning and scenario testing.
- Payers/Providers — readmission risk stratification experiments.
- Vendors/SIs — pipeline QA and demo environments.
Notes / Disclaimers
- Not real patient data. Not for clinical decision-making.
- Codes, rates, and scores follow synthetic calibrated distributions and do not represent any specific provider or population.