Opendatabay APP

Zalingo Synthetic Healthcare, Premium Evaluation Kit - 1M Rows

Synthetic Tabular Data

Tags and Keywords

Synthetic

Data

Healthcare

Emr

Encounters

Readmission

Length

Of

Stay

Risk

Scoring

Triage

Diagnosis

Icd-10

Cpt

Labs

Vitals

Benchmark

Parquet

Notebooks

1m

Rows

Pii-safe

Anonymised

1million

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Zalingo Synthetic Healthcare, Premium Evaluation Kit - 1M Rows Dataset on Opendatabay data marketplace

"No reviews yet"

£1,999

About

Zalingo Synthetic Healthcare — Premium Evaluation Kit (Encounters • Readmission • LOS • Labs/Vitals) — 1M Rows + Notebooks
A premium, end-to-end evaluation kit for clinical analytics and risk modelling. You get ~1,000,000 privacy-safe synthetic encounters with diagnoses, procedures, medications, labs, vitals, utilization history, and labels for 30-day readmission & length-of-stay (LOS)—plus Jupyter notebooks and data dictionaries—so teams can benchmark pipelines and models quickly without handling PHI/PII.
Need a production-scale feed? After purchase, message us about enterprise bundles (tens of millions of rows) and weekly/daily refresh subscriptions via S3/API.

What’s Inside

  • Data (Parquet, Snappy): ~1,000,000 rows, partitioned by date/facility/acuity; includes outcome labels and precomputed features.
  • Notebooks: EDA & quality, clinical feature engineering (ICD/CPT-like, labs/vitals, utilization), baseline models for readmission and LOS with ROC/PR, calibration, cost/bed-day curves.
  • Docs & Schema: Data dictionary, label policy, quick-start, JSON schema examples, optional FHIR mapping notes.

Key Fields (representative)

  • Identifiers/Context: patient_id (synthetic), encounter_id, encounter_type (inpatient|outpatient|ED|telemed), facility_type, country, city.
  • Timing: ts_admit_utc, ts_discharge_utc, los_days, los_bucket.
  • Demographics: age, sex.
  • Diagnoses/Procedures: diagnosis_primary (ICD-10), diagnosis_primary_desc, diagnosis_secondary_codes, procedure_code (CPT-like), diagnosis_ccsr_group.
  • Meds & Vitals: med_atc, med_dose, vital_hr, vital_bp_sys, vital_bp_dia, vital_temp_c, vital_spo2.
  • Labs: lab_name, lab_value, lab_units, lab_flag (normal/abnormal).
  • Utilization & Acuity: prior_12m_visits, prior_12m_admits, triage_acuity, icu_admit_flag, icu_hours.
  • Payer & Discharge: payer_type (public|private|self), discharge_disposition.
  • Outcomes/Labels: readmission_30d (0/1), readmission_ts_utc, risk_score_0_1.
  • Derived Features: cci_score (Charlson-style), rolling lab deltas, vitals instability flags, recent ED count, etc. (Columns may vary slightly; see the included dictionary + preview for the exact schema.)

Distribution

  • Format: ZIP with /data (Parquet), /notebooks, /docs, /schema.
  • Volume: ~1,000,000 rows, 25–50 columns, multi-part Parquet.
  • Approx Size: 60–150 MB zipped (category-dependent).
  • Partitioning: by admit_date / facility_id / acuity for efficient reads.

Usage

  • Readmission & LOS modelling — baselines, calibration, threshold & resource trade-offs.
  • Triage & capacity planning — what-if routing, bed-day optimisation.
  • Quality & monitoring — drift probes, KPI dashboards, alert simulations.
  • Education & enablement — reproducible exercises without PHI/PII.

Coverage

  • Geographic: Multi-country synthetic coverage (ISO codes).
  • Time Range: Recent multi-year synthetic window with weekly/seasonal patterns.
  • PHI/PII: None — fully synthetic; not re-identifiable.

Who Can Use It

  • Clinical Analytics & Data Science, Operations/Bed Management, Payers/Providers, Vendors/SIs for demos and validation.

Notes / Disclaimers

  • Not real patient data. Not for clinical decision-making.
  • Codes, rates, and scores follow synthetic calibrated distributions and do not represent any specific provider or population.

Evaluation License (Non-Production, Internal Use Only — 90 Days) Buyer is granted a non-exclusive, non-transferable license to use the data and included assets solely for internal evaluation, prototyping, and testing for 90 days from purchase. No production use, external distribution, resale, sublicensing, or sharing beyond Buyer’s employees and on-site contractors under NDA. Derived models/features may be retained for internal research; production deployment requires a separate enterprise license. All materials are provided “as is” without warranties; liability limited to the amount paid.

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

£1,999

Download Dataset in Parquet Format