Opendatabay APP

Zalingo Synthetic Healthcare Encounters — 100k Sample (Parquet)

Synthetic Tabular Data

Tags and Keywords

Synthetic

Data

Emr

Healthcare

Patient

Records

Encounters

Icd-10

Cpt

Medications

Laboratory

Results

Vitals

Time

Series

Parquet

Csv

Pii-safe

Anonymised

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Zalingo Synthetic Healthcare Encounters — 100k Sample (Parquet) Dataset on Opendatabay data marketplace

"No reviews yet"

£249

About

Zalingo Synthetic Healthcare Encounters — 100k Sample (Parquet)
This listing provides a 100,000-row, privacy-safe synthetic EMR/encounter sample generated by Zalingo’s data refinery. It emulates realistic clinical activity (encounters, diagnoses, procedures, meds, labs, vitals, payer and outcomes) with stable schemas—no PHI/PII and not derived from real patients. Use it to prototype ML features, validate pipelines, benchmark models, and run analytics without access hurdles.
Need larger volumes or scheduled refresh? After purchasing this sample, message us about enterprise-scale bundles and monthly/weekly/daily subscriptions.

Dataset Features (representative)

  • patient_id — Synthetic patient identifier (non-linkable).
  • encounter_id — Synthetic encounter visit ID.
  • ts_utc — Encounter timestamp (ISO-8601, UTC).
  • age — Age at encounter (integer).
  • sex — male | female | other (synthetic distribution).
  • encounter_type — inpatient | outpatient | emergency | telemed.
  • facility_type — hospital | clinic | gp | urgent care.
  • diagnosis_primary — ICD-10 code (synthetic).
  • diagnosis_primary_desc — Human-readable label.
  • diagnosis_secondary_codes — Pipe-delimited ICD-10 list (optional).
  • procedure_code — Procedure code (CPT-like; synthetic).
  • med_atc — Medication ATC code (synthetic).
  • med_dose — Normalised dose text (optional).
  • lab_name — e.g., HbA1c, CRP, WBC.
  • lab_value / lab_units / lab_flag — Numeric result, units, normal|abnormal.
  • vital_hr / vital_bp_sys / vital_bp_dia / vital_temp_c / vital_spo2 — Vitals snapshot.
  • payer_type — public | private | self.
  • length_of_stay_days — For inpatient encounters.
  • discharge_disposition — home | transfer | deceased (synthetic).
  • readmission_30d — 0/1 synthetic outcome flag.
  • country / city — ISO-2 country code + synthetic city.
(Columns can vary slightly by bundle; see included preview CSV for exact schema.)

Distribution

  • Format: ZIP with Parquet shards (Snappy) + README.
  • Volume: 100,000 rows, ~15–25 columns, 1–5 parquet parts.
  • Size: ~3–5 MB zipped (category-dependent).
  • Schema stability: Names/types consistent across healthcare samples; full datasets are date/attribute-partitioned.

Usage

  • Model prototyping: readmission, triage risk, LOS, utilization.
  • Quality & monitoring: pipeline tests, schema & drift checks.
  • Analytics & education: reproducible exercises without PHI.
  • Time-series experiments on visit, lab, and vitals trajectories.

Coverage

  • Geographic: Multi-country synthetic coverage (ISO country codes).
  • Time Range: Recent multi-year synthetic window (not tied to real events).
  • PII/PHI: None. Fully synthetic; not re-identifiable.

License

Proprietary — internal use rights; redistribution/resale not permitted.

Who Can Use It

  • Data Scientists/ML Engineers — feature engineering & baselines.
  • Healthcare Analysts/Researchers — exploratory analysis & benchmarks.
  • Product & Ops — workflow testing and demo environments.

Important Notes / Disclaimers

  • Not real patient data. Not for clinical decision-making.
  • Codes (ICD/ATC/CPT-like) follow synthetic distributions calibrated to public statistics; they do not represent actual prevalence at any provider.

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

£249

Download Dataset in Parquet Format