Zalingo Synthetic User Behaviour — Engagement & Conversion Signals
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£749
About
Zalingo Synthetic User Behaviour — Engagement & Conversion Signals — 100k Focused Sample (Parquet)
A 100,000-row privacy-safe focused sample purpose-built for engagement scoring, conversion propensity, funnel analytics, and attribution. It blends raw clickstream-style events with precomputed signals (recency/frequency, rolling counts, dwell/scroll metrics, propensity scores, and time-to-convert) so you can prototype features, benchmark models, and validate pipelines—without handling real individuals’ data (no PII).
Need larger volumes or scheduled refresh? After purchasing this focused sample, message us about enterprise bundles and monthly/weekly/daily subscriptions (signals-only or mixed behaviour domains).
Dataset Features (representative)
- event_id / ts_utc — Unique event + timestamp (ISO-8601, UTC).
- user_id / session_id — Synthetic, non-linkable identifiers.
- channel — web | mobile | email | ads | in-app | support.
- event_name — page_view | screen_view | click | search | add_to_cart | purchase | signup | unsubscribe, etc.
- page_url / screen_name / referrer — Normalised location context (when applicable).
- utm_source / utm_medium / utm_campaign / campaign_id — Attribution fields (synthetic).
- device_os / app_version / browser — Client context.
- geo_country / geo_city — ISO-2 + synthetic city.
Precomputed Signals (per user/session/window):
- rfm_recency_days / rfm_frequency_28d / rfm_monetary_90d — RFM-style features.
- roll_events_1d / 7d / 28d — Rolling event counts.
- dwell_time_ms / scroll_depth_pct — Engagement intensity.
- session_duration_s / n_events_session / bounce_flag — Session quality.
- time_to_convert_s — Seconds from first touch to conversion (if any).
- conversion_flag (0/1) / conversion_type — purchase | signup | other.
- propensity_score_0_1 — Calibrated conversion-likelihood proxy.
- next_best_action — Synthetic hint label (e.g., offer_trial, show_review).
- last_touch_channel / first_touch_channel — Attribution snapshots.
- cohort_week / cohort_source — Acquisition/cohort tags.
- days_since_last_active / repeat_visit_flag — Retention context.
- clv_proxy / ltv_bucket — Synthetic value signals for modelling demos.
- consent_flag / gdpr_region_flag — Synthetic compliance indicators.
(Exact columns may vary slightly by bundle; see the included preview CSV for this sample’s schema.)
Distribution
- Format: ZIP with Parquet shards (Snappy) + README.
- Volume: 100,000 rows, ~24–38 columns, 1–5 parts.
- Approx Size: ~2–5 MB zipped.
- Schema stability: Consistent across behaviour-signals focused bundles; full datasets partition by event_date / channel / session_id.
Usage
- Propensity & uplift models — conversion likelihood, treatment targeting (simulation).
- Funnels & cohorts — activation, retention, resurrection.
- Attribution & MMM inputs — last/first-touch snapshots, campaign features.
- Personalisation & ranking — engagement-aware recommendations.
- A/B testing sandboxes — signal shifts under variants.
- Pipeline QA / MLOps — schema checks, drift tests, dashboard demos.
Coverage
- Geographic: Multi-country synthetic coverage (ISO codes).
- Time Range: Recent multi-year synthetic window with weekly/seasonal patterns.
- PII: None — fully synthetic; not re-identifiable.
License
Proprietary — internal use rights; redistribution/resale not permitted.
Who Can Use It
- Growth/Marketing/CRM — targeting strategy experiments & KPI sandboxes.
- Product & Analytics — funnels, cohorts, activation and retention.
- Data Science/ML — feature engineering & baseline modelling.
- BI/RevOps — LTV and conversion diagnostics with synthetic inputs.
Notes / Disclaimers
- Not real user data. Not for direct targeting of individuals.
- Signals, scores, and rates follow synthetic calibrated distributions and do not represent any specific business.