Zalingo Synthetic SaaS/Product Analytics — Churn & Retention Modelling
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£749
About
Zalingo Synthetic SaaS/Product Analytics — Churn & Retention Modelling — 100k Focused Sample (Parquet)
A 100,000-row privacy-safe focused sample engineered for churn prediction, retention analysis, and activation modelling. It blends event-level usage with account snapshots and ground-truth labels (e.g.,
churn_30d
) so you can build features, benchmark models, and test interventions—without handling real user data (no PII).Need larger volumes or scheduled refresh? After purchasing this focused sample, message us about enterprise bundles and monthly/weekly/daily subscriptions (churn/retention only or mixed).
Dataset Features (representative)
- event_id / ts_utc — Unique event + timestamp (ISO-8601, UTC).
- user_id / session_id / account_id — Synthetic identifiers (non-linkable).
- app_surface — web | mobile | api | backend.
- event_name — login | view | feature_use | api_call | error | upgrade | cancel.
- feature_key — Feature/flag identifier when applicable.
- active_minutes / session_duration_s / pages_viewed — Engagement signals.
- cohort_week / cohort_source — Acquisition/cohort tags.
- activation_flag / aha_milestone_reached — 0/1 activation signals.
- retention_7d / retention_28d / retention_90d — Rolling account retention ratios.
- d1_active / d7_active / d28_active — Binary activity markers.
- plan_tier / license_seats — free | trial | pro | enterprise; seat count.
- mrr_usd / net_mrr_delta_usd — Synthetic revenue + expansion/contraction.
- downgrade_flag / upgrade_flag — Plan movement.
- churn_30d (0/1) / churn_ts_utc — Label and timestamp if churned.
- nps_score / csat_score — Optional survey outcomes (0–10 / 1–5).
- api_endpoint / latency_ms / error_code / status_code — Reliability context (when API events).
- geo_country / geo_city — ISO-2 + synthetic city.
- risk_score_0_1 / adoption_score_0_1 — Calibrated modelling signals. (Exact columns may vary slightly; see the included preview CSV for the sample’s schema.)
Distribution
- Format: ZIP with Parquet shards (Snappy) + README.
- Volume: 100,000 rows, ~22–35 columns, 1–5 parts.
- Approx Size: ~3–6 MB zipped.
- Schema stability: Consistent across churn/retention focused bundles; full datasets partition by date / app_surface / account_id.
Usage
- Churn prediction & retention uplift — baselines, threshold tuning, treatment targeting.
- Activation & “Aha!” analysis — pathing from first use to habit.
- Cohorts & funnels — lifecycle and feature-adoption breakdowns.
- RevOps analytics — net revenue retention (NRR) with synthetic MRR deltas.
- A/B testing sandboxes — feature flags, paywall/pricing what-ifs.
- MLOps QA — schema checks, drift tests, dashboard demos.
Coverage
- Geographic: Multi-region synthetic coverage (ISO country codes).
- Time Range: Recent multi-year synthetic window with weekly/seasonal patterns.
- PII: None — fully synthetic; not re-identifiable.
License
Proprietary — internal use rights; redistribution/resale not permitted.
Who Can Use It
- Data Scientists/ML Engineers — churn/retention feature engineering & baselines.
- Product/Growth/CS — activation, adoption, intervention design.
- RevOps/Finance — synthetic NRR/GRR scenarios without sensitive data.
Notes / Disclaimers
- Not real user data. Not for direct targeting of individuals.
- Labels, scores, and revenue fields follow synthetic calibrated distributions and do not represent any specific company.