Zalingo Synthetic Retail — Premium Evaluation Kit 1M Rows
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£1,999
About
Zalingo Synthetic Retail — Premium Evaluation Kit (Baskets • Price & Promo • Loyalty • Returns) — 1M Rows + Notebooks
A premium, end-to-end evaluation kit for retail analytics. You get ~1,000,000 privacy-safe synthetic transactions at basket & item level across POS and e-commerce, with price/promo, loyalty, and returns signals—plus Jupyter notebooks and data dictionaries—so teams can benchmark pipelines and models quickly without handling real consumer data (no PII).
Need a production-scale feed? After purchase, message us about enterprise bundles (tens of millions of rows) and weekly/daily refresh subscriptions via S3/API.
What’s Inside
- Data (Parquet, Snappy): ~1,000,000 rows, partitioned by order_date / channel / store_id(site_id); includes basket-level and line-item records with precomputed features.
- Notebooks: EDA & quality, price/promo elasticity & uplift, recommenders & basket analysis, and demand forecasting demos with metrics (ROC/PR where relevant, uplift curves, MAPE/RMSE).
- Docs & Schema: Data dictionary, column glossary, quick-start, JSON schema examples.
Key Fields (representative)
- Context:
order_id
,basket_id
,ts_utc
,channel
(pos|ecommerce|click_and_collect|delivery),store_id
/site_id
,country
,city
. - Customer (synthetic):
customer_id
(non-linkable),loyalty_flag
,loyalty_tier
. - Items & Pricing:
item_sku
,item_name
,category
,subcategory
,brand
,quantity
,list_price
,unit_price
,currency
,net_amount
. - Promotions:
promo_flag
,promo_mechanics
(bogof|multibuy|%off|price_point|coupon),discount_amount
,discount_pct
,markdown_flag
. - Elasticity/Uplift Signals:
price_index
,elasticity_estimate
,elasticity_bucket
,uplift_pct
,uplift_label
,baseline_units
,promo_units
. - Basket Features:
basket_items_count
,basket_value
,cross_sell_index
. - Ops & External:
stock_on_hand
,stockout_flag
,holiday_flag
,weather_index
(synthetic). - Post-purchase:
return_flag
,return_qty
,return_ts
. (Columns may vary slightly; see the included dictionary + preview for exact schema.)
Distribution
- Format: ZIP with /data (Parquet), /notebooks, /docs, /schema.
- Volume: ~1,000,000 rows, 25–45 columns, multi-part Parquet.
- Approx Size: 60–150 MB zipped (category-dependent).
- Partitioning: by order_date / channel / store(site) for efficient reads.
Usage
- Price elasticity & promo uplift — tactic comparison, holdouts, cannibalisation checks.
- Recommenders & cross-sell — basket-aware features and embeddings.
- Demand forecasting — store/category/SKU with price/promo regressors.
- Churn/loyalty & RFM — segments and lifecycle modelling (synthetic).
- Returns/fraud anomaly — post-purchase behaviour diagnostics.
- MLOps QA — schema contracts, drift monitors, dashboards.
Coverage
- Geographic: Multi-country synthetic coverage (ISO codes).
- Time Range: Recent multi-year synthetic window with weekly/seasonal patterns.
- PII: None — fully synthetic; not re-identifiable.
Who Can Use It
- Pricing/Merch/Category, Data Science/Analytics, E-commerce/BI, Vendors/SIs for demos and validation.
Notes / Disclaimers
- Not real consumer data. Not for direct targeting of individuals.
- Rates, elasticities, and uplift values are synthetic calibrated distributions and do not represent any specific retailer.
Evaluation License (Non-Production, Internal Use Only — 90 Days)
Buyer is granted a non-exclusive, non-transferable license to use the data and included assets solely for internal evaluation, prototyping, and testing for 90 days from purchase. No production use, external distribution, resale, sublicensing, or sharing beyond Buyer’s employees and on-site contractors under NDA. Derived models/features may be retained for internal research; production deployment requires a separate enterprise license. All materials are provided “as is” without warranties; liability limited to the amount paid.