Zalingo Synthetic Retail Basket & Omni-Channel Transactions — 100k
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£249
About
Zalingo Synthetic Retail Basket & Omni-Channel Transactions — 100k Sample (Parquet)
A 100,000-row privacy-safe synthetic retail dataset combining basket-level and item-level transactions across POS and e-commerce. It mimics realistic shopper behaviour (channel mix, promos, returns, loyalty, categories, pricing) with a stable schema—no PII and not derived from real consumers. Use it to prototype features, validate pipelines, benchmark models, and demo analytics without access hurdles.
Need larger volumes or scheduled refresh? After purchasing this sample, message us about enterprise-scale bundles and monthly/weekly/daily subscriptions.
Dataset Features (representative)
- order_id — Synthetic order identifier.
- basket_id — Basket/session ID (groups multiple items).
- customer_id — Synthetic non-linkable shopper ID (no PII).
- ts_utc — Order timestamp (ISO-8601, UTC).
- channel — pos | ecommerce | click-and-collect | delivery.
- store_id / site_id — Store or website identifier.
- country / city — ISO-2 country + synthetic city.
- item_sku / item_name — Product SKU and label.
- category / subcategory / brand — Product taxonomy.
- quantity — Integer units.
- unit_price — Price before discounts.
- discount_amount / promo_code — Per-line discount and promo.
- net_amount — quantity × unit_price − discount_amount.
- currency — ISO-4217 (e.g., GBP, ZAR, USD).
- payment_method — card | wallet | cash | BNPL | transfer.
- loyalty_flag / loyalty_tier — Loyalty participation snapshot.
- return_flag / return_ts — Returns handling (if applicable).
- fulfilment_type — delivery | pickup | in-store.
- basket_items_count / basket_value — Derived basket metrics (for convenience).
(Exact columns may vary slightly by bundle; see the included preview CSV for the schema in this sample.)
Distribution
- Format: ZIP containing Parquet shards (Snappy) + README.
- Volume: 100,000 rows, ~18–30 columns, 1–5 Parquet parts.
- Approx Size: ~3–5 MB zipped (category-dependent).
- Schema stability: Names/types consistent across retail samples; full datasets partition by order_date / channel / store_id.
Usage
- Demand forecasting & seasonality (store/site and category level).
- Recommenders & cross-sell (basket-aware features).
- Price/promo uplift analysis and elasticity experiments.
- Churn, loyalty & RFM segmentation prototypes.
- Fraud/returns anomaly detection.
- Merchandising & inventory analytics (sell-through signals).
Coverage
- Geographic: Multi-country synthetic coverage with ISO country codes.
- Time Range: Recent multi-year synthetic window with realistic weekly/seasonal patterns.
- PII: None. Fully synthetic; not re-identifiable.
License
Proprietary — internal use rights; redistribution/resale not permitted.
Who Can Use It
- Data Scientists/ML Engineers — feature engineering, baselines, MLOps tests.
- Merchandising & Pricing — promo experiments, elasticity, assortment.
- Marketing/CRM — segmentation, lifecycle modelling (without PII).
- E-commerce & BI — dashboards, KPI sandboxes, pipeline QA.
Important Notes / Disclaimers
- Not real consumer data. Not for direct targeting of individuals.
- Product categories, prices, and promo effects follow synthetic distributions calibrated to public statistics; they do not represent any specific retailer.