Zalingo Synthetic Finance Transactions — 100k Sample (Parquet)
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£249
About
Zalingo Synthetic Finance Transactions — 100k Sample (Parquet)
This listing provides a 100,000-row, privacy-safe synthetic transactions sample generated by Zalingo’s data refinery. It mimics realistic card and wallet spend patterns (amounts, currencies, merchant types, channels, geographies) without using any real person’s data. Ideal for prototyping ML features, pipeline testing, benchmarking, training demos, and analytics evaluations.
Looking for the full dataset or a monthly refresh? Message us after purchase of this sample—enterprise-scale bundles and subscriptions are available.
Dataset Features (representative)
- transaction_id — Synthetic unique ID per event.
- account_id — Synthetic payer/account identifier.
- ts_utc — Event timestamp (ISO 8601, UTC).
- amount — Numeric amount (float).
- currency — ISO-4217 code (e.g., USD, ZAR, GBP).
- merchant_category — High-level merchant type (e.g., groceries, fuel).
- mcc — Merchant Category Code (synthetic 4-digit).
- merchant_country — ISO-3166-1 alpha-2.
- city — City name (synthetic).
- channel — pos | ecommerce | atm | transfer.
- payment_method — debit | credit | wallet | bank.
- auth_result — approved | declined (synthetic logic). (Fields can vary slightly between bundles; see the included preview CSV for the exact columns in this sample.)
Distribution
- Format: ZIP containing Parquet files (Snappy) + README.
- Data Volume: 100,000 rows; ~10–18 columns; 1–5 parquet shards.
- File Size: ~3–5 MB zipped (category-dependent).
- Schema Stability: Column names/types are stable across finance samples; full datasets are partitioned by date and merchant attributes.
Usage
- Feature engineering & model prototyping (fraud/risk, segmentation, CLV).
- Anomaly/quality tests for pipelines and dashboards.
- Time-series forecasting (daily/weekly spend patterns).
- Education & enablement (reproducible ML exercises without PII).
Coverage
- Geographic: Multi-country synthetic coverage with ISO country codes.
- Time Range: Synthetic timestamps spanning a recent multi-year window (not tied to real-world events).
- PII: No PII. 100% synthetic; not re-identifiable.
License
Proprietary — purchase grants internal use rights; redistribution/resale not permitted.
Who Can Use It
- Data Scientists/ML Engineers — rapid model iteration.
- Researchers/Analysts — benchmarking and hypothesis testing.
- Product & Risk Teams — experimentation without live data exposure.