Opendatabay APP

Zalingo Synthetic Finance — Premium Evaluation Kit 1M Rows

Synthetic Tabular Data

Tags and Keywords

Synthetic

Data

Finance

Fraud

Detection

Chargebacks

Card-not-present

Authorization

Risk

Scoring

Velocity

Features

Device

Fingerprint

Geo

Consistency

Benchmark

Feature

Engineering

Parquet

Notebooks

Pii-safe

Anonymised

1m

1million

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Zalingo Synthetic Finance — Premium Evaluation Kit 1M Rows Dataset on Opendatabay data marketplace

"No reviews yet"

£2,499

About

Zalingo Synthetic Finance — Premium Evaluation Kit (Fraud & Chargebacks) — ~1M Rows + Notebooks
A premium, end-to-end evaluation kit for card-not-present (CNP) fraud and chargeback modelling. You get ~1,000,000 privacy-safe synthetic transactions with authorization signals, device/IP/geo features, and multi-window velocity aggregates, plus Jupyter notebooks, feature dictionary, and a data dictionary—so teams can benchmark pipelines and models quickly without handling real cardholder data (no PII).
Need a production-scale feed? After purchase, message us about enterprise bundles (tens of millions of rows) and weekly/daily refresh subscriptions delivered via S3/API.

What’s Inside (kit contents)

  • Data (Parquet, Snappy): ~1,000,000 rows, partitioned by date/merchant/MCC; labels and precomputed features included.
  • Notebooks (.ipynb):
    1. EDA & Data Quality — schema checks, missingness, drift probes
    2. Feature Engineering — velocity, geo, device, merchant, light graph signals
    3. Baseline Models — logistic/GBM with ROC/PR, cost curves & thresholding
  • Docs: Data dictionary, column glossary, label policy, sampling notes, quick-start.
  • Schema: JSON schema + example queries for Parquet readers (Spark/Pandas/Polars).

Dataset Features (representative)

  • Core: transaction_id, account_id, ts_utc, amount, currency, channel (ecommerce | wallet | mail/phone), mcc, merchant_id, merchant_country, user_agent
  • Auth Signals: three_ds_result, avs_result, cvv_result, auth_result, decline_reason_code
  • Device/IP/Geo: device_fingerprint, ip_country, distance_km_billing_shipping, first_time_merchant_flag, recurring_flag, coupon_used
  • Velocity Windows: txn_ct_15m/1h/24h/7d, amount_sum_1h/24h, unique_merchant_ct_7d
  • Graph-Lite Signals: shared_device_ct_7d, shared_ip_ct_7d, hub_account_flag (synthetic)
  • Labels & Scores: fraud_label (0/1), chargeback_flag (0/1), risk_score_0_1, chargeback_reason_code (Columns may vary slightly; see the included dictionary + preview for exact schema.)

Distribution

  • Format: ZIP containing Parquet data, /notebooks, /docs, /schema
  • Volume: ~1,000,000 rows, 25–45 columns, multi-part Parquet
  • Approx Size: 50–120 MB zipped (mix-dependent)
  • Partitioning: by event_date / merchant_id / mcc_group for efficient reads

Usage

  • Fraud & chargeback modelling — baselines, feature ablations, cost curves
  • Authorization optimisation — AVS/3DS policy experiments and threshold tuning
  • Scenario testing — velocity, geo-mismatch, first-use & recurring patterns
  • Pipeline QA & MLOps — schema contracts, drift monitors, dashboards
  • Education & enablement — hands-on exercises without compliance hurdles

Coverage

  • Geographic: Multi-country synthetic coverage (ISO codes)
  • Time Range: Recent multi-year synthetic window with weekly/seasonal patterns
  • PII: None — fully synthetic; not re-identifiable

Who Can Use It

  • Risk/Data Science — rapid feature engineering & model iteration
  • Payments/FinOps — authorization strategy & loss-rate diagnostics
  • Product/Analytics — KPI sandboxes & experiment design
  • Vendors/SIs — demo environments & connector validation

Notes / Disclaimers

  • Not real cardholder data. Not for production credit decisions.
  • Labels, rates, and distributions are synthetic and calibrated; they do not represent any specific issuer/acquirer/PSP.

Evaluation License (Non-Production, Internal Use Only) Buyer is granted a non-exclusive, non-transferable license to use the data and included assets solely for internal evaluation, prototyping, and testing for 90 days from purchase. No production use, external distribution, resale, sublicensing, or sharing beyond Buyer’s employees and on-site contractors under NDA. Derived models/features may be retained for internal research; production deployment requires a separate enterprise license. All materials are provided “as is” without warranties; liability limited to the amount paid.

  • Price Justification / Value: Premium kit bundles data + notebooks + docs for faster time-to-value; avoids compliance hurdles; calibrated labels for realistic benchmarks.
  • Support & SLA: Email support with 1 business-day response; fixes for material schema/data issues within 5 business days; upgrade credits available if you move to an enterprise plan within 60 days.

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

£2,499

Download Dataset in ZIP Format