Zalingo Synthetic Finance — Credit Risk Premium Evaluation Kit 1M Rows
Synthetic Tabular Data
Tags and Keywords
Trusted By




"No reviews yet"
£2,499
About
Zalingo Synthetic Finance — Credit Risk Premium Evaluation Kit — ~1M Rows + Notebooks
A premium, end-to-end kit for credit risk and scorecard development. You get ~1,000,000 privacy-safe synthetic credit applications with approvals/offers and 6/12-month performance labels, plus Jupyter notebooks, scorecard templates, and a data dictionary—so teams can benchmark pipelines and models quickly without handling real consumer data (no PII).
Need a production-scale feed? After purchase, message us about enterprise bundles (tens of millions of rows) and weekly/daily refresh subscriptions delivered via S3/API.
What’s Inside (kit contents)
-
Data (Parquet, Snappy): ~1,000,000 rows, partitioned by date/product/channel; includes approval decisions and performance labels.
-
Notebooks (.ipynb):
- EDA & Data Quality — schema checks, missingness, drift probes
- Feature Engineering & Binning — WOE/IV, monotonic bins, PSI
- Baseline Models — logistic/GBM with ROC/PR, KS, lift & calibration
-
Docs: Data dictionary, scorecard quick-start, label policy, sampling notes.
-
Schema: JSON schema + example queries for Parquet readers (Spark/Pandas/Polars).
Dataset Features (representative)
- Application:
application_id
,ts_utc
,channel
(online | branch | partner),product
(card | personal_loan | auto | mortgage),amount_requested
,term_months
,purpose
- Demographic/Economic (synthetic):
employment_status
,employment_tenure_m
,income_monthly
,housing_status
,address_age_m
- Affordability & Stability:
dti
,debt_service_ratio
,affordability_flag
,stability_score
- Bureau-like Signals (synthetic):
credit_history_length_m
,prior_defaults_ct
,late_payments_12m
,enquiries_90d
,utilisation_ratio
,limit_total
,balance_total
- Segmentation & Scores:
segment_bucket
(prime | near-prime | subprime),pd_score_0_1
- Decision & Offer:
approval_flag (0/1)
,decision_code
,offer_apr
,offer_limit
- Performance Labels:
outcome_default_6m (0/1)
,outcome_default_12m (0/1)
(Columns may vary slightly; see the included dictionary + preview for exact schema.)
Distribution
- Format: ZIP containing Parquet data, /notebooks, /docs, /schema
- Volume: ~1,000,000 rows, 25–40 columns, multi-part Parquet
- Approx Size: 50–120 MB zipped (mix-dependent)
- Partitioning: by
application_date
/product
/channel
for efficient reads
Usage
- PD / scorecard development — feature engineering, binning, calibration & threshold tuning
- Underwriting policy experiments — approval/offer strategies & sensitivity analyses
- Portfolio analytics — cohorting, roll rates, early warning signals
- Pipeline QA & MLOps — schema contracts, drift monitors, dashboards
- Education & enablement — hands-on exercises without compliance hurdles
Coverage
- Geographic: Multi-country synthetic coverage (ISO codes)
- Time Range: Recent multi-year synthetic window with weekly/seasonal patterns
- PII: None — fully synthetic; not re-identifiable
Who Can Use It
- Risk/Data Science — scorecards, PD modelling, monitoring setup
- Underwriting/FinOps — policy design & KPI diagnostics
- Product/Analytics — approval, offer, and performance sandboxes
- Vendors/SIs — demos, connector validation, pipeline benchmarks
Notes / Disclaimers
- Synthetic data; not real consumer applications or bureau data.
- Not for production credit decisions. Distributions and labels are synthetic and calibrated; they do not represent any specific lender/bureau.
Evaluation License (Non-Production, Internal Use Only)
Buyer is granted a non-exclusive, non-transferable license to use the data and included assets solely for internal evaluation, prototyping, and testing for 90 days from purchase. No production use, external distribution, resale, sublicensing, or sharing beyond Buyer’s employees and on-site contractors under NDA. Derived models/features may be retained for internal research; production deployment requires a separate enterprise license. All materials are provided “as is” without warranties; liability limited to the amount paid.
- Price Justification / Value: Premium kit bundles data + notebooks + docs for faster time-to-value; avoids compliance hurdles; calibrated labels for realistic benchmarks.
- Support & SLA: Email support with 1 business-day response; fixes for material schema/data issues within 5 business days; upgrade credits available if you move to an enterprise plan within 60 days.