Financial Fraud Detection Dataset
Fraud Detection & Risk Management
Related Searches
Trusted By




"No reviews yet"
Free
About
This dataset is designed to support research and model development in the area of fraud detection. It consists of real-world credit card transactions made by European cardholders over a two-day period in September 2013. Out of 284,807 transactions, 492 are labeled as fraudulent (positive class), making this a highly imbalanced classification problem.
Performance Note:
Due to the extreme class imbalance, standard accuracy metrics are not informative. We recommend using the Area Under the Precision-Recall Curve (AUPRC) or F1-score for model evaluation.
Features:
- Time Series Data: Each row represents a transaction, with the Time feature indicating the number of seconds elapsed since the first transaction.
- Dimensionality Reduction Applied: Features V1 through V28 are anonymized principal components derived from a PCA transformation due to confidentiality constraints.
- Raw Transaction Amount: The Amount field reflects the transaction value, useful for cost-sensitive modeling.
- Binary Classification Target: The Class label is 1 for fraud and 0 for legitimate transactions.
Usage:
- Machine learning model training for fraud detection.
- Evaluation of anomaly detection and imbalanced classification methods.
- Development of cost-sensitive learning approaches using the Amount variable.
Data Summary:
- Total Records: 284,807
- Fraud Cases: 492
- Imbalance Ratio: Fraudulent transactions account for just 0.172% of the dataset.
- Columns: 31 total (28 PCA features, plus Time, Amount, and Class)
License:
The dataset is provided under the CC0 (Public Domain) license, allowing users to freely use, modify, and distribute the data without any restrictions.
Acknowledgements
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.
More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project
Please cite the following works:
Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019
Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook
Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics