Synthetic Financial Fraud Transactions
Synthetic Data Generation
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This synthetic dataset for fraud detection is meticulously designed to assist data scientists and machine learning enthusiasts in developing robust fraud detection models. It contains realistic synthetic transaction data, encompassing user details, various transaction types, and calculated risk scores. This dataset is ideally suited for binary classification tasks, especially with machine learning models such as XGBoost and LightGBM, and is highly beneficial for anomaly detection, risk analysis, and security research. It features 21 attributes with a realistic blend of numerical, categorical, and temporal data, including binary fraud labels (0 for not fraud, 1 for fraud).
Columns
- Transaction_ID: A unique identifier for each individual transaction.
- User_ID: A distinct identifier assigned to each user.
- Transaction_Amount: The monetary value involved in the transaction, with amounts typically ranging from approximately 28.7 to 1,170.
- Transaction_Type: Categorical variable indicating the nature of the transaction, such as Online, In-Store, ATM, or POS.
- Timestamp: The date and time when the transaction occurred, covering a period from 1st January 2023 to 31st December 2023.
- Account_Balance: The user's account balance immediately prior to the transaction, ranging from approximately 500 to 100,000.
- Device_Type: Indicates the type of device used for the transaction, including Mobile, Desktop, and Tablet.
- Location: The geographical location where the transaction took place, with examples including Tokyo and Mumbai.
- Merchant_Category: The type of merchant involved in the transaction, such as Retail, Food, Travel, Clothing, or Groceries.
- IP_Address_Flag: A binary indicator (0 or 1) denoting whether the IP address used for the transaction was flagged as suspicious.
- Previous_Fraudulent_Activity: The count of past fraudulent activities associated with the user.
- Daily_Transaction_Count: The total number of transactions made by the user on that particular day, typically ranging from 1 to 14.
- Avg_Transaction_Amount_7d: The user's average transaction amount over the preceding 7 days, typically ranging from 10 to 500.
- Failed_Transaction_Count_7d: The number of failed transactions by the user within the last 7 days, typically ranging from 0 to 4.
- Card_Type: The type of payment card utilised, such as Credit, Debit, Prepaid, Mastercard, or Visa.
- Card_Age: The age of the payment card in months, typically ranging from 1 to 239 months.
- Transaction_Distance: The geographical distance between the user's usual location and the transaction location, typically ranging from 0.25 to 5,000 units.
- Authentication_Method: The method employed by the user for authentication, including PIN or Biometric.
- Risk_Score: A calculated fraud risk score for the transaction, ranging from 0 to 1.
- Is_Weekend: A binary indicator (0 or 1) specifying whether the transaction occurred on a weekend.
- Fraud_Label: The target variable, indicating whether the transaction is fraudulent (1) or not fraudulent (0).
Distribution
This dataset is provided in CSV format and has a file size of 7.02 MB. It comprises 50,000 individual records or rows and includes 21 distinct columns. The data structure is varied, featuring numerical, categorical, and temporal fields, which aids in creating sophisticated analytical models.
Usage
This dataset is ideally suited for a variety of applications, including:
- Training fraud detection models, particularly for binary classification.
- Anomaly detection within financial transactions.
- Developing and evaluating risk scoring systems for financial institutions such as banks and fintech companies.
- Feature engineering and model explainability research in the domain of financial security.
Coverage
The dataset focuses on transactional activities over a time range from 1st January 2023 to 31st December 2023. While it includes geographical transaction locations like Tokyo and Mumbai, it does not specify demographic information beyond user-related transactional patterns and device usage (Mobile, Desktop, Tablet). The data reflects various merchant categories and authentication methods.
License
CC0: Public Domain
Who Can Use It
This dataset is primarily intended for data scientists and machine learning enthusiasts. It is especially useful for those looking to:
- Build and test robust fraud detection models.
- Perform binary classification tasks.
- Conduct anomaly detection, risk analysis, and security research related to financial transactions.
Dataset Name Suggestions
- Synthetic Financial Fraud Transactions
- ML Fraud Detection Dataset 2023
- Digital Transaction Risk Model Data
- Fraudulent Transaction Simulation
- Financial Security Analysis Dataset
Attributes
Original Data Source: Synthetic Financial Fraud Transactions