Synthetic Retail Transactions Dataset
E-commerce & Online Transactions
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides simulated retail transaction data, offering valuable insights into customer purchasing behaviour and store operations. It is designed to facilitate market basket analysis, customer segmentation, and a variety of other retail analytics tasks. Each row captures detailed transaction information, including a unique identifier, the date and time of purchase, customer details, a list of purchased products, total items, total cost, payment method, and location details such as city and store type. Furthermore, it includes indicators for discounts and promotions applied, along with a customer category based on background or age group, and the season of purchase. This dataset is entirely synthetic, generated using the Python Faker library, making it a safe and versatile resource for researchers, data scientists, and analysts to develop and test algorithms, models, and analytical tools without using real customer data.
Columns
- Transaction_ID: A unique 10-digit identifier for each individual transaction, ensuring each purchase can be uniquely identified.
- Date: The precise date and time when each transaction occurred, providing a timestamp for every purchase.
- Customer_Name: The name of the customer who completed the purchase, offering a means to identify individual customers.
- Product: A detailed list of all products included in a specific transaction.
- Total_Items: The total quantity of items purchased within a single transaction.
- Total_Cost: The overall financial value of the transaction, denominated in currency.
- Payment_Method: The chosen payment method for the transaction, such as credit card, debit card, cash, or mobile payment.
- City: The geographical location (city) where the transaction took place.
- Store_Type: The classification of the store where the purchase was made, e.g., supermarket, convenience store, department store.
- Discount_Applied: A boolean indicator (True/False) showing whether a discount was applied to the transaction.
- Customer_Category: A categorisation of the customer based on their background or age group.
- Season: The season (e.g., spring, summer, autumn, winter) in which the purchase was made.
- Promotion: The specific type of promotion applied to the transaction, if any (e.g., "None", "BOGO", "Discount on Selected Items").
Distribution
This dataset is typically provided in a CSV file format. It contains approximately 1 million individual transaction records. The data spans a time range from 2020-01-01 to 2024-05-19. There are 329,738 unique customer names and 571,947 unique product entries. Payment methods are distributed with 25% Cash, 25% Debit Card, and 50% Other. Transaction locations include Boston (10%), Dallas (10%), and other cities (80%). Store types are categorised as Supermarket (17%), Pharmacy (17%), and other types (67%). Discounts were applied to approximately 50% of the transactions.
Usage
This dataset is ideally suited for:
- Market Basket Analysis: Uncovering associations between products and identifying common buying patterns.
- Customer Segmentation: Grouping customers based on their purchasing behaviour to target specific offers.
- Pricing Optimisation: Developing strategies to optimise pricing and identify opportunities for discounts and promotions.
- Retail Analytics: Analysing overall store performance and emerging customer trends.
- Algorithmic Development: Testing and refining machine learning models for retail forecasting or recommendation systems.
Coverage
The dataset's geographic coverage includes transactions from various cities, such as Boston and Dallas, representing a broad, though simulated, global scope. The time range of the transactions extends from 1st January 2020 to 19th May 2024. Demographic insights are provided through the Customer_Category column, which classifies customers based on background or age group, allowing for demographic-based analyses. As a synthetic dataset, specific real-world demographic notes are not applicable.
License
CC0
Who Can Use It
This dataset is beneficial for a wide range of users, including:
- Researchers: For academic studies on consumer behaviour and retail economics.
- Data Scientists: To develop and validate predictive models, such as recommender systems or churn prediction models.
- Analysts: For performing in-depth retail analytics, market basket analysis, and customer segmentation to inform business decisions.
- Students: As a practical, realistic dataset for learning and applying data analysis techniques in a retail context.
Dataset Name Suggestions
- Retail Transactions Dataset
- Customer Purchasing Behaviour Data
- Market Basket Analysis Data
- Synthetic Retail Transactions
- E-commerce Transaction Log
Attributes
Original Data Source: Retail Transactions Dataset