Dark Mode

Home

Data Categories

Consumer & Product Data

Retail Sales Data Cleaning Challenge

FREE DATASET LIBRARY

Verified Data Provider

£0

Retail Sales Data Cleaning Challenge

Retail & Consumer Behavior

Tags and Keywords

Retail

Sales

Cleaning

Analysis

Synthetic

Trusted By

Retail Sales Data Cleaning Challenge Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset presents synthetic sales transactions from a retail store, purpose-built for data cleaning challenges. It features 12,575 rows of data, simulating real-world inconsistencies such as missing or invalid values across its 11 columns. The dataset encompasses eight distinct product categories, each containing 25 unique items with fixed prices. It is an excellent resource for anyone looking to practise data cleaning techniques, perform exploratory data analysis, and develop feature engineering skills.

Columns

Transaction ID: A unique identifier for each transaction, always present and unique.
Customer ID: A unique identifier for each of the 25 distinct customers, always valid.
Category: The product category for the purchased item, such as 'Food' or 'Furniture'. There are eight unique categories.
Item: The specific name of the purchased item. This column may contain missing or 'None' values.
Price Per Unit: The static price of a single unit of the item. This column may also have missing or 'None' values. Prices range from £5.00 to £41.00.
Quantity: The number of units of the item purchased. Missing or 'None' values can be found here. Quantities range from 1 to 10.
Total Spent: The overall amount spent on the transaction, calculated as Quantity multiplied by Price Per Unit. This column may contain missing values. Total amounts range from £5.00 to £410.00.
Payment Method: The method used for payment, which might include 'Cash' or 'Credit Card'. This column can have missing or invalid entries.
Location: The place where the transaction occurred, such as 'In-store' or 'Online'. This column may also contain missing or invalid values.
Transaction Date: The date of the transaction. This field is always present and valid, with dates spanning from 2022-01-01 to 2025-01-18.
Discount Applied: An indicator of whether a discount was applied to the transaction. This column can be 'True', 'False', or 'None' due to missing values.

Distribution

The dataset is provided as a CSV file, named 'retail_store_sales.csv'. It contains 12,575 rows and 11 columns. The data is entirely synthetic, designed to mimic real-world retail sales with introduced inconsistencies.

Usage

This dataset is perfectly suited for several analytical applications, including:

Data Cleaning: Practising tasks such as handling missing values, inferring missing entries, and validating data integrity.
Exploratory Data Analysis (EDA): Analysing sales trends, evaluating category performance, and understanding customer behaviour.
Feature Engineering: Developing techniques to create new, insightful variables from existing data.

Coverage

The dataset's scope includes sales transactions from a retail store. The transactions cover a time range from 1st January 2022 to 18th January 2025. It details transactions involving 25 distinct customer IDs and items from eight different product categories. No specific geographic regions or detailed demographic information beyond customer IDs are provided.

License

CC BY-SA 4.0

Who Can Use It

This dataset is ideal for:

Data Analysts: For practising data manipulation and trend identification.
Data Scientists: To build and refine models, especially for data pre-processing steps.
Students and Educators: As a teaching and learning resource for data quality and analytics.
Business Intelligence Professionals: To understand data challenges in retail operations.

Dataset Name Suggestions

Retail Sales Data Cleaning Challenge
Dirty Sales Transactions Dataset
Simulated Retail Sales for Analytics
E-commerce Sales Anomaly Data

Attributes

Original Data Source: Retail Sales Data Cleaning Challenge

Listing Stats

VIEWS

246

DOWNLOADS

LISTED

13/08/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Retail Sales Data Cleaning Challenge

Retail & Consumer Behavior

Tags and Keywords

Retail

Sales

Cleaning

Analysis

Synthetic

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS