Retail Customer Spending Prediction
Retail & Consumer Behavior
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Provides purchase summaries from a retail company, ABC Private Limited, shared to facilitate the understanding of customer spending patterns against various products of different categories. The core goal is to build a model to predict the monetary purchase amount for individual customers, which will help in creating tailored and personalised offers. The dataset captures essential customer demographics, specific product details for selected high-volume items, and the total purchase amount recorded during the previous month.
Columns
- User_ID: A unique identifier for each customer.
- Product_ID: A unique identifier for the product purchased.
- Gender: The customer's gender, represented by two unique values, with males constituting approximately 75% of the entries.
- Age: The customer's age, grouped into predefined bins (e.g., 26-35 is the most common group at 40%).
- Occupation: A masked numerical field representing the customer's occupation.
- City_Category: A categorisation of the city where the customer currently resides (A, B, or C, with B being the most frequent).
- Stay_In_Current_City_Years: The number of years the customer has resided in the current city, represented as a range variable.
- Marital_Status: A masked binary representation of the customer's marital status.
- Product_Category_1: A masked numerical category indicating the primary product classification.
- Product_Category_2: A masked numerical category, indicating a secondary classification for the product (has missing values).
- Product_Category_3: A masked numerical category, indicating a tertiary classification for the product (has substantial missing values).
- Purchase: The monetary amount of the purchase; this is the target variable for prediction models.
Distribution
The main data file, train.csv, is approximately 25.53 MB and contains 12 columns. The dataset records about 550,000 valid entries across most fields, encompassing 3,631 unique Product IDs. Several fields exhibit data imbalances, notably Gender, which is predominantly male (75%). Critical attention is needed for missing values: Product Category 2 is missing for roughly 32% of records (about 174,000 entries), and Product Category 3 is missing for about 70% of records (about 383,000 entries). The target variable, Purchase, ranges from 12 to 24,000, with an average value of 9,260.
Usage
This dataset is ideal for building machine learning regression models focused on forecasting customer purchase amounts. It supports extensive Exploratory Data Analysis (EDA), including univariate analysis of the Purchase distribution and bivariate analysis comparing Purchase against demographic factors like Age, Gender, Marital Status, and Occupation. Users should perform data preprocessing tasks such as checking basic statistics, handling outliers, converting categorical data into integer formats, and implementing strategies for treating missing values. Visualisation efforts can include plots of Occupation vs. Purchased, Age vs. Purchased, and generating city category distribution charts.
Coverage
The data covers customer purchase summaries specifically from last month. Demographic scope includes detailed customer segments captured via age bins, gender, marital status, and various occupations. Geographic coverage is represented by distinct city categories (A, B, C) and the customer’s length of stay in that city. A key factor in data availability is the numerical masking applied to several categorical fields, such as Occupation and the three Product Categories. The secondary and tertiary product category fields are noted to have significant proportions of unavailable data.
License
CC0: Public Domain
Who Can Use It
- Data Scientists and Machine Learning Engineers: For model development, particularly in training prediction algorithms to forecast customer value.
- Retail Analysts: To discover correlations between demographic attributes and spending habits, aiding in segment identification.
- Marketing Strategists: To gain insights into product preferences and behaviour necessary for designing effective, targeted promotional offers.
Dataset Name Suggestions
- Black Friday Sales Data
- Retail Customer Spending Prediction
- High-Volume Product Purchase Behaviour
- Customer Demographic Value Forecasting
Attributes
Original Data Source:Retail Customer Spending Prediction
Loading...
