Black Friday Sales Analytics Dataset
Retail & Consumer Behavior
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a detailed summary of customer purchase behaviour against a range of products from various categories, collected over the last month by a retail company. It includes crucial customer demographic information such as age, gender, marital status, city type, and duration of stay in the current city, alongside specific product details like Product ID and different product categories. The primary goal for this dataset is to facilitate the development of a predictive model for customer purchase amounts, enabling the creation of personalised offers for customers across different products.
Columns
- User_ID: A unique identifier for each customer.
- Product_ID: A unique identifier for each product.
- Gender: Specifies the sex of the user, categorised as 'M' or 'F'.
- Age: Represents the age of the user, presented in distinct age bins.
- Occupation: The user's occupation, represented as a masked numerical value.
- City_Category: Denotes the category of the city the user resides in, categorised as A, B, or C.
- Stay_In_Current_City_Years: Indicates the number of years the user has resided in their current city.
- Marital_Status: The marital status of the user, represented as a masked numerical value.
- Product_Category_1: The primary product category, represented as a masked numerical value.
- Product_Category_2: An additional product category the product may belong to, also masked numerically. Note: Approximately 32% of values in this column are missing.
- Product_Category_3: A third potential product category, masked numerically. Note: Approximately 70% of values in this column are missing.
- Purchase: The total purchase amount, serving as the target variable for prediction models.
Distribution
The dataset is provided in CSV format, specifically as
train.csv
, and has a file size of 25.53 MB. Most columns contain 550,000 valid records. However, Product_Category_2
has 376,000 valid records, with 174,000 missing values, and Product_Category_3
has 167,000 valid records, with 383,000 missing values. The dataset includes 3,631 unique Product_ID
values. The User_ID
values span a wide range, indicating a significant number of distinct customers. The Purchase
amount ranges from 12 to 24,000.Usage
This dataset is ideal for:
- Building predictive models to forecast customer purchase amounts for various products.
- Developing strategies for creating personalised offers tailored to individual customer preferences.
- Conducting exploratory data analysis (EDA) to uncover insights into customer demographics and product interactions.
- Performing univariate and bivariate analysis to understand variable distributions and relationships.
- Identifying outliers in purchase data and other key metrics.
- Visualising relationships between different features such as Age vs. Purchased, Occupation vs. Purchased, and various Product Categories vs. Purchased amounts.
Coverage
The dataset covers customer demographic information including age (grouped into bins like 26-35, 36-45), gender (male and female), marital status, city type (A, B, C), and the duration of stay in their current city. It also includes details about products through
Product_ID
and three levels of Product_Category
. The data represents purchase summaries collected from "last month." Geographical scope is covered by the City_Category
column. It is important to note the significant amount of missing data in Product_Category_2
and Product_Category_3
.License
CC0: Public Domain
Who Can Use It
- Retail Companies: For gaining insights into customer buying patterns and optimising marketing strategies.
- Data Analysts: To perform in-depth data exploration, feature engineering, and statistical analysis.
- Machine Learning Engineers: For training and evaluating regression models to predict purchase values.
- Marketing Strategists: To design and implement targeted campaigns and personalised customer experiences.
- Business Intelligence Professionals: For reporting and dashboard creation related to sales and customer behaviour.
Dataset Name Suggestions
- Black Friday Sales Analytics Dataset
- Retail Customer Purchase Prediction
- Customer Shopping Behaviour Data
- Product Sales and Demographics
Attributes
Original Data Source: Black Friday Sales Analytics Dataset