Black Friday Customer Purchase Insights
Retail & Consumer Behavior
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to help a retail company, ABC Private Limited, understand customer purchase behaviour specifically regarding the purchase amount of various products across different categories. It provides a purchase summary for selected high-volume products from the previous month. The primary objective is to build a predictive model for customer purchase amounts, enabling the creation of personalised offers for customers across their product range.
Columns
- User_ID: A unique identifier for each customer.
- Product_ID: A unique identifier for each product.
- Gender: Indicates the gender of the user, either Male (M) or Female (F).
- Age: Represents the age group of the user.
- Occupation: The user's occupation, represented by a masked numerical value.
- City_Category: Denotes the category of the city the user resides in, such as Type B or Type C.
- Stay_In_Current_City_Years: The number of years the user has resided in their current city.
- Marital_Status: The marital status of the user, either Married or Unmarried, represented by a masked numerical value.
- Product_Category_1: The primary product category, represented by a masked numerical value.
- Product_Category_2: A secondary product category a product may belong to, also masked.
- Product_Category_3: A third product category a product may belong to, also masked.
- Purchase: The purchase amount, serving as the target variable for prediction.
Distribution
The dataset is provided as a CSV file, specifically
train.csv
, with a file size of 25.53 MB. Most columns, including User_ID
, Product_ID
, Gender
, Age
, Occupation
, City_Category
, Stay_In_Current_City_Years
, Marital_Status
, Product_Category_1
, and Purchase
, contain 550,000 valid records, indicating no missing values for these attributes.However,
Product_Category_2
has 376,000 valid records (68% complete), with 32% missing values. Product_Category_3
has 167,000 valid records (30% complete), with a notable 70% missing values.
Key demographics include:- Gender: 75% Male, 25% Female.
- Age: The 26-35 age group accounts for 40% of users, followed by 36-45 with 20%.
- City Category: Type B is the most frequent at 42%, followed by Type C at 31%.
- Stay in Current City Years: 35% of users have stayed for 1 year.
- Product_ID: There are 3631 unique products.
- Purchase: Purchase amounts range from 12 to 24,000, with a mean of approximately 9,260.
Usage
This dataset is ideal for:
- Building predictive models to forecast customer purchase amounts.
- Developing personalised marketing offers based on predicted purchase behaviour.
- Conducting Exploratory Data Analysis (EDA), including univariate and bivariate analysis concerning purchase amounts.
- Analysing purchasing patterns across various customer demographics (gender, age, marital status, occupation, city type, years in current city) and product categories.
- Data preprocessing practice, covering tasks such as checking basic statistics, handling missing values, identifying unique values, and converting categorical data to numerical formats.
- Creating visualisations to explore relationships between variables like Age vs. Purchased, Occupation vs. Purchased, and Product Category vs. Purchased.
Coverage
The dataset covers customer purchase summaries from last month. It includes detailed customer demographics such as age, gender, marital status, city type, and duration of stay in the current city, alongside product details. While gender distribution is 75% Male and 25% Female, and age group 26-35 is the largest (40%), it is important to note the significant amount of missing data for
Product_Category_2
(32%) and Product_Category_3
(70%). The dataset is general in its "city_category" with types B and C.License
CC0: Public Domain
Who Can Use It
- Retail Companies: To gain insights into customer buying habits and to refine their marketing strategies.
- Data Scientists and Machine Learning Engineers: For training and evaluating predictive models for sales forecasting and customer segmentation.
- Business Analysts: To understand key drivers of purchase behaviour and to inform business decisions.
- Students and Researchers: For academic projects and research in customer analytics, retail, and e-commerce.
Dataset Name Suggestions
- Black Friday Customer Purchase Insights
- Retail Sales Prediction Dataset
- Customer Transaction Behaviour Data
- ABC Private Limited Purchase Analysis
Attributes
Original Data Source: Black Friday Customer Purchase Insights