Black Friday Retail Purchase Data
Retail & Consumer Behavior
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset captures customer purchase behaviour and details about various products, focusing on sales from a recent Black Friday event for a retail company named "ABC Private Limited". Its primary aim is to facilitate the development of a predictive model for customer purchase amounts, which can then be used to create personalised offers for customers against different products. The dataset includes customer demographics and product specific details.
Columns
- User_ID: A unique identifier for each user.
- Product_ID: A unique identifier for each product.
- Gender: The gender of the user, specified as Male (M) or Female (F).
- Age: The age of the user, provided in age ranges.
- Occupation: A masked numerical value representing the user's occupation.
- City_Category: The category of the city where the user resides (e.g., Type B, Type C).
- Stay_In_Current_City_Years: The number of years the user has resided in their current city.
- Marital_Status: A masked numerical value indicating the user's marital status (e.g., Married or Unmarried).
- Product_Category_1: A masked numerical value representing the primary product category.
- Product_Category_2: A masked numerical value representing a secondary product category, if applicable. This column contains significant missing values.
- Product_Category_3: A masked numerical value representing a tertiary product category, if applicable. This column contains a large proportion of missing values.
- Purchase: The total purchase amount made by the customer; this is the target variable for prediction.
Distribution
The dataset is provided as a CSV file named
train.csv
and is approximately 25.53 MB in size. It contains 12 columns.Key statistics about the data distribution are as follows:
- Total Valid Records: 550,000 for most columns.
- User_ID: Ranges from 1,000,001 to 1,006,040.
- Product_ID: Features 3,631 unique products, with 'P00265242' being the most frequently purchased.
- Gender: Approximately 75% of users are Male and 25% are Female.
- Age: The '26-35' age group accounts for 40% of the users. There are 7 unique age groups in total.
- Occupation: There are 20 unique masked occupation values.
- City_Category: City category 'B' is the most common, representing 42% of the data. There are 3 unique city categories.
- Stay_In_Current_City_Years: Users who have stayed 1 year in their current city comprise 35% of the data. There are 5 unique values for years stayed.
- Marital_Status: Masked values for marital status show that roughly 59% are unmarried (value 0) and 41% are married (value 1).
- Product_Category_1: Contains 20 unique masked categories.
- Product_Category_2: Has 18 unique masked categories, but 32% of values are missing.
- Product_Category_3: Has 16 unique masked categories, but a substantial 70% of values are missing.
- Purchase Amount: The average purchase amount is approximately 9,260, with a standard deviation of 5,020. Purchase amounts range from 12 to 24,000.
Usage
This dataset is ideal for:
- Predicting customer purchase amounts.
- Developing personalised offer strategies for customers.
- Performing Exploratory Data Analysis (EDA) to understand customer and product trends.
- Conducting data cleaning and preprocessing exercises, particularly concerning missing values and categorical data conversion.
- Creating data visualisations to explore relationships such as Age vs. Purchased, Occupation vs. Purchased, and various Product Categories vs. Purchased.
- Implementing Univariate and Bivariate Analysis focusing on the 'Purchase' column.
Coverage
The data primarily covers customer purchase summaries from a specific "last month" period, associated with a Black Friday sale. It includes demographic information about customers (age, gender, marital status, city type, years in current city) and details about the products purchased (product ID and category). The occupation, marital status, and product category fields are masked (converted from categorical to numerical values). It is important to note the high percentage of missing values in
Product_Category_2
and Product_Category_3
.License
CC0: Public Domain
Who Can Use It
- Retail companies and analysts aiming to understand customer buying patterns and optimise sales strategies.
- Data scientists and machine learning engineers building predictive models for sales forecasting and personalisation.
- Business intelligence professionals interested in segmenting customers and products based on purchase behaviour.
- Students and researchers for educational purposes in data analysis, machine learning, and retail analytics.
- Developers of data visualisation tools and dashboards.
Dataset Name Suggestions
- Black Friday Retail Purchase Data
- Customer Purchase Prediction Data
- ABC Pvt Ltd Sales Transaction Data
- E-commerce Purchase Behaviour Study
Attributes
Original Data Source: Black Friday Retail Purchase Data