Big Mart Sales Prediction Dataset
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for predicting retail sales, specifically for products sold across various Big Mart outlets. It contains 2013 sales data for 1559 unique products distributed across 10 different stores situated in various cities. The primary purpose is to build a predictive model to forecast the sales of each product at a particular store, helping Big Mart understand which product and outlet characteristics are key drivers for increasing sales. The data includes attributes for both products and stores, though it may contain missing values due to reporting issues. It is ideal for exploring and building data science models to predict future sales, as sales are influenced by both store and product attributes.
Columns
- ProductID: A unique identifier for each product. The test dataset features 1543 unique product IDs with no missing values.
- Weight: The weight of individual products. Approximately 17% of this data is missing in the test set, with values ranging from 4.55 to 21.4 and a mean of 12.7.
- FatContent: Indicates whether a product is low on fat. Common values include 'Low Fat' (60%) and 'Regular' (34%). There are 5 unique values, with no missing data.
- Visibility: Represents the percentage of a store's total display area allocated to a specific product. Values range from 0 to 0.32, with a mean of 0.07. No missing values are reported.
- ProductType: The category to which a product belongs. There are 16 unique product categories, with 'Snack Foods' and 'Fruits and Vegetables' each making up 14% of the entries. No missing values.
- MRP: The Maximum Retail Price, which is the listed price of the products. Prices range from 32 to 267, with a mean of 141. No missing values.
- OutletID: A unique identifier for each store. There are 10 unique store IDs, with 'OUT027' and 'OUT013' being the most common (11% each). No missing values.
- EstablishmentYear: The year in which an outlet was established. Years range from 1985 to 2009, with a mean year around 2000. No missing values.
- OutletSize: Describes the size of the store based on its ground area. 'Medium' is the most frequent size (33%), but 28% of this data is missing. There are 3 unique size categories.
- LocationType: The type of city where a store is located. 'Tier 3' (39%) and 'Tier 2' (33%) are the most common types. There are 3 unique location types with no missing values.
- OutletType: Specifies the type of outlet, such as a grocery store or supermarket. 'Supermarket Type1' accounts for 65% of entries, while 'Grocery Store' accounts for 13%. There are 4 unique outlet types with no missing values.
- OutletSales: This is the target variable, representing the sales of a product within a particular store. This value is available in the training dataset and is the variable to be forecasted for the test dataset.
Distribution
The dataset is provided in CSV format. It is split into two parts: a train set containing 8523 records, which includes the sales values, and a test set containing 5681 records, for which sales need to be forecasted. Both files contain 11 columns of item and outlet information.
Usage
This dataset is ideal for:
- Developing and deploying predictive models for retail sales.
- Analysing the impact of product and store attributes on sales performance.
- Conducting exploratory data analysis to uncover sales trends and patterns.
- Building machine learning models for sales forecasting and classification tasks.
Coverage
The data encompasses sales from 2013 for 1559 distinct products across 10 Big Mart outlets located in various cities, categorised by 'LocationType' (e.g., Tier 2, Tier 3). Outlet establishment years span from 1985 to 2009. The data may contain missing values for certain attributes, notably 'Weight' and 'OutletSize', which need to be managed during analysis.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For building and testing predictive models.
- Retail Businesses/Analysts: To gain insights into sales drivers and inform strategic decisions regarding product placement and store operations.
- Students and Researchers: For educational purposes, data analysis exercises, and academic studies on retail analytics and forecasting.
- Machine Learning Practitioners: Interested in applying and evaluating various forecasting algorithms.
Dataset Name Suggestions
- Big Mart Sales Prediction Dataset
- Retail Product Sales Forecasting
- Mart Outlet Sales 2013 Data
- Product-Outlet Sales Analytics
Attributes
Original Data Source: Big Mart Sales Prediction Dataset