Future Sales Forecasting Challenge Data
Retail & Consumer Behavior
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection of historical daily sales data is the basis for a machine learning competition focused on predicting future retail sales. The data covers a nearly three-year period, requiring contestants or analysts to build time-series models that forecast the monthly product count sold per shop, while navigating the challenge of an evolving inventory and shop list. It serves as the final project for a data science competition course.
Columns
The dataset is split across several files containing transaction details and supplemental metadata, including:
- ID: A unique identifier for a specific (Shop, Item) pairing within the test set.
- shop_id: The unique identifier assigned to a retail shop.
- item_id: The unique identifier for a specific product.
- category_id: The unique identifier for an item category.
- item_cnt_day: The number of products sold on a given day (this daily count is used to predict the required monthly amount).
- item_price: The current selling price of the item.
- date: The specific date of the sale in dd/mm/yyyy format.
- date_block_num: A convenient consecutive month number, where January 2013 is 0 and October 2015 is 33.
- item_name: The descriptive name of the product.
- shop_name: The name of the retail shop.
- category_name: The descriptive name of the category.
Distribution
The data consists of multiple tabular files in CSV format. The training set (
sales_train.csv) provides daily historical records spanning from January 2013 to October 2015. The associated test set (test.csv) requires sales forecasts specifically for November 2015. Supporting files detail item categories (which has 84 unique categories), items, and shops. The data structure is fixed and will never be updated.Usage
This data product is ideally suited for:
- Developing and benchmarking advanced time-series forecasting models.
- Educational use in predictive analytics, particularly for those studying retail or sales data.
- Simulating real-world inventory and demand management scenarios.
- Participation in machine learning and data science competitions.
Coverage
The temporal coverage includes daily historical sales data from January 2013, extending through to October 2015. The geographic scope is defined by the included shops. The data captures transactions across numerous distinct items and 84 unique item categories.
License
CC0: Public Domain
Who Can Use It
- Data Scientists focused on improving predictive accuracy in retail sectors.
- Students and Academics completing coursework in statistical modelling and forecasting.
- Retail Analysts looking to model the impact of item categories and pricing on sales volumes.
- Machine Learning Engineers testing robust models designed to handle data drift and feature inconsistency.
Dataset Name Suggestions
- Future Sales Forecasting Challenge Data
- Historical Shop Transactions ML Set
- Retail Demand Prediction 2013-2015
Attributes
Original Data Source: Future Sales Forecasting Challenge Data
Loading...
