Opendatabay APP

Future Sales Forecasting Challenge Data

Retail & Consumer Behavior

Tags and Keywords

Sales

Forecasting

Retail

Prediction

Historical

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Future Sales Forecasting Challenge Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection of historical daily sales data is the basis for a machine learning competition focused on predicting future retail sales. The data covers a nearly three-year period, requiring contestants or analysts to build time-series models that forecast the monthly product count sold per shop, while navigating the challenge of an evolving inventory and shop list. It serves as the final project for a data science competition course.

Columns

The dataset is split across several files containing transaction details and supplemental metadata, including:
  • ID: A unique identifier for a specific (Shop, Item) pairing within the test set.
  • shop_id: The unique identifier assigned to a retail shop.
  • item_id: The unique identifier for a specific product.
  • category_id: The unique identifier for an item category.
  • item_cnt_day: The number of products sold on a given day (this daily count is used to predict the required monthly amount).
  • item_price: The current selling price of the item.
  • date: The specific date of the sale in dd/mm/yyyy format.
  • date_block_num: A convenient consecutive month number, where January 2013 is 0 and October 2015 is 33.
  • item_name: The descriptive name of the product.
  • shop_name: The name of the retail shop.
  • category_name: The descriptive name of the category.

Distribution

The data consists of multiple tabular files in CSV format. The training set (sales_train.csv) provides daily historical records spanning from January 2013 to October 2015. The associated test set (test.csv) requires sales forecasts specifically for November 2015. Supporting files detail item categories (which has 84 unique categories), items, and shops. The data structure is fixed and will never be updated.

Usage

This data product is ideally suited for:
  • Developing and benchmarking advanced time-series forecasting models.
  • Educational use in predictive analytics, particularly for those studying retail or sales data.
  • Simulating real-world inventory and demand management scenarios.
  • Participation in machine learning and data science competitions.

Coverage

The temporal coverage includes daily historical sales data from January 2013, extending through to October 2015. The geographic scope is defined by the included shops. The data captures transactions across numerous distinct items and 84 unique item categories.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists focused on improving predictive accuracy in retail sectors.
  • Students and Academics completing coursework in statistical modelling and forecasting.
  • Retail Analysts looking to model the impact of item categories and pricing on sales volumes.
  • Machine Learning Engineers testing robust models designed to handle data drift and feature inconsistency.

Dataset Name Suggestions

  • Future Sales Forecasting Challenge Data
  • Historical Shop Transactions ML Set
  • Retail Demand Prediction 2013-2015

Attributes

Listing Stats

VIEWS

3

DOWNLOADS

0

LISTED

29/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format