Opendatabay APP

PriceRunner E-commerce Offerings Data

E-commerce & Online Transactions

Tags and Keywords

Classification

Clustering

E-commerce

Retail

Products

Trusted By
Trusted by company1Trusted by company2Trusted by company3
PriceRunner E-commerce Offerings Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Details 35,311 product offers collected from the PriceRunner product comparison platform. The data spans 10 distinct product categories and includes contributions from 306 different merchants. This dataset offers an outstanding basis for testing and evaluating algorithms related to classification, clustering, and entity matching in the e-commerce domain. The data can also be successfully applied to general text and short-text mining problems.

Columns

The dataset contains 7 core columns detailing the product offers:
  • Product ID: A unique identifier assigned to each product.
  • Product Title: The text description or name of the product offer. This data has undergone preprocessing, specifically case folding and punctuation removal.
  • Merchant ID: An identifier for the specific merchant providing the product offer.
  • Cluster ID: A numerical identifier resulting from initial clustering efforts.
  • Cluster Label: The textual description corresponding to the Cluster ID.
  • Category ID: A numerical identifier designating the product's classification category.
  • Category Label: The human-readable name of the category, such as 'Fridge Freezers' or 'Mobile Phones'.

Distribution

The data structure consists of 35,311 records or product offer instances. The data file is typically found in a CSV format, with the file pricerunner_aggregate.csv having a size of 3.92 MB. The data is static and the expected update frequency is never, meaning this is a fixed historical snapshot. No recommended data splits are provided.

Usage

The data is ideally suited for machine learning initiatives focused on:
  • Product classification (multiclass classification problems).
  • Evaluating short-text clustering algorithms.
  • Benchmarking entity matching and entity resolution frameworks.
  • General text mining applications that utilise short, descriptive text.

Coverage

The scope covers product offers captured from the PriceRunner platform. The dataset addresses 10 specific product categories and includes data from 306 separate merchants. No specific geographic, time range, or demographic notes are available in the source material.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Data Scientists: For training and testing classification or clustering models on real-world e-commerce data.
  • Retail Analysts: For understanding product distribution and merchant participation across different categories.
  • Academic Researchers: For studies involving short-text mining or benchmarking competitive data science algorithms.

Dataset Name Suggestions

  • PriceRunner E-commerce Offerings Data
  • Product Offer Clustering Benchmark
  • Merchant Product Data for Classification

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

15/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format