Opendatabay APP

Textual Product Categorisation

E-commerce & Online Transactions

Tags and Keywords

E-commerce

Classification

Text

Products

Categories

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Textual Product Categorisation Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a foundational resource for text classification within the e-commerce domain. Its primary purpose is to enable the development and evaluation of machine learning models that can accurately categorise products based on their textual descriptions. The dataset is designed to reflect the typical distribution of products found on online retail platforms, covering four key categories that represent a significant portion of e-commerce offerings. It serves as a valuable tool for automating product organisation and enhancing customer experience.

Columns

The dataset is structured with two essential columns:
  • Label: This column contains the category name for each product, indicating its classification. The categories include "Electronics", "Household", "Books", and "Clothing & Accessories".
  • Text: This column holds the product description, which typically includes the product name and a detailed textual overview, as it would appear on an e-commerce website. This is the primary data point for classification.

Distribution

The dataset is provided in a CSV format. It contains a substantial number of instances, totalling 50,425 records. The file size is approximately 36.95 MB. The data is organised across four distinct classes, representing major e-commerce categories. While the dataset is largely complete, it is noted that one instance in the 'Text' column has a missing value.

Usage

This dataset is ideally suited for a variety of applications, particularly in the realm of machine learning and natural language processing:
  • Developing and testing text classification models to automatically assign categories to new products.
  • Enhancing e-commerce search algorithms by ensuring products are accurately indexed.
  • Building automated product categorisation systems for large inventories.
  • Training models for content-based recommendation engines within online retail.

Coverage

The dataset's scope is defined by its four core categories: "Electronics", "Household", "Books", and "Clothing & Accessories". These categories are stated to collectively cover approximately 80% of typical e-commerce website products. No specific geographic region or time range is detailed within the provided information.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is beneficial for a range of users involved in data science, machine learning, and e-commerce operations:
  • Machine learning engineers developing and deploying text classification solutions.
  • Data scientists performing deep dives into e-commerce product data and categorisation.
  • E-commerce analysts seeking to improve product data quality and organisation.
  • Researchers exploring new methods for automated text classification and natural language understanding.

Dataset Name Suggestions

  • E-commerce Product Classifier
  • Online Retail Categories
  • Textual Product Categorisation
  • E-commerce Item Tags

Attributes

Original Data Source: Textual Product Categorisation

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

27/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format