Textual Product Categorisation
E-commerce & Online Transactions
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a foundational resource for text classification within the e-commerce domain. Its primary purpose is to enable the development and evaluation of machine learning models that can accurately categorise products based on their textual descriptions. The dataset is designed to reflect the typical distribution of products found on online retail platforms, covering four key categories that represent a significant portion of e-commerce offerings. It serves as a valuable tool for automating product organisation and enhancing customer experience.
Columns
The dataset is structured with two essential columns:
- Label: This column contains the category name for each product, indicating its classification. The categories include "Electronics", "Household", "Books", and "Clothing & Accessories".
- Text: This column holds the product description, which typically includes the product name and a detailed textual overview, as it would appear on an e-commerce website. This is the primary data point for classification.
Distribution
The dataset is provided in a CSV format. It contains a substantial number of instances, totalling 50,425 records. The file size is approximately 36.95 MB. The data is organised across four distinct classes, representing major e-commerce categories. While the dataset is largely complete, it is noted that one instance in the 'Text' column has a missing value.
Usage
This dataset is ideally suited for a variety of applications, particularly in the realm of machine learning and natural language processing:
- Developing and testing text classification models to automatically assign categories to new products.
- Enhancing e-commerce search algorithms by ensuring products are accurately indexed.
- Building automated product categorisation systems for large inventories.
- Training models for content-based recommendation engines within online retail.
Coverage
The dataset's scope is defined by its four core categories: "Electronics", "Household", "Books", and "Clothing & Accessories". These categories are stated to collectively cover approximately 80% of typical e-commerce website products. No specific geographic region or time range is detailed within the provided information.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
This dataset is beneficial for a range of users involved in data science, machine learning, and e-commerce operations:
- Machine learning engineers developing and deploying text classification solutions.
- Data scientists performing deep dives into e-commerce product data and categorisation.
- E-commerce analysts seeking to improve product data quality and organisation.
- Researchers exploring new methods for automated text classification and natural language understanding.
Dataset Name Suggestions
- E-commerce Product Classifier
- Online Retail Categories
- Textual Product Categorisation
- E-commerce Item Tags
Attributes
Original Data Source: Textual Product Categorisation