Preprocessed Amazon Review Sentiment
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains preprocessed Amazon product review data for the Gen3EcoDot, primarily scraped from amazon.in. It is designed to facilitate the training and testing of classification models, particularly for sentiment analysis. The reviews have been stemmed and lemmatised using NLTK, and sentiment labels are generated using TextBlob polarity scores, making it ready for direct use in machine learning and natural language processing tasks.
Columns
- Index: A unique identifier for each record within the dataset.
- Review: The original, raw text of the customer review.
- Stemmed and Lemmatised review using nltk: The preprocessed version of the review text, optimised for text analysis and model training.
- Polarity: The numerical polarity score derived from the TextBlob analysis, indicating the sentiment expressed in the review (ranging from -1.00 for negative to 1.00 for positive sentiment).
- Division: A categorical label generated based on the polarity score, providing discrete sentiment categories.
Distribution
The dataset is provided in a tabular format, typically a CSV file, and is preprocessed for immediate use. While specific row counts are not explicitly stated as a single number, the 'division' column includes approximately 4156 records, categorised into various ranges. The 'polarity' column also contains counts for sentiment ranges, summing to approximately 4084 records, indicating positive, neutral, and negative sentiments.
Usage
This dataset is ideal for a variety of applications, including:
- Training and testing sentiment classification models.
- Developing and evaluating Natural Language Processing (NLP) algorithms.
- Conducting sentiment analysis on product reviews.
- Academic research in text analytics and machine learning.
- Building applications that require pre-classified text data.
Coverage
The data primarily covers product reviews from amazon.in, providing a global scope for e-commerce sentiment. The listing date for the dataset is noted as 16/06/2025. No specific historical time range for the reviews themselves or demographic details are provided.
License
CC0
Who Can Use It
This dataset is suitable for:
- Data scientists and machine learning engineers looking for preprocessed text data to train classification models.
- Researchers and academics in NLP, sentiment analysis, and text mining.
- Students learning about text preprocessing and sentiment modelling.
- Developers building applications that require sentiment understanding from review data.
Dataset Name Suggestions
- Preprocessed Amazon Review Sentiment
- Gen3EcoDot Sentiment Analysis Dataset
- E-commerce Product Review Polarity
- NLTK TextBlob Sentiment Data
Attributes
Original Data Source: Preprocessed Dataset Sentiment Analysis