Rotten Tomatoes Binary Sentiment Dataset
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Sifting through thousands of movie reviews from Rotten Tomatoes reveals the nuanced ways in which audiences express sentiment. This collection contains over ten thousand processed sentences, balanced between 5,331 positive and 5,331 negative critiques. Originally utilised in foundational academic research by Bo Pang and Lillian Lee in 2005, it provides a standard benchmark for evaluating how effectively computational models can distinguish between praise and criticism. Its significance lies in its structured simplicity, making it a staple for anyone exploring the field of text classification or natural language processing.
Columns
- text: The actual content of the movie review, provided as a string of text.
- label: A binary classification marker indicating whether the review is positive or negative.
Distribution
The records are organised into three separate comma-separated values files titled
train.csv, test.csv, and validation.csv. For example, the test file occupies approximately 127.72 kB and contains 1,066 unique records. The collection boasts 100% validity across its fields with no missing or mismatched data. It maintains a usability score of 10.00 and is archived as a static resource with no future updates planned.Usage
This resource is ideal for training and testing machine learning algorithms to identify sentiment within cinematic critiques. It is well-suited for categorising reviews by emotional tone or clustering similar critiques together to identify common thematic threads. Researchers can also use it to compare the performance of various text-processing libraries, such as NLTK, TextBlob, and Scikit-learn, against human-rated benchmarks.
Coverage
The scope is focused on film criticism sourced from the Rotten Tomatoes platform. Temporally, it traces back to data first utilised in a 2005 study, representing a significant historical period in digital movie reviews. The demographic focus is centred on public reviewers whose sentiments were categorised into equal positive and negative samples to ensure a balanced analytical view.
License
CC0: Public Domain
Who Can Use It
Data scientists can leverage these records to build and refine sentiment analysis classifiers using algorithms like random forests or support vector machines. Academic researchers may utilise the balanced labels to benchmark the accuracy of new natural language processing techniques. Additionally, students can use the clean, structured format to practice fundamental text pre-processing tasks such as tokenization, lemmatization, and part-of-speech tagging.
Dataset Name Suggestions
- Rotten Tomatoes Movie Review Sentiment Archive
- Cinematic Sentiment Analysis: Positive and Negative Review Log
- The Pang and Lee Movie Review Sentiment Collection
- Benchmark Film Reviews for Text Classification
- Rotten Tomatoes Binary Sentiment Dataset
Attributes
Original Data Source: Rotten Tomatoes Binary Sentiment Dataset
Loading...
Free
Download Dataset in ZIP Format
Recommended Datasets
Loading recommendations...
