Film Sentiment Classification Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for sentiment analysis of movie reviews, drawing from sources like Rotten Tomatoes and IMDB. It facilitates the training and evaluation of machine learning models to predict movie ratings. The dataset comprises tab-separated files, including a train and test split. Phrases from sentences have been parsed, each with a unique PhraseId and associated SentenceId. The training data includes sentiment labels, which range from 0 (negative) to 4 (positive), or specifically 0 (negative) and 1 (positive) for certain samples. It is suitable for tasks involving the understanding, cleanup, and classification of textual data to infer sentiment.
Columns
The dataset typically includes the following columns:
- PhraseId: A unique identifier for each parsed phrase.
- SentenceId: An identifier linking phrases back to their original sentence.
- text: The actual movie review phrase or a full movie review.
- label: The associated sentiment label. This can be one of five values (0: negative, 1: somewhat negative, 2: neutral, 3: somewhat positive, 4: positive) or a binary sentiment (0: negative, 1: positive).
Distribution
The dataset is provided in tab-separated files and includes a train and test split for benchmarking purposes. One part of the dataset contains 40,000 training samples. Specifically, for a binary sentiment classification, approximately 20,019 samples are labelled as negative (0) and 19,981 samples are labelled as positive (1). Information regarding the exact total number of rows or records across all files is not explicitly stated, beyond the training sample count.
Usage
This dataset is ideally suited for:
- Understanding and performing initial cleanup of textual data.
- Building classification models to predict movie review sentiment or ratings.
- Comparing the evaluation metrics of various classification algorithms, particularly in Natural Language Processing (NLP).
- Benchmarking machine learning models for sentiment analysis tasks.
Coverage
The dataset has global coverage, meaning it is not restricted to any specific geographic region. Details on the exact time range or demographic scope of the movie reviews are not specified within the sources. The dataset was listed on 16/06/2025.
License
CCO
Who Can Use It
This dataset is particularly useful for:
- Machine learning practitioners and data scientists focusing on text analysis.
- Individuals interested in Natural Language Processing (NLP) and sentiment analysis.
- Beginners in the field of AI and machine learning looking for a straightforward classification problem.
- Researchers aiming to develop and test algorithms for movie review sentiment prediction.
Dataset Name Suggestions
- IMDB Movie Sentiment Analysis
- Movie Review Sentiment Ratings
- Film Sentiment Classification Dataset
- Rotten Tomatoes IMDB Sentiment Data
Attributes
Original Data Source: IMDB Movie Ratings Sentiment Analysis