Dark Mode

Home

Data Categories

AI & ML Data

Film Sentiment Classification Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Film Sentiment Classification Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

Classification

Beginner

Binary

Nlp

Sentiment

Trusted By

Film Sentiment Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for sentiment analysis of movie reviews, drawing from sources like Rotten Tomatoes and IMDB. It facilitates the training and evaluation of machine learning models to predict movie ratings. The dataset comprises tab-separated files, including a train and test split. Phrases from sentences have been parsed, each with a unique PhraseId and associated SentenceId. The training data includes sentiment labels, which range from 0 (negative) to 4 (positive), or specifically 0 (negative) and 1 (positive) for certain samples. It is suitable for tasks involving the understanding, cleanup, and classification of textual data to infer sentiment.

Columns

The dataset typically includes the following columns:

PhraseId: A unique identifier for each parsed phrase.
SentenceId: An identifier linking phrases back to their original sentence.
text: The actual movie review phrase or a full movie review.
label: The associated sentiment label. This can be one of five values (0: negative, 1: somewhat negative, 2: neutral, 3: somewhat positive, 4: positive) or a binary sentiment (0: negative, 1: positive).

Distribution

The dataset is provided in tab-separated files and includes a train and test split for benchmarking purposes. One part of the dataset contains 40,000 training samples. Specifically, for a binary sentiment classification, approximately 20,019 samples are labelled as negative (0) and 19,981 samples are labelled as positive (1). Information regarding the exact total number of rows or records across all files is not explicitly stated, beyond the training sample count.

Usage

This dataset is ideally suited for:

Understanding and performing initial cleanup of textual data.
Building classification models to predict movie review sentiment or ratings.
Comparing the evaluation metrics of various classification algorithms, particularly in Natural Language Processing (NLP).
Benchmarking machine learning models for sentiment analysis tasks.

Coverage

The dataset has global coverage, meaning it is not restricted to any specific geographic region. Details on the exact time range or demographic scope of the movie reviews are not specified within the sources. The dataset was listed on 16/06/2025.

License

CCO

Who Can Use It

This dataset is particularly useful for:

Machine learning practitioners and data scientists focusing on text analysis.
Individuals interested in Natural Language Processing (NLP) and sentiment analysis.
Beginners in the field of AI and machine learning looking for a straightforward classification problem.
Researchers aiming to develop and test algorithms for movie review sentiment prediction.

Dataset Name Suggestions

IMDB Movie Sentiment Analysis
Movie Review Sentiment Ratings
Film Sentiment Classification Dataset
Rotten Tomatoes IMDB Sentiment Data

Attributes

Original Data Source: IMDB Movie Ratings Sentiment Analysis

Listing Stats

VIEWS

DOWNLOADS

LISTED

16/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Film Sentiment Classification Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

Classification

Beginner

Binary

Nlp

Sentiment

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS