Opendatabay APP

Film Sentiment Classification Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

Classification

Beginner

Binary

Nlp

Sentiment

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Film Sentiment Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for sentiment analysis of movie reviews, drawing from sources like Rotten Tomatoes and IMDB. It facilitates the training and evaluation of machine learning models to predict movie ratings. The dataset comprises tab-separated files, including a train and test split. Phrases from sentences have been parsed, each with a unique PhraseId and associated SentenceId. The training data includes sentiment labels, which range from 0 (negative) to 4 (positive), or specifically 0 (negative) and 1 (positive) for certain samples. It is suitable for tasks involving the understanding, cleanup, and classification of textual data to infer sentiment.

Columns

The dataset typically includes the following columns:
  • PhraseId: A unique identifier for each parsed phrase.
  • SentenceId: An identifier linking phrases back to their original sentence.
  • text: The actual movie review phrase or a full movie review.
  • label: The associated sentiment label. This can be one of five values (0: negative, 1: somewhat negative, 2: neutral, 3: somewhat positive, 4: positive) or a binary sentiment (0: negative, 1: positive).

Distribution

The dataset is provided in tab-separated files and includes a train and test split for benchmarking purposes. One part of the dataset contains 40,000 training samples. Specifically, for a binary sentiment classification, approximately 20,019 samples are labelled as negative (0) and 19,981 samples are labelled as positive (1). Information regarding the exact total number of rows or records across all files is not explicitly stated, beyond the training sample count.

Usage

This dataset is ideally suited for:
  • Understanding and performing initial cleanup of textual data.
  • Building classification models to predict movie review sentiment or ratings.
  • Comparing the evaluation metrics of various classification algorithms, particularly in Natural Language Processing (NLP).
  • Benchmarking machine learning models for sentiment analysis tasks.

Coverage

The dataset has global coverage, meaning it is not restricted to any specific geographic region. Details on the exact time range or demographic scope of the movie reviews are not specified within the sources. The dataset was listed on 16/06/2025.

License

CCO

Who Can Use It

This dataset is particularly useful for:
  • Machine learning practitioners and data scientists focusing on text analysis.
  • Individuals interested in Natural Language Processing (NLP) and sentiment analysis.
  • Beginners in the field of AI and machine learning looking for a straightforward classification problem.
  • Researchers aiming to develop and test algorithms for movie review sentiment prediction.

Dataset Name Suggestions

  • IMDB Movie Sentiment Analysis
  • Movie Review Sentiment Ratings
  • Film Sentiment Classification Dataset
  • Rotten Tomatoes IMDB Sentiment Data

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free