IMDB Movie Reviews (Binary Sentiment)
Entertainment & Media Consumption
Related Searches
Trusted By




"No reviews yet"
Free
About
Source
Huggingface Hub: link
About this dataset
This is a large dataset for binary sentiment classification containing a substantial amount of data compared to previous benchmark datasets. Provided are 25,000 highly polar movie reviews for training and 25,000 for testing. There is also additional unlabeled data available for use. The data fields are consistent among all splits of the dataset
How to use the dataset
In order to use this dataset, you will need to first download the IMDB Large Movie Review Dataset. Once you have downloaded the dataset, you can either use it in its original form or split it into training and testing sets. To split the dataset, you will need to create a new file called unsupervised.csv and copy the text column from train.csv into it. You can then split unsupervised.csv into two files: train_unsupervised.csv and test_unsupervised.csv.
Once you have either the original dataset or the training and testing sets, you can begin using them for binary sentiment classification. In order to do this, you will need to use a machine learning algorithm that is capable of performing binary classification, such as logistic regression or support vector machines. Once you have trained your model on the training set, you can then evaluate its performance on the test set by predicting the labels of the reviews in test_unsupervised.csv
Research Ideas
This dataset can be used to train a binary sentiment classification model.
This dataset can be used to train a model to classify movie reviews into positive and negative sentiment categories.
This dataset can be used to build a large movie review database for research purposes
License
CC0
Original Data Source: IMDB Movie Reviews (Binary Sentiment)