Popular Movies Recommender Dataset
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features over 9,000 movies, primarily sorted by their popularity, and is designed specifically for building recommender systems. It serves as an excellent resource for learners venturing into the fields of Data Science and Machine Learning, particularly those interested in applying Natural Language Processing (NLP) and Machine Learning models to real-world data. It offers a clear and structured collection of movie attributes suitable for various analytical and model-building tasks.
Columns
- Release_Date: The date when the movie was released. Dates range from 17th April 1902 to 3rd July 2024.
- Title: The official name of the movie. There are over 9,500 unique movie titles within the dataset.
- Overview: A brief summary or synopsis of the movie's plot.
- Popularity: A key metric calculated by TMDB developers, reflecting the movie's appeal based on daily views, votes, and user engagements like "favourite" and "watchlist" additions. Values range from 7.1 to 5.08k.
- Vote_Count: The total number of votes or ratings received from viewers. Counts range from 0 to over 31k.
- Vote_Average: The average rating given by viewers, scored out of 10. The average rating is 6.44, with scores ranging from 0 to 10.
- Original_Language: The original language in which the movie was produced. Dubbed versions are not considered. English (
en
) is the most common language, accounting for 77% of entries. - Genre: The categories or classifications applicable to the movie.
- Poster_Url: The URL link to the movie's poster image.
Distribution
The dataset is provided as a CSV data file, named
mymoviedb.csv
, with a size of 4.21 MB. It contains 9 columns and approximately 9,827 to 9,828 records, depending on the specific column's validity.Usage
This dataset is ideal for:
- Developing movie recommender systems using various machine learning algorithms.
- Practising Natural Language Processing (NLP) techniques on movie summaries and titles.
- Exploring and learning Machine Learning models for data prediction and classification.
- Educational projects and exercises in Data Science.
Coverage
The dataset covers movies released over a broad time range, specifically from 17th April 1902 to 3rd July 2024. While not explicitly geographic, the data is sourced from TMDB, a global movie database. The primary language represented is English, making up 77% of the original languages, with Japanese at 7% and various other languages contributing the remaining 16%.
License
CC0: Public Domain
Who Can Use It
This dataset is primarily intended for learners and aspiring professionals who wish to gain practical experience in Data Science and Machine Learning by building and experimenting with recommender systems. It is also suitable for researchers and developers interested in movie data analysis.
Dataset Name Suggestions
- Popular Movies Recommender Dataset
- TMDB Movie Popularity Data
- Cinema Insights Dataset
- Movie Recommender Systems Learning Data
- Global Movie Popularity Index
Attributes
Original Data Source: Popular Movies Recommender Dataset