Opendatabay APP

Popular Movies Recommender Dataset

News & Media Articles

Tags and Keywords

Movie

Dataset

Recommender

Machine

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Popular Movies Recommender Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset features over 9,000 movies, primarily sorted by their popularity, and is designed specifically for building recommender systems. It serves as an excellent resource for learners venturing into the fields of Data Science and Machine Learning, particularly those interested in applying Natural Language Processing (NLP) and Machine Learning models to real-world data. It offers a clear and structured collection of movie attributes suitable for various analytical and model-building tasks.

Columns

  • Release_Date: The date when the movie was released. Dates range from 17th April 1902 to 3rd July 2024.
  • Title: The official name of the movie. There are over 9,500 unique movie titles within the dataset.
  • Overview: A brief summary or synopsis of the movie's plot.
  • Popularity: A key metric calculated by TMDB developers, reflecting the movie's appeal based on daily views, votes, and user engagements like "favourite" and "watchlist" additions. Values range from 7.1 to 5.08k.
  • Vote_Count: The total number of votes or ratings received from viewers. Counts range from 0 to over 31k.
  • Vote_Average: The average rating given by viewers, scored out of 10. The average rating is 6.44, with scores ranging from 0 to 10.
  • Original_Language: The original language in which the movie was produced. Dubbed versions are not considered. English (en) is the most common language, accounting for 77% of entries.
  • Genre: The categories or classifications applicable to the movie.
  • Poster_Url: The URL link to the movie's poster image.

Distribution

The dataset is provided as a CSV data file, named mymoviedb.csv, with a size of 4.21 MB. It contains 9 columns and approximately 9,827 to 9,828 records, depending on the specific column's validity.

Usage

This dataset is ideal for:
  • Developing movie recommender systems using various machine learning algorithms.
  • Practising Natural Language Processing (NLP) techniques on movie summaries and titles.
  • Exploring and learning Machine Learning models for data prediction and classification.
  • Educational projects and exercises in Data Science.

Coverage

The dataset covers movies released over a broad time range, specifically from 17th April 1902 to 3rd July 2024. While not explicitly geographic, the data is sourced from TMDB, a global movie database. The primary language represented is English, making up 77% of the original languages, with Japanese at 7% and various other languages contributing the remaining 16%.

License

CC0: Public Domain

Who Can Use It

This dataset is primarily intended for learners and aspiring professionals who wish to gain practical experience in Data Science and Machine Learning by building and experimenting with recommender systems. It is also suitable for researchers and developers interested in movie data analysis.

Dataset Name Suggestions

  • Popular Movies Recommender Dataset
  • TMDB Movie Popularity Data
  • Cinema Insights Dataset
  • Movie Recommender Systems Learning Data
  • Global Movie Popularity Index

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

13/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format