Global Movie Popularity Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides details on the 10,000 most popular films globally, sourced from The Movie Database (TMDb) via its read API. TMDb is a crowd-sourced movie information database widely used by various film-related platforms and applications. The dataset is ideal for film-related analysis, building recommender systems, and natural language processing tasks, even for those new to data analysis, as it contains some missing values.
Columns
- index: An identifier for each record.
- title: The name of the movie.
- overview: A concise summary or synopsis of the movie.
- original_language: The primary language in which the movie was filmed.
- vote_count: The number of votes received for the movie, also indicated as the date of publish in some contexts.
- vote_average: The average rating given to the movie by voters.
- popularity: A metric indicating the popularity score of the movie.
Distribution
The dataset is provided in a CSV file format. It comprises approximately 10,000 individual movie records. While exact row and record counts are not specified, the dataset is structured as tabular data, with each row representing a unique movie entry and columns detailing various attributes.
Usage
This dataset is well-suited for a variety of applications, including:
- Developing and enhancing film-related consoles, websites, and mobile applications.
- Creating movie recommender systems.
- Performing data visualisations related to film trends and popularity.
- Conducting natural language processing (NLP) tasks on movie overviews.
- Data analysis and exploration, particularly for those looking to practise handling missing data.
Coverage
The dataset covers movies from across the world, offering a global scope. While a specific time range for the movies is not explicitly stated, the data is fetched from TMDb, which updates its API periodically. It's noted that the dataset includes some null values where information was missing from the original TMDb database.
License
CCO
Who Can Use It
This dataset is intended for a broad audience including:
- Young analysts: To practise data cleaning and analysis with datasets containing missing values.
- Developers: For integrating movie information into media managers, mobile apps, and social sites.
- Researchers: For studies on movie popularity, audience reception, and content analysis.
- Data scientists: For building and testing machine learning models such as recommender systems and NLP models.
Dataset Name Suggestions
- TMDb Popular Movies
- Global Movie Popularity Dataset
- Top Movies from TMDb API
- Movie Data for Film Analysis
- TMDb Film Insights
Attributes
Original Data Source: Popular Movies of IMDb