Global Movie Analytics Dataset
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset combines movie information from The Movie Database (TMDb) and IMDb, offering a detailed collection of film-related data. It includes a variety of movie attributes such as titles, release dates, ratings, genres, production companies, and more. This makes it an invaluable resource for film analysis, recommendation systems, and various data science tasks. The dataset notably expands upon a 400k TMDb collection by incorporating features like directors, writers, and movie cast from IMDb.
Columns
- id: A unique identifier for each movie in the dataset. (type: int64)
- title: The movie title in English. (type: object)
- vote_average: The average vote score from TMDb users. (type: float64)
- vote_count: The total number of votes cast for the movie on TMDb. (type: int64)
- status: Indicates whether the movie is released, in production, or in post-production. (type: object)
- release_date: The official release date of the movie in yyyy-mm-dd format. (type: object)
- revenue: The total earnings generated by the movie. (type: int64)
- runtime: The duration of the movie in minutes. (type: int64)
- adult: Indicates whether the movie is rated for adults (True/False). (type: bool)
- backdrop_path: The URL path for the movie’s backdrop image. (type: object)
- budget: The total production budget of the movie. (type: int64)
- homepage: The official website of the movie, if available. (type: object)
- tconst: The IMDb-specific identifier for the movie. (type: object)
- original_language: The original language in which the movie was produced. (type: object)
- original_title: The title of the movie in its original language. (type: object)
- overview: A brief summary or plot description of the movie. (type: object)
- popularity: A popularity score based on TMDb analytics. (type: float64)
- poster_path: The URL path for the movie poster image. (type: object)
- tagline: The movie's tagline. (type: object)
- genres: A list of genres the movie falls under (e.g., Action, Drama, Comedy). (type: object)
- production_companies: A list of companies that produced the movie. (type: object)
- production_countries: A list of countries where the movie was produced. (type: object)
- spoken_languages: A list of languages spoken in the movie. (type: object)
- keywords: Keywords associated with the movie, used for categorisation or search. (type: object)
- directors: A list of directors of the movie. (type: object)
- writers: A list of writers who contributed to the screenplay. (type: object)
- averageRating: The average rating of the movie on IMDb. (type: float64)
- numVotes: The number of votes the movie received on IMDb. (type: float64)
- cast: A list of actors and actresses involved. (type: object)
Distribution
The dataset is provided as a CSV file, named
TMDB IMDB Movies Dataset.csv
, with a size of 277.59 MB. It contains approximately 434,000 records (rows) and 29 columns of movie-related data.Usage
This dataset is ideal for:
- Movie Analysis: Exploring trends in movie genres, production companies, revenue, and ratings.
- Recommendation Systems: Building models to suggest movies based on genres, average ratings, and popularity.
- Sentiment Analysis: Utilising overviews, taglines, and keywords for text analysis to extract sentiments or topics.
- Visualisation: Creating visual representations of movie ratings, revenue versus budget, and genre popularity.
- Predicting Movie Ratings: Forecasting movie ratings (vote_average) using features like revenue, popularity, genre, and runtime to assess their impact on audience reception.
Coverage
The dataset spans a significant time range, with movie release dates from 12th September 1874 to 13th October 2029. Geographic coverage is indicated by
production_countries
, with the United States of America being the most frequent. Spoken languages are also listed, with English being the most common. No specific demographic data beyond the 'adult' rating is provided.License
CC BY-NC-SA 4.0
Who Can Use It
This dataset is suitable for:
- Data Scientists and Analysts: For research into film industry trends, predictive modelling, and statistical analysis.
- Developers: To build and enhance movie recommendation engines or film information applications.
- Researchers: Studying cinematic history, audience reception, and box office performance.
- Educators and Students: For teaching and learning data science principles using real-world film data.
- Movie Enthusiasts: Exploring detailed movie attributes and uncovering insights into their favourite films.
Dataset Name Suggestions
- Merged Movie Data: TMDb & IMDb
- Film Data Insights Hub
- Dual-Source Cinema Dataset
- Movie Ratings and Details Collection
- Global Movie Analytics Dataset
Attributes
Original Data Source: Global Movie Analytics Dataset