Opendatabay APP

Global Movie Analytics Dataset

Product Reviews & Feedback

Tags and Keywords

Movies

Ratings

Cinema

Analysis

Genres

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Global Movie Analytics Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset combines movie information from The Movie Database (TMDb) and IMDb, offering a detailed collection of film-related data. It includes a variety of movie attributes such as titles, release dates, ratings, genres, production companies, and more. This makes it an invaluable resource for film analysis, recommendation systems, and various data science tasks. The dataset notably expands upon a 400k TMDb collection by incorporating features like directors, writers, and movie cast from IMDb.

Columns

  • id: A unique identifier for each movie in the dataset. (type: int64)
  • title: The movie title in English. (type: object)
  • vote_average: The average vote score from TMDb users. (type: float64)
  • vote_count: The total number of votes cast for the movie on TMDb. (type: int64)
  • status: Indicates whether the movie is released, in production, or in post-production. (type: object)
  • release_date: The official release date of the movie in yyyy-mm-dd format. (type: object)
  • revenue: The total earnings generated by the movie. (type: int64)
  • runtime: The duration of the movie in minutes. (type: int64)
  • adult: Indicates whether the movie is rated for adults (True/False). (type: bool)
  • backdrop_path: The URL path for the movie’s backdrop image. (type: object)
  • budget: The total production budget of the movie. (type: int64)
  • homepage: The official website of the movie, if available. (type: object)
  • tconst: The IMDb-specific identifier for the movie. (type: object)
  • original_language: The original language in which the movie was produced. (type: object)
  • original_title: The title of the movie in its original language. (type: object)
  • overview: A brief summary or plot description of the movie. (type: object)
  • popularity: A popularity score based on TMDb analytics. (type: float64)
  • poster_path: The URL path for the movie poster image. (type: object)
  • tagline: The movie's tagline. (type: object)
  • genres: A list of genres the movie falls under (e.g., Action, Drama, Comedy). (type: object)
  • production_companies: A list of companies that produced the movie. (type: object)
  • production_countries: A list of countries where the movie was produced. (type: object)
  • spoken_languages: A list of languages spoken in the movie. (type: object)
  • keywords: Keywords associated with the movie, used for categorisation or search. (type: object)
  • directors: A list of directors of the movie. (type: object)
  • writers: A list of writers who contributed to the screenplay. (type: object)
  • averageRating: The average rating of the movie on IMDb. (type: float64)
  • numVotes: The number of votes the movie received on IMDb. (type: float64)
  • cast: A list of actors and actresses involved. (type: object)

Distribution

The dataset is provided as a CSV file, named TMDB IMDB Movies Dataset.csv, with a size of 277.59 MB. It contains approximately 434,000 records (rows) and 29 columns of movie-related data.

Usage

This dataset is ideal for:
  • Movie Analysis: Exploring trends in movie genres, production companies, revenue, and ratings.
  • Recommendation Systems: Building models to suggest movies based on genres, average ratings, and popularity.
  • Sentiment Analysis: Utilising overviews, taglines, and keywords for text analysis to extract sentiments or topics.
  • Visualisation: Creating visual representations of movie ratings, revenue versus budget, and genre popularity.
  • Predicting Movie Ratings: Forecasting movie ratings (vote_average) using features like revenue, popularity, genre, and runtime to assess their impact on audience reception.

Coverage

The dataset spans a significant time range, with movie release dates from 12th September 1874 to 13th October 2029. Geographic coverage is indicated by production_countries, with the United States of America being the most frequent. Spoken languages are also listed, with English being the most common. No specific demographic data beyond the 'adult' rating is provided.

License

CC BY-NC-SA 4.0

Who Can Use It

This dataset is suitable for:
  • Data Scientists and Analysts: For research into film industry trends, predictive modelling, and statistical analysis.
  • Developers: To build and enhance movie recommendation engines or film information applications.
  • Researchers: Studying cinematic history, audience reception, and box office performance.
  • Educators and Students: For teaching and learning data science principles using real-world film data.
  • Movie Enthusiasts: Exploring detailed movie attributes and uncovering insights into their favourite films.

Dataset Name Suggestions

  • Merged Movie Data: TMDb & IMDb
  • Film Data Insights Hub
  • Dual-Source Cinema Dataset
  • Movie Ratings and Details Collection
  • Global Movie Analytics Dataset

Attributes

Original Data Source: Global Movie Analytics Dataset

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

31/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format