Opendatabay APP

TMDB 5000 Movie Financials and Ratings

Product Reviews & Feedback

Tags and Keywords

Movies

Ratings

Revenue

Analytics

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
TMDB 5000 Movie Financials and Ratings Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Movie voting and rating information is provided, covering aspects like budget, revenue, popularity, and languages. This data is ideal for building recommendation systems, performing text classification, and conducting natural language processing (NLP) tasks. It includes financial data, audience metrics, and descriptive details for thousands of films, offering insights into what makes a movie successful.

Columns

  • Budget: The financial budget of the movie.
  • Genres: The genres associated with the movie, such as Drama or Comedy.
  • Homepage: The official website link for the movie.
  • Id: A unique identifier for each movie.
  • Keywords: Keywords or tags related to the movie's plot and themes.
  • Original_language: The original language the movie was produced in.
  • Original_title: The movie's title in its original language.
  • Overview: A brief summary of the movie's plot.
  • Popularity: A metric indicating the movie's popularity.
  • Production_companies: The companies involved in the production of the movie.
  • Production_countries: The countries where the movie was produced.
  • Release_date: The date the movie was released.
  • Revenue: The total revenue generated by the movie.
  • Runtime: The duration of the movie in minutes.
  • Spoken_languages: Languages spoken within the movie.
  • Status: The release status of the movie (e.g., Released).
  • Tagline: The movie's promotional slogan.
  • Title: The title of the movie.
  • Vote_average: The average rating the movie has received.
  • Vote_count: The total number of votes the movie has received.

Distribution

This data is available as a single CSV file (tmdb_5000_movies.csv) with a size of 5.7 MB. The structure consists of 4,804 rows and 20 columns.

Usage

  • Recommendation Systems: Develop algorithms to suggest movies to users based on their viewing history and preferences.
  • Natural Language Processing (NLP): Analyse movie overviews, taglines, and keywords for sentiment analysis and topic modelling.
  • Text Classification: Classify movies into genres based on textual data like the overview.
  • Exploratory Data Analysis: Investigate relationships between budget, revenue, popularity, and audience ratings.
  • Data Visualisation: Create visual representations of trends in the movie industry.

Coverage

The dataset covers movies released between 1916 and 2017. Geographically, it includes films from various production countries, with a significant majority from the United States. While English is the most common original language (94%), 36 other languages are also represented.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For building and training machine learning models for recommendation engines or predictive analytics.
  • NLP Engineers: For tasks involving text analysis, sentiment extraction, and topic modelling on movie descriptions and keywords.
  • Market Researchers: To analyse trends in the film industry, such as the financial performance of different genres.
  • Students and Academics: For research projects related to film studies, data analysis, and computer science.

Dataset Name Suggestions

  • TMDB 5000 Movie Financials and Ratings
  • Movie Performance and Audience Metrics
  • Film Industry Analytics: Budget, Revenue, and Ratings
  • Cinema Insights: A Movie Metrics Collection
  • NLP Movie Dataset with Ratings and Revenue

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

02/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format