Film Performance Analytics
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
film information found on the IMDB website. It is specifically tailored for Exploratory Data Analysis, Machine Learning applications, and data visualisation tasks. The data was gathered via web scraping in Python and combined with material from an IMDB repository. It has been refined to include only movies released after 1970 that have amassed over 50,000 ratings. Furthermore, only films with budgets and gross figures stated in USD are included to maintain consistency.
Columns
- id: A unique identifier for the movie, as used by the IMDB repository.
- primaryTitle: The primary title of the movie, presented in English.
- originalTitle: The movie's original title in its native language.
- isAdult: An indicator for parental guidance.
- runtimeMinutes: The total duration of the movie, measured in minutes.
- genres: The categories or genres the movie belongs to.
- averageRating: The final rating of the movie, derived from all submitted ratings.
- numVotes: The total count of votes or ratings received by the movie.
- budget: The total production budget of the movie, expressed in USD.
- gross: The total worldwide earnings of the movie, expressed in USD.
- release_date: The initial release date of the movie.
- directors: The director(s) of the movie.
Distribution
The dataset is typically provided in CSV format, with a sample file
imdb_data.csv
weighing approximately 447.11 kB. It contains 3348 individual observations or records, each described by 12 distinct attributes. The data is structured tabularly, with nearly all columns featuring 100% valid entries. There are 51 missing values for 'gross' (affecting 2% of records) and 5 missing values for 'release_date'. The id
, primaryTitle
, originalTitle
, genres
, directors
, and release_date
columns feature unique entries, while isAdult
is a binary field. Numerical attributes like runtimeMinutes
, averageRating
, numVotes
, budget
, and gross
exhibit varied distributions.Usage
This dataset is ideal for a range of analytical and predictive tasks. It can be used for in-depth Exploratory Data Analysis to uncover trends in the film industry, for building and training Machine Learning models to predict movie success or audience ratings, and for creating insightful visualisations of cinematic performance over time. Researchers can explore the correlation between budget, gross earnings, and audience reception.
Coverage
The dataset focuses on films released globally, with financial figures (budget and gross) consistently denominated in USD. It includes movies released from 1970 onwards and specifically targets films with a significant audience engagement, evidenced by over 50,000 ratings. The last update to the dataset was on 12th November 2023, with updates expected on an annual basis.
License
the Attribution 4.0 International (CC BY 4.0) license.
Who Can Use It
This dataset is suitable for data analysts, data scientists, machine learning engineers, and academic researchers interested in the entertainment industry. Film students can utilise it for studying movie history and trends, while market researchers might analyse audience preferences and box office performance. Individuals looking to develop recommendation systems or predict cinematic outcomes will also find it valuable.
Dataset Name Suggestions
- IMDB Film Metrics
- Movie Industry Data
- Global Cinema Insights
- Film Performance Analytics
- IMDB Film Archive
Attributes
Original Data Source: Film Performance Analytics