Movie Analytics Ready Dataset
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated collection of approximately 45,000 IMDb movie records, spanning from 1874 to 2017. It has been meticulously modified from its original form to enhance ease of use, making it highly user-friendly. Key improvements include the imputation of null entries (using '0' for numerical fields and 'not available' for text fields) and the conversion of dictionary entries into more accessible strings or lists. This dataset is ideal for various analytical and machine learning tasks, including data analytics and recommendation system development.
Columns
The dataset contains 23 distinct columns, each offering unique insights into movies:
- adult: A boolean indicating if the movie is classified as adult content (False or True).
- belongs_to_collection: Details if the movie is part of a collection. Approximately 90% are 'not available'.
- budget: The movie's budget in US dollars, with nulls imputed as 0. The mean budget is approximately $4.27 million.
- original_language: The original language of the movie. English (
en
) accounts for 71% of entries, French (fr
) 5%, with 90 unique languages represented. - original_title: The movie's original title, which may be in any language. There are around 43,000 unique titles.
- overview: A summary of the movie's plot. 'Not available' is common.
- popularity: A numerical score indicating the movie's popularity. The mean popularity score is 2.94.
- release_date: The movie's release date, formatted as YY-MM-DD. There are over 17,000 unique release dates.
- revenue: The movie's revenues in US dollars, with nulls imputed as zeros. The mean revenue is approximately $11.3 million.
- runtime: The movie's length in minutes. The mean runtime is 93.7 minutes.
- tagline: A brief descriptive phrase for the movie. 'Not available' is the most common entry (55%).
- title: The primary title of the movie. There are approximately 41,900 unique titles.
- vote_average: The average rating of the movie on a scale, ranging from 0 to 10. The mean vote average is 5.62.
- vote_count: The number of votes received for the movie. The mean vote count is 111.
- languages: A list of languages featured in the movie. 'English' is the most common entry (49%).
- day_of_week: The day of the week the movie was released. Friday is the most common (31%).
- month: The month the movie was released. January is the most common (13%).
- season: The quarter of the year the movie was released. Q1 and Q4 are equally common (27%).
- year: The year the movie was released, ranging from 1874 to 2017. The majority (44,622 records) fall between 1916 and 2017.
- has_homepage: A boolean indicating if the movie has an associated homepage (True or False).
- genre: A list of genres associated with the movie. 'Drama' is the most common (11%).
- companies: A list of production companies involved in the movie.
- countries: A list of countries associated with the movie's production. The 'United States of America' is the most common (39%), with 2,378 unique countries represented.
Distribution
This dataset comprises 45,000 movie records and is provided as a MOVIES.csv file. The file size is 25.86 MB. It consists of 23 columns, as detailed above, offering a structured tabular format for analysis.
Usage
This dataset is ideally suited for:
- Film industry analysis: Exploring trends in movie production, budgets, and revenues.
- Recommendation system development: Building models to suggest movies to users based on various attributes.
- Data analytics projects: Conducting statistical analysis on movie characteristics, popularity, and success metrics.
- Machine learning tasks: Developing classification models (e.g., predicting movie genre or success) and regression models.
- Academic research: Studying cinematic history, language distribution, and global film production.
Coverage
The dataset covers movies released from 1874 to 2017. Geographically, it includes movies from a wide array of countries, with a significant portion from the United States of America (39%). It also covers a broad range of original languages, with English being predominant.
License
CC0: Public Domain
Who Can Use It
This dataset is perfect for:
- Data scientists looking for rich, pre-processed movie data.
- Machine learning engineers developing recommendation systems or classification models.
- Film researchers and academics studying cinematic trends and history.
- Students and enthusiasts interested in exploring movie data for personal projects.
- Business analysts in the entertainment sector for market insights.
Dataset Name Suggestions
- IMDb Curated Movies Dataset
- Global Movie Database (1874-2017)
- User-Friendly IMDb Films
- Movie Analytics Ready Dataset
- Cinema Data Collection
Attributes
Original Data Source: Movie Analytics Ready Dataset