Global Cinematic Database
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset features data on over 10,000 films from TMDB, gathered using the TMDB API. It encompasses details such as film identifiers, titles, release dates, average votes, vote counts, overviews, and popularity metrics. The dataset may contain null values where information was not available from the TMDB database. It is particularly useful for new analysts looking to practise handling missing data and for developing film recommendation systems.
Columns
id
: Unique identifier for the film.title
: The name of the film.overview
: A brief summary or synopsis of the film.release_date
: The original release date of the film.popularity
: A numerical score indicating the film's popularity.vote_average
: The average vote score received by the film.vote_count
: The total number of votes cast for the film.
Distribution
The dataset contains information on over 10,000 films. The data is typically available in CSV format, structured as a pandas DataFrame. It includes unique identifiers for nearly 10,000 films. Release dates span from 17th April 1902 to 7th September 2022. Popularity scores vary widely, with the majority falling into the lower ranges but some reaching high values. Vote counts also show a broad distribution, and average vote scores range from approximately 5.00 to 8.70. Some fields within the dataset may contain null values.
Usage
This dataset is ideal for:
- Developing and testing film recommendation systems.
- Practising data cleaning and handling of missing values, particularly beneficial for new data analysts.
- Exploratory data analysis of film trends and audience reception.
Coverage
The dataset's coverage is global. It includes films released between 17th April 1902 and 7th September 2022. No specific demographic scope is noted; coverage is based on films available through the TMDB API.
License
CC0
Who Can Use It
- Data Analysts: Especially those new to data analysis, to gain experience with data manipulation and missing value imputation.
- Machine Learning Engineers: For building and evaluating film recommendation algorithms.
- Researchers: Studying film industry trends, audience preferences, and cinematic history.
- Developers: Creating applications that require film metadata.
Dataset Name Suggestions
- TMDB Movies Data
- Film Insights Collection
- Global Cinematic Database
- Movie Popularity and Ratings
- Open Film Dataset
Attributes
Original Data Source: TMDB MOVIES DATASET