IMDB Movies Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated list of films, including pre-release metadata such as release date, runtime, and textual information like cast, directors, and plot overviews. It has been compiled from the IMDB dataset and enhanced with additional details gathered via the TMDB API. This movie metadata is ideal for applying diverse analytical techniques, from predictive modelling to text mining, to uncover data-driven insights and inform decision-making across the entertainment industry. The dataset has undergone a cleaning process.
Columns
- tconst: The unique IMDB identifier for the title.
- titleType: The category of the show (e.g., movie, TV series).
- primaryTitle: The popular name of the film or show.
- originalTitle: The original name of the film or show.
- startYear: The release year of the title.
- runtimeMinutes: The duration of the movie in minutes.
- genres: The genres associated with the title, typically separated by commas.
- averageRating: The average IMDB rating for the title.
- numVotes: The number of user votes received on IMDB.
- director: The name of the director.
Distribution
The dataset is typically provided as a CSV file. While a specific total number of rows/records is not available, the data includes unique values such as 29,236 distinct IMDB IDs (
tconst
) and 20,582 unique director names. The number of user votes (numVotes
) can range significantly, reaching up to approximately 1.51 million votes for some titles.Usage
This dataset is suited for:
- Predictive modelling for audience behaviour or film success.
- Text mining on movie descriptions, cast, and director information.
- Developing Natural Language Processing (NLP) models.
- Feature extraction for analytical purposes.
- Model comparison studies in the entertainment domain.
- Gaining insights to support decision-making within the entertainment industry ecosystem.
Coverage
- Geographic Scope: The dataset's scope is global.
- Time Range: Film release years (
startYear
) span from 2001 to 2023. - Runtime: Film durations (
runtimeMinutes
) range from 46 minutes to 140 minutes. - Ratings: Average IMDB ratings (
averageRating
) range from 1 to 10. - Votes: The number of user votes (
numVotes
) ranges from 101 to over 1.5 million. - Genre Distribution: Major genres include Drama (15%) and Comedy (7%), with other genres making up the remaining 79%.
License
CC0
Who Can Use It
This dataset is suitable for:
- Data scientists and analysts interested in film and media analytics.
- Researchers studying entertainment consumption trends.
- Developers building recommendation systems or content analysis tools.
- Anyone needing robust movie metadata for predictive modelling, text mining, or NLP applications.
Dataset Name Suggestions
- IMDB Movies Dataset
- Film Metadata Collection
- Entertainment Industry Films
- Movie Data for Analytics
- Global Movie Database
Attributes
Original Data Source: IMDB Movies