Opendatabay APP

IMDB Movies Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

And

Tv

Shows

Nlp

Feature

Extraction

Model

Comparison

Trusted By
Trusted by company1Trusted by company2Trusted by company3
IMDB Movies Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a curated list of films, including pre-release metadata such as release date, runtime, and textual information like cast, directors, and plot overviews. It has been compiled from the IMDB dataset and enhanced with additional details gathered via the TMDB API. This movie metadata is ideal for applying diverse analytical techniques, from predictive modelling to text mining, to uncover data-driven insights and inform decision-making across the entertainment industry. The dataset has undergone a cleaning process.

Columns

  • tconst: The unique IMDB identifier for the title.
  • titleType: The category of the show (e.g., movie, TV series).
  • primaryTitle: The popular name of the film or show.
  • originalTitle: The original name of the film or show.
  • startYear: The release year of the title.
  • runtimeMinutes: The duration of the movie in minutes.
  • genres: The genres associated with the title, typically separated by commas.
  • averageRating: The average IMDB rating for the title.
  • numVotes: The number of user votes received on IMDB.
  • director: The name of the director.

Distribution

The dataset is typically provided as a CSV file. While a specific total number of rows/records is not available, the data includes unique values such as 29,236 distinct IMDB IDs (tconst) and 20,582 unique director names. The number of user votes (numVotes) can range significantly, reaching up to approximately 1.51 million votes for some titles.

Usage

This dataset is suited for:
  • Predictive modelling for audience behaviour or film success.
  • Text mining on movie descriptions, cast, and director information.
  • Developing Natural Language Processing (NLP) models.
  • Feature extraction for analytical purposes.
  • Model comparison studies in the entertainment domain.
  • Gaining insights to support decision-making within the entertainment industry ecosystem.

Coverage

  • Geographic Scope: The dataset's scope is global.
  • Time Range: Film release years (startYear) span from 2001 to 2023.
  • Runtime: Film durations (runtimeMinutes) range from 46 minutes to 140 minutes.
  • Ratings: Average IMDB ratings (averageRating) range from 1 to 10.
  • Votes: The number of user votes (numVotes) ranges from 101 to over 1.5 million.
  • Genre Distribution: Major genres include Drama (15%) and Comedy (7%), with other genres making up the remaining 79%.

License

CC0

Who Can Use It

This dataset is suitable for:
  • Data scientists and analysts interested in film and media analytics.
  • Researchers studying entertainment consumption trends.
  • Developers building recommendation systems or content analysis tools.
  • Anyone needing robust movie metadata for predictive modelling, text mining, or NLP applications.

Dataset Name Suggestions

  • IMDB Movies Dataset
  • Film Metadata Collection
  • Entertainment Industry Films
  • Movie Data for Analytics
  • Global Movie Database

Attributes

Original Data Source: IMDB Movies

Listing Stats

VIEWS

1

DOWNLOADS

2

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format