Opendatabay APP

Filtered IMDb Movies & TV Shows Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

Exploratory

Nlp

Multiclass

Recommender

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Filtered IMDb Movies & TV Shows Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

Columns

  • SNo.: Serial number for each record.
  • index: An internal index for the record.
  • tconst: A unique identifier for the title.
  • titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.
  • primaryTitle: The most commonly known title for the content.
  • originalTitle: The official original title of the content.
  • isAdult?: A boolean indicator for adult content.
  • startYear: The year the title was released or started.
  • endYear: The year the title concluded (for TV series) or was released.
  • runtimeMinutes: The duration of the content in minutes.
  • Genres: Categories or types of content (multiple values may be present).
  • Average Rating: The average rating of the title as found on IMDb.
  • Num. of Votes: The total number of votes received for the rating on IMDb.
  • Region: The geographic region associated with the title's availability or origin.
  • Number of Ratings Types: Details related to how ratings are categorised.
  • Attributes: Additional characteristics or tags associated with the title.
  • Description: A textual description of the title, likely from Rotten Tomatoes.

Distribution

The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

Usage

This dataset is highly suitable for various analytical and machine learning tasks, including:
  • Developing content-based recommendation systems using genres, descriptions, and ratings.
  • Performing exploratory data analysis on movie and TV show trends.
  • Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction.
  • Executing multi-label classification to predict genres from description data.
  • Clustering movies and shows based on their descriptions and genre attributes.
  • Aiding projects that require cross-content analysis across different media types.

Coverage

The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

License

CCO

Who Can Use It

This dataset is ideal for:
  • Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption.
  • Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms.
  • Researchers: Studying trends in film and television, cross-media analysis, and content categorisation.
  • Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms.
  • Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

Dataset Name Suggestions

  • IMDb Films & Shows with Descriptions
  • Nineties and Beyond IMDb Data
  • Rotten Tomatoes-IMDb Integrated Dataset
  • Filtered IMDb Movies & TV Shows
  • Entertainment Content Analytics Dataset

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free