Opendatabay APP

Film Recommendation Dataset

Entertainment & Media Consumption

Tags and Keywords

Movies

Tv

Shows

Nlp

Recommender

Systems

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Film Recommendation Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides details for 3772 movies, including their summaries, genres, and release years. It is designed to facilitate the development of movie recommendation systems and Natural Language Processing (NLP) applications by leveraging movie descriptions and categorisations. The data was compiled from IMDb, making it a valuable resource for understanding film content and user preferences.

Columns

  • titles: The title of the movie. There are 3716 unique movie titles in the dataset.
  • summary: A brief summary of the movie as provided on IMDb. There are 3772 unique summaries, suggesting each movie has a distinct summary.
  • genre: The genre(s) assigned to the movie, also sourced from IMDb. Common genres include Comedy, Drama, and Romance, with a significant portion falling into "Other" categories.
  • year: The release year of the movie.

Distribution

The dataset comprises 3772 records, with each record representing a single movie. While the specific file format is not stated, data files are typically provided in CSV format. The movies span release years from 1960 to 2020, with the highest count of films released between 2014 and 2020 (817 movies). Genre distribution shows 'Other' as the dominant category (90%), followed by 'Comedy, Drama, Romance' (6%) and 'Drama' (5%).

Usage

This dataset is ideal for:
  • Developing and training movie recommendation engines based on content (summaries and genres).
  • Conducting Natural Language Processing (NLP) tasks, such as text classification, sentiment analysis, or topic modelling on movie summaries.
  • Analysing film trends and historical movie data across different genres and release periods.
  • Building applications that allow users to discover new movies based on their preferences.

Coverage

The dataset covers movies released globally within the time range of 1960 to 2020. The distribution of movies across these years is detailed, showing an increasing number of entries in more recent periods. There are no specific demographic notes other than the general scope of global movie releases.

License

CC0

Who Can Use It

  • Data Scientists and Machine Learning Engineers: For building and testing movie recommendation algorithms and NLP models.
  • Researchers: Studying trends in film, natural language understanding, or information retrieval in the entertainment domain.
  • Film Enthusiasts and Analysts: For personal projects or academic studies on movie characteristics and evolution over time.

Dataset Name Suggestions

  • IMDb Movie Summaries & Genres
  • Film Recommendation Dataset
  • Global Movie Archive (1960-2020)
  • Cinema Data for NLP & AI

Attributes

Original Data Source: imdb_movies_data

Listing Stats

VIEWS

21

DOWNLOADS

1

LISTED

27/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format