Film Recommendation Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides details for 3772 movies, including their summaries, genres, and release years. It is designed to facilitate the development of movie recommendation systems and Natural Language Processing (NLP) applications by leveraging movie descriptions and categorisations. The data was compiled from IMDb, making it a valuable resource for understanding film content and user preferences.
Columns
- titles: The title of the movie. There are 3716 unique movie titles in the dataset.
- summary: A brief summary of the movie as provided on IMDb. There are 3772 unique summaries, suggesting each movie has a distinct summary.
- genre: The genre(s) assigned to the movie, also sourced from IMDb. Common genres include Comedy, Drama, and Romance, with a significant portion falling into "Other" categories.
- year: The release year of the movie.
Distribution
The dataset comprises 3772 records, with each record representing a single movie. While the specific file format is not stated, data files are typically provided in CSV format. The movies span release years from 1960 to 2020, with the highest count of films released between 2014 and 2020 (817 movies). Genre distribution shows 'Other' as the dominant category (90%), followed by 'Comedy, Drama, Romance' (6%) and 'Drama' (5%).
Usage
This dataset is ideal for:
- Developing and training movie recommendation engines based on content (summaries and genres).
- Conducting Natural Language Processing (NLP) tasks, such as text classification, sentiment analysis, or topic modelling on movie summaries.
- Analysing film trends and historical movie data across different genres and release periods.
- Building applications that allow users to discover new movies based on their preferences.
Coverage
The dataset covers movies released globally within the time range of 1960 to 2020. The distribution of movies across these years is detailed, showing an increasing number of entries in more recent periods. There are no specific demographic notes other than the general scope of global movie releases.
License
CC0
Who Can Use It
- Data Scientists and Machine Learning Engineers: For building and testing movie recommendation algorithms and NLP models.
- Researchers: Studying trends in film, natural language understanding, or information retrieval in the entertainment domain.
- Film Enthusiasts and Analysts: For personal projects or academic studies on movie characteristics and evolution over time.
Dataset Name Suggestions
- IMDb Movie Summaries & Genres
- Film Recommendation Dataset
- Global Movie Archive (1960-2020)
- Cinema Data for NLP & AI
Attributes
Original Data Source: imdb_movies_data