Film Industry Metadata Catalogue
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A detailed collection featuring Bollywood movies spanning over a century, from 1920 up to 2024. The data includes essential metadata such as film titles, release years, and associated genres. This resource is suitable for film analysis and building recommendation systems, and it is expected to receive annual updates.
Columns
- Movie ID: A randomly generated unique identifier for each film record.
- Title: The name of the movie. It is noted that 12 records currently lack a title, but there are 8117 unique titles overall. The most common title found is "toofan."
- Year: The year the movie was released, with a time range extending from 1920 to 2024.
- Genre: The classification(s) assigned to the movie. Note that multiple genres are delimited by a space rather than a comma. There are 1548 missing values in this column, and "social" is the most frequently occurring primary genre, accounting for 12% of the valid records.
Distribution
The dataset is structured with 4 columns, contained within a CSV file format, with a file size of approximately 318.23 kB. There are 9,443 valid records covering Movie ID and Year. Data quality notes include 1,548 missing entries in the Genre column and 12 missing entries in the Title column.
Usage
This data product is ideally suited for building Movie and TV Show applications. Specific use cases include:
- Developing collaborative filtering or content-based recommender systems.
- Conducting historical film studies and trend analysis across different decades.
- Data manipulation and practice using programming languages like pandas.
Coverage
The collection covers Bollywood films released within the expansive time range of 1920 to 2024. The dataset exhibits varying availability across decades; for example, the period between 1940 and 1951 shows a higher record count (1,194) compared to the most recent years (2013 to 2024), which only account for 246 records.
License
CC0: Public Domain
Who Can Use It
- Data scientists focused on entertainment data and building recommendation models.
- Researchers studying the evolution of Indian cinema genres and production rates.
- Developers needing robust film metadata for mobile or web applications.
- Students practising data cleaning and aggregation techniques.
Dataset Name Suggestions
- Bollywood Film List 1920-2024
- Historical Indian Movie Genres
- Film Industry Metadata Catalogue
- Bollywood Recommender System Data
Attributes
Original Data Source: Film Industry Metadata Catalogue