Opendatabay APP

Film Industry Metadata Catalogue

Product Reviews & Feedback

Tags and Keywords

Bollywood

Movies

Cinema

Genre

Recommender

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Film Industry Metadata Catalogue Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A detailed collection featuring Bollywood movies spanning over a century, from 1920 up to 2024. The data includes essential metadata such as film titles, release years, and associated genres. This resource is suitable for film analysis and building recommendation systems, and it is expected to receive annual updates.

Columns

  • Movie ID: A randomly generated unique identifier for each film record.
  • Title: The name of the movie. It is noted that 12 records currently lack a title, but there are 8117 unique titles overall. The most common title found is "toofan."
  • Year: The year the movie was released, with a time range extending from 1920 to 2024.
  • Genre: The classification(s) assigned to the movie. Note that multiple genres are delimited by a space rather than a comma. There are 1548 missing values in this column, and "social" is the most frequently occurring primary genre, accounting for 12% of the valid records.

Distribution

The dataset is structured with 4 columns, contained within a CSV file format, with a file size of approximately 318.23 kB. There are 9,443 valid records covering Movie ID and Year. Data quality notes include 1,548 missing entries in the Genre column and 12 missing entries in the Title column.

Usage

This data product is ideally suited for building Movie and TV Show applications. Specific use cases include:
  • Developing collaborative filtering or content-based recommender systems.
  • Conducting historical film studies and trend analysis across different decades.
  • Data manipulation and practice using programming languages like pandas.

Coverage

The collection covers Bollywood films released within the expansive time range of 1920 to 2024. The dataset exhibits varying availability across decades; for example, the period between 1940 and 1951 shows a higher record count (1,194) compared to the most recent years (2013 to 2024), which only account for 246 records.

License

CC0: Public Domain

Who Can Use It

  • Data scientists focused on entertainment data and building recommendation models.
  • Researchers studying the evolution of Indian cinema genres and production rates.
  • Developers needing robust film metadata for mobile or web applications.
  • Students practising data cleaning and aggregation techniques.

Dataset Name Suggestions

  • Bollywood Film List 1920-2024
  • Historical Indian Movie Genres
  • Film Industry Metadata Catalogue
  • Bollywood Recommender System Data

Attributes

Original Data Source: Film Industry Metadata Catalogue

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

07/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format