Bollywood & Regional Film Dataset
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset brings together a collection of Indian movies from IMDb, offering a central repository for film analysis. Its primary purpose is to facilitate exploratory data analysis (EDA), allowing users to uncover trends and patterns within the Indian film industry. It is particularly useful for investigating aspects such as the highest-rated movies by year, the correlation between movie duration and ratings, and the most prolific directors or actors. The dataset provides a foundation for various analytical tasks, from identifying popular releases to predicting future trends based on historical performance.
Columns
The dataset contains 10 distinct columns:
- Name: The title of the movie. There are 13,838 unique names, with 15.5k valid entries.
- Year: The year the movie was released. Approximately 3% of entries are missing, with 15.0k valid entries spanning 102 unique years.
- Duration: The running time of the movie in minutes. A substantial 53% of entries are missing, leaving 7,240 valid entries and 182 unique durations.
- Genre: Categorisation of movies by genre. About 12% of entries are missing, with 13.6k valid entries across 485 unique genres, 'Drama' being the most common.
- Rating: The rating given to the movie. Almost half (49%) of the entries are missing, with 7,919 valid ratings ranging from 1.1 to 10, and a mean of 5.84.
- Votes: The number of votes a movie received. Similar to ratings, 49% of entries are missing, leaving 7,919 valid vote counts. The votes range from 5 to 591k, with a mean of 1.94k.
- Director: The director of the movie. Only 3% of entries are missing, with 15.0k valid entries and 5,938 unique directors.
- Actor 1: The main actor in the movie. 10% of entries are missing, with 13.9k valid entries and 4,718 unique actors.
- Actor 2: The second main actor in the movie. 15% of entries are missing, with 13.1k valid entries and 4,891 unique actors.
- Actor 3: The third main actor in the movie. 20% of entries are missing, with 12.4k valid entries and 4,820 unique actors.
Distribution
The dataset is typically provided as a CSV file, specifically named
IMDb Movies India.csv
, with a file size of 1.38 MB. While an explicit total row count is not provided, the number of valid entries across columns suggests a dataset size of approximately 15,000 records.Usage
This dataset is ideal for a variety of analytical applications. Users can clean the data by handling missing values to prepare it for further manipulation and analysis. It facilitates the exploration of trends such as identifying the year with the best average movie ratings, examining how movie length might influence ratings, and determining the top-performing movies overall or per year. It can also be used to count popular movies released annually, analyse vote distribution, and discover prolific directors or popular actors. Furthermore, it supports general exploratory data analysis and can contribute to predictive modelling related to cinema trends.
Coverage
The dataset focuses exclusively on Indian movies sourced from IMDb. The time range is extensive, covering movies released over 102 unique years, with 2019 being a frequently represented year. There is no specific demographic scope beyond the general audience and participants of the Indian film industry. It is important to note the presence of missing data across several key columns, including Duration, Rating, and Votes, which may require pre-processing before in-depth analysis.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for a wide range of users, including:
- Beginner data enthusiasts looking to gain practical experience with data cleaning and basic analytical tasks.
- Data analysts and data scientists who want to conduct exploratory data analysis on real-world movie data.
- Researchers interested in the trends and dynamics of the Indian film industry.
- Anyone developing applications related to movie recommendations, cinematic history, or entertainment analytics.
Dataset Name Suggestions
- IMDb Indian Movies Dataset
- Indian Cinema Data on IMDb
- Bollywood & Regional Film Dataset
- IMDb India Film Collection
Attributes
Original Data Source: Bollywood & Regional Film Dataset