IMDb Netflix Movie & TV Show Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a detailed collection of IMDb top Netflix movies and TV shows, enabling advanced data analysis and predictive modelling. It is particularly valuable for feature extraction, exploratory data analysis (EDA), and building recommendation systems. The dataset's purpose is to facilitate the creation of models that can predict movie or TV show genres based on various attributes. The data was successfully scraped from the IMDb website, reflecting a beginner's journey into web scraping with advanced Python.
Columns
- MOVIES: Contains the names of the movies or TV shows, with 6817 unique values and 9999 valid entries.
- YEAR: Indicates the year of telecast for the movie or TV show. It has 438 unique values, with common entries like (2020– ) and (2021– ). 9355 entries are valid.
- GENRE: Lists various genres, highly valuable for recommendation systems. It has 510 unique genres, with Comedy being the most common. 9919 entries are valid.
- RATING: Represents the audience's rating for the movie or TV show, ranging from 1.1 to 9.9. The mean rating is 6.92, with 8179 valid entries.
- ONE-LINE: Provides a short description or first impression summary of the movie or TV show. All 9999 entries are valid, with 8688 unique descriptions.
- STARS: Lists the casting artists, indicating the main actors. All 9999 entries are valid, with 7877 unique star listings.
- VOTES: Shows the audience's expressed views, useful for identifying the impact of the content. Valid for 8179 entries, with a mean of 15.1k votes.
- RunTime: Specifies the duration or running time of the content. Valid for 7041 entries, with a mean runtime of 68.7 units.
- Gross: Represents the total amount earned worldwide. Only 460 entries are valid, with a significant majority being missing.
Distribution
The dataset is provided as a CSV file, specifically
movies.csv
, with a size of 3.11 MB. It contains more than 9 columns, with a total of 9999 rows/records based on the most populated columns.Usage
This dataset is ideal for several applications, including:
- Feature Extraction: Deriving meaningful features from raw data for machine learning models.
- Exploratory Data Analysis (EDA): Gaining insights into data patterns and distributions.
- Recommendation Systems: Building models to suggest movies or TV shows to users.
- Genre Prediction Models: Developing predictive models to determine the genre of unclassified content.
Coverage
The dataset covers top movies and TV shows available on Netflix, with data originally scraped from IMDb. The time range for telecast years includes recent years such as (2020– ) and (2021– ), alongside a wide variety of other years, encompassing 438 unique year entries. The scope is global, reflecting Netflix's and IMDb's international reach.
License
CC0 Public Domain
Who Can Use It
This dataset is suitable for:
- Data Scientists and Analysts: For conducting deep dives into movie and TV show data.
- Machine Learning Engineers: For developing and training recommendation and genre prediction models.
- Beginners in Data Science: To practice web scraping, feature engineering, and basic predictive modelling.
- Researchers: Studying trends in entertainment content and audience reception.
Dataset Name Suggestions
- IMDb Netflix Movie & TV Show Data
- Netflix IMDb Entertainment Dataset
- Movie & TV Show Analysis Data
- IMDb Netflix Content Insights
- Global Movie TV Dataset
Attributes
Original Data Source: IMDb Netflix Movie & TV Show Data