Global Netflix Titles Database
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides detailed information about Netflix shows and movies, collected through web scraping. It contains raw, unlabelled text data for approximately 8,807 unique titles, including full details such as cast, release year, rating, and description. The dataset's primary purpose is to offer a structured overview of the Netflix content library, suitable for analysis and research into media consumption and content trends.
Columns
- show_id: A unique identifier for each show or movie.
- type: Indicates whether the entry is a 'Movie' (approximately 70% of entries) or a 'TV Show' (approximately 30% of entries).
- title: The name of the show or movie.
- director: The director(s) of the show or movie.
- cast: The overall cast involved in the production.
- country: The country where the show or movie was released, with the United States accounting for about 32% of entries and India for about 11%.
- date_added: The date the content was added to Netflix, with entries ranging from 1st January 2008 to 25th September 2021.
- release_year: The original release year of the content, spanning from 1925 to 2021.
- rating: The Netflix content rating, such as 'TV-MA' (around 36%) and 'TV-14' (around 25%).
- duration: The duration of the content, which for TV shows often indicates the number of seasons (e.g., '1 Season' for approximately 20% of TV shows).
Distribution
The dataset comprises around 8,807 unique entries representing Netflix shows and movies. The raw data is typically available in a CSV file format and was obtained via web scraping using Selenium. Specific numbers for rows or records are available through the count of unique values mentioned for
show_id
, type
, and title
.Usage
This dataset is ideally suited for exploratory data analysis (EDA), allowing users to uncover patterns and insights within Netflix's content. It is also highly valuable for natural language processing (NLP) tasks, given its text-based descriptions and details. Ideal applications include building recommendation systems, analysing content trends over time, understanding regional content distribution, and studying rating patterns. Its structure makes it suitable for beginners in data analysis.
Coverage
- Geographic: The dataset's content originates from various countries globally, with specific notes on significant contributions from the United States and India. The overall region coverage is global.
- Time Range: The
date_added
column covers content added to Netflix between 1st January 2008 and 25th September 2021. Therelease_year
column indicates content released between 1925 and 2021. - Demographic Scope: While no explicit demographic data is present, the inclusion of content ratings (e.g., TV-MA, TV-14) provides an implicit indication of target audiences for specific titles.
License
CCO
Who Can Use It
This dataset is perfect for data analysts, researchers, students, and developers interested in media consumption, streaming services, and content analysis. It is particularly useful for those looking to perform exploratory data analysis or engage in NLP projects related to movie and TV show descriptions. Its beginner-friendly nature makes it accessible for those new to data science.
Dataset Name Suggestions
- Netflix Content Metadata
- Global Netflix Titles Database
- Netflix Streaming Catalog
- Netflix Shows and Films Data
- Netflix Content Analysis Dataset
Attributes
Original Data Source: Dataset: NetFlix Shows