Opendatabay APP

Global Netflix Titles Database

Entertainment & Media Consumption

Tags and Keywords

Movies

Beginner

Text

Exploratory

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Global Netflix Titles Database Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides detailed information about Netflix shows and movies, collected through web scraping. It contains raw, unlabelled text data for approximately 8,807 unique titles, including full details such as cast, release year, rating, and description. The dataset's primary purpose is to offer a structured overview of the Netflix content library, suitable for analysis and research into media consumption and content trends.

Columns

  • show_id: A unique identifier for each show or movie.
  • type: Indicates whether the entry is a 'Movie' (approximately 70% of entries) or a 'TV Show' (approximately 30% of entries).
  • title: The name of the show or movie.
  • director: The director(s) of the show or movie.
  • cast: The overall cast involved in the production.
  • country: The country where the show or movie was released, with the United States accounting for about 32% of entries and India for about 11%.
  • date_added: The date the content was added to Netflix, with entries ranging from 1st January 2008 to 25th September 2021.
  • release_year: The original release year of the content, spanning from 1925 to 2021.
  • rating: The Netflix content rating, such as 'TV-MA' (around 36%) and 'TV-14' (around 25%).
  • duration: The duration of the content, which for TV shows often indicates the number of seasons (e.g., '1 Season' for approximately 20% of TV shows).

Distribution

The dataset comprises around 8,807 unique entries representing Netflix shows and movies. The raw data is typically available in a CSV file format and was obtained via web scraping using Selenium. Specific numbers for rows or records are available through the count of unique values mentioned for show_id, type, and title.

Usage

This dataset is ideally suited for exploratory data analysis (EDA), allowing users to uncover patterns and insights within Netflix's content. It is also highly valuable for natural language processing (NLP) tasks, given its text-based descriptions and details. Ideal applications include building recommendation systems, analysing content trends over time, understanding regional content distribution, and studying rating patterns. Its structure makes it suitable for beginners in data analysis.

Coverage

  • Geographic: The dataset's content originates from various countries globally, with specific notes on significant contributions from the United States and India. The overall region coverage is global.
  • Time Range: The date_added column covers content added to Netflix between 1st January 2008 and 25th September 2021. The release_year column indicates content released between 1925 and 2021.
  • Demographic Scope: While no explicit demographic data is present, the inclusion of content ratings (e.g., TV-MA, TV-14) provides an implicit indication of target audiences for specific titles.

License

CCO

Who Can Use It

This dataset is perfect for data analysts, researchers, students, and developers interested in media consumption, streaming services, and content analysis. It is particularly useful for those looking to perform exploratory data analysis or engage in NLP projects related to movie and TV show descriptions. Its beginner-friendly nature makes it accessible for those new to data science.

Dataset Name Suggestions

  • Netflix Content Metadata
  • Global Netflix Titles Database
  • Netflix Streaming Catalog
  • Netflix Shows and Films Data
  • Netflix Content Analysis Dataset

Attributes

Original Data Source: Dataset: NetFlix Shows

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free