Opendatabay APP

Netflix Data Analysis Project

Product Reviews & Feedback

Tags and Keywords

Netflix

Movies

Tv

Streaming

Data

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Netflix Data Analysis Project Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains cleaned information about content available on Netflix, including movies, TV shows, and original productions. It covers content added to the streaming service between 2008 and 2021, with the oldest content dating back to 1925. The dataset has undergone a thorough cleaning process using PostgreSQL and is designed to facilitate data analysis and visualisation using tools like Tableau. It offers insights into content types, geographical distribution, addition trends over time, top directors and genres, and content ratings.

Columns

  • show_id: A unique identifier for each movie or TV show entry.
  • type: Specifies whether the content is a "Movie" or a "TV Show". Movies constitute approximately 70% of the dataset, while TV Shows make up about 30%.
  • title: The name or title of the content.
  • director: The director(s) associated with the content. Missing values in this column were addressed by populating them based on relationships with cast members or by assigning 'Not Given' where no relationship could be established.
  • country: The primary country of production for the content. This column was cleaned to ensure a single country entry per record, typically the first country listed if multiple were present. Missing values were handled similarly to the director column.
  • date_added: The specific date on which the content was added to the Netflix platform. A small number of records with missing dates were removed.
  • release_year: The original year the content was released.
  • rating: The content rating, indicating its suitability for different audiences (e.g., TV-MA, which signifies content for mature audiences). Records with missing ratings were removed.
  • duration: The running time of movies or the number of seasons for TV shows. Records with missing durations were removed.
  • listed_in: The genre or categories under which the content is listed.

Distribution

The dataset is provided as a CSV (Comma Separated Values) file, named netflix1.csv. It has a file size of 1.07 MB. The dataset comprises 8790 records (rows) and 10 distinct columns, providing a structured overview of Netflix content.

Usage

This dataset is ideal for:
  • Data Cleaning Practice: Utilising SQL (specifically PostgreSQL) to practise handling null values, identifying and removing duplicates, populating missing data, and splitting columns for refined analysis.
  • Data Visualisation: Creating compelling visualisations with tools like Tableau to explore content trends, such as the proportion of movies versus TV shows, geographical distribution of content, and how content additions have evolved over the years.
  • Content Analysis: Identifying top directors, most frequent genres (e.g., Drama & International Movies, Documentary), and prevalent content ratings on the platform.
  • Time Series Analysis: Examining the number of contents added annually and identifying peak addition years, as well as comparing movie and TV show addition rates over time.
  • Educational Purposes: Serving as an exercise to enhance data cleaning, analysis, and visualisation skills.

Coverage

  • Geographic Scope: The dataset includes content from various countries globally. Notable countries with the highest volume of content include the United States of America, India, and the United Kingdom.
  • Time Range: The dataset covers content added to Netflix from 2008 to 2021. However, the release years of the content itself range from as early as 1925 up to 2021, demonstrating a wide historical span of content available.
  • Demographic Scope: While not directly demographic, content ratings provide an indication of the intended audience for various content pieces, with TV-MA being the most common rating.

License

CC0: Public Domain

Who Can Use It

This dataset is particularly suitable for:
  • Data Analysts and Scientists: Those seeking to refine their data manipulation and cleaning skills using SQL, and to perform in-depth analysis of streaming service content.
  • Business Intelligence Professionals: Individuals interested in understanding content trends, market distribution, and historical data patterns within the streaming industry.
  • Students and Educators: A practical resource for learning and teaching data cleaning, database management (PostgreSQL), and data visualisation (Tableau) techniques.
  • Researchers: Anyone studying media trends, content production, and audience consumption patterns on major streaming platforms.

Dataset Name Suggestions

  • Netflix Content Insights
  • Global Streaming Catalogue
  • Movies and TV Shows on Netflix
  • Netflix Data Analysis Project
  • StreamFlow: Netflix Content

Attributes

Original Data Source: Netflix Data Analysis Project

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format