Opendatabay APP

Customer Retention Simulation Dataset

Data Science and Analytics

Tags and Keywords

Churn

Netflix

Streaming

Customer

Prediction

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Customer Retention Simulation Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The data simulates customer behaviour for a hypothetical video streaming service, mirroring platforms like Netflix. It consists of 5,000 synthetic records developed using 14 carefully engineered features. The purpose of the data is to facilitate the development of churn prediction models, extract crucial business insights, and enable detailed customer segmentation analysis for Over-The-Top (OTT) platforms.

Columns

  • customer_id: A unique identifier assigned to each simulated customer instance.
  • age: The age range of the user, spanning from 18 to 70.
  • gender: A categorical field detailing the user's gender, with categories including Female, Male, and Other.
  • subscription_type: Indicates the type of subscription held (e.g., Premium, Basic).
  • watch_hours: The total number of viewing hours logged by the customer over a defined period.
  • last_login_days: Measures the time elapsed since the customer's most recent login, up to 60 days.
  • region: Specifies the customer's geographic area, featuring 6 distinct unique regions such as South America and Europe.
  • device: Identifies the primary device used for streaming, including options like Tablet and Laptop.
  • monthly_fee: The recurring cost charged to the customer, ranging from 8.99 to 18.
  • churned: The binary target variable (0 or 1), indicating whether the customer decided to leave the service.
  • payment_method: The type of method used for billing (e.g., Debit Card, PayPal).
  • number_of_profiles: The count of user profiles associated with the main account, ranging from 1 to 5.
  • avg_watch_time_per_day: The calculated average time spent watching content daily.
  • favorite_genre: The user’s preferred content category, such as Drama or Documentary (7 unique types).

Distribution

The information is available as a CSV file, specifically netflix_customer_churn.csv, which has a size of 545.93 kB. It is structured with 14 columns and contains 5,000 distinct records. All records are validated, showing 100% validity with zero missing or mismatched entries across all features.

Usage

This data is ideally suited for:
  • Executing machine learning classification tasks aimed at distinguishing between churning and non-churning users.
  • Performing exploratory data analysis (EDA) to understand underlying patterns in usage and demographics.
  • Developing detailed customer behaviour models specifically tailored for OTT streaming environments.

Coverage

The dataset employs synthetic data and therefore does not have specific real-world geographic or time constraints. It simulates a user population spanning ages 18 to 70, distributed across six regions. Key behavioral metrics, such as viewing hours and subscription fees, are engineered to reflect realistic streaming service dynamics.

License

CC BY-SA 4.0

Who Can Use It

  • Data Scientists: For training predictive models focused on customer retention and attrition.
  • Market Researchers: To segment users based on subscription type, device usage, and viewing habits.
  • Students and Educators: For practical application and study of data analysis and predictive classification in the business domain.

Dataset Name Suggestions

  • Streaming Platform Churn Predictor
  • Customer Retention Simulation Dataset
  • OTT Service User Analytics

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

28/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format