Customer Retention Simulation Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The data simulates customer behaviour for a hypothetical video streaming service, mirroring platforms like Netflix. It consists of 5,000 synthetic records developed using 14 carefully engineered features. The purpose of the data is to facilitate the development of churn prediction models, extract crucial business insights, and enable detailed customer segmentation analysis for Over-The-Top (OTT) platforms.
Columns
customer_id: A unique identifier assigned to each simulated customer instance.age: The age range of the user, spanning from 18 to 70.gender: A categorical field detailing the user's gender, with categories including Female, Male, and Other.subscription_type: Indicates the type of subscription held (e.g., Premium, Basic).watch_hours: The total number of viewing hours logged by the customer over a defined period.last_login_days: Measures the time elapsed since the customer's most recent login, up to 60 days.region: Specifies the customer's geographic area, featuring 6 distinct unique regions such as South America and Europe.device: Identifies the primary device used for streaming, including options like Tablet and Laptop.monthly_fee: The recurring cost charged to the customer, ranging from 8.99 to 18.churned: The binary target variable (0 or 1), indicating whether the customer decided to leave the service.payment_method: The type of method used for billing (e.g., Debit Card, PayPal).number_of_profiles: The count of user profiles associated with the main account, ranging from 1 to 5.avg_watch_time_per_day: The calculated average time spent watching content daily.favorite_genre: The user’s preferred content category, such as Drama or Documentary (7 unique types).
Distribution
The information is available as a CSV file, specifically
netflix_customer_churn.csv, which has a size of 545.93 kB. It is structured with 14 columns and contains 5,000 distinct records. All records are validated, showing 100% validity with zero missing or mismatched entries across all features.Usage
This data is ideally suited for:
- Executing machine learning classification tasks aimed at distinguishing between churning and non-churning users.
- Performing exploratory data analysis (EDA) to understand underlying patterns in usage and demographics.
- Developing detailed customer behaviour models specifically tailored for OTT streaming environments.
Coverage
The dataset employs synthetic data and therefore does not have specific real-world geographic or time constraints. It simulates a user population spanning ages 18 to 70, distributed across six regions. Key behavioral metrics, such as viewing hours and subscription fees, are engineered to reflect realistic streaming service dynamics.
License
CC BY-SA 4.0
Who Can Use It
- Data Scientists: For training predictive models focused on customer retention and attrition.
- Market Researchers: To segment users based on subscription type, device usage, and viewing habits.
- Students and Educators: For practical application and study of data analysis and predictive classification in the business domain.
Dataset Name Suggestions
- Streaming Platform Churn Predictor
- Customer Retention Simulation Dataset
- OTT Service User Analytics
Attributes
Original Data Source: Customer Retention Simulation Dataset
Loading...
