Opendatabay APP

Synthetic Bank Customer Churn Data

Synthetic Data Generation

Tags and Keywords

Churn

Banking

Classification

Synthetic

Tabular

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Synthetic Bank Customer Churn Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Data for binary classification tasks concerning bank customer attrition. This collection of synthetic records was created for the Playground Series S4 E1 competition and is highly suitable for developing and testing machine learning models that predict whether a customer will exit the bank. The data includes both raw attributes and several engineered features to aid model performance.

Columns

  • Surname: Label Encoded Surnames.
  • Surname_tfidf_0 through Surname_tfidf_4: Features derived by applying a TFIDF Vectorizer to Surnames.
  • Credit Score: A numerical value indicating the customer's credit score, ranging from 350 to 850.
  • Geography: The customer's country of residence (France, Spain, or Germany).
  • Gender: The customer's gender (Male or Female).
  • Age: The customer's age, spanning 18 to 92 years.
  • Tenure: The number of years the customer has maintained an account with the bank, from 0 to 10 years.
  • Balance: The customer's account balance, with a maximum value around 251k.
  • NumOfProducts: The quantity of bank products utilized (e.g., savings account, credit card), ranging from 1 to 4.
  • HasCrCard: Binary indicator (1 = yes, 0 = no) showing credit card ownership.
  • IsActiveMember: Binary indicator (1 = yes, 0 = no) showing active membership status.
  • EstimatedSalary: The estimated salary of the customer, up to 200k.
  • Exited: The target variable, indicating customer churn (1 = yes, 0 = no).
  • Germany, France, Spain: One-Hot Encoded geography features.
  • Male, Female: One-Hot Encoded gender features.
  • Mem__no__Products: Engineered feature calculated as NumOfProducts multiplied by IsActiveMember.
  • Cred_Bal_Sal: Engineered feature calculated as (Credit Score * Balance) / EstimatedSalary.
  • Bal_sal: Engineered feature calculated as Balance / EstimatedSalary.
  • Tenure_Age: Engineered feature calculated as Tenure / Age.
  • Age_Tenure_product: Engineered feature calculated as Age * Tenure.

Distribution

The file is provided in a CSV format, with a size of approximately 36.27 MB. It consists of 25 distinct columns and contains 175,000 records. All variables are present, and the records show no missing values. The underlying data is entirely synthetic.

Usage

This collection is ideal for developing binary classification models, specifically predictive analytics for customer churn risk in the banking sector. It is highly suitable for educational purposes, machine learning competitions, and initial exploration into classification techniques using tabular data. It can also support investment-related analyses regarding banking stability.

Coverage

The geographical scope covers customer records from France, Spain, and Germany. Demographic details include customer ages ranging from 18 to 92 years and bank tenure spanning 0 to 10 years. The data includes both Male and Female customer genders. This is a static collection with no expected future updates.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

Data Scientists and ML Engineers: For training, testing, and benchmarking binary classification algorithms focused on retention strategies. Students and Beginners: Ideal for learning core machine learning concepts due to its clear structure and synthetic nature. Banking Analysts: For simulating and understanding the key drivers behind customer attrition risk within financial institutions.

Dataset Name Suggestions

  • Synthetic Bank Customer Churn Data
  • Financial Attrition Binary Classification
  • Bank Customer Exit Prediction Dataset
  • S4 E1 Churn Analysis Data.

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

07/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format