Opendatabay APP

Imbalanced Client Risk Prediction Data

Data Science and Analytics

Tags and Keywords

Imbalance

Credit

Risk

Client

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Imbalanced Client Risk Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This starter set focuses on client risk classification using data afflicted by class imbalance. This condition, where observations are disproportionately distributed across categories, is a common and challenging problem in machine learning classification tasks. The imbalance renders standard metrics like accuracy ineffective for reliably measuring model performance, making model training trickier. The dataset is particularly useful for developing predictive models in critical domains such as anti-fraud and anti-spam systems.

Columns

  • month: The month of the associated purchase.
  • credit_amount: The amount requested for the loan.
  • credit_term: The duration or terms of the loan.
  • age: The customer's age (ranging from 18 to 90).
  • sex: The customer's gender (Male or Female).
  • education: The level of education attained by the customer (e.g., Secondary special education, Higher education).
  • product_type: The category of the purchased product (e.g., Cell phones, Household appliances).
  • having_children_flg: A binary flag indicating the presence of children associated with the client.
  • region: The customer location category.
  • income: The customer's total income (ranging up to 401k).
  • family_status: The client's familial status (e.g., Married).
  • phone_operator: The mobile operator category used by the client.
  • is_client: A flag indicating if the individual is an existing client of the institution.
  • bad_client_target: The classification target variable, indicating whether the client is high-risk.

Distribution

The data is delivered as a CSV file named clients.csv and contains 1,723 valid records across 14 columns. The dataset is entirely clean, showing zero missing or mismatched entries for all attributes. Its defining structural feature is the significant class imbalance within the bad_client_target variable: approximately 1,527 records belong to the majority class (0), while only 196 records belong to the minority class (1).

Usage

This dataset is ideal for practitioners seeking to mitigate the challenges posed by data imbalance. Ideal applications include benchmarking classification algorithms on skewed data, experimenting with cost-sensitive training methods, applying sampling techniques (such as up-sampling the minority class or down-sampling the majority class), and developing predictive risk assessment models using tree-based algorithms. It is specifically built for identifying rare or high-risk events.

Coverage

The data covers various customer demographics, including customer age (from 18 up to 90), gender (54% male), and diverse categories of education (Secondary special education is the most common at 49%) and family status. Financially, it spans loan requests between 5,000 and 301k and customer incomes up to 401k. The temporal scope is defined by the month of purchase and includes categorised customer location data (region).

License

CC0: Public Domain

Who Can Use It

  • Machine Learning Specialists: Seeking to refine performance metrics and modelling strategies for classification problems where data classes are heavily skewed.
  • Banking and Finance Researchers: Analysing how customer profiles relate to the probability of default or loan risk.
  • Data Scientists: Learning practical methods for dealing with real-world complexities like highly unbalanced class distributions.
  • Risk Management Developers: Building and testing predictive engines to identify high-risk clients or potential fraudulent activities.

Dataset Name Suggestions

  • Imbalanced Client Risk Prediction Data
  • Starter Credit Classification Set
  • Financial Risk Scoring Data with Skewed Classes
  • Imbalanced Customer Profile Data

Attributes

Listing Stats

VIEWS

5

DOWNLOADS

1

LISTED

15/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format