Taiwan Credit Default Risk Model Data
E-commerce & Online Transactions
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Predicting the probability of customer default is essential for risk management in the financial sector. This collection of data focuses on credit card clients in Taiwan, providing a robust foundation for building and comparing various data mining and machine learning models aimed at forecasting client solvency. The dataset was originally compiled for research comparing the predictive accuracy of six different data mining methods, where an artificial neural network model was found to accurately estimate the real probability of default. It includes detailed demographic information, credit limit details, and six months of historical payment behaviour.
Columns
The collection contains 25 fields, including 23 explanatory variables, one ID, and the binary response variable. Monetary amounts are specified in NT (New Taiwan) dollars.
- ID: Client identification number.
- LIMIT_BAL (X1): Amount of the credit granted (includes both individual and supplementary family credit).
- SEX (X2): Gender (1 = male; 2 = female).
- EDUCATION (X3): Educational background (1 = graduate school; 2 = university; 3 = high school; 4 = others).
- MARRIAGE (X4): Marital status (1 = married; 2 = single; 3 = others).
- AGE (X5): Age in years.
- PAY_0, PAY_2, PAY_3, PAY_4, PAY_5, PAY_6 (X6-X11): Repayment status tracking from September 2005 (PAY_0) down to April 2005 (PAY_6). Status measurement scale ranges from -1 (pay duly) to 9 (payment delay for nine months and above).
- BILL_AMT1 to BILL_AMT6 (X12-X17): Amount of bill statement for six consecutive months, starting with September 2005 (BILL_AMT1) down to April 2005 (BILL_AMT6).
- PAY_AMT1 to PAY_AMT6 (X18-X23): Amount of previous payment made for six consecutive months, starting with September 2005 (PAY_AMT1) down to April 2005 (PAY_AMT6).
- default payment_next_month (Y): The binary response variable indicating default payment in the subsequent month (Yes = 1, No = 0).
Distribution
The dataset is available as a CSV file, typically named
credit_card_default.csv, with a file size of approximately 2.9 MB. It contains 30,000 records or rows, and 25 columns. Based on the data quality report, all variables are fully populated with 100% valid records and zero missing entries.Usage
This data is an ideal resource for:
- Developing and evaluating machine learning models for binary classification and credit scoring.
- Risk management modelling, particularly focused on estimating the probability of default rather than just a simple credible/not credible classification.
- Academic research comparing the performance of different data mining techniques, such as logistic regression, support vector machines, or neural networks, in a financial forecasting context.
- Analysing trends in credit behaviour across different demographic segments.
Coverage
The data covers credit card clients geographically located in Taiwan. The time series data for payment status, bill statements, and previous payments spans six months, specifically from April 2005 to September 2005. Demographic coverage includes gender, education level, marital status, and age for 30,000 clients.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: For training and testing predictive models for financial risk.
- Financial Risk Managers: To understand factors contributing to client default and refine internal scoring systems.
- Academic Researchers: For studies in data mining, financial modelling, and predictive analytics.
Dataset Name Suggestions
- Taiwan Credit Default Risk Model Data
- Client Payment History and Default Prediction
- Taiwanese Credit Card Client Default Payments Data
Attributes
Original Data Source: Taiwan Credit Default Risk Model Data
Loading...
