Dark Mode

Home

Data Categories

Financial Data

Loan Default Prediction Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Loan Default Prediction Data

Fraud Detection & Risk Management

Tags and Keywords

Finance

Banking

Classification

Investing

Loans

Trusted By

Loan Default Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to aid banks in predicting loan defaults using machine learning. Lending loans is a primary revenue source for banks, but it carries the inherent risk of borrowers defaulting. To mitigate this, banks are looking to leverage machine learning to develop robust models that can classify whether a new borrower is likely to default or not.

The dataset is substantial and includes numerous deterministic factors such as the borrower's income, gender, and loan purpose. Users should be aware that the dataset is subject to strong multicollinearity and contains empty values, presenting a challenge for model development. The primary objective is to clean and understand the dataset, build a classification model to predict loan defaults, fine-tune hyperparameters, and compare the evaluation metrics of various classification algorithms.

Columns

ID: Unique identifier for each record.
year: The year the data was recorded. This dataset primarily covers 2019.
loan_limit: Indicates the type or limit of the loan, with 'cf' (91%) and 'ncf' (7%) as common values.
Gender: The gender of the loan applicant, including categories like 'Male' (28%) and 'Joint' (28%).
approv_in_adv: Status of approval in advance, primarily 'nopre' (84%) and 'pre' (16%).
loan_type: Categorisation of the loan type, with 'type1' being the most common (76%).
loan_purpose: The stated purpose of the loan, with 'p3' (38%) and 'p4' (37%) being frequent.
Credit_Worthiness: Reflects the borrower's credit standing, largely 'l1' (96%).
open_credit: Status of open credit, mostly 'nopc' (100%).
business_or_commercial: Indicates if the loan is for business or commercial purposes, mainly 'nob/c' (86%).
loan_amount: The value of the loan requested, ranging from £16.5k to £3.58m, with a mean of £331k.
rate_of_interest: The interest rate applied to the loan, with values ranging from 0 to 8, and a mean of 4.05.
Interest_rate_spread: The spread in the interest rate, ranging from -3.64 to 3.36, with a mean of 0.44.
Upfront_charges: Any upfront charges associated with the loan, ranging from £0 to £60k, with a mean of £3.22k.
term: The term of the loan, predominantly 360 units (e.g., months), with a mean of 335.
Neg_ammortization: Indicates if negative amortisation is present, mostly 'not_neg' (90%).
interest_only: Specifies if the loan is interest-only, primarily 'not_int' (95%).
lump_sum_payment: Indicates if a lump sum payment is involved, largely 'not_lpsm' (98%).
property_value: The value of the property associated with the loan, ranging from £8k to £16.5m, with a mean of £498k.
construction_type: The type of construction, exclusively 'sb' (100%).
occupancy_type: The occupancy type of the property, primarily 'pr' (93%).
Secured_by: How the loan is secured, exclusively 'home' (100%).
total_units: The number of units, mostly '1U' (99%).
income: The borrower's income, ranging from £0 to £579k, with a mean of £6.96k.
credit_type: The type of credit, with 'CIB' (32%) and 'CRIF' (30%) being common.
Credit_Score: The borrower's credit score, ranging from 500 to 900, with a mean of 700.
co-applicant_credit_type: The co-applicant's credit type, split evenly between 'CIB' (50%) and 'EXP' (50%).
age: The age range of the borrower, with '45-54' (23%) and '35-44' (22%) being common.
submission_of_application: How the application was submitted, mostly 'to_inst' (64%).
LTV: Loan-to-Value ratio, with values ranging from 0.97 to 7.83k, and a mean of 72.7.
Region: The geographic region, with 'North' (50%) and 'south' (43%) being common.
Security_Type: The type of security, exclusively 'direct' (100%).
Status: The target variable, indicating loan default status (0 or 1).
dtir1: Debt-to-income ratio (Dtir 1), ranging from 5 to 61, with a mean of 37.7.

Distribution

The dataset is provided as a CSV file and has a file size of 28.48 MB. It comprises 34 columns. The dataset contains approximately 149,000 records for most columns, although some columns like 'rate_of_interest', 'Interest_rate_spread', 'Upfront_charges', 'property_value', 'income', and 'dtir1' have missing values ranging from 2% to 27%.

Usage

This dataset is ideally suited for:

Developing and testing machine learning classification models to predict loan defaults.
Risk assessment in the financial sector to identify potentially defaulting borrowers.
Performing data cleaning and preprocessing techniques to handle multicollinearity and missing values.
Hyperparameter tuning and comparing performance of various classification algorithms.
Building predictive analytics solutions for banking and lending institutions.

Coverage

The dataset's time range is focused on the year 2019. It includes various demographic and financial factors such as borrower gender, income, age, credit score, loan type, and purpose. No specific geographic coverage is detailed beyond general 'Region' categories like 'North' and 'South'. The dataset notably contains attributes that may exhibit multicollinearity and missing values, which should be addressed during analysis.

License

CC0: Public Domain

Who Can Use It

This dataset is intended for:

Data Scientists and Machine Learning Engineers for building and validating predictive models.
Financial Analysts and Risk Managers within banking institutions for assessing credit risk.
Researchers and Academics studying financial stability, credit behaviour, and predictive modelling.
Students and Beginners in data science looking to gain practical experience with a real-world classification problem.

Dataset Name Suggestions

Loan Default Prediction Data
Bank Loan Risk Classification
Borrower Default Likelihood
Credit Risk Assessment Dataset
Financial Default Predictor

Attributes

Original Data Source: Loan Default Prediction Data

Listing Stats

VIEWS

459

DOWNLOADS

LISTED

14/07/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...