Loan Default Risk Assessment
Finance & Banking Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for predicting whether an individual will default on a bank loan. Banks face substantial financial losses when customers fail to repay loans, significantly affecting economic growth. By analysing various attributes such as funded amount, loan details, and financial behaviour, this dataset enables the development of models to identify potential defaulters, thereby helping financial institutions mitigate risks and prevent monetary losses. This data originates from a MachineHack - Deloitte Hackathon.
Columns
- ID: Unique identifier for each loan record.
- Loan Amount: The principal amount of the loan applied for.
- Funded Amount: The amount committed by the bank for the loan.
- Funded Amount Investor: The portion of the loan amount funded by investors.
- Term: The duration of the loan.
- Batch Enrolled: An identifier for the batch the loan was processed in.
- Interest Rate: The annual interest rate on the loan.
- Grade: The loan grade assigned by the lending institution.
- Sub Grade: A more detailed sub-grade of the loan.
- Employment Duration: The length of the borrower's employment.
- Home Ownership: The borrower's home ownership status (e.g., MORTGAGE, RENT, OWN).
- Verification Status: Indicates if the borrower's income was verified.
- Payment Plan: Whether a payment plan is in place.
- Loan Title: The purpose or title of the loan (e.g., Debt Consolidation, Credit card refinancing, Home improvement, Green loan).
- Debit to Income: The borrower's debt-to-income ratio.
- Delinquency - two years: Number of past due payments in the last two years.
- Inquires - six months: Number of credit inquiries in the last six months.
- Open Account: Number of open credit lines.
- Public Record: Number of derogatory public records.
- Revolving Balance: Outstanding balance on a revolving credit line.
- Revolving Utilities: The amount of credit line used relative to total available credit.
- Total Accounts: The total number of credit accounts a borrower has.
- Initial List Status: The initial listing status of the loan.
- Total Received Interest: The total interest amount received to date.
- Total Received Late Fee: The total late fees collected.
- Recoveries: Amount recovered after a loan default.
- Collection Recovery Fee: Fee associated with collection efforts.
- Collection 12 months Medical: Number of medical collections in the past 12 months.
- Application Type: Type of loan application (e.g., INDIVIDUAL).
- Last week Pay: Payments made in the last week.
- Accounts Delinquent: Number of accounts on which the borrower is delinquent.
- Total Collection Amount: The total amount in collection.
- Total Current Balance: The current total balance across all accounts.
- Total Revolving Credit Limit: The total revolving credit line available.
- Loan Status: The target variable, indicating whether the borrower is a defaulter (1) or non-defaulter (0).
Distribution
The data is typically provided in CSV file format. The dataset consists of a training set with 67,463 rows and 35 columns, and a testing set comprising 28,913 rows and 34 columns. A sample of the data is available for review.
Usage
This dataset is ideal for developing and evaluating machine learning models aimed at predicting loan defaulters. It is particularly useful for:
- Building credit risk assessment models.
- Financial fraud detection.
- Optimising lending strategies for banks.
- Academic research in finance and predictive analytics.
- Participants in hackathons focused on financial prediction.
Coverage
While specific geographic or time-based coverage is not detailed, the data originates from a MachineHack - Deloitte Hackathon. Deloitte operates globally across 150 countries, suggesting the potential for broad applicability of insights derived from such financial datasets. The focus is on individual loan accounts.
License
CC0: Public Domain
Who Can Use It
This dataset is intended for:
- Data Scientists and Machine Learning Engineers for building and testing predictive models.
- Financial Analysts and Risk Managers in banking and lending institutions for strategic decision-making.
- Researchers and Academics studying credit risk, financial stability, and predictive analytics.
- Users with proficiency in handling large datasets, understanding underfitting vs. overfitting, and skilled in optimising log_loss for model generalisation.
Dataset Name Suggestions
- Bank Loan Defaulter Prediction
- Loan Default Risk Assessment
- Customer Credit Risk Dataset
- Financial Loan Outcomes
Attributes
Original Data Source: Loan Default Risk Assessment