Opendatabay APP

Financial Delinquency Prediction Data

Finance & Banking Analytics

Tags and Keywords

Credit

Risk

Finance

Delinquency

Automl

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Financial Delinquency Prediction Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This resource is designed as a structured benchmark for automated machine learning (AutoML) and predictive modeling, specifically focusing on the financial domain. The main objective is credit risk assessment, which involves predicting whether a borrower will experience serious delinquency (90 days or more late on a payment) within the upcoming two years. The data incorporates a blend of crucial personal attributes and financial metrics, enabling users to develop, test, and rigorously evaluate various credit scoring models.

Columns

The dataset includes 10 predictor variables and 1 target variable:
  • rev_util: The ratio of the total outstanding balance on revolving credit lines relative to the total credit limit available on those accounts. This reflects the borrower's utilization of available credit.
  • age: The age of the borrower, measured in years.
  • late_30_59: The count of instances where the borrower was 30 to 59 days past due on a payment, but not worse. This measures short-term delinquency behaviour.
  • debt_ratio: The ratio of the borrower’s monthly debt payments (including alimony and loans) to their monthly gross income, indicating overall debt burden.
  • monthly_inc: The gross income the borrower receives each month.
  • open_credit: The total number of open instalment loans and revolving credit lines the borrower possesses.
  • late_90: The count of times the borrower has been 90 days or more late on a payment, signifying severe delinquency issues.
  • real_estate: The count of credit lines or loans secured by real estate, such as mortgages or home equity lines.
  • late_60_89: The count of times the borrower was 60 to 89 days past due on a payment, providing insight into mid-term delinquency behaviour.
  • dependents: The count of individuals who are financially dependent on the borrower.
  • dlq_2yrs: The binary target variable: 1 if the borrower experienced a serious delinquency in the next two years, and 0 otherwise.

Distribution

The dataset structure is suitable for binary classification tasks, featuring 10 numerical predictors and 1 binary target variable. The data file is titled Credit Risk Benchmark Dataset.csv and has a size of 1.02 MB. While the exact number of rows is not specified, it contains approximately 16.7 thousand valid records. Users are advised to perform exploratory data analysis, manage potential missing values or outliers, and experiment with feature engineering techniques like scaling and transformation.

Usage

This data product is highly suitable for several predictive and analytical applications:
  • Risk Management: Developing and validating robust credit scoring models aimed at forecasting borrower default risks accurately.
  • AutoML Benchmarking: Evaluating and comparing the efficiency and performance of diverse AutoML frameworks on a standardised, industry-relevant financial dataset.
  • Academic Research: Conducting investigations into trends and underlying relationships in credit behaviour, alongside analysing the predictive utility of various financial indicators.
  • Model Interpretability: Given that financial models are heavily regulated, the dataset offers an excellent foundation for testing feature importance and generating explainable AI (XAI) solutions that ensure necessary transparency.

Coverage

The dataset focuses on capturing key demographic (age, dependents) and financial indicators (income, credit usage, debt burden) of individual borrowers. Crucially, it tracks short-, mid-, and long-term delinquency histories. The prediction task relates to the likelihood of severe delinquency over a two-year future period.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Data Scientists: For testing machine learning algorithms, including modern approaches like neural networks and gradient boosting, against classical methods like logistic regression.
  • Financial Analysts: For building models used in internal risk systems and determining optimal lending practices.
  • Researchers: To study the factors that drive serious credit default and the relationships between personal finance variables.

Dataset Name Suggestions

  • Credit Default Risk Benchmark
  • Financial Delinquency Prediction Data
  • Borrower Credit Scoring Indicators
  • Two-Year Delinquency Forecast Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

10/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format