Credit_Defaulter_Classifier_1659348484

Uploaded by

Ishika Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Credit_Defaulter_Classifier_1659348484

Uploaded by

Ishika Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Credit Risk Model

Anubhab Bose
Soumya Ghatak
July 31, 2022

1 Introduction:
Banks play a crucial role in market economies. They decide who can get finance and on what terms
and can make or break investment decisions. For markets and society to function, individuals and
companies need access to credit.
Credit scoring algorithms, which make a guess at the probability of default, are the method banks
use to determine whether or not a loan should be granted. This competition requires participants to
improve on the state of the art in credit scoring, by predicting the probability that somebody will
experience financial distress in the next two years.
The goal of this project is to build a model that borrowers can use to help make the best financial
decisions.
Historical data are provided on 150,000 borrowers .

2 Dataset Variables:
1) SeriousDlqin2yr:Person experienced 90 days past due delinquency or worse

2) RevolvingUtilizationOfUnsecuredLines :Total balance on credit cards and personal lines of

credit except real estate and no installment debt like car loans divided by the sum of credit limits

3) NumberOfTime30-59DaysPastDueNotWorse :Number of times borrower has been 30-59 days

past due but no worse in the last 2 years.

4) DebtRatio :Monthly debt payments, alimony,living costs divided by monthy gross income

5) MonthlyIncome :Monthly income

6) NumberOfOpenCreditLinesAndLoans :Number of Open loans (installment like car loan or

mortgage) and Lines of credit (e.g. credit cards)

7) NumberOfTimes90DaysLate :Number of times borrower has been 90 days or more past due.

8)NumberRealEstateLoansOrLines :Number of mortgage and real estate loans including home

equity lines of credit

9) NumberOfTime60-89DaysPastDueNotWorse :Number of times borrower has been 60-89 days

past due but no worse in the last 2 years.

10)NumberOfDependents :Number of dependents in faIn our original dataset, we have 6.684

1
3 Exploratory Data Analysis And Data Pre-Processing:

•
Generally, younger people were more responsible for defaulting than older people as evident from
the 2nd histogram.
• 2.5 percent of the persons that is roughly 4 lakh people have debt ratio over 3489. From the figures
concerning Debt Ratio, it is evident that there are many outliers. So, we replace values (nearly
20.85 percent of the data set) outside the 3rd quartile+1.5* IQR(1.908) with that particular
value.
• The ’NA’ values in the ’MonthlyIncome’ are repalced by 0 as ’MonthlyIncome’ will be 0 in the
worst case scenario. This will make our model more robust.
• ’NA’ values in ’NumberofDependents’ are replaced by the median of the observations rounded
off to the nearest integer as the observations are positively skewed.

• In our original dataset, we have 6.684 percent defaulters. This is an imbalanced data set. So,
we smote the data set for building a model to identify defaulters. In the smoted dataset, the
percent of defaulters has been raised to 48.35 percent. Also, total number of samples have been
increased to 216744 from 150000.

We divide the dataset into training set and test set in 80:20 fashion and build our supervised
classification models on the training set and validate our results using the AUC score on the test
set.

4 Supervised Models:
1. Random Forest Classifier:We built a random forest classifier with 500 decision trees built
using the bootstrap method and we considered Gini’s Coefficient for measuring the gain in
information. The hyper parameters yielding the best AUC were chosen by Grid Search Cross

2
Validation Technique. The performance of the classifier on the test set was observed as below:
Report Precision Recall f1-Score support
0 0.95 0.78 0.85 28036
1 0.11 0.38 0.16 1964
The ROC Curve along with the AUC value is presentes as below:

AUC:0.576
Accuracy:0.75
The importance of the features from random forest model are observed as below:

3
The three least significant features namely ”Age”,”MonthlyIncome”,”NumberOfOpenCreditLinesAndLoans”
are removed from the dataset and another random forest model is bult on the modified data set
with the same hyper parameters as mentioned above.The performance of the modified classifier
on the test set was observed as below:
Report Precision Recall f1-Score support
0 0.95 0.76 0.85 28036
1 0.13 0.49 0.20 1964
The ROC Curve along with the AUC value is presentes as below:

AUC:0.624
Accuracy:0.74
2. Logistic Classifier: We standartized the training data and built a logistic classifier model on
the standartized training set. The threshold for classification was set as 0.7. The performance
of the logistic classifier on the test set was observed as below:
Report Precision Recall f1-Score support
0 0.97 0.82 0.89 28036
1 0.20 0.64 0.30 1964
The ROC Curve along with the AUC value is presentes as below:

4
AUC:0.73
Accuracy:0.81
3. Decision Tree Classifier:We built decision tree classifier with maximum depth,10 and mini-
mum samples per leaf, 5 and we considered Gini’s Coefficient for measuring the gain in informa-
tion. The hyper parameters yielding the best AUC were chosen by Grid Search Cross Validation
Technique. The performance of the decision tree classifier on the test set was observed as below:
Report Precision Recall f1-Score support
0 0.95 0.78 0.85 28036
1 0.11 0.39 0.17 1964
The ROC Curve along with the AUC value is presentes as below:

AUC:0.624
Accuracy:0.75

5
4. Naive Bayes Classifier: We built a Naive Bayes Classifier to classify defaulters accurately.
The performance of the Naive Bayes classifier on the test set was observed as below:
Report Precision Recall f1-Score support
0 0.94 0.18 0.97 28036
1 0.55 0.07 0.12 1964
The ROC Curve along with the AUC value is presentes as below:

AUC:0.531
Accuracy:0.94
5. KNN Classifier: We plotted the miss classification error for different values of neighbours.

From the plot, it is evident that as number of neighbours increase, miss classification error also
increses. So, we built KNN 1 classifier. The performance of the Naive Bayes classifier on the test

6
set was observed as below:
Report Precision Recall f1-Score support
0 0.96 0.68 0.80 28036
1 0.11 0.56 0.18 1964
The ROC Curve along with the AUC value is presentes as below:

AUC:0.624
Accuracy:0.68

5 Conclusion:
All the supervised classification models with their corresponding AUC scores are listed as below:
Classification Model AUC
Random Forest 0.576
Modified Random Forest 0.624
Logistic 0.73
Decision Tree 0.624
Naive Bayes 0.531
KNN 0.624
The AUC value of the Logistic Classifier is 0.73 which is the highest among all the supervised classifiers
considered.Also, the f1-Score for Defaulters in the Logistic model was 0.64 which suggests that the
Logistic Classifier classifies the actual defaulters in the training set with 64 percent success rate and
this success rate is the highest among all the other classifiers. So the Logistic Classifier should be used
to predict a borrower as defaulter or not based on the portfolio.

6 Refernce:
1. Code: Github Code
2. Dataset: Github Dataset
3. Kaggle Competition: Kaggle Link

IDS 575 Project Report
No ratings yet
IDS 575 Project Report
9 pages
Credit Score Prediction.
No ratings yet
Credit Score Prediction.
3 pages
Credit Risk Analysis
No ratings yet
Credit Risk Analysis
6 pages
Credit_Default_project_23124001
No ratings yet
Credit_Default_project_23124001
13 pages
Machinelearning
No ratings yet
Machinelearning
24 pages
NTCC Seminar Sem6 Prachi Kumari A35400719009
No ratings yet
NTCC Seminar Sem6 Prachi Kumari A35400719009
30 pages
Coser Al. Crisan Albu (T)
No ratings yet
Coser Al. Crisan Albu (T)
17 pages
Credit Risk Management Using ML
No ratings yet
Credit Risk Management Using ML
4 pages
Solution PDF
No ratings yet
Solution PDF
4 pages
Loan Eligibility Prediction
No ratings yet
Loan Eligibility Prediction
12 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Amta Assignment
No ratings yet
Amta Assignment
20 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Final_Project_Title_and_Abstract_Group-3
No ratings yet
Final_Project_Title_and_Abstract_Group-3
5 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
12113667 an Kit
No ratings yet
12113667 an Kit
12 pages
Final Credit Risk Prediction Report Corrected
No ratings yet
Final Credit Risk Prediction Report Corrected
19 pages
Ppa Final Project
No ratings yet
Ppa Final Project
17 pages
DataMining - CaseStudy
No ratings yet
DataMining - CaseStudy
48 pages
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
No ratings yet
C5 IEEE CreditRiskScoringAnalysisBasedonMachineLearningModels
6 pages
ABSTRACT
No ratings yet
ABSTRACT
2 pages
Final Report
No ratings yet
Final Report
69 pages
PA v0.7
No ratings yet
PA v0.7
15 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
HCI ScorecardModel PPT
No ratings yet
HCI ScorecardModel PPT
9 pages
Project Report - ML
100% (1)
Project Report - ML
17 pages
Finclub Summer Project 2 (2025) (1)
No ratings yet
Finclub Summer Project 2 (2025) (1)
7 pages
Project: Creditworthiness: Step 1: Business and Data Understanding
No ratings yet
Project: Creditworthiness: Step 1: Business and Data Understanding
12 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
PREDICTING BANK CREDIT RISK USING DATA MINING Group SIX
No ratings yet
PREDICTING BANK CREDIT RISK USING DATA MINING Group SIX
5 pages
Omkar Gaikwad Project..Suk
No ratings yet
Omkar Gaikwad Project..Suk
23 pages
DS Report 2
No ratings yet
DS Report 2
10 pages
Credit Risk Modeling in R
100% (2)
Credit Risk Modeling in R
66 pages
Modelling-project notes-2
No ratings yet
Modelling-project notes-2
49 pages
1 PB
No ratings yet
1 PB
13 pages
Data Analysis Powerpoint
100% (1)
Data Analysis Powerpoint
17 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Assignment 3 F1_F4
No ratings yet
Assignment 3 F1_F4
19 pages
DECISION MAKING ASSIGNMENT
No ratings yet
DECISION MAKING ASSIGNMENT
6 pages
Credit-card-approval-data-information
No ratings yet
Credit-card-approval-data-information
3 pages
Credit_Loan_Default_Prediction_Based_On_Data_Mining
No ratings yet
Credit_Loan_Default_Prediction_Based_On_Data_Mining
4 pages
Development of A Credit Scoring Model On The Public Report Data From Bondora P2P Lending Platform
No ratings yet
Development of A Credit Scoring Model On The Public Report Data From Bondora P2P Lending Platform
5 pages
PA v0.20
No ratings yet
PA v0.20
17 pages
Vehicle Loan Fraud Prediction Using Data Science and Machine Learning Techniques
No ratings yet
Vehicle Loan Fraud Prediction Using Data Science and Machine Learning Techniques
4 pages
Cluster Credit Risk R PDF
No ratings yet
Cluster Credit Risk R PDF
13 pages
Capstone Project Report v1 - Abhishek Bihani
No ratings yet
Capstone Project Report v1 - Abhishek Bihani
16 pages
PA v0.25
No ratings yet
PA v0.25
18 pages
An automatic credit Analysis model
No ratings yet
An automatic credit Analysis model
12 pages
Banking_Project_final
No ratings yet
Banking_Project_final
38 pages
Credit Card Default Predicati ON: High Level Design
No ratings yet
Credit Card Default Predicati ON: High Level Design
6 pages
WRITEUP
No ratings yet
WRITEUP
2 pages
document (3)
No ratings yet
document (3)
10 pages
Case: German Credit: Var. # Variable Name Description Variable Type Code Description
No ratings yet
Case: German Credit: Var. # Variable Name Description Variable Type Code Description
4 pages
The Failure of Risk Management: Why It's Broken and How to Fix It
From Everand
The Failure of Risk Management: Why It's Broken and How to Fix It
Douglas W. Hubbard
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
Tissue Culture: Group 7
No ratings yet
Tissue Culture: Group 7
37 pages
Salt Compare
No ratings yet
Salt Compare
1 page
SSC Marksheet
No ratings yet
SSC Marksheet
2 pages
5th Sem SEC3T Aerial Photograph (Types, Geometry and Principle) Goutam Kumar Das 08-02-2021
No ratings yet
5th Sem SEC3T Aerial Photograph (Types, Geometry and Principle) Goutam Kumar Das 08-02-2021
26 pages
Microbiological Analysis of Pharmaceutical Products.
No ratings yet
Microbiological Analysis of Pharmaceutical Products.
40 pages
Basic Principles of Critical Pedagogy: January 2011
No ratings yet
Basic Principles of Critical Pedagogy: January 2011
10 pages
KIKA NICOLELA Portfolio 2021
No ratings yet
KIKA NICOLELA Portfolio 2021
25 pages
Discovering Evolutionary Ecology
No ratings yet
Discovering Evolutionary Ecology
220 pages
Industrial Construction Estimating Manual 1st Edition Kenneth Storm - Download the ebook with all fully detailed chapters
100% (1)
Industrial Construction Estimating Manual 1st Edition Kenneth Storm - Download the ebook with all fully detailed chapters
56 pages
Solubility of Salts
No ratings yet
Solubility of Salts
2 pages
Physical Quantities and Unit: 9th Grade
No ratings yet
Physical Quantities and Unit: 9th Grade
28 pages
The Routledge Companion to Feminist Philosophy 1st Edition Ann Garry download pdf
100% (5)
The Routledge Companion to Feminist Philosophy 1st Edition Ann Garry download pdf
55 pages
Canada Revenue Agency ATIP Response
No ratings yet
Canada Revenue Agency ATIP Response
87 pages
PH Mini Controller: With 4-20 Ma Recorder Output
No ratings yet
PH Mini Controller: With 4-20 Ma Recorder Output
1 page
SOCI 5804W - Justin Paulson ONLINE
No ratings yet
SOCI 5804W - Justin Paulson ONLINE
8 pages
Science Orientation
No ratings yet
Science Orientation
16 pages
Michael Frede, The ἐφ ἡμῖν in ancient philosophy
No ratings yet
Michael Frede, The ἐφ ἡμῖν in ancient philosophy
15 pages
What Is Psycholinguistics
No ratings yet
What Is Psycholinguistics
17 pages
Myp-Nv c1 G10u1b Ringo
No ratings yet
Myp-Nv c1 G10u1b Ringo
3 pages
Foundations of Earth Science 7th Edition Lutgens Test Bankinstant download
100% (2)
Foundations of Earth Science 7th Edition Lutgens Test Bankinstant download
50 pages
9 Science EM
No ratings yet
9 Science EM
62 pages
ECONOMICS FOR THE REST OF US Summary
No ratings yet
ECONOMICS FOR THE REST OF US Summary
2 pages
Energies: Ffects of Flame Propagation Velocity and Turbulence
No ratings yet
Energies: Ffects of Flame Propagation Velocity and Turbulence
23 pages
Download Full Where the Millennials Will Take Us A New Generation Wrestles with the Gender Structure 1st Edition Barbara J. Risman PDF All Chapters
100% (4)
Download Full Where the Millennials Will Take Us A New Generation Wrestles with the Gender Structure 1st Edition Barbara J. Risman PDF All Chapters
55 pages
A Foucauldian Study of Power Gender and
No ratings yet
A Foucauldian Study of Power Gender and
13 pages
2401.01028v2
No ratings yet
2401.01028v2
11 pages
PR2 Module 5 Research Title
100% (1)
PR2 Module 5 Research Title
9 pages
River Sediment Management
No ratings yet
River Sediment Management
82 pages
Designed Leadership Moura Quayle download
No ratings yet
Designed Leadership Moura Quayle download
84 pages
06 - Brand Management-2022
No ratings yet
06 - Brand Management-2022
4 pages