0% found this document useful (0 votes)

161 views10 pages

Loan Prediction 10

This document discusses using machine learning models to predict bank loan eligibility by analyzing past customer data. It evaluates models like random forest, XGBoost, Adaboost, Lightgbm, decision tree and KNN. Logistic regression achieved the highest accuracy at 92% and was determined the best model with an F1-score of 96%.

Uploaded by

Ezhilarasi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views10 pages

Loan Prediction 10

Uploaded by

Ezhilarasi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Proceedings of the 7th North American International Conference on Industrial Engineering

and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Predicting Bank Loan Eligibility Using Machine Learning

Models and Comparison Analysis
Miraz Al Mamun
Department of Economics & Decision Sciences
The University of South Dakota
Vermillion, South Dakota, USA
[email protected]

Afia Farjana and Muntasir Mamun

Department of Computer Science
The University of South Dakota
Vermillion, South Dakota, USA
[email protected], [email protected]

Abstract

As people's demands grow, so does the need for bank loans. Every day, banks get many loan applications from
customers and other individuals but not every applicant is accepted. Typically, banks execute a loan application after
verifying and evaluating the applicant's eligibility, which is a time-consuming and challenging process. When
examining loan applications and making credit approval decisions, most banks use their credit score and risk
assessment systems. Despite this, some applicants fail to pay their bills each year, causing financial institutions to lose
a substantial amount of money. In this study, Machine Learning (ML) algorithms are employed to extract patterns
from a common loan-approved dataset and predict deserving loan applicants. Customers' previous data will be used to
undertake the study, including their age, income type, loan annuity, last credit bureau report, Type of organization they
work for, and length of employment. ML methods such as Random Forest, XGBoost, Adaboost, Lightgbm, Decision
tree, and K-Nearest Neighbor were used to discover the maximum relevant features, i.e., the elements that have the
most impact on the prediction output. These mentioned algorithms are compared and assessed against one another
using standard metrics. Among these, Logistic Regression achieved the highest accuracy of 92%. It was also
determined as the best model and performed significantly well better than other machine learning methods in terms of
F1-Score, which is 96%.

Keywords
Loan Sanction, Machine Learning, XGBoost, Adaboost, Lightgbm.

1. Introduction
People prefer to apply for loans on the internet because data is growing daily due to digitization in the financial sector.
Artificial intelligence (AI) is gaining popularity as a common tool for data analysis. Individuals from diverse
businesses are using AI calculations to solve problems based on their sector knowledge. Banks are having a difficult
time getting loans approved. Every day, bank staff are faced with a large number of applications to manage, and the
odds of making a mistake are significant. Almost every bank's fundamental operation is the distribution of loans. The
profit earned from the loans distributed by the bank’s accounts. So, one mistake can make a massive loss to a bank
(Gupta et al 2020).

The primary goal in the banking sector is to place their funds in safe hands. Many banks and financial institutions now
grant loans after a lengthy process of verification and validation, but there is no guarantee that the chosen applicant is
the most deserving of all applicants. We can forecast whether a given applicant is safe or not using our method, and
the entire feature validation process is automated using machine learning techniques. Loan Prediction is extremely
beneficial to both bank employees and applicants (Kumar et al. 2016).

© IEOM Society International 1423

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

The purpose of this paper is to provide a quick, straightforward, and efficient method of selecting qualified applicants.
It may provide the bank with unique benefits. The Loan Prediction System can calculate the weight of each
characteristic involved in loan processing automatically, and the same features are processed according to their
associated weight on new test data. The applicant can be given a deadline to determine whether or not his or her loan
will be approved. The Loan Prediction System allows you to jump to a specific application and review it on a priority
basis [2]. This approach allows you to jump on specific applications that deserve to be accepted first. Gender.

Married, Dependents, Education, Self-Employed, ApplicantIncome, CoapplicantIncome, LoanAmount, Loan Amount

Term, Credit History, Property Area, and Loan Status are some of the features used in the forecast.

There are six subsections in this research report. We've looked at the literature survey in the next part. A brief overview
of our dataset follows that. A machine learning approach is suggested in the next section. The algorithms that were
employed to create the model were then presented. After that, there will be a quick discussion of the findings and
analysis, followed by the conclusion.

2. Literature Survey
A prediction is an assertion about what one believes will occur in the future. Predictions are made all the time. Some
are highly serious and based on scientific calculations, while others are simply guessing. Prediction aids us in a variety
of ways, such as predicting what will happen after a period of time, a year, or 10 years. Predictive analytics is a branch
of advanced analytics that analyzes current data and makes forecasts using a variety of approaches from data mining,
statistics, modeling, machine learning, and artificial intelligence. Kumar Arun et al. (2016) studied how to forecast
how a bank will approve a loan. They presented a model using machine learning technologies such as SVM and neural
networks. This assessment of the literature aided us in carrying out our research and developing a reliable bank loan
prediction model.

Mohammad et al. (2010) proposed a study to predict whether or not a bank would give a loan to a customer. The goal
of the model was to achieve classification; hence using Logistic Regression with sigmoid function was used for
developing the model. The dataset for studying and prediction was obtained from Kaggle and consisted of two data
sets, one for training and the other for testing. To avoid missing values in the data set, the data has to be cleansed first.
After that, performance measures including sensitivity and specificity were used to compare the models. The model
produced an accuracy of 81%, according to the final results. The model was marginally better because it included
variables (such as a customer's age, purpose, credit history, credit amount, credit duration, and so on) other than
checking account information (which indicates a customer's wealth) that should be considered when calculating the
probability of loan default correctly. As a result, by calculating the chance of default on a loan, the suitable customers
to target for loan giving might be simply identified using a logistic regression approach.

Pidikiti et al.( 2019) designed an effective model, the major goal of this paper was to lower the risk element associated
with picking a safe individual to assign the loan in order to save time and money for the bank. There were four sections
to this paper. (i) Data collection (ii) Machine learning model comparison using the data acquired (iii) System training
using the most promising model (iv) Testing. They forecasted loan data using machine learning algorithms such as
classification, logistic regression, Decision Tree, and gradient boosting in this paper. When compared to other
algorithms, the decision tree method was found to be the most accurate, with an accuracy of 82 percent. It was
successful because it produced improved results in classification problem. It was incredibly user friendly, simple to
install, and provided interpretable results.

According to Pandey et al. (2010) predicting loan defaulters is one of the most challenging challenges for any bank.
However, by predicting loan defaulters, banks can significantly reduce their losses by lowering non-profit assets. As
a result, the research of loan approval prediction became crucial. In the prediction of this type of data, machine learning
techniques are extremely important and useful. Four classification-based machine learning algorithms, namely
Logistic Regression, Decision tree, Support vector Machine, and Random Forest, were used in this study, with the
Support Vector Machine approach being the most accurate in predicting loan acceptance with a high accuracy of
79.67%. They gathered a list (dataset) of past client’s information from numerous banks who had backed a series of
boundary advances.

© IEOM Society International 1424

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Ndayisenga et al. (2021) contributed to work with commercial banks to predict the behaviors of borrowers by
developing and testing the accuracy of different models using data from Bank of Kigali. The data was divided intotwo
categories: training and test, with the training dataset accounting for 70% of the total and the test dataset accounting
for 30%. Ensembles were utilized to discover the best machine learning strategies to apply for predicting bank loan
default. Gradient Boosting (Accuracy 80.40 %) was shown to be the best model for predicting bank loan default,
followed by XGBoosting, with decision trees, random forest, and logistic regression performing badly.

In Tejaswini et al. (2020) a robust predictive modeling method was presented to approve or reject loan applications
based on the customers' historical financial and credit scores. The purpose of this paper was to create a quick,
straightforward, and efficient method of selecting qualified applicants. The data was gathered from a variety of
financial institutions. The training data set was provided to the machine learning model, and the model was trained
using that data set. Every new applicant's information entered on the application form serves as a test data set. In this
paper, they used three machine learning methods to predict client loan approval: Logistic Regression (LR), Decision
Tree (DT), and Random Forest (RF). The testing results show that the Decision Tree machine learning algorithm has
a higher accuracy of 82.00 % when compared to Logistic Regression and Random Forest machine learning techniques.

KUMAR (2016) developed a model for predicting whether or not a person will be approved for a loan. The main goal
of this work was to see if a person could acquire a loan or not by analyzing the data with the help of decision tree
classifiers, which provided 76.40% accuracy to forecast. Datasets were acquired from Kaggle and separated into two
categories: existing customers and new customers. Every new applicant's information serves as a set of fact tests.

MADANE et al. (2016) constructed a model using the decision tree induction technique and attempted to analyze
credit score of mortgage loans and applicant requirements. The credit score plays a role in loan approval. They built a
model to predict if loan sanctioning is safe or not, and it was discovered that most low-income applicants are approved
for loans because they are more likely to repay them. The dataset was gathered from online. The model they developed
for bankers in this research would assist them anticipate the trustworthy individuals who have sought for a loan,
boosting the likelihood of maintaining their loans on time.

The authors of Shrishti et al.(2018) proposed a robust machine learning model to predict loan approval. This model's
major goal was to approve loans to applicants in a short amount of time. They used three types of machine algorithms:
Logistic Regression, Decision Tree, and Random Forest. After reviewing the data sets for various models, it was
discovered that the Random Forest algorithm had the highest accuracy of all the models.

A review on machine learning classification strategy for bank loan clearance was proposed by Karthiban ( 2019)..
Almost all applications in today's world are influenced and controlled by machine learning algorithms. Despite the
fact that a number of researchers are working on various machine learning algorithms, the algorithms' performance
and precision remain a difficulty. They obtained data from a bank. This research looked at the performance of various
classification algorithms in terms of precision, recall, and f-measure in order to predict whether or not a bank loan will
be approved. Gradient Boosting outperformed all other classifiers in terms of classification matrices (accuracy,
precision, recall and F-1 score) which showed 98.06% accuracy and F1 score was 99.20% in table 1.

Table 1. Bank Loan Approval prediction model performance analysis

Aurthors (year) Dataset Collection Applied Models Measures (Proposed model)

(samples)

Mohammad et al. (2020) Kaggle (1500 cases) Logistic regression Accuracy:

[Proposed] 81.00%

© IEOM Society International 1425

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Pidikiti Supriya et al. From previous customers Logic regression, Decision Accuracy:
(2019) of Bank (1000 cases and 7 Tree [proposed model]and 82.00%
numerical and 6 Gradient Boosting
categorical attributes.)
Nitesh Pandey et al. From past clients of Logistic Regression, Accuracy:
(2021) different banks Decision tree, Support 79.67%
Vector Machine (SVM) Precision:
[proposed model] and 46.00%
Random forest Recall:
95.00%
F1-Score:
61.00%

Ndayisenga et al. (2021) Bank of Kigali Gradient Accuracy:

Boosting[Proposed 80.40%
model] Precision:
XGBoosting 82.59%
Decision trees Recall:
Random forest, 80.25%
Logistic Regression F1-Score:
81.00%

TejaswiniIn et al. Financial Institution Logistic Regression (LR), Accuracy:

(2020) Decision Tree (DT) 82.00%
[Proposed model] and Precision:
Random Forest (RF) 83.00%
Recall:
82.00%
F1-Score:
75.00%

KUMAR, SOURAV et Kaggle data source Decision Tree (DT) Accuracy:

al.(2021) [Proposed model] 76.40%
Precision:
59.00%
Recall:
79.83%

NIKHIL MADANE et al. Online Decision Tree (DT) Accuracy: 85%

(2019) [Proposed model]

Shrishti et al. Kaggle Logistic Regression, Accuracy:

(2018) Decision tree and 89.22%
Random Forest algorithm
[proposed model]

© IEOM Society International 1426

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Karthiban, R.el at (2019) Bank Logistic Regression, Accuracy:

Decision tree, 98.06%
Naive Bayes, Precision:
Random Forest 99.10%
Deep Learning, Recall:
Gradient Boosting 99.30%
[Proposed model], F1-Score:
Generated linear model 99.20%

3. Proposed Methodology
Data collection is the first step in the suggested methodology and then we moved to the data pre-processing. Using the
standard hold-out approach, the selected classifiers such as XGBoost, AdaBoost, LighGBM, Random Forest, Decision
Tree, and K-Nearest Neighbor are then trained and tested on the provided dataset. To establish the best effective Bank
Loan eligibility prediction method, the findings are computed and analyzed. Figure 1 depicts the overview of the
proposed strategy.

A. Dataset Collection
In this paper, the provided dataset has been collected from the Kaggle online website. This dataset has 10,128 instances,
and 23 attributes, whereas 1 class attribute and 23 attributes are predictive. Proper Bank Loan eligibility prediction is
conducted appropriately using attributes, where the attributes describe the eligibility. The predictive 23 attributes are
associated mainly with the information of a person’s age, gender, educational background, ownership, properties,
financial status, types of income source, credit card information, etc. and the class attribute is bank loan eligibility
prediction.

B. Dataset pre-processing
Dataset pre-processing has been done by using feature extraction, data cleaning, missing values handling, and
categorical variables transformation.

C. Validation process:
Selecting the appropriate validation process for a particular dataset is crucial. The hold-out validation process is one
of the effective methods for getting the appropriate results [12]. We applied the hold-out validation process by holding
70% data on training and 30% data on testing. Using this validation process, we figured out the performance by
confusion matrix and found the results of accuracy, precision, recall, area under curve (AUC) and F1-Score for every
machine learning technique.

© IEOM Society International 1427

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Figure 1. An overview of the study (Bank loan eligibility prediction)

4. Dataset Descriptions and Pre-processing

The bank loan prediction system dataset comes from the Kaggle competition and includes applicants of various ages
and genders. The data set contains twenty-three attributes, such as education, marital status, income, assets, and so on,
as shown in Table 2. There are total of 10,128 applicant records with the values of their relevant attributes in categorical
and numerical data. We handle the missing value and normalize the data through pre-processing and feature
engineering so that we may use it in an ML algorithm. The dataset is separated into two sections: training and testing.
The model is trained using machine learning methods and forecasts the system using test data, as detailed in the
following section.

Table 2. Some of dataset attribute names and information

Variable Name Description of Variable Data Type

Loan ID CLIENTNUM Unique Loan ID Integer

Customer_Age Age of Customer Integer
Gender Male/ Female Character
Dependents Number of dependents Integer
Married Applicant married (Y/N) Character

Education Graduate/ Under Graduate String

Income_Category Income type String

Card_Category Card type String
Self_Employed Self Imployed (Y/N) Character
ApplicantIncome Applicant income Integer
CoapplicantIncome Coapplicant income Integer

© IEOM Society International 1428

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Loan_Amount Loan amount in thousands Integer

Loan_Amount_Term Term of loan in months Integer
Credit_History credit history meets guidelines Integer
Property_Area Urban/ Semi Urban/ Rural String
Loan_Status Loan Approved(Y/N) String

5. Result Analysis
Table 3 shows the average performance of selective machine learning classifiers such as XGBoost, LightGBM,
Adaboost, Decision Tree, Random Forest, and KNN. After that, we looked at the results of the models in Figures 2
and 3. For observing the model's performances, we gave the results of Accuracy, Precision, Recall, F1Score, and AUC
in table 3.

Table 3. Values of different measures for different machine learning classifiers for predicting the Bank loan
eligibility

Model Accuracy Sensitivity Specificity Precision F1-score AUC

XgBoost 0.9180 0.9223 0.4456 0.9969 0.9582 0.74

AdaBoost 0.9187 0.9217 0.4111 0.9976 0.9581 0.74

LightGBM 0.9189 0.9214 0.5316 0.9990 0.9586 0.75

Random forest 0.9188 0.9205 0.75 1.0 0.9586 0.70

Decision tree 0.8497 0.9252 0.1244 0.9088 0.9169 0.53

KNN 0.9167 0.9206 0.1400 0.9975 0.9575 0.54

Figure 2 shows that LightGBM has the highest accuracy score of 91.89 %, while Decision Tree has the lowest accuracy
score of 84.97%. Furthermore, Random forest fared well with a score of 91.88 %. The results for XgBoost, AdaBoost,
and KNN are 91%, 91.87%, and 91.67%, respectively.

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

94
91.89% 91.87% 91.88% 91.80% 91.67%
92
90
88
86 84.97%
84
82
80
LightGBM AdaBoost Random XGBoost KNN Decision
Forest Tree
Machine Learning Model

Figure 2. Accuracy analysis for predicting the Bank Loan eligibility using machine learning models

Accuracy, on the other hand, cannot be the only parameter used to assess model performance. As a result, the AUC
value, which analyzes a model's ability to distinguish between classes, becomes an important metric for evaluating the
model's performance. It's a probability curve that demonstrates how the True Positive Rate and False Positive Rate
compare at different thresholds. The AUC assesses a model's ability to discriminate between positive and negative
classifications. The AUC number should be as high as possible. The values vary from 0 to 1, with 0 being a fully
inaccurate test and 1 representing a completely accurate test. AUC of 0.5 indicates no discrimination (i.e., the ability
to distinguish a customer's eligibility probability or condition to get loan based on the test), 0.7 to 0.8 indicates
acceptable performance, 0.8 to 0.9 indicates excellent results, and more than 0.9 indicates outstanding achievement
for predicting the test. For the above machine learning models, we produced AUC graphs and mean results using
holdout-validation in Figure 3.

Figure 3a. LightGBM Figure 3b.Adaboost Figure 3c. XGboost

Figure 4. Random Forest Figure 5. KNN Figure 6. Decision Tree

Figure 3. Figure 4 and figure 5 , figure 6 Area under curve (AUC) output graph of LightGBM, XGBoost, Adaboost,
Random forest, KNN and Decision Tree

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

As can be seen in Figure 4, LightGBM outperformed other machine learning classifiers in terms of AUC, which was
75 percent. XGboost and Adaboost both did well, scoring 74 percent, which is quite close to LightGBM's. (figure 7).
Furthermore, AUCs of 70%, 54%, and 53% were reached using Random forest, KNN, and Decision tree, respectively.
LightGBM outperformed other machine learning classifiers in terms of overall performance in terms of Accuracy and
AUC.

80 75% 74% 74%

70%
70
60 54% 53%
50
40
30
20
10
0
LightGBM AdaBoost XGBoost Random KN Decision
Forest Tree
Machine Learning Model

Figure 7. Area under curve (AUC) analysis for predicting the Bank Loan eligibility using machine lea rning models

6.Conclusion and Future Scope

Today's fast-growing IT sector requires the development of new technology and the updating of existing technology
that allows us to eliminate human interference and boost job productivity. This model is used for the banking system
or anyone who wants to apply for a loan. Based on the examination of the data, it is apparent that it reduces all frauds
committed during the loan approval process. Time is valuable to everyone, and by doing so, not only the bank, but
also the applicant's waiting time will be reduced. Cleaning and processing of data, imputation of missing values,
experimental analysis of data set, model construction, and testing on test data are all steps in the prediction process.
The best-case accuracy attained on the original data set is 0.9189 on Data set. After analyzing the data, the following
conclusions were drawn: those applicants with the lowest credit scores will be denied a loan since they have a higher
risk of defaulting on the loan. Most of the time, applicants with a high income and requests for a smaller loan are more
likely to be approved, which makes sense because they are more likely to repay their debts. Other factors, such as
gender and marital status, do not appear to be considered by the corporation. This prediction module can be enhanced
and integrated in the future. The system is prepared on the previous training data but in the future, it is possible to
make changes to software, which can accept new testing data and should also take part in training data and predict
accordingly.

7. Reference
Gupta, Anshika, et al. "Bank Loan Prediction System using Machine Learning." 2020 9th International Conference
System Modeling and Advancement in Research Trends (SMART). IEEE, 2020.
Kumar, Arun, Garg Ishan, and Kaur Sanmeet. "Loan approval prediction based on machine learning approach." IOSR
J. Comput. Eng 18.3, 18-21, 2016.
M. A. Sheikh, A. K. Goel and T. Kumar, "An Approach for Prediction of Loan Approval using Machine Learni ng
Algorithm," 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC),
pp. 490-494, 2020.
Supriya, P. Usha et al. “Loan Prediction by using Machine Learning Models.” ,2019.
Loan Approval Prediction using Machine Learning Algorithms Approach. 2021 [Ebook]. Retrieved from
https://ijirt.org/master/publishedpaper/IJIRT151769_PAPER.pdf
Ndayisenga, Theoneste. Bank Loan Approval Prediction Using Machine Learning Techniques. Diss. 2021.
Tejaswini, J., et al. "Accurate loan approval prediction based on machine learning approach." Journal of Engineering
Science vol. 11, no.4, pp. 523-532. 2020.

Proceedings of the 7th North American International Conference on Industrial Engineering
and Operations Management, Orlando, Florida, USA, June 12-14, 2022

KUMAR, SOURAV. "LOAN PREDICTION SYSTEM." 2021.

Nikhil Madane, Siddharth Nanda-Loan Prediction using Decision tree,Journal of the Gujrat Research History, Volume
21 Issue 14s, December , 2019
Shrishti Srivastava, Ayush Garg, Arpit Sehgal, Ashok kumar – Analysis and comparison of Loan Sanction Prediction
Model using Python, International journal of computer science engineering and information technology
research(IJCSEITR), Vol and issue 2, 2018
Karthiban, R. M. Ambika and K. E. Kannammal, "A Review on Machine Learning Classification Technique for Bank
Loan Approval," 2019 International Conference on Computer Communication and Informatics (ICCCI), pp. 1-
6, 2019, doi: 10.1109/ICCCI.2019.8822014.
Yadav S. and S. Shukla, "Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for
Quality Classification," 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 78 -83,
2016,
Mandrekar, J. , Receiver Operating Characteristic Curve in Diagnostic Test Assessment. Journal Of Thoracic
Oncology,vol. 5, no.9, pp. 1315- 1316. 2010.

Biographies
Miraz Al Mamun is a recent graduate of the University of South Dakota (USD) and currently serves as Applications
Support Analyst at Sanford Health under practical training. Sanford Health is a non-profit research affiliated
organization with the University of South Dakota. At USD he studied Master of Science in Business Analytics. He
earned bachelor’s degree in business administration from North South University, Bangladesh. He also worked as a
Graduate Research Assistant at the Beacom School of Business from 2019 to 2021. Under Beacom School of Business
he had experience of working on different research projects on data analytics using machine learning methods.
Currently, he is involved in several research utilizing machine learning models for business process automation and
his research interests include financial fraud detection, customer privacy, customer churn modeling, customer
sentiment analysis and business process automation.

Afia Farjana is a current graduate student of Computer Science at the University of South Dakota. She is working as
a Research Assistant at the department of computer science and involved with different research project utilizing
Machine Learning Algorithms. She completed Bachelor of science in Computer Science from American International
University of Bangladesh (AIUB), Bangladesh. She has worked for sentimental analysis which is a survey on Machine
learning for emotion and mental health detection, analysis, visualization using COVID-19 Social Media data. Apart
from that recently she is doing her thesis related to federated learning on lung sound analysis. Her research interest
includes data privacy, analysis of customer sentiment in business sector, image processing, pattern recognitions.

Muntasir Mamun is a Graduate student of University of South Dakota in Computer Science Department. Currently,
he is working as Research and Teaching Assistant in University of South Dakota. He completed his bachelor’s in
electrical and Electronic Engineering at American International University of Bangladesh. However, he completed his
research thesis and work on Covid-19 screening using Machine learning and Deep learning methods by cough sounds.
This research work is already accepted in Springer Nature conference and another review work is accepted in peer J
journal (impact factor:2.98). Currently, he is doing some research work on lung cancer and heart diseases predicting
model using ensemble learning techniques. Apart from that, he has some other research publications in IEEE Xplore
about Nanotechnology.

Kumar Sunil - Python For Accounting and Finance. An Integrative Approach To Using Python For Research
No ratings yet
Kumar Sunil - Python For Accounting and Finance. An Integrative Approach To Using Python For Research
502 pages
2023-Data Analytics For Non-Life Insurance Pricing
No ratings yet
2023-Data Analytics For Non-Life Insurance Pricing
240 pages
One Page Talent Management
100% (1)
One Page Talent Management
25 pages
Modern Technology Usage and Education Modality Among SHS Students of Baguio National School of Arts and Trades
No ratings yet
Modern Technology Usage and Education Modality Among SHS Students of Baguio National School of Arts and Trades
5 pages
Ppjfkmkno
100% (1)
Ppjfkmkno
249 pages
Track Mobile Location
100% (1)
Track Mobile Location
14 pages
Ultimate Python for Fintech Solutions
From Everand
Ultimate Python for Fintech Solutions
Bhagvan Kommadi
No ratings yet
POE-2347-PV
No ratings yet
POE-2347-PV
4 pages
HMTH216 Notes
No ratings yet
HMTH216 Notes
73 pages
Chung-Ki Min - Applied Econometrics - A Practical Guide (Routledge Advanced Texts in Economics and Finance) - Routledge (2019)
No ratings yet
Chung-Ki Min - Applied Econometrics - A Practical Guide (Routledge Advanced Texts in Economics and Finance) - Routledge (2019)
313 pages
Application of Data Science
No ratings yet
Application of Data Science
8 pages
Online Loan Management System
No ratings yet
Online Loan Management System
5 pages
Statistics - II Regression - For - Predictive - Modeling - CourseNotes PDF
No ratings yet
Statistics - II Regression - For - Predictive - Modeling - CourseNotes PDF
266 pages
2022 V13i1198
No ratings yet
2022 V13i1198
12 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Basel Implementation Issues PDF
No ratings yet
Basel Implementation Issues PDF
16 pages
Chapter-4 Designing The Course - PDF Format
No ratings yet
Chapter-4 Designing The Course - PDF Format
41 pages
The Image of Nursing
100% (1)
The Image of Nursing
26 pages
2019 SAT Question-Answer Student Guide Final PDF
0% (5)
2019 SAT Question-Answer Student Guide Final PDF
24 pages
BAM 062: Project Management Student Activity Sheet Module No.3 BSAIS 3 A1-01 A. Lesson Preview/Review
No ratings yet
BAM 062: Project Management Student Activity Sheet Module No.3 BSAIS 3 A1-01 A. Lesson Preview/Review
2 pages
g11 Bspsych2a Final Chapter One
No ratings yet
g11 Bspsych2a Final Chapter One
13 pages
Is Attention All You Need For Intraday Forex Tradi
100% (2)
Is Attention All You Need For Intraday Forex Tradi
19 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Introduction To Philosophy
No ratings yet
Introduction To Philosophy
4 pages
CERTIFICATE
No ratings yet
CERTIFICATE
4 pages
Sanet ST
No ratings yet
Sanet ST
385 pages
Automatic Extraction and Identification of Chart Patterns Towards Financial Forecast
No ratings yet
Automatic Extraction and Identification of Chart Patterns Towards Financial Forecast
12 pages
The Vision, The Tool, and The Project: Scikit
No ratings yet
The Vision, The Tool, and The Project: Scikit
75 pages
Predicting Personal Loan Approval Using Machine Learning Handbook
No ratings yet
Predicting Personal Loan Approval Using Machine Learning Handbook
31 pages
Financial Engineering Interview Questions Part 1 1657202878
No ratings yet
Financial Engineering Interview Questions Part 1 1657202878
24 pages
Deep Reinforcement Learning PDF
No ratings yet
Deep Reinforcement Learning PDF
150 pages
Characteristics of Good Language Learners
No ratings yet
Characteristics of Good Language Learners
7 pages
Importance of BRAIN BREAKS
No ratings yet
Importance of BRAIN BREAKS
3 pages
Ncte
No ratings yet
Ncte
18 pages
Innovation Strategy
0% (1)
Innovation Strategy
17 pages
Trainee Handout: Personal Safety & Social Responsibility
No ratings yet
Trainee Handout: Personal Safety & Social Responsibility
56 pages
ML Unit 1 Notes
100% (1)
ML Unit 1 Notes
19 pages
Data Mining in Insurance
No ratings yet
Data Mining in Insurance
9 pages
Catastrophe Modeling For Commercial Lines - Lalonde
No ratings yet
Catastrophe Modeling For Commercial Lines - Lalonde
32 pages
Smart Business Problems and Analytical Hints
From Everand
Smart Business Problems and Analytical Hints
Zemelak Goraga
No ratings yet
Pedagogy of Experience-Field Trips
No ratings yet
Pedagogy of Experience-Field Trips
17 pages
Syllabus in MAS 101
No ratings yet
Syllabus in MAS 101
6 pages
Derivatives and Risk Management
0% (1)
Derivatives and Risk Management
82 pages
Track Location 1
No ratings yet
Track Location 1
6 pages
Stock Market Prediction Using Reinforcement Learning With Sentiment Analysis
No ratings yet
Stock Market Prediction Using Reinforcement Learning With Sentiment Analysis
20 pages
Track Location: Project Guided By: Project Done by
No ratings yet
Track Location: Project Guided By: Project Done by
5 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
Developing Machine Learning Applications With TensorFlow
No ratings yet
Developing Machine Learning Applications With TensorFlow
22 pages
Analysis and Linear Algebra For Finance Part I
No ratings yet
Analysis and Linear Algebra For Finance Part I
127 pages
Growth - Fixed Mindset - Short Video Analysis
No ratings yet
Growth - Fixed Mindset - Short Video Analysis
2 pages
XL Wings
No ratings yet
XL Wings
214 pages
Predicting Credit Risk For Unsecured Lending
No ratings yet
Predicting Credit Risk For Unsecured Lending
9 pages
Research RRL
No ratings yet
Research RRL
2 pages
Forecast Time Series With R Language
No ratings yet
Forecast Time Series With R Language
98 pages
How To Help Players Watch Film
No ratings yet
How To Help Players Watch Film
7 pages
Importance of Outcome Based Education (OBE) To Advance Educational Quality and Enhance Global Mobility
No ratings yet
Importance of Outcome Based Education (OBE) To Advance Educational Quality and Enhance Global Mobility
10 pages
Sudoku Solver PDF
No ratings yet
Sudoku Solver PDF
16 pages
Animatic Storyboard Project
No ratings yet
Animatic Storyboard Project
9 pages
106 - Machine Learning and Credit Risk Modelling
100% (1)
106 - Machine Learning and Credit Risk Modelling
8 pages
Curriculum (Reflection)
50% (2)
Curriculum (Reflection)
1 page
Rating Models and Its Applications, Setting Credit Limits
No ratings yet
Rating Models and Its Applications, Setting Credit Limits
16 pages
L1 - Machine Learning For Finance
100% (1)
L1 - Machine Learning For Finance
131 pages
Vasicek Model
No ratings yet
Vasicek Model
4 pages
ML Canvas
No ratings yet
ML Canvas
1 page
Lecture 3 EdgeDetection
No ratings yet
Lecture 3 EdgeDetection
52 pages
Accident Investigation Root Cause Analysis
No ratings yet
Accident Investigation Root Cause Analysis
4 pages
Episode 7 Name of Website Topics
No ratings yet
Episode 7 Name of Website Topics
2 pages
IJERT Data Analysis Using Python
No ratings yet
IJERT Data Analysis Using Python
6 pages
A Review On Credit Card Default Modelling Using Data Science
No ratings yet
A Review On Credit Card Default Modelling Using Data Science
7 pages
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
No ratings yet
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
6 pages
FXCM
No ratings yet
FXCM
1 page
Econometric Methods With Applications in Business
No ratings yet
Econometric Methods With Applications in Business
9 pages
Python Markov Decision Process Toolbox Documentation: Release 4.0-b4
No ratings yet
Python Markov Decision Process Toolbox Documentation: Release 4.0-b4
44 pages
TOGAF ADM Software Tool
No ratings yet
TOGAF ADM Software Tool
15 pages
Lab Experiment Name: - : Rubric
No ratings yet
Lab Experiment Name: - : Rubric
1 page
Electoral College Lesson Plan
No ratings yet
Electoral College Lesson Plan
3 pages
CBM Basel III - Group 7
No ratings yet
CBM Basel III - Group 7
16 pages
Long Term Wave Statistics
No ratings yet
Long Term Wave Statistics
8 pages
Anomaly Detection in Images CIFAR-10
No ratings yet
Anomaly Detection in Images CIFAR-10
9 pages
Literature Survey
No ratings yet
Literature Survey
3 pages
BSF 4230 - Advanced Portfolio Management - April 2022
No ratings yet
BSF 4230 - Advanced Portfolio Management - April 2022
8 pages
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
No ratings yet
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
108 pages
Forecasting Default With The KMV-Merton Model
No ratings yet
Forecasting Default With The KMV-Merton Model
35 pages
RPH Mat THN 3 DLP 2.3.1
No ratings yet
RPH Mat THN 3 DLP 2.3.1
2 pages
Credit Concentration Risk
No ratings yet
Credit Concentration Risk
15 pages
Enterprise Credit Risk Evaluation Based On Neural Network Algorithm
No ratings yet
Enterprise Credit Risk Evaluation Based On Neural Network Algorithm
8 pages
C Boe Taxes and Investing
No ratings yet
C Boe Taxes and Investing
27 pages
Data Visualization Ebook
No ratings yet
Data Visualization Ebook
15 pages
12 Financial Modelling & Valuation THEMED TRACK
No ratings yet
12 Financial Modelling & Valuation THEMED TRACK
6 pages
Pattern Recognition and Machine Learning Errata and Additional Comments
0% (1)
Pattern Recognition and Machine Learning Errata and Additional Comments
7 pages
Financial Modeling Case Study (Enercon)
No ratings yet
Financial Modeling Case Study (Enercon)
2 pages
PERDEV DLL 2nd Q SEPT.9-13,2019
100% (3)
PERDEV DLL 2nd Q SEPT.9-13,2019
4 pages
Machine Learning by Joerg Kienitz
No ratings yet
Machine Learning by Joerg Kienitz
5 pages
pr2 Chapter3
No ratings yet
pr2 Chapter3
8 pages
MFE Formulas
No ratings yet
MFE Formulas
7 pages
SAgarwal CV April 2011
No ratings yet
SAgarwal CV April 2011
2 pages
CV Kulbir Minhas
No ratings yet
CV Kulbir Minhas
3 pages

Loan Prediction 10

Uploaded by

Loan Prediction 10

Uploaded by

Proceedings of the 7th North American International Conference on Industrial Engineering

and Operations Management, Orlando, Florida, USA, June 12-14, 2022

Predicting Bank Loan Eligibility Using Machine Learning

Afia Farjana and Muntasir Mamun

© IEOM Society International 1423

Married, Dependents, Education, Self-Employed, ApplicantIncome, CoapplicantIncome, LoanAmount, Loan Amount

© IEOM Society International 1424

Table 1. Bank Loan Approval prediction model performance analysis

Aurthors (year) Dataset Collection Applied Models Measures (Proposed model)

Mohammad et al. (2020) Kaggle (1500 cases) Logistic regression Accuracy:

© IEOM Society International 1425

Ndayisenga et al. (2021) Bank of Kigali Gradient Accuracy:

TejaswiniIn et al. Financial Institution Logistic Regression (LR), Accuracy:

KUMAR, SOURAV et Kaggle data source Decision Tree (DT) Accuracy:

NIKHIL MADANE et al. Online Decision Tree (DT) Accuracy: 85%

Shrishti et al. Kaggle Logistic Regression, Accuracy:

© IEOM Society International 1426

Karthiban, R.el at (2019) Bank Logistic Regression, Accuracy:

© IEOM Society International 1427

Figure 1. An overview of the study (Bank loan eligibility prediction)

4. Dataset Descriptions and Pre-processing

Table 2. Some of dataset attribute names and information

Variable Name Description of Variable Data Type

Loan ID CLIENTNUM Unique Loan ID Integer

Education Graduate/ Under Graduate String

Income_Category Income type String

© IEOM Society International 1428

Loan_Amount Loan amount in thousands Integer

Model Accuracy Sensitivity Specificity Precision F1-score AUC

XgBoost 0.9180 0.9223 0.4456 0.9969 0.9582 0.74

AdaBoost 0.9187 0.9217 0.4111 0.9976 0.9581 0.74

LightGBM 0.9189 0.9214 0.5316 0.9990 0.9586 0.75

Random forest 0.9188 0.9205 0.75 1.0 0.9586 0.70

Decision tree 0.8497 0.9252 0.1244 0.9088 0.9169 0.53

KNN 0.9167 0.9206 0.1400 0.9975 0.9575 0.54

© IEOM Society International 1429

Figure 3a. LightGBM Figure 3b.Adaboost Figure 3c. XGboost

Figure 4. Random Forest Figure 5. KNN Figure 6. Decision Tree

© IEOM Society International 1430

80 75% 74% 74%

6.Conclusion and Future Scope

© IEOM Society International 1431

KUMAR, SOURAV. "LOAN PREDICTION SYSTEM." 2021.

© IEOM Society International 1432

You might also like