Credit Scoring Model Implementation in A Microfinance Context
Credit Scoring Model Implementation in A Microfinance Context
Microfinance Context
Ajsa Terko Emir Zunic Dzenana Donko
Faculty ofElectrical engineering, Faculty ofElectrical engineering, Faculty ofElectr ical engineering,
University ofSarajevo University ofSaraj evo University ofSarajevo
Sarajevo , Bosnia and Herzegovina Sarajevo , Bosnia and Herzegovina Sarajevo , Bosnia and Herzegovina
aterko l @etf.unsa.ba emir.zunic @etf.unsa.ba ddonko @etf.unsa.ba
Abstract-The purpose of the credit scoring process is the assures that clients applying for a loan will be assessed using
classification of the loan as default or non-default trying to a consistent, accurate and fair form of credit risk assessment.
reduce the risk for financial institutions. Paper aims to Moreover, that implies such a process can be considered
illustrate the implementation of a credit scoring model using objective, ensuring no prejudices.
boosting techniques. Specifically, the proposed solution is
implemented using XGBoost algorithm discussing the role of The purpose of this paper is to illustrate how boosting
hyperparameter tuning and feature selection in result algorithms can be applied to solving a problem of credit
optimization. Data used for obtaining performance scores is scoring based on data provided by a microfinance institution
real-world data provided by a microfinance institution based in based in Bosnia and Herzegovina. The proposed solution
Bosnia and Herzegovina. Results suggest that significant should support a decision making the process in the sense of
optimization of XGBoost may be performed, yet, the model approving a loan request. Model implementation will be
fails to outperform typically recommended approaches for followed by equally important feature selection, a
solving credit scoring problem. Given that, it is suggested that hyperparameter tuning, as well as a brief comparison of the
although boosting techniques are increasingly being relied main proposed solution with similar boosting algorithms.
upon, it is unaccountable to make a decision without
understanding the specificity of data and questioning whether In order to achieve its purpose, the paper is structured as
other techniques are more suitable. follows: Firstly it gives a general overview of relevant related
research papers. Such analysis is followed by a description
Keywords-credit scoring, classification algorithms, boosting, of research methodologies, as well as pre-processing
risk analysis, microeconomics methods that were applied. Finally, the data mining
algorithm and their performances are illustrated, and the
I. INTRODUCTION
paper is concluded.
Throughout history, the decision of whether a person
shall be granted a loan by the bank has been subjective. Such n. RELATED WORK
a decision-making process was enhanced in the nineties by A great amount of effort was put into the researching
the introduction of statistical techniques and widespread topic of credit scoring and fmancial risk assessment. A brief
adoption of credit scoring models [1]. overview of several research papers which were analyzed is
A credit scoring is one of the well-known risk model given below.
types used in the classification, i.e., scoring of the credit risk A study [2] gives a brief overview of currently available
for the individuals or corporations. Model's output is an and mostly used statistical-based and artificial intelligence-
assessment of the relative likelihood of a specific credit event based methods for solving a problem of credit scoring.
occurring for the given observable inputs, i.e., the likelihood Logistic regression is pointed out as a superior method in
of credit being repaid in a satisfactory way. In its simplest predicting defaults in practice. As an alternative, the use of
form, a credit score could be the weighted numerical sum of neural networks is mentioned as well. It is highlighted that
the given inputs predicting creditworthiness. Such models although the use of neural network results in the best overall
can be used for a number of activities. This includes but is accuracy rate, the logistic regression approach stands out for
not limited to the approval of new credits, the assessment of its stability and ease of understanding both process and
risk regarding ongoing credits, the prediction of expected outcome.
credit loss, etc.
Similarly, paper [3] offers a detailed comparison of
Typically workflow of the usage of a credit scoring performances of different methods (e.g. neural networks,
model can be illustrated as determining whether a person genetic programming, support vector machine, k-nearest
qualifies for a loan, determining the interest rate or the credit neighbors, etc.) used in credit scoring systems based on the
limit which will be offered to the client, and in general, literature. Average accuracy varies between 69.0% and
determining client's profitability, all based on the obtained 84.7% of correctly classified instances based on
credit score. Naturally, that leads to the credit scoring process approximately 30 collected sources. One recorded
being an important data science topic for the bank industry, performance stands out being 92.7% of correctly classified
especially for it being time-saving and easily instances while using neural networks method.
comprehensible. In general, the use of credit scoring model
°
network (ANN), decision tree, and logistic regression. It days. It is labeled as DEFAULT_Ol where positive value 1
suggests that ANN-linear model outperforms all other denotes credit being default, while negative value denotes
models considering the error rate metrics. Moreover, this credit not being a default. The dataset includes data related
work outlines the use of nearest neighbor imputation to both active and fully repaid credits.
algorithms for handling missing data.
B. Data Exploration and Pre-processing
III. RESEARCH METHODOLOGY
Data exploration and pre-processing are to be performed
A. Description ofa Dataset in order to develop credit scoring model which will rely on
A dataset used for the development of the appropriate relevant attributes and serve its purpose given optimal
data mining model for credit scoring is provided by running time. That implies feature selection can be
Microfmance institution based in Bosnia and Herzegovina. considered one of the crucial steps within the process of
This real-world data was collected based on the approved credit scoring model development.
loan requests in the time period that includes years between Before other data transformations were considered, the
2013 and 2016. It includes 75386 records with 49 attributes problem of missing values was handled. As Fig. 1 suggests,
that include clients' personal data (demographic, fmancial, there were 8 features that exceeded a predefined threshold.
behavioral, etc. characteristics), as well as, data related to Features with a fraction of missing values being above a
clients' credit history. Based on these variables, the defmed threshold of 40% were removed.
economic profile of a client can be built and used within a
predictive model. Fraction of Missing Values Histogram
• Client's household data that include marital status, When it comes to other categories of data, it was
number of children, number of (non)adults, number observed that a number of features with missing rows was
of dependent members of the household, number of low, i.e. it is possible to assume a client's personal data and
pets, etc. credit request data were mandatory. The problem of missing
• Client's financial data that includes total incomes data was handled by replacing string values with the fixed
of household, as well as total expenses of the value "UNKNOWN", while for numerical attributes median
household, etc. was used.
• Client's credit history data that includes a total CSV Reader Cell Spliner Column Filter C~um n Rename String To Number Math FOlmu)a
repayment in client's previous credit repayment -.,. ~!! ...- ., ,.. - ., ,..- " - ., - 1Jo !!:!: ...- .. ?:, fm
cycle, etc.
• Data that describes specific approved credit request Fig. 2. KNIME transformations for handling missing data
that includes its unique id, data of credit approval,
its purpose, principal credit amount, number of Figure 2 shows the set of transformations performed in
installments, installment amount, etc. analytics platform KNIME. Besides actual missing value
handling, date of the birth attribute was transformed into the Figure 5 gives a fmallist of features used by learner and
client's age. Finally, features that uniquely identify credit predictor of a chosen model with their corresponding
request were filtered out to avoid overfitting based on row importance values as generated by FeatureSelector.
unique identifiers.
Cumulative Feature Importance
In order to reach full potential while training prediction
1.0
models, discretization of continuous features was performed,
i.e. continuous values of features were converted to Ql
Ql
Considering the limitations of numerous algorithms ~
iii 0.4
when it comes to handling categorical features, the ordinal S
encoding of categorical features was performed as well. The E
:J 0.2
approach which was used for encoding categorical values U
was a "label encoding" technique provided by Pandas.
0.0
The fmal core concept that was applied within the pre- o
~----=-----=-----:-:-----=------'
5 20
processing stage was feature selection. Its purpose was
defmed as selecting features in such a manner that no Fig. 4. Cumulative importance versus the number offeatures
irrelevant or partially relevant features are incorporated in
model training so there is no negative impact on model
performances. Features were selected based on four INDICATOR_1 25752 0126013 0126013
generalization performances, collinear features are 4 IIUM _OF_INSTAlLMElITS 15:82 0.075881 0 4n892
identified and removed. A number of attributes CREDIT_PRODUCT 1311.8 0 084181 05:2G73
related to residence type, credit purpose, and CREDIT_AMOUNT IOn5 005m6 059:793
household data were found to be highly correlated. AGE 10230 0050059 0 64:857
• Zero importance features: Identifying features with EOU_STATUS 924.5 0045239 0690098
machine learning model is performed i.e. features 10 IIATIONAlITY 785.1 0 037:39 0.7661:3
• Single unique value features: Identifying features 19 CREDIT_PRODUCT_TYPE 2293 0011220 0 970072
with zero variance. As it was the case with zero 2G NO_GUARAIITEE 228 : 0.011178 0 93 1249
importance method, client type was identified by a 21 REGISTERED_BUSINESS 1973 0 009655 0 990903
INDICATOR _l
a lENT_CYCLE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!~••••l
INDICATOR_2
CREDIT_PURPOSE
~ M_OI=_INSTALLMENTS
CREDIT_PROOOCT
CREDIT_AMOUNT
AGE
EDU_STATUS
BUSINESS_OWNER
NATIO NALITY
Q.lENT_STATUS
BUSINESS_TYPE
INCOME_TYPE
~ BAN_ RURAL
Fig. 6. Comparison of three methods : single, bagging and boosting [7]
0.08 0.08
Normalized Importance
In general, ensemble learning refers to techniques
Fig. 3. The plot oftop 15 features and their corresponding normalized
importance values (bagging, boosting, etc.) that combine outputs from multiple
learners, i.e, base learners in order to improve the accuracy
of a single combined prediction. What it means, in general,
is that number of similar or dissimilar algorithms are Hyperparameter tuning was done using the native
combined with the goal to deliver superior results. XGBoost API as it, unlike scikit-learn API of XGBoost
supports fmding the best number of boosting rounds
A boosting algorithm, being one of the ensemble
automatically. Moreover, it has built-in cross-validation
approaches, is understood as a way of improving the
which was used extensively. In order to optimize
performance and accuracy of a model by combining a
performances of the tuning process, DMatrix was used
number of weak learners to form a strong one. This is
making the algorithm more efficient. Tuned hyperparameter
achieved by predictors learning from a mistake which were
values were obtained performing Bayesian optimization. It
made by previous predictors. This points out one of the main
was chosen over approaches such as GridSearch and
differences between bagging and boosting as Fig. 6 suggests
RandomizedSearch. Unlike those methods, Bayesian
- while iterations run in parallel in the case of bagging, each
optimization uses previous evaluation results and limits next
boosting algorithm iteration focuses on the instances which
evaluations to hyperparameter inputs that have done well in
were wrongly classified by prior iterations.
the previous evaluations [10].
The goal is to implement a credit scoring predictive
model in order to predict whether a failure to repay a loan is TABL E I. CONFUSION MATRIX
going to occur. Given its reputation at the data mining Actual Values
competitions, the proposed solution is implemented using FALSE TRUE
XGBoost algorithm. XGBoost which stands for extreme
gradient boosting implements the gradient boosting decision
Predicted I FALSE 15081 505
tree algorithm. The name comes from the fact that it uses the I Values I TRUE 3613 3417
gradient descent algorithm to minimize loss once the new TABLE II. OVERVIEW OF PREDI CTOR 'S PERFORMANCES
model is added [8]. Significant growth of XGBoost
community and its application in both research and industry FALSE TRUE Overall
projects can be attributed to its ease of use, execution speed, Recall 0.968 0.486 -
the possibility for customized evaluation and parameters Precision 0.807 0.871 -
tuning, etc. This approach supports both regression and Sensitivity 0.968 0.486 -
classification predictive modeling problems [9]. When Specificity 0.486 0.968 -
considering the specific use case of this paper, it is F-measure 0.880 0.624 -
important to mention that XGBoost is primarily adequate for Accuracy - - 0.818
solving problems involving structured, tabular data given as Cohen's
a small-to-medium dataset. kappa - - 0.516
~DICAI0R_2 • • • •
Finally, the feature importance based on the trained
model is discussed. When fitting XGBoost model for a CLIENT_CYCLE 1===:: 1
classification problem importance values [11] are calculated CREDIT_PURPOSE • • •
CREOI1 _PROOUC I
• gain, the relative contribution of the corresponding CREDlI _AJ.lOUN I
feature to the model, i.e, for each tree in the model,
which relates to the importance of feature for
generating a prediction;
• coverage, the relative number of observations roU_STATUS
DEMOG_STATUS
Given that, indicators can be interpreted as intermediary ooo
~---:-:c:-----::-:-:--....."..,,.,....--"::,,:,:---:-=--""':"O,,...-J
0 05 0 10 015 020 025 o.JO
expected outputs, yet, both pre-analysis and given results relatrve Imponanc.e
suggest accuracy increases with formalizing a model and Fig. 9. Feature importance calculated by the total gain importance metric
introducing additional features.
XGBoosl Fea lure Impo rtance
v. CONCLUSION AND FUTURE WORK
Paper provides an overview of the implementation of the
credit scoring model using ensemble data mining
techniques, primarily XGBoost as one of the most relevant
representatives of boosting algorithms. Given results are
collected using data provided by microfmance institution
based in Bosnia and Herzegovina. Data includes attributes
varying from personal data, credit's history data, and finally
data on a specific request being recorded, etc.
AGE.
CREDlljJ,IOUNT I -• • • • •
. Throughout recent years, XGBoost built its reputation as
a model that potentially outperforms other models within
EDU_STATUS 1-••••• data mining based competitions. Specifically, XGBoost is
NATIONAlITY . . . . . .
expected to reach its full potential when used for solving
a.ILNI _SIATUS . . . . .
problems involving structured, tabular data. Given that, a
SOC_STATUS • • • • problem presented by this paper can be considered an
BUSINESS_OWNER , -_ _ appropriate use case for XGBoost. When it comes to
~CmE_TYPE implementation and obtaining results, learning and
BUSINESS_I YPE prediction phases are expanded by hyperparameter tuning
RESIDENCE_TYPE and feature selection discussion. Both proved to be
URBAN_RURAl
important steps in modeling and improving performances,
i.e. hyperparameter tuning and feature selection resulted in
an increase of accuracy score of approx. 6%.
CREDIT_PRODUC TJYPE It should be noted that accuracy result of 81.8%
REGISTERED_BUSINESS
compares well to an overview of results from the literature.
When compared to results given by [3], it can be observed
002 004 006 OOS 0 10 0 12 that XGBoost accuracy score outperforms a number of
relaDve unportanc.e approaches including Latent Dirichlet allocation, decision
Fig. 8. Feature importance calculated by the total coverage importance
tree algorithm, k-nearest neighbors algorithm, etc.
metric Additionally, it can be observed that XGBoost did not
outperform models which are generally suggested as the
most appropriate for credit scoring problem and said to [3] R. Sultana, S. Muntaha, D.M. Anisuzzaman, F. Sarker and K. Mamun ,
"Automated Credit Scoring System for Financial Services in
obtain the highest accuracy, i.e. logistic regression, neural Developing Countri es", International Conference On Advanced
networks, and genetic programming. Information And Communication Technology. Bangladesh. 2016 .
[4] C. L. Huang , M. C. Chen, and C. 1. Wang , "Credit scoring with a data
It is important to point out that previously compared mining approach based on support vector machines, " Expert Syst.
scores are not obtained using the same data set and under Appl. ,2007.
constant, controlled training conditions, therefore no clear [5] S. Imtiaz and A. 1., "A Better Comparison Summary of Credit Scoring
and reliable conclusions can be made. At the same time, it Classification," Int. 1. Adv. Comput. Sci. Appl. , 2017 .
[6] Koehr sen, and Will, "A Feature Selection Tool for Machine Leaming
should be suggested that despite its potential and reputation, in Python ," Towards Data Science, 22 June 2018 , Available at:
it is unaccountable to choose XGBoost or similar boosting https ://towardsdatascience .comla-feature-selection-tool-for-machine-
algorithms by default without analyzing the specificity of leaming-in-python-b64dd2371OfU, [Onlin e, Acc essed : 30.05.2019.]
data and whether other techniques are more suitable. [7] "What is the differenc e between bagging and boosting?," QuantDare,
20 April 2016 , Available at: https ://quantdare.comlwhat-is-the-
The main goal for the future work can be defined as difference-between-bagging-and-boostingl, [Online , Accessed:
comparing results obtained by XGBoost with results of 30.05 .2019 .]
[8] N. Vijayalakshmi, and S. Dipayan, "Ensemble Machine Leaming
additional boosting approaches, including recently released Cookbook," Packt Publishing, January 2019 .
models such as LightGBM developed by Microsoft or [9] He, and Tong , "XGBoost: eXtreme Gradient Boosting, " Available at:
CatBoost developed by Yandex. Such work would be http s:llwww .saedsayad.comldocs /xgboost.pdf. [Online, Accessed :
crucial for drawing clear conclusions on general 30 .05.2019.]
performances of boosting algorithms when working with [10] Restrepo, and Mat eo, "Doing XGBoost Hyp er-Parameter Tuning the
Smart Way - Part 1 of 2," Towards Data Science, 29 Aug . 2018,
problems such as credit scoring is. Moreover, such an Available at: https :lltowardsdatascience.comldoing-xgboost-hyper-
upgrade to this work would make it possible to illustrate parameter-tuning-the-smart-way-part-l-of-2-f6d25 5a45dde, [Online ,
structural differences or performance differences between Accessed: 31.05 .2019 .]
mentioned techniques. [11] Abu-Rrnileh, and Amjad , "Be Careful When Interpreting Your
Features Importance in XGBoost!," Towards Data Scienc e, 8 Feb.
REFERENCES 2019 , Available at: https ://towardsdatascience.com/b e-careful-when-
interpreting-your-features-importance-in-xgboost-6e161325 88e7,
[11 1. S. Marquez, "An Introduction to Credit Scoring For Small and
[Online , Accessed: 28.05.2019 .]
Medium Size Enterprises," The World Bank, 2008 .
[21 E. M. Nazr i, E. A. Bakar, Y. Lizar Eddy, E. Muhammad, N. Engku,
and A. Bakar, "Credit scoring models : techniques and issues, " J. Adv.
Res. Bus . Manag. Stud ., 2017 .