Customer Churn Prediction
Customer Churn Prediction
I certify that this dissertation which I now submit for examination for the award of
MSc in Computing (Data Analytics), is entirely my own work and has not been taken
from the work of others save and to the extent that such work has been cited and
acknowledged within the test of my work.
This dissertation was prepared according to the regulations for postgraduate study of
the Technological University Dublin and has not been submitted in whole or part for
an award in any other Institute or University.
The work reported on in this dissertation conforms to the principles and requirements
of the Institute’s guidelines for ethics in research.
i
ABSTRACT
Churned customers identification plays an essential role for the functioning and growth
of any business. Identification of churned customers can help the business to know the
reasons for the churn and they can plan their market strategies accordingly to enhance
the growth of a business. This research is aimed at developing a machine learning
model that can precisely predict the churned customers from the total customers of a
Credit Union financial institution.
A quantitative and deductive research strategies are employed to build a supervised
machine learning model that addresses the class imbalance problem handled feature
selection and efficiently predict the customer churn. The overall accuracy of the
model, Receiver Operating Characteristic curve and Area Under the Receiver
Operating Characteristic Curve is used as the evaluation metrics for this research to
identify the best classifier.
A comparative study on the most popular supervised machine learning methods –
Logistic Regression, Random Forest, Support Vector Machine (SVM) and Neural
Network were applied to customer churning prediction in a CU context. In the first
phase of our experiments, the various feature selection techniques were studied. In the
second phase of our study, all models were applied on the imbalance dataset and
results were evaluated. SMOTE technique is used to balance the data and then the
same models were applied on the balanced dataset and results were evaluated and
compared. The best over-all classifier was Random Forest with accuracy almost 97%,
precision 91% and recall as 98%.
ii
ACKNOWLEDGEMENTS
I would also like to thank DIT and Prof. Luca Longo, M.Sc. thesis coordinator, for
providing me with the opportunity to work on this thesis.
Finally, I would like to thank all my friends and family for all their encouragement,
support and motivation during my studies. Special gratitude to my parents Pradeep
and Chetna, and my husband Nitin for their love, support and encouragement
throughout my studies. This accomplishment would not have been possible without
them.
iii
TABLE OF CONTENTS
ABSTRACT.................................................................................................................II
ACKNOWLEDGEMENTS.......................................................................................III
TABLE OF FIGURES.............................................................................................VII
TABLE OF TABLES..............................................................................................VIII
LIST OF ACRONYMS..............................................................................................IX
1. INTRODUCTION.................................................................................................1
1.1 BACKGROUND.....................................................................................................1
1.2 RESEARCH PROJECT............................................................................................2
1.3 RESEARCH OBJECTIVES.......................................................................................3
1.4 RESEARCH METHODOLOGIES..............................................................................4
1.4.1 Based on type: Primary Vs. Secondary Research...................................4
1.4.2 Based on objective: Qualitative Vs. Quantitative Research...................5
1.4.3 Based on form: Exploratory Vs. Constructive Vs. Empirical.................5
1.4.4 Based on reasoning: Deductive Vs. Inductive Research........................6
1.5 SCOPE AND LIMITATIONS....................................................................................7
1.6 DOCUMENT OUTLINE..........................................................................................7
2. LITERATURE REVIEW.....................................................................................9
2.1 BACKGROUND.....................................................................................................9
2.2 CUSTOMER CHURN PREDICTION.........................................................................9
2.3 DATA EXPLORATION AND PRE-PROCESSING.....................................................11
2.3.1 Class Imbalance....................................................................................12
2.3.2 Feature Selection..................................................................................14
2.4 MACHINE LEARNING.........................................................................................15
2.4.1 Supervised Machine Learning...............................................................16
2.5 MACHINE LEARNING TECHNIQUES...................................................................17
2.5.1 Logistic Regression...............................................................................17
2.5.2 Random Forest......................................................................................18
2.5.3 Support Vector Machine.......................................................................18
2.5.4 Neural Network.....................................................................................19
iv
2.6 MODEL EVALUATION........................................................................................20
2.7 HISTORIC CUSTOMER CHURN PREDICTION.......................................................20
2.8 CUSTOMER CHURN PREDICTION USING MACHINE LEARNING..........................21
2.9 APPROACHES TO SOLVE THE PROBLEM.............................................................22
2.10 SUMMARY, LIMITATIONS AND GAPS IN THE LITERATURE SURVEY..............25
v
4.2.6 Data Splitting........................................................................................54
4.3 MODELLING.......................................................................................................55
4.3.1 Logistic Regression...............................................................................55
4.3.2 Random Forest......................................................................................56
4.3.3 Support Vector Machine.......................................................................57
4.3.4 Neural Network.....................................................................................57
4.4 RESULTS............................................................................................................59
4.5 SECONDARY RESEARCH....................................................................................60
6. CONCLUSION....................................................................................................66
BIBLIOGRAPHY.......................................................................................................70
APPENDIX A.............................................................................................................76
vi
TABLE OF FIGURES
vii
TABLE OF TABLES
viii
LIST OF ACRONYMS
CU Credit Union
CRISP-DM Cross Industry Standard Process for Data Mining
BOI Bank Of Ireland
AIB Allied Irish Bank
ILCU Irish League of Credit Union
SVM Support Vector Machine
CRM Customer Relationship Management
SMOTE Synthetic Minority Oversampling
technique AUC Area Under Curve
ROC Receiver Operating Curve
ANN Artificial Neural Network
SOM Self Organizing Map
DT Decision Tree
MLP Multi Layer Perceptron
TDL Top Decile Lift
EDA Exploratory Data Analysis
RBF Radial Basis Function
RELU Rectified Linear Unit
TP True Positive
FP False Positive
TN True Negative
FN False Negative
TPR True Positive Rate
FPR False Positive Rate
ix
1. INTRODUCTION
1.1 Background
A Credit Union (CU) is a non-profit organisation which exists to serve their members
in Ireland since 1958. They have more than 3.6 million members in Ireland. CUs
functions the same as banks, they accept deposits, provide loans at a reasonable rate of
interest and offer a wide variety of financial services. A CU is a group of people
connected by a ‘common bond’ based on the area they live in, the occupation, or the
employer they work for, who can save together and lend to each other at a fair and
reasonable rate of interest.1 There is CU present based on geographical areas. In every
area, there is one CU present for its members.
CU is different from the banks (BOI, AIB, Ulster bank) in many ways –
1
https://www.creditunion.ie/about-credit-unions/what-is-a-credit-union/
1
The Irish League of Credit Union (ILCU) describes CU as “a group of people who
save together and lend to each other at a fair and reasonable rate of interest”. CU offers
their members the chance to have control over their finances. Regular savings form a
common pool of money, which provides many benefits for members.
The ILCU has an affiliated membership of 351 CUs – 259 in the Republic of Ireland
and 92 in Northern Ireland. In this research, we are using the member/customer data of
one of these CUs to predict customer churn.
Supervised machine learning techniques have been used in customer churn prediction
problems in the past with SVM-POLY using AdaBoost as the best overall model
(Vafeiadis, Diamantaras, Chatzisavvas & Sarigiannidis, 2015). The most common
techniques applied for predicting customer churn are Decision tree, Multilayer
perceptron, and SVM.
2
Techniques that are most commonly used to predict customer churn are neural
networks, support vector machines and logistic regression models. Data mining
research literature suggests that machine learning techniques, such as neural networks
should be used for non-parametric datasets because they often outperform traditional
statistical techniques such as linear and quadratic discriminant analysis approaches
(Zoric, 2016).
Based on the previous literature in this area and for reasons mentioned further on in
this section, four supervised machine learning techniques will be compared when
aiming to predict customer churn, the four techniques are logistic regression, random
forest, SVM and neural network.
Currently, the customer churn is not predicted using any of the machine learning
algorithm techniques for CU members’ data. The Logistic regression model is selected
and in the previous research, it has been observed that SVM and random forest
outperformed logistic regression when predicting customer churn.
The research question is framed as:
“Which supervised machine learning: Logistic regression, Random forest, SVM or
Neural network; can best predict the customer churn of CU with the best accuracy,
specificity, precision, and recall?”
The key objective of the research is to identify whether the Supervised Machine
Learning will help to predict the customer churn rate on CU customer data precisely.
Currently, no specific method has been adopted by CU to identify the customer churn
rate. This research help identify the customers which are more likely to churn and then
3
in turn the customers can focus more on those customers and thus can retain their old
customers which leads to the growth.
1) To collect the required customer data from the business for the research.
2) Understanding the data, identifying any data issues and then rectifying those to
apply machine learning algorithms.
3) Preparing the data using sampling, encoding, feature selection and splitting the
data.
5) Validating the models on the Validation data set and based on evaluation
metrics identifying the best model among all for predicting the customer churn.
6) Then testing the best performance model amongst all supervised models on the
Test data set and then evaluating the results.
Primary Research is also known as field research. The research is done in this to
collect the original data that does not already exist. Secondary research is also known
as desk research which involves the summary, collation and/or synthesis of existing
research.
Here in this research of customer churn prediction of CU, this is a primary type of
research as the research has been done to collect the original data from the financial
4
institute. This research is unique as no such work has been performed on the CU
member dataset.
The current research is Quantitative research which uses data mining, involves the
systematic investigation of customer data and is aimed at developing models, then
verifying the results and then either the hypothesis is accepted or rejected based on the
customer churn precision (Borrego, Douglas & Amelink, 2013).
In Exploratory research, the research is being carried out for a problem that has not
been clearly defined. It helps to determine the best research design, data collection
method. Constructive research referred to a new contribution. A completely new
5
approach or new model or new theory is formulated in this research. It often involves
the proper validation of the research via analytical comparison with predefined
research, benchmark tests. Empirical research refers to the way of gaining knowledge
through direct observation or experience. It involves the process of defining the
hypothesis and then the predictions which can be tested with a suitable experiment.
A deductive approach is a top-down approach which is from the more general to more
specific in which based on the pre-defined theory the hypothesis is defined and then
the conclusion is drawn based on the research.
In Inductive research also known as a bottom-up approach which goes from specific
observation to broader generalizations of theories.
6
and the Supervised Machine learning models were built on the CU customer data to
predict the churned customers. Then the champion model is selected based on the
accuracy of the model.
Python programming language is used for statistical exploration of data, data cleaning,
data preparation, building supervised machine learning models and evaluation of those
models.
The scope of this research is to develop a machine learning model using the CU’s
customer data to predict the customer churn.
The main limitation of the research is that the customer data is obtained from one CU
only so it cannot be the representative of the other CU financial institutions. The
customer base would be different for different CU institutions.
The other limitation of the research is that there are so many DateTime data type
variables present which are not considered for building the classifiers. Also, the data
imbalance is another limitation to overcome as the churned customers were less
common, so less data was provided to the classifiers to study the features of churned
variables.
This thesis report starts with defining and explaining the research problem and
providing the importance of the research problem with the methodologies adopted,
exploiting the problem and purpose of the problem with the proper research question.
7
Chapter 3 (Design and Methodology) describes the design and methodology adopted
to solve the research problem in detail. It follows the CRISP-DM methodology and
each step is carried out and explained in detail in this chapter.
Chapter 4 (Implementation and Results) presents the implementation details and the
results of the implementation. It describes in detail which models are chosen and
which models have performed with proper justification. The hypothesis of the research
is considered, and results are compared, and the hypothesis is evaluated.
Chapter 6 (Conclusion) discusses the research problem with the result obtained and
evaluation. It summarises the research, discusses the contribution of the research
towards the research question. Also, it recommends some future research work in a
similar area.
8
2. LITERATURE REVIEW
This chapter provides a review of the literature available on CUs, Customer Churn
prediction methods, various approaches adopted to solve the problem and evaluation
metrics used for evaluating the models. The chapter concludes with the gaps in the
existing research and forms the objective for the research.
2.1 Background
The term Customer Attrition refers to the customer leaving one business service to
another. Customer Churn Prediction is used to identify the possible churners in
advance before they leave the company. This step helps the company to plan some
required retention policies to attract the likely churners and then to retain them which
in turn reduces the financial loss of the company (Umayaparvathi & Iyakutti, 2012).
Customer churn is a concern for several industries, and it is particularly acute in the
strongly competitive industries. Losing customers leads to financial loss because of
reduced sales and leads to an increasing need for attracting new customers (Guo-en &
Wei-dong, 2008).
9
customer churn due to the sparsity of the data as compared to another domain. This
requires longer investigation periods for churn prediction (Kaya, et.al., 2018).
The economic value of customer retention is widely recognized (Poel & Lariviere,
2004):
(1) Successful customer retention allows organizations to focus more on the needs
of their existing customers instead of seeking new and potentially risky ones.
(2) Long term customers would be more beneficial and, if satisfied, may provide
new referrals.
(3) Long term customers tend to be less sensitive towards a competitive market.
(4) Long term customers become less expensive to serve due to the bank’s
knowledge
(5) Losing customers leads to reduced sales, and increased sales to attract new
customers.
Customer Churn has become a major problem in all industries including the banking
industry and banks have always tried to track customer interaction so that they can
detect the customers who are likely to leave the bank. Customer Churn modeling is
mainly focusing on those customers who are likely to leave and so that they can take
steps to prevent churn (Oyeniyi & Adeyemo, 2015).
In an era of the competitive world, more and more companies do realize that their most
precious asset is the existing customer base and their data. We mainly investigate the
predictors of churn incidence as part of customer relationship management (CRM).
Churn Management is an important task to retain valuable customers.
Previous research indicates that there were two types of targeted approaches to
managing customer churn: reactive and proactive. In a reactive approach, the company
10
waits until the customer asks to cancel their service. In a proactive approach, the
company tries to identify customers who are likely to churn. The company then tries to
retain those customers by providing incentives. If churn predictions are inaccurate then
companies will waste their money on customer churn so the customer churn should be
accurate (Tsai & Lu, 2009).
Data Exploration is required to gain further understanding of the data and business
problem. The CRISP-DM methodology is widely accepted for the Data mining model.
It is mainly for conducting a data mining process, whose life cycle consists of six
phases as shown in the below figure.
2
https://www.kdnuggets.com/2017/01/four-problems-crisp-dm-fix.html
11
task includes selecting relevant data, attribute selection, removing anomalies,
eliminating duplicate records. This stage also deals with filling the missing values,
reducing ambiguity and removing outliers (Zhang, Zhang & Yang, 2003).
This stage is of high importance due to the following:
(1) the real data is impure;
(2) high-performance mining requires quality data;
(3) quality data yields high-quality patterns
Feature Selection
Feature Selection is the process of identifying the fields which are the best for
prediction as a critical process (Hadden, Tiwari, Roy & Ruta, 2005). This step is
important in customer churn prediction. Feature selection is a process of selecting a
subset of original features is an important and frequently used dimensionality reduction
technique for data mining.
In one of the researches done by Khan, Manoj, Singh & Bluemenstock (2015) t-test
was performed separately for each feature, which indicated the extent to which a single
feature can accurately differentiate between people who have churned or not. A Tree-
based method was used for feature selection. This method was useful in producing a
list of correlated predictors.
The feature selection was categorized into two categories based on Label Information
and Search Strategy. The below diagram will detail the division.
13
performance as an evaluation criterion to select features. Algorithms with an embedded
model, e.g., C4.5 and LARS, were the examples of wrapper models which incorporate
variable selection as a part of the training process, and feature relevance was obtained
analytically from the objective of the learning model (Miao & Niu, 2016).
According to researchers Cai, Luo, Wang & Yang (2018) Supervised feature selection
for classification problem using the correlation between the feature and the class label
as its fundamental principle. The correlation between the features were determined and
compared to the threshold to decide if a feature was redundant or not. This method was
an optimal feature selection method which maximized the classifiers accuracy.
14
Figure 2.4: Machine Learning Techniques – Unsupervised and Supervised Learning3
3
https://vitalflux.com/dummies-notes-supervised-vs-unsupervised-learning/
15
Figure 2.5: Supervised Machine Learning Model
(Source: Vladimir, 2017)
Several machine learning techniques have previously been used in similar customer
churn prediction problems.
Logistic Regression was very widely used statistical model used for Customer Churn
and has been proven a powerful algorithm.
The formula in figure 7 below represents logistic regression where 𝑝𝑖 is the probability
and 𝑥𝑖 is the independent variables which predicted the outcome 𝑝𝑖.
16
(Source: Nie, 2011)
Support Vector Machine model is a supervised machine learning model which can be
used for classification as well as regression problems. SVM is mostly used in a
17
classification problem as it can separate two classes using a hyperplane. The objective
of SVM is to find a hyperplane that can distinctly classify the data. Hyperplanes are
decision boundaries that help classify the data points. Support Vectors are data points
that are closer to hyperplane and influence the position and orientation of the
hyperplane.
Neural Networks are a set of algorithms, that are designed to recognize patterns. The
basic building blocks of neural network is neurons. The output depends on the
activation function of the neuron.
The researcher Zoric, 2016 have used neural network model within the software
package Alyuda NeuroInteligence for his research on customer churn prediction in
Banking industry because neural network worked well for pattern recognition, image
processing, optimization problems etc.
18
Another group of researchers Huang, Kechadi, Buckley, Keirnan, Keogh & Rashid,
2010 have proposed the comparison between the popular modelling technique –
Multilayer Perceptron Neural Networks and Decision Tree with the innovative
modelling technique – SVM (Huang, et.al. 2010) for customer churn prediction in the
telecom industry. MLP and SVM were more efficient than Decision Tree.
19
(2) Predictive Analytics – In this, the retention of customers was focused.
The Predictive Analytics is the customer churn analysis which mainly focuses on
retaining the customers.
Now as the time passes the data increases and due to the volume of data is immense it
becomes a daunting task for the data analysts to analyse such huge data. So, then the
customer churn prediction using machine learning and data mining techniques played a
significant role.
Customer churn prediction using machine learning models follow a set of steps. The
data is collected, next, the selected data was pre-processed and transformed into a
suitable form for building a machine learning model. After modelling the testing was
performed and then finally the model was deployed (Kim, Shin & Park, 2005). The
machine learning investigated the data and detects the underlying data patterns for the
customer churn analysis (Kim, Shin & Park, 2005). Using machine learning the
prediction of customer churn was more accurate than the traditional approach.
20
Figure 2.7: Churn Rate Prediction using Machine Learning
(Source: Beker, 2019)
Several features were involved as variables in customer churn analysis. The various
category of variables was customer variables of recency, frequency and monetary
value (RFM), demographic features like the geographical details, cultural information
and age (Senanayake, Muthugama, Mendis & Madushanka, 2015).
Many researchers have worked on the prediction of customer churn. Most of the
research was based on applying machine learning algorithms on customer data and
predicting the customer churn rate. A few of the studies are discussed in this section.
Researchers Guo-en, & Wei-dong, (2008) have applied the machine learning method
SVM on structural risk minimization to predict the customer churn on telecom industry
customer data set. They have analysed the results of the SVM model with an artificial
neural network, decision tree, logistic regression, and naïve Bayesian classifiers. In the
experiment it was found that the SVM has outperformed with best accuracy rate, hit
rate, covering rate and lift coefficient. There were two datasets used in the research and
for SVM model the kernel function was selected using MATLAB 6.5. For the first
dataset the SVM has acquired good results using kernel function as radial basis
function and for the other dataset Cauchy kernel function was used. The SVM model
21
accuracy was calculated as 90% and 59% for dataset 1 and dataset 2 respectively.
Decision Tree C4.5 had the least performance for both the datasets with accuracy as
83% and 52% respectively.
Another study on European financial bank customer data was conducted by Poel &
Lariviere, (2004) using the Cox proportional hazard method to investigate customer
attrition. The focus was on churn incidence. The SAS enterprise miner was used in this
research. They performed the research by combining several different types of
predictors into one comprehensive proportional hazard model. By analysing this bank
customer dataset two critical customer churn periods were identified – firstly the early
years after becoming the customer and a second period is after some 20 years.
Demographic and environmental changes were of major concern and have a great
impact on customer retention. In this research, four retention predictor categories were
used it would have been more advantageous if the data obtained were merged and
would have incorporated in a single retention model instead of four different models.
Hybrid neural networks were built, and the performance was compared with the
baseline ANN model by the researchers Tsai & Lu (2009). The customer churn was
predicted on the American telecom company data. In this research, they have built one
baseline ANN model and two hybrid models by combining the clustering and
classification methods to improve the performance of the single clustering or
classification techniques. It comprised of two learning stages, the first one, was used
for pre-processing the data and the second one for the final output prediction. The two
hybrid models built were ANN+ANN (Artificial Neural Network) and SOM (Self
Organizing Maps) +ANN. These models were evaluated based on the Type I and Type
II error rates and the accuracy of the models. In statistical hypothesis testing a type I
error was the rejection of a true null hypothesis, while type II error was the non-
rejection of a false null hypothesis. The actual results showed that the ANN+ANN
model performed better than both the ANN and SOM+ANN models in terms of Type I
error rates. Also, the prediction accuracy for ANN+ANN hybrid model was better than
that of ANN and SOM+ANN models. Thus, in this research paper hybrid techniques
were performed. The hybrid model with two ANN has performed better when
compared to SOM+ANN hybrid model. Feature selection was not considered in this
research.
22
In one of the research papers on customer churn in the financial industry by researchers
(Kaya, et. al., 2018) they have emphasized more impact on Spatio-temporal features.
They have adopted Random Forest as the classification model for their study and
trained the model with 500 trees and maximum of 2 features per tree. Stratified 8-fold
cross-validation was adopted for evaluation. In this research, Spatio-temporal and
choice features were found more superior than demographic features in financial churn
decision prediction. In this research, it was observed that young people were more
likely to leave the bank. The results of this research suggested that based on mobility,
temporal and choice entropy patterns which can be extracted from customer behaviour
data we can predict the customer churn rate. The evaluation was performed using AUC
ROC evaluation metrics.
Researchers Oyeniyi & Adeyemo (2015) have predicted the customer churn problem
on one of the Nigerian bank datasets and they have used WEKA tool for knowledge
analysis. K-means clustering algorithm was used for clustering phase followed by a
JRip algorithm rule generation phase.
Customer Churn prediction was performed on Personal Handy Phone System Service
by researchers Bin, Peiji & Juan, (2007). They have built a Decision tree and three
experiments were conducted to build an effective and accurate customer churn model.
In this research 180 days data was randomly sampled and utilized in the research for
churn prediction. In the first experiment sub-periods for training data sets were
changed, in the second experiment, the misclassification cost was changed in churn
model and then in the third experiment being conducted sample methods were changed
in the training data sets. In this study in first experiment, the number of sub-periods
were considered as 18, 9, 6 and 3 which means the 180 days call record data is divided
into 18, 9, 6 and 3 parts. In second experiment, the misclassification cost means setting
the proportion of nonchurn and churn customers in training dataset. In third
experiment, various sampling techniques were adopted to balance the dataset. This
research helped in churn prediction and in improving the performance of churn
prediction models. In this study, it has been observed that the performance of the
model was superior when sub-period was set as 18. In the case of misclassification cost
when it was set as 1:2, 1:3 and 1:5 the result was superior and finally in case of sample
method random sample method has yielded the best results in the research.
23
A comparative study on customer churn prediction was performed by Vafeiadis, et.al.
(2016) on telecom data set. The performance comparison of multi-layer perceptron,
Decision Tree, SVM, Naïve Bayes and Logistic regression were compared. All the
models were built and evaluated using cross-validation. Monte Carlo simulations were
used and SVM has outperformed other models with an accuracy of 97% and F-measure
of 84%.
In one of the previous researches, on Customer churn prediction, the researchers have
used the traditional method supervised machine learning algorithms – Decision Tree,
Regression Analysis for prediction and also the Soft Computing methodologies such as
fuzzy logic, neural networks and genetic algorithms (Hadden, Tiwari, Roy & Ruta,
2005).
Sharma & Panigrahi (2011) have performed the customer churn prediction on telecom
dataset using Neural Network. The neural network has yielded better result with
accuracy of 92%. The researcher has focused on changing the number of neurons and
increasing the hidden layers in the neural network model. Feature selection and class
imbalance problem were not considered in the research.
In one of the comparison research paper by Xie, Li, Ngai & Ying (2009) it has been
observed that balanced Random Forest has outperformed the other classifiers ANN,
SVM and DT based on precision and recall.
Most of the research (Xie, et.al., 2009, Sharma, 2011, Vafeiadis, et.al., 2016) were
performed on telecom customer dataset and few research (Oyeniyi & Adeyemo, 2015),
Kaya, et. al., 2018) were performed on financial dataset. Customer churn prediction on
Personal Handy Phone System Service by researchers Bin, Peiji & Juan, L. (2007) was
performed. No research has been focused on the customer churn prediction on CU
financial institute.
24
Currently, the vital and active areas of research in Customer Churn prediction was
using feature selection for data mining purposes (Guo-en & Wei-dong, 2008). Also,
while implementing SVM how to select fitting kernel function and parameter, how to
weigh customer samples (Guo-en & Wei-dong, 2008). For further research, it would be
a challenge to incorporate customer behaviour, customer perceptions, customer
demographics and macroenvironment into one comprehensive retention model (Poel &
Lariviere, 2004). More focus should be emphasized on the pre-processing stage for
better performance, the dimensionality reduction or feature selection. Also, other
domain data sets for churn prediction can be used for further comparisons (Tsai & Lu,
2009). Research should be aligned towards improving the predictive ability of churn
model by using other data mining techniques, for example, neural net, logistic
regression, self-organizing map, support vector machine and so on (Bin, Peiji & Juan,
2007).
Most of the studies were done using archived data. In the existing research not, much
guidance was provided on how to analyse the real-world application dataset. To
address the limitations and research gaps presented in this section, the research was
focused on covering the data pre-processing steps of feature selection by using
correlation technique and extra tree classifier method, handling class imbalance using
SMOTE technique. Further, secondary research was also conducted focusing on a
comparative study on churn prediction of Banking domain dataset (Kumar &
Vadlamani, 2008) with the current research prediction results.
25
3. DESIGN AND METHODOLOGY
In this chapter, the design of the research and the methodology will be explained in
detail to answer the research question. The experiment design followed the CRISP-DM
process in the research lifecycle. Python programming was used to carry out the
experiments of the research.
This research aimed at building and comparing the supervised machine learning
techniques using a CU customer dataset to predict the customer churn rate. The
Logistic Regression, Random Forest, SVM and Neural Network supervised machine
learning models were built, and the results were compared. The secondary research
focused on a comparative study of research results with the existing research paper
results on banking domain (Kumar & Vadlamani, 2008).
The thesis followed the CRISP-DM methodology, and each of the phases are described
in detail below.
26
27
3.1 Data Understanding
The Data Understanding phase deals with the collection of data and data exploration to
get basic insight into the type of data. Some understanding of data was gained in this
phase.
The dataset used in this research was the customer data of the financial institute called
CU. The dataset was completely original, and no statistical research has been done on
this dataset. It consists of the data of all customers who have joined the CU from 1911
to 2019. The dataset has 96967 records of distinct members with 48 features. The
customer churn was defined as the total number of customers who have closed their
accounts. In this research, the customers who are not deceased and whose accounts
were either closed or dormant were considered as churned from CU.
The data was loaded using the pandas library of python. The number of records were
explored, using the info() function the datatype of each independent variables were
identified.
The basic quantitative analysis of the data was carried out. The measures of central
tendency, range, standard deviation, mean, max, min of the variables was measured
here using Descriptive Statistics. Also, the skew and kurtosis of the variables were
measured to check the normality of the variables. Exploratory Data Analysis (EDA)
was performed. The data visualisation was performed using matplotlib and seaborn
python libraries and histogram, box-plot was created to view the data distribution for
checking the normality and to identify the outliers in the variables.
The correlation matrix was built using the Spearman method to identify the correlation
between the dependent and independent variables and to identify the correlation
between dependent variables to avoid multicollinearity.
In the data preparation phase, all activities were performed to convert the raw data into
the final dataset which we can feed into the modelling algorithms and build models.
Various tasks like data cleaning, removing outliers, imputing missing values,
28
construction of new attributes, feature selection and transformation of data all tasks
were performed in this phase.
29
3.2.3 Feature Selection
Feature selection improves the accuracy of the model. It trains the model faster and
reduced the complexity of the model. Another method Tree based classifier was also
implemented to find the most predicting features for feature selection based on a
literature review. It is an ensemble learning method and used to predict the best
features in predicting target.
3.2.4 Encoding
The dataset contains continuous and categorical variables. There are few machine
learning algorithms like SVM and Logistic Regression which accepts only numeric
data. For this reason, the categorical data is converted into 0 and 1 using label
30
encoding. In this dataset a total of 21 variables were categorical variables with True
and False values or some nominal values. These values were transformed into
numerical using sklearn’s4 LabelEncoder function.
In many real-world application class imbalances is the most common data issue. In
such problems, most of the examples are labelled as one class, while fewer examples
are labelled as the other class, usually the important ones. This problem is known as a
class imbalance. Class imbalance problem exists in lots of application domains (Guo,
2016).
The course of action can be taken in the data pre-processing phase of the project was
either the random undersampling or random oversampling which were the data level
methods to handle class imbalance problem. As observed in the previous research by
Maheshwari, Jain & Jadon (2017) both the undersampling and oversampling have
advantages as well as disadvantages. Oversampling can lead to overfitting and lead to
more computation work for large datasets whereas undersampling can lead to the
removal of some significant data records. Here in this research, SMOTE technique was
used to handle class imbalance problem.
3.2.6
3.2.7 Neural Network
Finally, the Neural Network was evaluated. Neural Network is a nonlinear predictive
model which learns through training and the structure is like a biological neural
network. Neurons acts as the basic building blocks of the network. The output depends
on the activation function of the neuron. Here in this research relu activation function
was used based on the previous research. The relu (Rectified Linear Unit) activation
function was computationally less expensive. It takes input and then each input was
multiplied by a weight. Then all the weighted inputs were summed up together with a
31
bias. Finally, the sum was passed through an activation function. The most common
activation function used for Neural Network was ‘Sigmoid’ function. This function is
useful for binary classification as it outputs in the range of 0 and 1 (Zoric, 2016).
In this research, the techniques selected were Logistic Regression, Random Forest,
SVM and Neural Network. All these techniques were used in previous researches on
different datasets for identifying the customer churn problem. This research is adding
value to the previous research.
Here in this research for loop was used to divide the data randomly in train and test
datasets. Each model was fitted every time on the same dataset. The accuracy score
was appended and stored in a list every time the for loop was executed and thus
different accuracy was found each time, we run the model. Finally, the average
accuracy of each classifier was obtained and compared to identify the champion model.
This approach was used so that each model was fitted to the same split of data which
ensured that the model results can be compared. The for loop ensures that the results
were generalised and that the split of data does not have an impact on the model
performance.
32
The dataset consists of 7043 rows and 21 columns, where rows represent the number of
customers in the dataset and the columns represent each customer’s attribute. The attributes
There are 21 columns so we will divide them into independent and dependent columns:-
Independent variables:-
Dependent variables:-
[ ‘Churn’ ]
Now it’s time to start building the artificial neural network, firstly we will import important
#import pandas
import pandas as pd
#import numpy
import numpy as np
#import matplotlib
import matplotlib.pyplot as plt
#import seaborn
import seaborn as sb
So, we import pandas for data analysis, NumPy for calculating N-dimensional
33
array, seaborn, and matplotlib to visualize the data, these all are the basic libraries required
Now we will define our dataset and then we will see our churn dataset for overview.
In this dataset there are 7043 rows and 21 columns are present. There are some categorical
Preprocess Dataset
Now it’s time to preprocess the data, firstly we will observe the dataset, this means we have
to see the data types of the columns, other functionalities, and parameters of each column.
34
First, we check the dataset information using the info() method
df.info()
You can see that the datatypes of each column, number of rows present with non-null values,
there are 2 int, 1 float, and remaining are string datatype columns.
Second, we check the description of the dataset, here we will only visible the num variables
df.describe()
35
Here you can see that describe() method only describe the functionalities of a numerical
variable. From this, we can easily conclude the parameters of each column.
Now we drop unwanted features from our dataset because these unwanted features are like
the garbage they will affect our model accuracy so we drop it.
We drop customerID because it has no meaning in the dataset and we can easily
differentiate each customer using indices of the rows. By dropping this column or
When we note the TotalCharges column then we found that it’s a data type of an object but it
After printing, we found that 11 rows contain” ” empty string which will affect the datatype of
the column, so we convert this into nan values and typecast into float64.
So, now TotalCharges has 11 null values, we have to fill it. let’s do it.
Null values badly affect our model performance because, these null values are irreverent in
nature they are misplaced in the dataset so we have to remove them and replace them with
other values if null values are less, but if it was present in large quantity then we just drop it.
Now we have to check for null values, for this, we use the pandas IsNull() method
which will give True if the null value is present and False when there are no null
values.
37
To handle null values we fill null values of the TotalCharges column with the mean of
Now we will extract the numerical and categorical columns from the dataset for further
processes.
#numerical variables
num = list(df.select_dtypes(include=['int64','float64']).keys())
#categorical variables
cat = list(df.select_dtypes(include='O').keys())
print(cat)
print(num)
['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',
'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
'PaperlessBilling', 'PaymentMethod', 'Churn']
['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges']
Here we create the num variable for numerical columns and cat for the categorical columns
Now we see the value counts of each category in each categorical column.
38
On observation we found that there are multiple columns were having some irrelevant
categories, so we have to just convert it into a useful manner. For we change the “No Phone
Service” category into the “No” category and we do it for all the columns where this “No
So, here we have to handle categorical columns, handle means we have to convert
categorical values into numerical values because while the training model dataset contains all
On observing the count values of the dataset then we found that there are NO and YES are
present, so we have to convert it into 1 and 0 which will be easy to process. For all categorical
Now we importing LabelEncoder from the sklearn which will decode categorical values into
numeric ones.
You can see that all the categorical columns are now typed cast into the numerical values.
The handling of categorical columns is over now we have to scale our data because there are
some columns present where values are much larger which will affect the runtime of the
scale_cols = ['tenure','MonthlyCharges','TotalCharges']
# now we scling all the data
from sklearn.preprocessing import MinMaxScaler
scale = MinMaxScaler()
df[scale_cols] = scale.fit_transform(df[scale_cols])
40
scale_cols contain that columns which are having large numerical values, and with
This is an important step into the model-building part we have to separate all the columns
which are important or by which target values are predicted with the target values which e
have to predict.
Now we start our model training process, first, we have to divide our dataset into dependent
which is to target variable. All the columns except Churn are present in the X variable and
Splitting data
This is the important part is we have to split our data into training and testing parts by which
we do further processes.
Now we have to split our dataset into train and test sets, where the training set is used to train
the model, and the testing set is used for testing the values of targeted columns.
41
We have just imported the train_test_split() method from the sklearn and we set some
parameters where testing size was 30% and the remaining 70% considered as training data.
42
Building Neural Network for Customer Churn Data
Now all our preprocessing and splitting part is our, its time for building the neural network, we
will use TensorFlow and Keras library for building the artificial neural net.
Tensorflow is used for multiple tasks but has a particular focus on the training and inference
of deep neural networks and Keras acts as an interface for the TensorFlow library.
Define Model
Now we have to define our model, which means we have to set the parameters and layers of
the deep neural network which will be used for training the data.
Here we define sequential model, in the sequential model the input, hidden and output layers
are connected into the sequential manner, here we define one input layer which contains all
19 columns as an input, second and third layer is hidden layers which contain 15, 10 hidden
neurons and here we apply RelU activation function. Our last layer is the output layer, as our
output is in the form of 1 and 0 so, we will use the sigmoid activation function.
Now we compile our sequential model and fit the training data into our model.
The compilation of the model is the final step of creating an artificial neural model. The
compile defines the loss function, the optimizer, and the metrics which we have to give into
parameters.
Here we use compile method for compiling the model, we set some parameters into the
compile method.
We fit the training data and set the epochs into the model and in each epoch, our model tries
44
Now we evaluate our model by this we can observe the summary of the model.
As above we are performing scaling on the data, that’s why our predicted values are scaled
so we have to unscale it into normal form for this we write the following program.
45
#
46
3.3 Strengths and Limitation
In this section, the strength and limitation of the Design and Methodology are
discussed in brief.
The feature selection was used which eliminates the irrelevant features and thus it
helped in improving the performance of the model. This also helped to reduce the
training time and avoid overfitting. Another strength was that in this research the
customer’s age, gender and area were also considered and they were the prominent
predictors of identifying the customer churn. The main strength was that in this
research CU members data were used to determine Customer Churn prediction and
there was not much research performed in this area.
The main limitation of the research was that the data was very imbalanced and due to
that the classifiers were more likely to be biased towards the majority class. SMOTE
sampling technique was used to overcome this issue. Also, there were so many Date
time datatype variables present in the dataset which were not taken into consideration
in this research. A time-series model can be built for utilizing the Datetime data type
variables. In this research, a single snap of data was used so it was difficult to build a
time-series model for prediction. For time-series modelling different sets of data with
the proper date was required.
47
orical
The main strength of the research was its ability to precisely identify the customer
churn. The results suggest that the Random Forest model was the best predictor of CU
customer data when compared to Logistic Regression, SVM and Neural Network.
Another strength in this research was that the customer’s age, gender and area were
48
also considered, and they were the prominent predictors of identifying the customer
churn. The main strength of the research was CU members data was used to determine
Customer Churn prediction and there was not much research performed in this area.
Feature selection was used in this research which has increased the accuracy of the
models.
The main limitation of the research was that the data was very imbalanced and due to
that the classifiers were more likely to be biased towards the majority class. Supervised
machine learning models have performed well with an imbalanced dataset as compared
to the balanced dataset. Also, there were so many Date time datatype variables present
in the dataset which were not taken into consideration in this research. Time series
model was not supported in this dataset. Another limitation was that the customers’
data of only one CU was used for this research so it cannot be the representative of the
other CU financial institutions. The customer base would be different for different CU
institutions.
49
4. CONCLUSION
This chapter gives an overview of the research carried out. It summarises the results of
the experiment performed in predicting customer churn. The chapter summarises the
outcome of our research and derives proper interpretation from them. It summarises the
finding with respect to the research question which was set at the beginning of the
research: “Which supervised machine learning: Logistic regression, Random
forest, SVM or Neural network; can best predict the customer churn of CU with
the best accuracy, specificity, precision and recall?”
The goal of this research was to examine the predictive power of Supervised Machine
learning algorithms on CU customer dataset in predicting customer churn. CU is a
financial institution which is owned by its members and it is growing because of its
reasonable rate of interest, as discussed in the literature review. The four Supervised
machine learning algorithms were examined, Logistic Regression, Random Forest,
Support Vector Machine and Neural Network to predict whether the member will be
churned or retained with the institute. These four models were chosen for this research
based on previous research.
The supervised machine learning models aimed to predict whether the customer of CU
will churn or not. Many previous papers were dealt with the customer churn problem
of financial institutions like the bank, telecom industry. The papers reviewed for this
project did not cover CU customer data for churn prediction.
The main objective was to identify the supervised machine learning model with the
best accuracy in predicting the customer churn. Chapter two described previous
research carried out in this area, the various techniques and approaches applied to solve
the problem. Chapter three detailed the method and design approach adopted in the
current research to solve the problem. Chapter four outlined the implementation of the
models. Chapter five outlined the result analysis of the four supervised machine
learning models and compare their performance. The accuracy measure was considered
as the evaluation metrics to get the best model. It was found that the Random Forest
50
technique outperformed the other algorithms for both the experiments performed one
with imbalance dataset and others with balanced dataset using SMOTE technique. In
this research, it was found that all the models have performed better in the imbalanced
dataset on the contrary to the previous research on imbalance dataset. Therefore, the
alternative hypothesis was accepted that a random forest supervised machine learning
model build using the CU customer data, will achieve high accuracy (97%) than the
other supervised machine learning algorithms like Logistic Regression, Support Vector
Machine and Neural Network, to predict the customer churn.
Customer Churn calculation and monitoring are very important in all sectors of an
industry because it is far cheaper to retain old customers than to acquire new ones. CU
is a financial institution owned by its members so churn prediction will be helpful for
them to try to retain their existing members.
The literature review confirmed that many supervised machine learning techniques
have been evaluated in the research area to predict customer churn. The SVM and
Random Forest techniques were seen to be performed with good results for customer
churn prediction in previous research.
This research aimed to determine which supervised machine learning algorithm would
be best in predicting the customer churn on CU member dataset.
Currently, the CU has not adopted any techniques to identify the members who were
likely to leave the institution. Adopting machine learning technique in building and
evaluating the supervised machine learning techniques for CU member dataset has
contributed to gain further insight into the members and helps the CUs to know the
churn prediction.
51
The first step was data exploration, data cleaning. Variables with missing values more
than 60% were not considered in the final dataset. Variables with 2% to 30% missing
values were imputed with mean values for continuous variables and mode value for
categorical variables. Feature Selection was also performed using a correlation matrix
and using Extra tree classifier algorithm. The normal distribution of the continuous
variables was also identified using the histogram and by measuring skew and kurtosis.
Label encoding was performed as for machine learning models string data type was not
accepted.
Finally, the data was divided into train and test data sets with 80% of data as training
and the remaining 20% as test datasets. Four supervised machine learning models
Logistic Regression, Random Forest, SVM and Neural Network were built on the
training data set. All the models were trained on same train dataset picked up randomly
and iterated for 40 times using for loop. They were tested on the test dataset. These
models were evaluated based on accuracy as the evaluation metrics. A ROC curve was
also plotted for the best model. In this research, Random Forest was the best in
predicting the customer churn for CU dataset.
This experiment was performed twice one for imbalanced dataset and the other for a
balanced dataset using SMOTE sampling technique.
The Random Forest model provided the highest accuracy of 97% in imbalance dataset
and 96% accuracy in a balanced dataset. Even the precision and recall percentage was
better for the random forest as compared to other models. All models performed with
better accuracy on imbalanced dataset instead of a balanced dataset in this research.
However, this was not true in the previous research papers.
Most of the customer churn prediction literature were performed on telecom or bank or
app dataset. In this research, the CU dataset was used which is unique. There is not
much research done on CU dataset to identify the customer churn prediction. The
research about CU churn prediction contributes literature for future research.
From the business point of view, it could be helpful for the institution to know the most
likely to leave members. They can increase customer retention. Moreover, retaining an
52
old customer is more beneficial financially than getting new customers. Also, in CU as
the members own the institute so it is hard for the institute to get the trustworthy
members, so it is important for the CU to retain their old customers.
Some future work identified throughout the project, which may be carried out. Here in
this research only one branch of the dataset was explored and analysed. In future,
another branch of CU dataset can be explored. Further research is needed to handle
datetime data type variables.
The four machine learning techniques were used in this project on the CU dataset.
Further other techniques can be explored as well. Different machine learning
algorithms can be explored, and data can be analysed.
Further research can be done to build the time-series model to predict customer churn.
53
BIBLIOGRAPHY
Ahmed, A., Maheshware, D. (2017). Churn prediction on huge telecom data using hybrid
firefly based classification. Egyptian Informatics Journal,18(3) , 215-220.
doi.org/10.1016/j.eij.2017.02.002
Ali, O., Ariturk,U. (2014). Dynamic churn prediction framework with more effective use of
rare event data: The case of private banking. Expert Systems with Applications,
41(17).7889-7903. doi.org/10.1016/j.eswa.2014.06.018
Aliyu, A., Kasim, R., Martin, D. (2011). Impact of Violent Ethno-Religious Conflicts on
Residential Property Value Determination in Jos Metropolis of Northern Nigeria:
Theoretical Perspectives and Empirical Findings. Modern Applied Science, 5(5), 171-
183. doi:10.5539/mas.v5n5p171
Alwis, P., Kumara, B., Hapuarachchi, H. (2018). Customer Churn Analysis and Prediction in
Telecommunication for Decision Making. International Conference on Business
Innovation. 40-45. doi.org/10.1016/0305-0548(93)90063-O
Amin, A., Obeidat,F.,Shah,B., Adnan,A., Loo, J., Anwar,S. (2019).Customer churn prediction
in telecommunication industry using data certainty. Journal of Business Research,94.
290-301. doi.org/10.1016/j.jbusres.2018.03.003
Bin, L., Peiji, S., & Juan, L. (2007). Customer Churn Prediction Based on the Decision Tree in
Personal Handyphone System Service. 2007 International Conference On Service
Systems And Service Management. doi: 10.1109/icsssm.2007.4280145
Bin,L., Peiji,S., Juan,L. (2007).Customer Churn Prediction Based on the Decision Tree in
Personal Handyphone System Service. International Conference on Service Systems
and Service Management, 687- 696. DOI: 10.1109/ICSSSM.2007.4280145
Borrego, M., Douglas, E., & Amelink, C. (2009). Quantitative, Qualitative, and Mixed
Research Methods in Engineering Education. Journal Of Engineering
Education, 98(1), 53-66. doi: 10.1002/j.2168-9830.2009.tb01005.x
Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new
perspective. Neurocomputing, 300, 70–79. doi.org/10.1016/j.neucom.2017.11.077
Dalvi, P., Khandge, S., Deomore, A., Bankar, A., & Kanade, V. (2016). Analysis of customer
churn prediction in telecom industry using decision trees and logistic regression. doi:
10.1109/cdan.2016.7570883
doi.org/10.1016/j.ejor.2011.09.031
54
F.Y, O., J.E.T, A., O, A., J. O, H., O, O., & J, A. (2017). Supervised Machine Learning
Algorithms: Classification and Comparison. International Journal Of Computer
Trends And Technology, 48(3), 128-138. doi: 10.14445/22312803/ijctt-v48p1262016
Symposium On Colossal Data Analysis And Networking (CDAN).
Fabris, F., Magalhães, J., & Freitas, A. (2017). A review of supervised machine learning
applied to ageing research. Biogerontology, 18(2), 171-188. doi: 10.1007/s10522-017-
9683-y
Ganganwar, V. (2012). An overview of classification algorithms for imbalanced
datasets. International Journal of Emerging Technology and Advanced Engineering,
2(4). 42-47.
Gordini,N., Veglio, V.(2017). Customers churn prediction and marketing retention strategies.
An application of support vector machines based on the AUC parameter-selection
technique in B2B e-commerce industry. Industrial Marketing Management, 62,100-
107. doi.org/10.1016/j.indmarman.2016.08.003
Guo-en, X., Wei-dong, J.(2008). Model of Customer Churn Prediction on Support Vector
Machine. SETP Journal Title, 28(1), 71-77. doi.org/10.1016/S1874-8651(09)60003-X
Hadden, J., Tiwari, A., Roy, R., Ruta, D.(2005).Computer assisted customer churn
management:State-of-the-art and future trends. Computers & Operations Research
34(10), 2902-2917. doi.org/10.1016/j.cor.2005.11.007
He,B., Shi,Y., Wan, Q., Zhao, X. (2014). Prediction of Customer Attrition of Commercial
Banks based on SVM Model. Procedia Computer Science,31. 423-430.
doi.org/10.1016/j.procs.2014.05.286
Huang, B. Q., Kechadi, T., Buckley, B.,Keirnan, G.,Keogh, E., Rashid, T. (2010). A new
feature set with new window techniques for customer churn prediction in land-line
telecommunications. Expert Systems with Applications, 37(5). 3657-3665.
doi.org/10.1016/j.eswa.2009.10.025
Idris, A., Rizwan, M., & Khan, A. (2012). Churn prediction in telecom using Random Forest
and PSO based data balancing in combination with various feature selection
strategies. Computers & Electrical Engineering, 38(6), 1808–1819.
doi.org/10.1016/j.compeleceng.2012.09.001
Jahromi,A., Stakhovych,S., Ewing,M. (2014). Managing B2Bcustomer churn, retention and
profitability. Industrial Marketing Management,43(7).1258-1268.
doi.org/10.1016/j.indmarman.2014.06.016
55
Kang, H. (2013). The prevention and handling of the missing data. Korean Journal Of
Anesthesiology, 64(5), 402-409. doi: 10.4097/kjae.2013.64.5.402
Kaya, E., Dong, X., Suhara,Y., Balsicoy, S., Bozkaya, B., Pentland, A. (2018). Behavioral
Attributes and Financial Churn Prediction. EPJ Data Science, 7(1), 1-18.
doi.org/10.1140/epjds/s13688-018-0165-5
Kelleher, J., Mac Namee, B., & D’Arcy, A. (2015). Fundamentals of machine learning for
predictive data analytics: algorithms, worked examples, and case studies. Cambridge,
Massachusetts: The MIT Press, 2015.
Kelleher, J., Mac Namee, B., & D'Arcy, A. (2015). Fundamentals of machine learning for
predictive data analytics. Cambridge (Mass.): The MIT Press.
Khan, M. R., Manoj, J., Singh, A., & Blumenstock, J. (2015). Behavioral Modeling for Churn
Prediction: Early Indicators and Accurate Predictors of Custom Defection and
Loyalty. IEEE International Congress on Big Data. 1-4,
doi.org/10.1109/bigdatacongress.2015.107
Kim, M., Park, M., & Jeong, D. (2004). The effects of customer satisfaction and switching
barrier on customer loyalty in Korean mobile telecommunication
services. Telecommunications Policy, 28(2), 145-159. doi:
10.1016/j.telpol.2003.12.003
Kim, S., Shin, K., & Park, K. (2005). An Application of Support Vector Machines for
Customer Churn Analysis: Credit Card Case. Lecture Notes In Computer Science, 636-
647. doi: 10.1007/11539117_91
KORKMAZ, M., GÜNEY, S. and YİĞİTER, Ş. (2012). The importance of logistic regression
implementations in the Turkish livestock sector and logistic regression
implementations/fields. Harran University, 16(2), 25-36.
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A
review. GESTS International Transactions On Computer Science And Engineering, 30,
1-12.
Kumar, D., & Ravi, V. (2008). Predicting credit card customer churn in banks using data
mining. International Journal Of Data Analysis Techniques And Strategies, 1(1), 4.
doi: 10.1504/ijdats.2008.020020
Lee, H.,Lee, Y.,Cho, H., Im, K.,Kim,Y. (2017). Mining churning behaviors and developing
retention strategies based on a partial least squares (PLS) model. Decision Support
Systems 52(1), 207-216.doi.org/10.1016/j.dss.2011.07.005
56
Maheshwari, S., Jain, R.C.., & Jadon, R.S.. (2017). A Review on Class Imbalance Problem:
Analysis and Potential Solutions. International Journal Of Computer Science
Issues, 14(6), 43-51. doi: 10.20943/01201706.4351
Malhotra, K. (2007). Marketing research – An applied orientation (5th Edn ed.). New Jersey:
Pearson Education.
Manjupriya, R. and Poornima, A. (2018). Customer Churn Prediction in the Mobile
Telecommunication Industry Using Decision Tree Classification Algorithm. Journal of
Computational and Theoretical Nanoscience,15(9).2789-2793.
doi.org/10.1166/jctn.2018.7540
Miao, J., & Niu, L. (2016). A Survey on Feature Selection. Procedia Computer Science, 91,
919-926. doi: 10.1016/j.procs.2016.07.111
Mukaka, M. (2012). Statistics Corner: A guide to appropriate use of Correlation coefficient in
medical research. Malawi Medical Journal: The Journal Of Medical Association Of
Malawi, 24(3), 69-71.
Nashwan, S., & Hassan, H. (2017). Impact of customer relationship management (CRM) on
customer satisfaction and loyalty: A systematic review. Journal Of Advanced Research
In Business And Management Studies, 6(1), 86-107. Retrieved from
https://www.researchgate.net/publication/318206357_Impact_of_customer_relationshi
p_management_CRM_on_customer_satisfaction_and_loyalty_A_systematic_review
Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by
logistic regression and decision tree. Expert Systems With Applications, 38(12), 15273-
15285. doi: 10.1016/j.eswa.2011.06.028.
Oyeniyi, A. O., Adeyemo, A. B. (2015). Customer Churn Analysis In Banking Sector Using
Data Mining Techniques. African Journal of Computing and ICT, 8(3), 165 - 174.
10.1109/IWBIS.2019.8935884
Poel, D., Lariviere, B. (2004). Customer attrition analysis for financial services using
proportional hazard models. European Journal of Operational Research, 196-217.
doi.org/10.1016/S0377-2217(03)00069-9
Pretorius, A., Bierman, S., & Steel, S. (2016). A meta-analysis of research in random forests
for classification. 2016 Pattern Recognition Association Of South Africa And Robotics
And Mechatronics International Conference (PRASA-Robmech)., 1-6. doi:
10.1109/robomech.2016.7813171
Pretorius, A., Bierman, S., & Steel, S. J. (2016). A meta-analysis of research in random forests
for classification. Pattern Recognition Association of South Africa and Robotics and
57
Mechatronics International Conference (PRASA-RobMech).1-10,
doi.org/10.1109/robomech.2016.7813171
Saunders, M., Lewis, P., Thornbill, A. (2009). Research Methods for Business Students (5th
Edn ed.). England: Pearson Education.
Sayed, H., A., M., & Kholief, S. (2018). Predicting Potential Banking Customer Churn using
Apache Spark ML and MLlib Packages: A Comparative Study. International Journal
of Advanced Computer Science and Applications, 9(11).
doi.org/10.14569/ijacsa.2018.091196
Senanayake, D., Muthugama, L., Mendis, L., & Madushanka, T. (2015). Customer Churn
Prediction: A Cognitive Approach. World Academy Of Science, Engineering And
Technology International Journal Of Computer And Information Engineering, 9(3),
767-773. doi:org/10.5281/zenodo.1100190
Shaaban, E., Helmy, Y., Khedr, A., & Nasr, M. (2012). A Proposed Churn Prediction
Model. International Journal Of Engineering Research And Applications
(IJERA), 2(4), 693-697.
Sharma, A., & Kumar Panigrahi, P. (2011). A Neural Network based Approach for Predicting
Customer Churn in Cellular Network Services. International Journal of Computer
Applications, 27(11), 26–31. doi.org/10.5120/3344-4605
Singh, A., Thakur, N., & Sharma, A. (2016). A review of supervised machine learning
algorithms. 2016 3rd International Conference on Computing for Sustainable Global
Development (INDIACom), 1310-1315.
Subramanian, V., Hung, M., Hu, M.(1992). An Experimental Evaluation of Neural Network
for Classification. Computers & Operations Research, 20(7).769-
782.doi.org/10.1016/0305-0548(93)90063-O
Tian, Y., Shi, Y., & Liu, X. (2012). Recent Advances On Support Vector Machines
Research. Technological and Economic Development of Economy, 18(1), 5–33.
doi.org/10.3846/20294913.2012.661205
Tsai, C., Lu, Y. (2009). Customer churn prediction by hybrid neural networks. Expert Systems
with Applications, 36(10), 12547- 12553. doi.org/10.1016/j.eswa.2009.05.032
Umayaparvathi, V., & Iyakutti, K. (2012). Applications of Data Mining Techniques in
Telecom Churn Prediction. International Journal Of Computer Applications, 42(20),
5-9. doi: 10.5120/5814-8122
58
Vafeiadis, T., Diamantaras, K., Chatzisavvas, K., Sarigiannidis, G. (2015). A comparison of
machine learning techniques for customer churn prediction. Simulation Modelling
Practice and Theory, 55, 1-9, doi: 10.1016/j.simpat.2015.03.003.
Van den Poel, D., & Larivière, B. (2004). Customer attrition analysis for financial services
using proportional hazard models. European Journal Of Operational Research, 157(1),
196-217. doi: 10.1016/s0377-2217(03)00069-9
Verbeke, W., Dejaeger,K., Martens, D., Hur,J.,Baesens, B.(2012). New insights into churn
prediction in the telecommunication sector: A profit driven data mining approach.
European Journal of Operational Research 218(1). 211-229.
Wieringa, R., Maiden, N., Mead, N., & Rolland, C. (2005). Requirements engineering paper
classification and evaluation criteria: a proposal and a discussion. Requirements
Engineering, 11(1), 102-107. doi: 10.1007/s00766-005-0021-6
Wirth, R., Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining.
Proceedings of the Fourth International Conference on the Practical Application of
Knowledge Discovery and Data Mining
Xie, Y., Li, X., Ngai, E., & Ying, W. (2009). Customer churn prediction using improved
balanced random forests. Expert Systems With Applications, 36(3), 5445-5449. doi:
10.1016/j.eswa.2008.06.121
Zhang, S., Zhang, C., Yang, Q. (2003). Data Preparation for Data Mining. Applied Artificial
Intelligence,17,375–381.DOI:10.1080/08839510390219264
Zorich, A. (2018). Predicting Customer Churn In Banking Industry Using Neural Networks.
Interdisciplinary Description of Complex Systems, 14. 116-124.
https://doi.org/10.7906/indecs.14.2.1
59
60