0% found this document useful (0 votes)
365 views

Customer Churn Prediction

The document discusses developing a machine learning model to predict customer churn for a credit union. It describes preparing data through techniques like handling missing values, normalization, feature selection, and sampling to address class imbalance. Models like logistic regression, random forest, SVM and neural network are applied and evaluated on balanced and imbalanced data. The best performing model is random forest with high accuracy, precision and recall.

Uploaded by

Jyoti Upadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
365 views

Customer Churn Prediction

The document discusses developing a machine learning model to predict customer churn for a credit union. It describes preparing data through techniques like handling missing values, normalization, feature selection, and sampling to address class imbalance. Models like logistic regression, random forest, SVM and neural network are applied and evaluated on balanced and imbalanced data. The best performing model is random forest with high accuracy, precision and recall.

Uploaded by

Jyoti Upadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 70

DECLARATION

I certify that this dissertation which I now submit for examination for the award of
MSc in Computing (Data Analytics), is entirely my own work and has not been taken
from the work of others save and to the extent that such work has been cited and
acknowledged within the test of my work.

This dissertation was prepared according to the regulations for postgraduate study of
the Technological University Dublin and has not been submitted in whole or part for
an award in any other Institute or University.

The work reported on in this dissertation conforms to the principles and requirements
of the Institute’s guidelines for ethics in research.

Signed: Deepshikha Wadikar

Date: 05 January 2020

i
ABSTRACT

Churned customers identification plays an essential role for the functioning and growth
of any business. Identification of churned customers can help the business to know the
reasons for the churn and they can plan their market strategies accordingly to enhance
the growth of a business. This research is aimed at developing a machine learning
model that can precisely predict the churned customers from the total customers of a
Credit Union financial institution.
A quantitative and deductive research strategies are employed to build a supervised
machine learning model that addresses the class imbalance problem handled feature
selection and efficiently predict the customer churn. The overall accuracy of the
model, Receiver Operating Characteristic curve and Area Under the Receiver
Operating Characteristic Curve is used as the evaluation metrics for this research to
identify the best classifier.
A comparative study on the most popular supervised machine learning methods –
Logistic Regression, Random Forest, Support Vector Machine (SVM) and Neural
Network were applied to customer churning prediction in a CU context. In the first
phase of our experiments, the various feature selection techniques were studied. In the
second phase of our study, all models were applied on the imbalance dataset and
results were evaluated. SMOTE technique is used to balance the data and then the
same models were applied on the balanced dataset and results were evaluated and
compared. The best over-all classifier was Random Forest with accuracy almost 97%,
precision 91% and recall as 98%.

Key words: Credit Union, Churn Prediction, Supervised Machine Learning,


Classification, Sampling, Feature Selection.

ii
ACKNOWLEDGEMENTS

I would first like to express my sincere thanks to my supervisor Prof. Vincent


McGrady for providing me with the data for my research. His immense knowledge,
continuous support, guidance and advice throughout the project helped me and
encouraged me a lot to do better in my thesis writing. You are an amazing mentor and
without your support, this thesis would not have been possible.

I would also like to thank DIT and Prof. Luca Longo, M.Sc. thesis coordinator, for
providing me with the opportunity to work on this thesis.

Finally, I would like to thank all my friends and family for all their encouragement,
support and motivation during my studies. Special gratitude to my parents Pradeep
and Chetna, and my husband Nitin for their love, support and encouragement
throughout my studies. This accomplishment would not have been possible without
them.

iii
TABLE OF CONTENTS
ABSTRACT.................................................................................................................II

ACKNOWLEDGEMENTS.......................................................................................III

TABLE OF FIGURES.............................................................................................VII

TABLE OF TABLES..............................................................................................VIII

LIST OF ACRONYMS..............................................................................................IX

1. INTRODUCTION.................................................................................................1

1.1 BACKGROUND.....................................................................................................1
1.2 RESEARCH PROJECT............................................................................................2
1.3 RESEARCH OBJECTIVES.......................................................................................3
1.4 RESEARCH METHODOLOGIES..............................................................................4
1.4.1 Based on type: Primary Vs. Secondary Research...................................4
1.4.2 Based on objective: Qualitative Vs. Quantitative Research...................5
1.4.3 Based on form: Exploratory Vs. Constructive Vs. Empirical.................5
1.4.4 Based on reasoning: Deductive Vs. Inductive Research........................6
1.5 SCOPE AND LIMITATIONS....................................................................................7
1.6 DOCUMENT OUTLINE..........................................................................................7

2. LITERATURE REVIEW.....................................................................................9

2.1 BACKGROUND.....................................................................................................9
2.2 CUSTOMER CHURN PREDICTION.........................................................................9
2.3 DATA EXPLORATION AND PRE-PROCESSING.....................................................11
2.3.1 Class Imbalance....................................................................................12
2.3.2 Feature Selection..................................................................................14
2.4 MACHINE LEARNING.........................................................................................15
2.4.1 Supervised Machine Learning...............................................................16
2.5 MACHINE LEARNING TECHNIQUES...................................................................17
2.5.1 Logistic Regression...............................................................................17
2.5.2 Random Forest......................................................................................18
2.5.3 Support Vector Machine.......................................................................18
2.5.4 Neural Network.....................................................................................19

iv
2.6 MODEL EVALUATION........................................................................................20
2.7 HISTORIC CUSTOMER CHURN PREDICTION.......................................................20
2.8 CUSTOMER CHURN PREDICTION USING MACHINE LEARNING..........................21
2.9 APPROACHES TO SOLVE THE PROBLEM.............................................................22
2.10 SUMMARY, LIMITATIONS AND GAPS IN THE LITERATURE SURVEY..............25

3. DESIGN AND METHODOLOGY....................................................................27

3.1 BUSINESS UNDERSTANDING..............................................................................28


3.2 DATA UNDERSTANDING....................................................................................29
3.3 DATA PREPARATION.........................................................................................29
3.3.1 Handling Missing Values......................................................................30
3.3.2 Normalizing Data..................................................................................30
3.3.3 Feature Selection..................................................................................31
3.3.4 Encoding...............................................................................................31
3.3.5 Data Sampling.......................................................................................32
3.4 MODELLING.......................................................................................................33
3.4.1 Logistic Regression...............................................................................33
3.4.2 Random Forest......................................................................................34
3.4.3 Support Vector Machine.......................................................................35
3.4.4 Neural Network.....................................................................................36
3.5 EVALUATION.....................................................................................................37
3.6 STRENGTHS AND LIMITATION...........................................................................38

4. IMPLEMENTATION AND RESULTS............................................................39

4.1 DATA UNDERSTANDING....................................................................................39


4.1.1 Dataset..................................................................................................39
4.1.2 Correlation Analysis.............................................................................47
4.1.3 Outlier Analysis.....................................................................................48
4.2 DATA PRE-PROCESSING.....................................................................................50
4.2.1 Handling Missing Values......................................................................50
4.2.2 Normalizing the Data............................................................................51
4.2.3 Feature Selection..................................................................................51
4.2.4 Encoding...............................................................................................52
4.2.5 Sampling................................................................................................53

v
4.2.6 Data Splitting........................................................................................54
4.3 MODELLING.......................................................................................................55
4.3.1 Logistic Regression...............................................................................55
4.3.2 Random Forest......................................................................................56
4.3.3 Support Vector Machine.......................................................................57
4.3.4 Neural Network.....................................................................................57
4.4 RESULTS............................................................................................................59
4.5 SECONDARY RESEARCH....................................................................................60

5. EVALUATION AND DISCUSSION.................................................................62

5.1 EVALUATION OF THE RESULTS.........................................................................62


5.2 HYPOTHESIS EVALUATION................................................................................64
5.3 STRENGTHS OF THE RESEARCH.........................................................................64
5.4 LIMITATIONS OF THE RESEARCH.......................................................................65

6. CONCLUSION....................................................................................................66

6.1 RESEARCH OVERVIEW......................................................................................66


6.2 PROBLEM DEFINITION.......................................................................................67
6.3 DESIGN, EVALUATION AND RESULTS................................................................67
6.4 CONTRIBUTIONS AND IMPACT...........................................................................68
6.5 FUTURE WORK AND RECOMMENDATIONS........................................................69

BIBLIOGRAPHY.......................................................................................................70

APPENDIX A.............................................................................................................76

vi
TABLE OF FIGURES

Figure 1.1: Inductive Vs. Deductive Reasoning.............................................................6


Figure 2.1: The phases of the CRISP-DM data mining model.....................................11
Figure 2.2: Effects of Sample methods.........................................................................12
Figure 2.3: Feature Selection Category........................................................................14
Figure 2.4: Machine Learning Techniques – Unsupervised and Supervised
Learning........................................................................................................................15
Figure 2.5: Supervised Machine Learning Model........................................................17
Figure 2.6: Logistic Regression Formula.....................................................................17
Figure 2.7: Support Vector Machine............................................................................19
Figure 2.7: Churn Rate Prediction using Machine Learning........................................22
Figure 3.1: CRISP_DM Process...................................................................................27
Figure 3.2: Logistic Regression....................................................................................34
Figure 3.3: Random Forest...........................................................................................35
Figure 4.1: Age variable Histogram.............................................................................40
Figure 4.2: AgeAtJoining variable Histogram.............................................................41
Figure 4.3: TotalSavings variable Histogram..............................................................41
Figure 4.4: TotalLoans variable Histogram.................................................................42
Figure 4.5: Closed Variable distribution......................................................................43
Figure 4.6: Gender Variable distribution.....................................................................43
Figure 4.7: MaritalStatus Variable distribution...........................................................44
Figure 4.8: AccomodationType Variable distribution.................................................45
Figure 4.9: PaymentMethod Variable distribution......................................................45
Figure 4.10: Dormant Variable distribution................................................................46
Figure 4.11: Correlation heatmap of the variables......................................................47
Figure 4.12: Boxplot of Age variable.........................................................................49
Figure 4.13: Scatterplot of Age variable with respect to the target variable..............49
Figure 4.14: Feature Importance graph with respect to the target variable................52
Figure 4.15: Target variable distribution....................................................................53
Figure 4.16: Confusion Matrix...................................................................................59
Figure 5.1: Accuracy Comparison graph....................................................................63
Figure 5.2: ROC graph................................................................................................63

vii
TABLE OF TABLES

Table 1.1: Qualitative Vs. Quantitative Research.........................................................5


Table 3.1: Correlation Table........................................................................................31
Table 4.1: Descriptive Statistics of Customer data......................................................46
Table 4.2: Correlation Matrix of the variables.............................................................48
Table 4.3: Target Variable Counts...............................................................................53
Table 4.4: Final Dataset Description............................................................................54
Table 4.5: Logistic Regression Results for a balanced dataset....................................55
Table 4.6: Logistic Regression Results for imbalance dataset.....................................56
Table 4.7: Random Forest Results for a balanced dataset............................................56
Table 4.8: Random Forest Results for imbalance dataset............................................56
Table 4.9: Support Vector Machine Results for a balanced dataset.............................57
Table 4.10: Support Vector Machine Results for imbalance dataset...........................57
Table 4.11: Neural Network Results for a balanced dataset........................................58
Table 4.12: Neural Network Results for imbalance dataset.........................................58
Table 4.13: Results of Supervised Machine Learning Models....................................59

Table 4.14: Results of Supervised Machine Learning Models with imbalanced


dataset..........................................................................................................................60

viii
LIST OF ACRONYMS

CU Credit Union
CRISP-DM Cross Industry Standard Process for Data Mining
BOI Bank Of Ireland
AIB Allied Irish Bank
ILCU Irish League of Credit Union
SVM Support Vector Machine
CRM Customer Relationship Management
SMOTE Synthetic Minority Oversampling
technique AUC Area Under Curve
ROC Receiver Operating Curve
ANN Artificial Neural Network
SOM Self Organizing Map
DT Decision Tree
MLP Multi Layer Perceptron
TDL Top Decile Lift
EDA Exploratory Data Analysis
RBF Radial Basis Function
RELU Rectified Linear Unit
TP True Positive
FP False Positive
TN True Negative
FN False Negative
TPR True Positive Rate
FPR False Positive Rate

ix
1. INTRODUCTION

1.1 Background

A Credit Union (CU) is a non-profit organisation which exists to serve their members
in Ireland since 1958. They have more than 3.6 million members in Ireland. CUs
functions the same as banks, they accept deposits, provide loans at a reasonable rate of
interest and offer a wide variety of financial services. A CU is a group of people
connected by a ‘common bond’ based on the area they live in, the occupation, or the
employer they work for, who can save together and lend to each other at a fair and
reasonable rate of interest.1 There is CU present based on geographical areas. In every
area, there is one CU present for its members.

CU is different from the banks (BOI, AIB, Ulster bank) in many ways –

1) CU is a not-for-profit democratic financial institute owned by its members


whereas banks are profit-earning financial institutes.

2) Any surplus income is either distributed amongst members in the form of


dividends or is used to develop new and existing services.

3) No hidden administration or transaction fees for the members.

4) Loans and Savings are insured at no direct cost.

5) Flexibilities are offered to the members regarding loan repayments.

6) CU is committed to their local communities and provides support to local youth


initiatives, charities, sporting clubs, and cultural events.

7) Banks are ahead of CU in terms of the number of employees working. The


number of employees in BOI is 11,086; in AIB 10,500 whereas in CU there are
3,500 employees.

1
https://www.creditunion.ie/about-credit-unions/what-is-a-credit-union/

1
The Irish League of Credit Union (ILCU) describes CU as “a group of people who
save together and lend to each other at a fair and reasonable rate of interest”. CU offers
their members the chance to have control over their finances. Regular savings form a
common pool of money, which provides many benefits for members.

With advancements and competition amongst financial institutions, there is a need to


retain their old customers. Customer retention is crucial in a variety of businesses as
acquiring new customers is often more costly than keeping the current ones (Kaya,
Dong, Suhara, Balsicoy & Bozkaya, 2018). Customer Churn has become a major
problem in all industries including the banking industry and banks have always tried to
track customer interaction so that they can detect the customers who are likely to leave
the bank. Customer Churn modeling is mainly focusing on those customers who are
likely to leave and so that they can take the necessary steps to prevent churn (Oyeniyi
& Adeyemo 2015). For CUs customer churn is important as getting new members is
expensive. Moreover, to join the CU the member must satisfy the common bond
criteria, a common bond of either within a community (geographical), or industrial
(employment).

The ILCU has an affiliated membership of 351 CUs – 259 in the Republic of Ireland
and 92 in Northern Ireland. In this research, we are using the member/customer data of
one of these CUs to predict customer churn.

1.2 Research Project

Supervised machine learning techniques have been used in customer churn prediction
problems in the past with SVM-POLY using AdaBoost as the best overall model
(Vafeiadis, Diamantaras, Chatzisavvas & Sarigiannidis, 2015). The most common
techniques applied for predicting customer churn are Decision tree, Multilayer
perceptron, and SVM.

In existing research of customer churn prediction problem in telecommunication


industry the researcher Guo-en, X., have used SVM model as it can solve the
nonlinearity, high dimension, and local minimization problems. The model prediction
depends on the data structure and condition.

2
Techniques that are most commonly used to predict customer churn are neural
networks, support vector machines and logistic regression models. Data mining
research literature suggests that machine learning techniques, such as neural networks
should be used for non-parametric datasets because they often outperform traditional
statistical techniques such as linear and quadratic discriminant analysis approaches
(Zoric, 2016).

Logistic Regression is a type of probability statistical classification model mainly used


for classification problems (Nie, Rowe, Zhang, Tian & Shi, 2011). The technique can
work well with a different combination of variables and can help in predicting the
customer churn with higher accuracy.

Random Forest is an ensemble learning method for classification, regression problems


and uses the bagging technique to generate the results. The default hyperparameters of
Random Forest gives good results and it is great at avoiding overfitting (Pretorius,
Bierman & Steel, 2016).

Based on the previous literature in this area and for reasons mentioned further on in
this section, four supervised machine learning techniques will be compared when
aiming to predict customer churn, the four techniques are logistic regression, random
forest, SVM and neural network.
Currently, the customer churn is not predicted using any of the machine learning
algorithm techniques for CU members’ data. The Logistic regression model is selected
and in the previous research, it has been observed that SVM and random forest
outperformed logistic regression when predicting customer churn.
The research question is framed as:
“Which supervised machine learning: Logistic regression, Random forest, SVM or
Neural network; can best predict the customer churn of CU with the best accuracy,
specificity, precision, and recall?”

1.3 Research Objectives

The key objective of the research is to identify whether the Supervised Machine
Learning will help to predict the customer churn rate on CU customer data precisely.
Currently, no specific method has been adopted by CU to identify the customer churn
rate. This research help identify the customers which are more likely to churn and then

3
in turn the customers can focus more on those customers and thus can retain their old
customers which leads to the growth.

The research objectives are as follows–

1) To collect the required customer data from the business for the research.

2) Understanding the data, identifying any data issues and then rectifying those to
apply machine learning algorithms.

3) Preparing the data using sampling, encoding, feature selection and splitting the
data.

4) Building the supervised machine learning models – Support Vector Machine,


Logistic Regression, Random Forest and Neural Network to see the
performance on training data set.

5) Validating the models on the Validation data set and based on evaluation
metrics identifying the best model among all for predicting the customer churn.

6) Then testing the best performance model amongst all supervised models on the
Test data set and then evaluating the results.

7) Identify the limitation and future research propose areas.

1.4 Research Methodologies

The Research can be classified based on different ways –

1.4.1 Based on type: Primary Vs. Secondary Research

Primary Research is also known as field research. The research is done in this to
collect the original data that does not already exist. Secondary research is also known
as desk research which involves the summary, collation and/or synthesis of existing
research.

Here in this research of customer churn prediction of CU, this is a primary type of
research as the research has been done to collect the original data from the financial

4
institute. This research is unique as no such work has been performed on the CU
member dataset.

1.4.2 Based on objective: Qualitative Vs. Quantitative Research

Qualitative research is the non-statistical research to gain a qualitative understanding


of the underlying reasons and motivations. It usually requires a smaller but focused
dataset. It describes the research broadly and develops a deeper understanding of a
topic. Quantitative research means the systematic study of the research and analysing
the data statistically. It deals with investigating the quantitative data and
recommending a final output of the research.

Table 1.1: Qualitative Vs. Quantitative Research (Source: Malhotra, 2007)

The current research is Quantitative research which uses data mining, involves the
systematic investigation of customer data and is aimed at developing models, then
verifying the results and then either the hypothesis is accepted or rejected based on the
customer churn precision (Borrego, Douglas & Amelink, 2013).

1.4.3 Based on form: Exploratory Vs. Constructive Vs. Empirical

In Exploratory research, the research is being carried out for a problem that has not
been clearly defined. It helps to determine the best research design, data collection
method. Constructive research referred to a new contribution. A completely new

5
approach or new model or new theory is formulated in this research. It often involves
the proper validation of the research via analytical comparison with predefined
research, benchmark tests. Empirical research refers to the way of gaining knowledge
through direct observation or experience. It involves the process of defining the
hypothesis and then the predictions which can be tested with a suitable experiment.

This research is an empirical form of research because it involves defining the


hypothesis and predicting the precision of customer churn by performing suitable
experiments and then collating the results and then based on the results the hypothesis
is accepted or rejected.

1.4.4 Based on reasoning: Deductive Vs. Inductive Research

A deductive approach is a top-down approach which is from the more general to more
specific in which based on the pre-defined theory the hypothesis is defined and then
the conclusion is drawn based on the research.

In Inductive research also known as a bottom-up approach which goes from specific
observation to broader generalizations of theories.

Figure : 1.1 Inductive Vs. Deductive Reasoning


(Source: Aliyu, Kasim & Martin, 2011)

A Deductive reasoning is employed in this research also known as the top-down


approach (Saunders, Lewis, & Thornbill, 2009) as in this research firstly based on the
theory the hypothesis is created and to study them the experiments were performed,

6
and the Supervised Machine learning models were built on the CU customer data to
predict the churned customers. Then the champion model is selected based on the
accuracy of the model.

Python programming language is used for statistical exploration of data, data cleaning,
data preparation, building supervised machine learning models and evaluation of those
models.

1.5 Scope and Limitations

The scope of this research is to develop a machine learning model using the CU’s
customer data to predict the customer churn.

The main limitation of the research is that the customer data is obtained from one CU
only so it cannot be the representative of the other CU financial institutions. The
customer base would be different for different CU institutions.

The other limitation of the research is that there are so many DateTime data type
variables present which are not considered for building the classifiers. Also, the data
imbalance is another limitation to overcome as the churned customers were less
common, so less data was provided to the classifiers to study the features of churned
variables.

1.6 Document Outline

This section outlines the thesis document:

This thesis report starts with defining and explaining the research problem and
providing the importance of the research problem with the methodologies adopted,
exploiting the problem and purpose of the problem with the proper research question.

Chapter 2 (Literature Review) discusses the literature related to customer churn


prediction. This chapter reviews and compares the previous work done in this area
using supervised machine learning techniques to predict the customer churn. It
describes the use of SVM, Logistic Regression, Random Forest and Neural Network in
previous work, their specifications and discusses most valuable work.

7
Chapter 3 (Design and Methodology) describes the design and methodology adopted
to solve the research problem in detail. It follows the CRISP-DM methodology and
each step is carried out and explained in detail in this chapter.

Chapter 4 (Implementation and Results) presents the implementation details and the
results of the implementation. It describes in detail which models are chosen and
which models have performed with proper justification. The hypothesis of the research
is considered, and results are compared, and the hypothesis is evaluated.

Chapter 5 (Evaluation and Discussion) discusses the evaluation criteria of the


supervised machine learning models. The champion model is determined, and the
results were discussed. The research problem is discussed with the results and the
hypothesis is evaluated. Also, the strength and limitations of the thesis are discussed.

Chapter 6 (Conclusion) discusses the research problem with the result obtained and
evaluation. It summarises the research, discusses the contribution of the research
towards the research question. Also, it recommends some future research work in a
similar area.

8
2. LITERATURE REVIEW

This chapter provides a review of the literature available on CUs, Customer Churn
prediction methods, various approaches adopted to solve the problem and evaluation
metrics used for evaluating the models. The chapter concludes with the gaps in the
existing research and forms the objective for the research.

2.1 Background

Customer Churn Prediction is important in all businesses because it helps to gain a


better understanding of your customers and of future expected revenue. It can also help
your business identify and improve upon areas where customer service is lacking. A lot
of work has been done on this and still, and there are a lot of industries customer data
to explore. The results differ for the data of different industries.

2.2 Customer Churn Prediction

The term Customer Attrition refers to the customer leaving one business service to
another. Customer Churn Prediction is used to identify the possible churners in
advance before they leave the company. This step helps the company to plan some
required retention policies to attract the likely churners and then to retain them which
in turn reduces the financial loss of the company (Umayaparvathi & Iyakutti, 2012).

Customer churn is a concern for several industries, and it is particularly acute in the
strongly competitive industries. Losing customers leads to financial loss because of
reduced sales and leads to an increasing need for attracting new customers (Guo-en &
Wei-dong, 2008).

Customer retention is crucial in a variety of businesses as acquiring new customers is


often more costly than keeping the current ones. Due to the unpredictable nature of
customers, it is quite a daunting task to predict whether the customer will quit the
company or not. For financial institutes, it is even more complex to identify the

9
customer churn due to the sparsity of the data as compared to another domain. This
requires longer investigation periods for churn prediction (Kaya, et.al., 2018).

The economic value of customer retention is widely recognized (Poel & Lariviere,
2004):

(1) Successful customer retention allows organizations to focus more on the needs
of their existing customers instead of seeking new and potentially risky ones.

(2) Long term customers would be more beneficial and, if satisfied, may provide
new referrals.

(3) Long term customers tend to be less sensitive towards a competitive market.

(4) Long term customers become less expensive to serve due to the bank’s
knowledge

(5) Losing customers leads to reduced sales, and increased sales to attract new
customers.

Customer Churn has become a major problem in all industries including the banking
industry and banks have always tried to track customer interaction so that they can
detect the customers who are likely to leave the bank. Customer Churn modeling is
mainly focusing on those customers who are likely to leave and so that they can take
steps to prevent churn (Oyeniyi & Adeyemo, 2015).

In an era of the competitive world, more and more companies do realize that their most
precious asset is the existing customer base and their data. We mainly investigate the
predictors of churn incidence as part of customer relationship management (CRM).
Churn Management is an important task to retain valuable customers.

Business organizations, such as banks, insurance companies, and other service


providers are changing their employees to be more customer-and service oriented and
they are setting strategies to ensure customer retention (Nashwan & Hassan, 2017).
The best core marketing strategy for the future is to retain existing customers and
avoiding customer churn (Kim, Park & Jeong, 2004)

Previous research indicates that there were two types of targeted approaches to
managing customer churn: reactive and proactive. In a reactive approach, the company

10
waits until the customer asks to cancel their service. In a proactive approach, the
company tries to identify customers who are likely to churn. The company then tries to
retain those customers by providing incentives. If churn predictions are inaccurate then
companies will waste their money on customer churn so the customer churn should be
accurate (Tsai & Lu, 2009).

2.3 Data Exploration and Pre-processing

Data Exploration is required to gain further understanding of the data and business
problem. The CRISP-DM methodology is widely accepted for the Data mining model.
It is mainly for conducting a data mining process, whose life cycle consists of six
phases as shown in the below figure.

Figure 2.1: The phases of the CRISP-DM data mining model2


The most important stage of Data Analysis is the Data Preparation. In general, the data
cleaning and pre-processing take approximately 80% of the time. The data preparation
is more challenging and time-consuming part.
The Real-world data can be noisy, incomplete and inconsistent. The data preparation
stage deals with – incomplete data where some attribute values were missing, where
certain important attributes were missing. In the data preparation stage the outliers and
errors in data were also handled, even the data discrepancies were handled in the data
preparation. Data preparation generates a smaller dataset than the original one. This

2
https://www.kdnuggets.com/2017/01/four-problems-crisp-dm-fix.html

11
task includes selecting relevant data, attribute selection, removing anomalies,
eliminating duplicate records. This stage also deals with filling the missing values,
reducing ambiguity and removing outliers (Zhang, Zhang & Yang, 2003).
This stage is of high importance due to the following:
(1) the real data is impure;
(2) high-performance mining requires quality data;
(3) quality data yields high-quality patterns

Feature Selection

Feature Selection is the process of identifying the fields which are the best for
prediction as a critical process (Hadden, Tiwari, Roy & Ruta, 2005). This step is
important in customer churn prediction. Feature selection is a process of selecting a
subset of original features is an important and frequently used dimensionality reduction
technique for data mining.
In one of the researches done by Khan, Manoj, Singh & Bluemenstock (2015) t-test
was performed separately for each feature, which indicated the extent to which a single
feature can accurately differentiate between people who have churned or not. A Tree-
based method was used for feature selection. This method was useful in producing a
list of correlated predictors.
The feature selection was categorized into two categories based on Label Information
and Search Strategy. The below diagram will detail the division.

Figure 2.3: Feature Selection Category


(Source: Miao & Niu, 2016)
The training data can be labelled, unlabelled or partial labelled which leads to the
development of Supervised, Unsupervised and Semi-Supervised feature selection
12
algorithms. Supervised feature selection determined the feature relevance by
evaluating the feature’s correlation. Unsupervised feature selection exploited data
variance and separability to evaluate feature relevance. Semi-supervised feature
selection algorithm used both labelled and semi-labelled data and improved the feature
selection of unlabelled data. Based on search strategy three categories of feature
selection are filter, wrapper and embedded models. The filter model evaluated features
without involving any learning algorithm which relies on the general characteristics of
data. The wrapper model required a predetermined learning algorithm and used its

13
performance as an evaluation criterion to select features. Algorithms with an embedded
model, e.g., C4.5 and LARS, were the examples of wrapper models which incorporate
variable selection as a part of the training process, and feature relevance was obtained
analytically from the objective of the learning model (Miao & Niu, 2016).
According to researchers Cai, Luo, Wang & Yang (2018) Supervised feature selection
for classification problem using the correlation between the feature and the class label
as its fundamental principle. The correlation between the features were determined and
compared to the threshold to decide if a feature was redundant or not. This method was
an optimal feature selection method which maximized the classifiers accuracy.

2.4 Machine Learning

Machine Learning is a method of data analysis which assists in analytical model


building. It is a branch of Artificial Intelligence (AI). The machine learning models
learn from the data, identify general patterns in it and construct decision with minimal
human intervention.
Machine Learning is mainly used when we have a complex problem or task involving
a huge amount of data. It is a good option for more complex data and deliver faster,
more accurate results. It helps an organization of identifying profitable opportunities or
any unknown risks (Sayed, Fattah & Kholief, 2018).
Machine learning mainly uses two types of learning techniques:
1) Supervised Machine Learning
2) Unsupervised Machine Learning

14
Figure 2.4: Machine Learning Techniques – Unsupervised and Supervised Learning3

2.4.1 Supervised Machine Learning

Supervised Machine Learning is the computational task of learning correlations


between variables in training dataset and then utilising this information for creating a
predictive model capable of inferring annotations for new data (Fabris, Magalhaes &
Freitas, 2017). In Supervised Machine learning, we have an input variable (X) and an
output variable (Y) and we use an algorithm to learn the mapping from the input to the
output.
Y = f(X)
The goal is to approximate the mapping function so well that when the new input data
(X) is introduced the model predicts the output variable (Y) for that data.
The learning is called as Supervised learning when instances are given with known
labels. The features can be continuous, categorical or binary (Kotsiantis, Kanellopoulos
& Pintelas, 2006).
The supervised learning problems can be grouped into regression and classification –
1) Classification – When the output variable is categorical, such as “red” or
“blue” and “yes” or “no” then it is considered as Classification problems.
2) Regression – When the output variable is a real value, then such problems are
considered as Regression problems.

3
https://vitalflux.com/dummies-notes-supervised-vs-unsupervised-learning/

15
Figure 2.5: Supervised Machine Learning Model
(Source: Vladimir, 2017)

2.5 Machine Learning Techniques

Several machine learning techniques have previously been used in similar customer
churn prediction problems.

2.5.1 Logistic Regression

Logistic Regression was very widely used statistical model used for Customer Churn
and has been proven a powerful algorithm.
The formula in figure 7 below represents logistic regression where 𝑝𝑖 is the probability
and 𝑥𝑖 is the independent variables which predicted the outcome 𝑝𝑖.

Figure 2.6: Logistic Regression Formula

16
(Source: Nie, 2011)

In a study on churn prediction of credit card in China’s banking industry Logistic


Regression and Decision Tree model were built. It was observed that Logistic
Regression has performed better than Decision Tree (Nie, 2011). There were 135
variables and, in this research, instead of selecting all 135 variables certain variables
were selected and models were built based on correlation and in the study, it has been
observed that Logistic Regression model has performed better than Decision tree
algorithm.
Researchers have implemented the binary and ordinal logistic regression models for
customer churn prediction using SAS 9.2 with the Logistic regression procedure and
Cox regression models (Ali & Ariturk, 2014).
In another study of comparison of models performed on predicting Customer Churn on
Telecom dataset, it was observed that the Logistic Regression model has outperformed
Decision Tree. Logistic Regression uses maximum likelihood estimation for
transforming the dependent variable into a logistic variable. The proposed system
provided a statistical survival analysis tool to predict customer churn. The confusion
matrix was used for evaluation purpose (Khandge, Deomore, Bankar & Kanade, 2016).

2.5.2 Random Forest

Random Forest is an ensemble learning method model for a classification or regression


problem. A decision tree is the building block of random forest. The multitude of
decision tree makes the random forest and the output is the mode of the class in
classification and mean prediction for the regression problem. Random Forest will not
overfit if enough number of trees are there in the classifier. It can handle missing
values and is suitable for categorical variables also.
In one of the previous researches on financial customer churn, the researchers have
used Random forest classification technique (Kaya, et. al., 2018).

2.5.3 Support Vector Machine

Support Vector Machine model is a supervised machine learning model which can be
used for classification as well as regression problems. SVM is mostly used in a

17
classification problem as it can separate two classes using a hyperplane. The objective
of SVM is to find a hyperplane that can distinctly classify the data. Hyperplanes are
decision boundaries that help classify the data points. Support Vectors are data points
that are closer to hyperplane and influence the position and orientation of the
hyperplane.

Figure 2.7: Support Vector


Machine (Source: Ali,
2018)
Several researchers have implemented mainly two methods for customer churn
prediction. The first method was the traditional classification method using supervised
learning mainly for quantitative data and the other was artificial intelligence method
for large scale, high dimensionality, nonlinearity and time-series data (Guo-en & Wei-
dong, 2008). In existing research of customer churn prediction problem in
telecommunication industry the researcher Guo-en, X., have used SVM model as it can
solve the nonlinearity, high dimension, and local minimization problems. The model
prediction depended on the data structure and condition.

2.5.4 Neural Network

Neural Networks are a set of algorithms, that are designed to recognize patterns. The
basic building blocks of neural network is neurons. The output depends on the
activation function of the neuron.
The researcher Zoric, 2016 have used neural network model within the software
package Alyuda NeuroInteligence for his research on customer churn prediction in
Banking industry because neural network worked well for pattern recognition, image
processing, optimization problems etc.
18
Another group of researchers Huang, Kechadi, Buckley, Keirnan, Keogh & Rashid,
2010 have proposed the comparison between the popular modelling technique –
Multilayer Perceptron Neural Networks and Decision Tree with the innovative
modelling technique – SVM (Huang, et.al. 2010) for customer churn prediction in the
telecom industry. MLP and SVM were more efficient than Decision Tree.

2.6 Historic Customer Churn Prediction

In any organisation, the Customer Relationship Management is a prominent field in the


business analysis field. It deals with retaining existing customers, then identifying,
expanding and attracting the potential customers. CRM has two aspects one is the
technical aspect and the other is the operational aspect. The technical aspect of the
CRM also known as Customer Analytics (Senanayake, Muthugama, Mendis &
Madushanka, 2015). Customer Analytics can be broken into two categories:

(1) Descriptive Analytics – In which the customer identification was done

19
(2) Predictive Analytics – In this, the retention of customers was focused.

The Predictive Analytics is the customer churn analysis which mainly focuses on
retaining the customers.

According to the researchers Senanayake, Muthugama, Mendis, & Madushanka,


(2015) the typical approach of identifying the customer without machine learning was
to analyse the data of those customers who have already churned and identifying
customer attrition from the existing customers based on observation and customer
behaviour.

2.7 Customer Churn Prediction using Machine Learning

Now as the time passes the data increases and due to the volume of data is immense it
becomes a daunting task for the data analysts to analyse such huge data. So, then the
customer churn prediction using machine learning and data mining techniques played a
significant role.

Customer churn prediction using machine learning models follow a set of steps. The
data is collected, next, the selected data was pre-processed and transformed into a
suitable form for building a machine learning model. After modelling the testing was
performed and then finally the model was deployed (Kim, Shin & Park, 2005). The
machine learning investigated the data and detects the underlying data patterns for the
customer churn analysis (Kim, Shin & Park, 2005). Using machine learning the
prediction of customer churn was more accurate than the traditional approach.

20
Figure 2.7: Churn Rate Prediction using Machine Learning
(Source: Beker, 2019)

Several features were involved as variables in customer churn analysis. The various
category of variables was customer variables of recency, frequency and monetary
value (RFM), demographic features like the geographical details, cultural information
and age (Senanayake, Muthugama, Mendis & Madushanka, 2015).

2.8 Approaches to solve the problem

Many researchers have worked on the prediction of customer churn. Most of the
research was based on applying machine learning algorithms on customer data and
predicting the customer churn rate. A few of the studies are discussed in this section.

Researchers Guo-en, & Wei-dong, (2008) have applied the machine learning method
SVM on structural risk minimization to predict the customer churn on telecom industry
customer data set. They have analysed the results of the SVM model with an artificial
neural network, decision tree, logistic regression, and naïve Bayesian classifiers. In the
experiment it was found that the SVM has outperformed with best accuracy rate, hit
rate, covering rate and lift coefficient. There were two datasets used in the research and
for SVM model the kernel function was selected using MATLAB 6.5. For the first
dataset the SVM has acquired good results using kernel function as radial basis
function and for the other dataset Cauchy kernel function was used. The SVM model

21
accuracy was calculated as 90% and 59% for dataset 1 and dataset 2 respectively.
Decision Tree C4.5 had the least performance for both the datasets with accuracy as
83% and 52% respectively.

Another study on European financial bank customer data was conducted by Poel &
Lariviere, (2004) using the Cox proportional hazard method to investigate customer
attrition. The focus was on churn incidence. The SAS enterprise miner was used in this
research. They performed the research by combining several different types of
predictors into one comprehensive proportional hazard model. By analysing this bank
customer dataset two critical customer churn periods were identified – firstly the early
years after becoming the customer and a second period is after some 20 years.
Demographic and environmental changes were of major concern and have a great
impact on customer retention. In this research, four retention predictor categories were
used it would have been more advantageous if the data obtained were merged and
would have incorporated in a single retention model instead of four different models.

Hybrid neural networks were built, and the performance was compared with the
baseline ANN model by the researchers Tsai & Lu (2009). The customer churn was
predicted on the American telecom company data. In this research, they have built one
baseline ANN model and two hybrid models by combining the clustering and
classification methods to improve the performance of the single clustering or
classification techniques. It comprised of two learning stages, the first one, was used
for pre-processing the data and the second one for the final output prediction. The two
hybrid models built were ANN+ANN (Artificial Neural Network) and SOM (Self
Organizing Maps) +ANN. These models were evaluated based on the Type I and Type
II error rates and the accuracy of the models. In statistical hypothesis testing a type I
error was the rejection of a true null hypothesis, while type II error was the non-
rejection of a false null hypothesis. The actual results showed that the ANN+ANN
model performed better than both the ANN and SOM+ANN models in terms of Type I
error rates. Also, the prediction accuracy for ANN+ANN hybrid model was better than
that of ANN and SOM+ANN models. Thus, in this research paper hybrid techniques
were performed. The hybrid model with two ANN has performed better when
compared to SOM+ANN hybrid model. Feature selection was not considered in this
research.

22
In one of the research papers on customer churn in the financial industry by researchers
(Kaya, et. al., 2018) they have emphasized more impact on Spatio-temporal features.
They have adopted Random Forest as the classification model for their study and
trained the model with 500 trees and maximum of 2 features per tree. Stratified 8-fold
cross-validation was adopted for evaluation. In this research, Spatio-temporal and
choice features were found more superior than demographic features in financial churn
decision prediction. In this research, it was observed that young people were more
likely to leave the bank. The results of this research suggested that based on mobility,
temporal and choice entropy patterns which can be extracted from customer behaviour
data we can predict the customer churn rate. The evaluation was performed using AUC
ROC evaluation metrics.

Researchers Oyeniyi & Adeyemo (2015) have predicted the customer churn problem
on one of the Nigerian bank datasets and they have used WEKA tool for knowledge
analysis. K-means clustering algorithm was used for clustering phase followed by a
JRip algorithm rule generation phase.

Customer Churn prediction was performed on Personal Handy Phone System Service
by researchers Bin, Peiji & Juan, (2007). They have built a Decision tree and three
experiments were conducted to build an effective and accurate customer churn model.
In this research 180 days data was randomly sampled and utilized in the research for
churn prediction. In the first experiment sub-periods for training data sets were
changed, in the second experiment, the misclassification cost was changed in churn
model and then in the third experiment being conducted sample methods were changed
in the training data sets. In this study in first experiment, the number of sub-periods
were considered as 18, 9, 6 and 3 which means the 180 days call record data is divided
into 18, 9, 6 and 3 parts. In second experiment, the misclassification cost means setting
the proportion of nonchurn and churn customers in training dataset. In third
experiment, various sampling techniques were adopted to balance the dataset. This
research helped in churn prediction and in improving the performance of churn
prediction models. In this study, it has been observed that the performance of the
model was superior when sub-period was set as 18. In the case of misclassification cost
when it was set as 1:2, 1:3 and 1:5 the result was superior and finally in case of sample
method random sample method has yielded the best results in the research.

23
A comparative study on customer churn prediction was performed by Vafeiadis, et.al.
(2016) on telecom data set. The performance comparison of multi-layer perceptron,
Decision Tree, SVM, Naïve Bayes and Logistic regression were compared. All the
models were built and evaluated using cross-validation. Monte Carlo simulations were
used and SVM has outperformed other models with an accuracy of 97% and F-measure
of 84%.

In one of the previous researches, on Customer churn prediction, the researchers have
used the traditional method supervised machine learning algorithms – Decision Tree,
Regression Analysis for prediction and also the Soft Computing methodologies such as
fuzzy logic, neural networks and genetic algorithms (Hadden, Tiwari, Roy & Ruta,
2005).
Sharma & Panigrahi (2011) have performed the customer churn prediction on telecom
dataset using Neural Network. The neural network has yielded better result with
accuracy of 92%. The researcher has focused on changing the number of neurons and
increasing the hidden layers in the neural network model. Feature selection and class
imbalance problem were not considered in the research.

In one of the comparison research paper by Xie, Li, Ngai & Ying (2009) it has been
observed that balanced Random Forest has outperformed the other classifiers ANN,
SVM and DT based on precision and recall.

2.9 Summary, Limitations and Gaps in the Literature Survey

A detailed study of state-of-the-art approaches in predicting customer churn has been


performed for this research. It has been observed that there is a need to focus more on
data pre-processing stage. Most of the research has not handled the feature selection
and class imbalance problem.

Most of the research (Xie, et.al., 2009, Sharma, 2011, Vafeiadis, et.al., 2016) were
performed on telecom customer dataset and few research (Oyeniyi & Adeyemo, 2015),
Kaya, et. al., 2018) were performed on financial dataset. Customer churn prediction on
Personal Handy Phone System Service by researchers Bin, Peiji & Juan, L. (2007) was
performed. No research has been focused on the customer churn prediction on CU
financial institute.

24
Currently, the vital and active areas of research in Customer Churn prediction was
using feature selection for data mining purposes (Guo-en & Wei-dong, 2008). Also,
while implementing SVM how to select fitting kernel function and parameter, how to
weigh customer samples (Guo-en & Wei-dong, 2008). For further research, it would be
a challenge to incorporate customer behaviour, customer perceptions, customer
demographics and macroenvironment into one comprehensive retention model (Poel &
Lariviere, 2004). More focus should be emphasized on the pre-processing stage for
better performance, the dimensionality reduction or feature selection. Also, other
domain data sets for churn prediction can be used for further comparisons (Tsai & Lu,
2009). Research should be aligned towards improving the predictive ability of churn
model by using other data mining techniques, for example, neural net, logistic
regression, self-organizing map, support vector machine and so on (Bin, Peiji & Juan,
2007).

Most of the studies were done using archived data. In the existing research not, much
guidance was provided on how to analyse the real-world application dataset. To
address the limitations and research gaps presented in this section, the research was
focused on covering the data pre-processing steps of feature selection by using
correlation technique and extra tree classifier method, handling class imbalance using
SMOTE technique. Further, secondary research was also conducted focusing on a
comparative study on churn prediction of Banking domain dataset (Kumar &
Vadlamani, 2008) with the current research prediction results.

25
3. DESIGN AND METHODOLOGY

In this chapter, the design of the research and the methodology will be explained in
detail to answer the research question. The experiment design followed the CRISP-DM
process in the research lifecycle. Python programming was used to carry out the
experiments of the research.

This research aimed at building and comparing the supervised machine learning
techniques using a CU customer dataset to predict the customer churn rate. The
Logistic Regression, Random Forest, SVM and Neural Network supervised machine
learning models were built, and the results were compared. The secondary research
focused on a comparative study of research results with the existing research paper
results on banking domain (Kumar & Vadlamani, 2008).

The overall workflow of the research is as shown below.

Figure 3.1: CRISP_DM Process


(Source: Wirth & Hipp, 2000)

The thesis followed the CRISP-DM methodology, and each of the phases are described
in detail below.

26
27
3.1 Data Understanding

The Data Understanding phase deals with the collection of data and data exploration to
get basic insight into the type of data. Some understanding of data was gained in this
phase.

The dataset used in this research was the customer data of the financial institute called
CU. The dataset was completely original, and no statistical research has been done on
this dataset. It consists of the data of all customers who have joined the CU from 1911
to 2019. The dataset has 96967 records of distinct members with 48 features. The
customer churn was defined as the total number of customers who have closed their
accounts. In this research, the customers who are not deceased and whose accounts
were either closed or dormant were considered as churned from CU.

The data was loaded using the pandas library of python. The number of records were
explored, using the info() function the datatype of each independent variables were
identified.

The basic quantitative analysis of the data was carried out. The measures of central
tendency, range, standard deviation, mean, max, min of the variables was measured
here using Descriptive Statistics. Also, the skew and kurtosis of the variables were
measured to check the normality of the variables. Exploratory Data Analysis (EDA)
was performed. The data visualisation was performed using matplotlib and seaborn
python libraries and histogram, box-plot was created to view the data distribution for
checking the normality and to identify the outliers in the variables.

The correlation matrix was built using the Spearman method to identify the correlation
between the dependent and independent variables and to identify the correlation
between dependent variables to avoid multicollinearity.

3.2 Data Preparation

In the data preparation phase, all activities were performed to convert the raw data into
the final dataset which we can feed into the modelling algorithms and build models.
Various tasks like data cleaning, removing outliers, imputing missing values,

28
construction of new attributes, feature selection and transformation of data all tasks
were performed in this phase.

3.2.1 Handling Missing Values

It is very important to handle missing values as many machine learning algorithms do


not support data with missing values.
The given dataset comprised of many missing values. This may be caused due to a
number of various factors. One of the reasons may be that the data was not collected.
Variables with more than 60% of values missing can be removed from the final dataset
(Kelleher, Mac Namee, & D’Arcy,2015).
For continuous variables which can take any values between its minimum and
maximum values, with missing values from 2% till 30% the values were imputed by
mean values. Mean is a reasonable estimate for randomly selected observations from a
normal distribution. Missing values may be caused by several different factors.
Missing data generated various problems. Missing data reduced the statistical power
which can lead to the wrong evaluation of the hypothesis. It could also reduce the
representativeness of the sample. It could complicate the analysis of data. Few
algorithms do not work with missing data. The missing values in categorical variables
which contain labels were imputed using the maximum likelihood and last observation
carried forward techniques were the most common techniques for imputing the missing
values (Kang, 2013). In maximum likelihood the missing values were imputed with the
values which occurred most of the time. In the last observation carried forward
technique the previous observation was imputed in the missing value.

3.2.2 Normalizing Data

Normalization is a technique applied as a part of data pre-processing for building


machine learning models. The goal of normalization is to change the values to a
common scale. The skew and kurtosis were measured for each numerical column and
if the skew and kurtosis values were outside the range of +/-2 then the variable was
said to be skewed data. Also, the histogram can be used to depict the normal
distribution of the data.
These attributes were normalized using the sklearn MixMaxScaler() function.

29
3.2.3 Feature Selection

Feature selection is applied to the dataset as there is a large number of dependent


variables in the given dataset. The feature selection is used to find relevant features for
the model construction. The Correlation Matrix with heatmap method was used to do
the feature selection in this research. A correlation was determined between the
dependent and independent variables and between the dependent variables. Correlation
was a measure of how strongly one variable depends on another. If the correlation goes
beyond the threshold of correlation greater than 0.5 the variables will not be considered
as it will affect the model accuracy (Mukaka, 2012)

Feature selection improves the accuracy of the model. It trains the model faster and
reduced the complexity of the model. Another method Tree based classifier was also
implemented to find the most predicting features for feature selection based on a
literature review. It is an ensemble learning method and used to predict the best
features in predicting target.

3.2.4 Encoding

The dataset contains continuous and categorical variables. There are few machine
learning algorithms like SVM and Logistic Regression which accepts only numeric
data. For this reason, the categorical data is converted into 0 and 1 using label

30
encoding. In this dataset a total of 21 variables were categorical variables with True
and False values or some nominal values. These values were transformed into
numerical using sklearn’s4 LabelEncoder function.

3.2.5 Data Sampling

In many real-world application class imbalances is the most common data issue. In
such problems, most of the examples are labelled as one class, while fewer examples
are labelled as the other class, usually the important ones. This problem is known as a
class imbalance. Class imbalance problem exists in lots of application domains (Guo,
2016).

Before undertaking an experiment, a decision must be made on class imbalance


problem as the minority class was of prime importance in this research. Here in this
research, the class imbalance ratio of approximately 18:5 was found which means
against 18 non churned customers 5 will be churned. The non-churned members were
75% of total members whereas the churned members were 25% of the total CU
members.

The course of action can be taken in the data pre-processing phase of the project was
either the random undersampling or random oversampling which were the data level
methods to handle class imbalance problem. As observed in the previous research by
Maheshwari, Jain & Jadon (2017) both the undersampling and oversampling have
advantages as well as disadvantages. Oversampling can lead to overfitting and lead to
more computation work for large datasets whereas undersampling can lead to the
removal of some significant data records. Here in this research, SMOTE technique was
used to handle class imbalance problem.
3.2.6
3.2.7 Neural Network

Finally, the Neural Network was evaluated. Neural Network is a nonlinear predictive
model which learns through training and the structure is like a biological neural
network. Neurons acts as the basic building blocks of the network. The output depends
on the activation function of the neuron. Here in this research relu activation function
was used based on the previous research. The relu (Rectified Linear Unit) activation
function was computationally less expensive. It takes input and then each input was
multiplied by a weight. Then all the weighted inputs were summed up together with a
31
bias. Finally, the sum was passed through an activation function. The most common
activation function used for Neural Network was ‘Sigmoid’ function. This function is
useful for binary classification as it outputs in the range of 0 and 1 (Zoric, 2016).

In this research, the techniques selected were Logistic Regression, Random Forest,
SVM and Neural Network. All these techniques were used in previous researches on
different datasets for identifying the customer churn problem. This research is adding
value to the previous research.

Here in this research for loop was used to divide the data randomly in train and test
datasets. Each model was fitted every time on the same dataset. The accuracy score
was appended and stored in a list every time the for loop was executed and thus
different accuracy was found each time, we run the model. Finally, the average
accuracy of each classifier was obtained and compared to identify the champion model.
This approach was used so that each model was fitted to the same split of data which
ensured that the model results can be compared. The for loop ensures that the results
were generalised and that the split of data does not have an impact on the model
performance.

32
The dataset consists of 7043 rows and 21 columns, where rows represent the number of

customers in the dataset and the columns represent each customer’s attribute. The attributes

are used to predict the churn of a particular customer.

Look to columns in the dataset:

There are 21 columns so we will divide them into independent and dependent columns:-

Independent variables:-

[ ‘customerID’, ‘gender’, ‘SeniorCitizen’, ‘Partner’, ‘Dependents’, ‘tenure’, ‘PhoneService’,

‘MultipleLines’, ‘InternetService’, ‘OnlineSecurity’, ‘OnlineBackup’, ‘DeviceProtection’,

‘TechSupport’, ‘StreamingTV’, ‘StreamingMovies’, ‘Contract’, ‘PaperlessBilling’,

‘PaymentMethod’, ‘MonthlyCharges’, ‘TotalCharges’ ]

Dependent variables:-

[ ‘Churn’ ]

Now it’s time to start building the artificial neural network, firstly we will import important

libraries for the further process.

Import Libraries required to create the Customer Churn Model

We import basic libraries for processing the data.

#import pandas
import pandas as pd
#import numpy
import numpy as np
#import matplotlib
import matplotlib.pyplot as plt
#import seaborn
import seaborn as sb

So, we import pandas for data analysis, NumPy for calculating N-dimensional
33
array, seaborn, and matplotlib to visualize the data, these all are the basic libraries required

for the preprocessing of the data.

Now we will define our dataset and then we will see our churn dataset for overview.

Load Churn Prediction Dataset

For loading our churn dataset we need to use panda’s library

# use pandas to import csv file


df = pd.read_csv('churn.csv')
# too see max columns
pd.set_option('display.max_columns',None)
# print dataframe
df

In this dataset there are 7043 rows and 21 columns are present. There are some categorical

and some numerical columns present.

Preprocess Dataset

Now it’s time to preprocess the data, firstly we will observe the dataset, this means we have

to see the data types of the columns, other functionalities, and parameters of each column.

34
First, we check the dataset information using the info() method

df.info()

You can see that the datatypes of each column, number of rows present with non-null values,

there are 2 int, 1 float, and remaining are string datatype columns.

Second, we check the description of the dataset, here we will only visible the num variables

functionalities. we will use describe() method.

df.describe()

35
Here you can see that describe() method only describe the functionalities of a numerical

variable. From this, we can easily conclude the parameters of each column.

Now we drop unwanted features from our dataset because these unwanted features are like

the garbage they will affect our model accuracy so we drop it.

# we didn't require customerID so we drop it


df = df.drop('customerID',axis=1)

We drop customerID because it has no meaning in the dataset and we can easily

differentiate each customer using indices of the rows. By dropping this column or

dataset should be now ready to process.

When we note the TotalCharges column then we found that it’s a data type of an object but it

even would be float. so we have to typecast this column.

#count of string value into the column.


count=0
for i in df.TotalCharges:
if i==' ':
count+=1
print('count of empty string:- ',count)
#we will replace this empty string to nan values
df['TotalCharges'] = df['TotalCharges'].replace(" ",np.nan)
36
# typecasting of the TotalCharges column
df['TotalCharges'] = df['TotalCharges'].astype(float)

After printing, we found that 11 rows contain” ” empty string which will affect the datatype of

the column, so we convert this into nan values and typecast into float64.

So, now TotalCharges has 11 null values, we have to fill it. let’s do it.

Checking Null Values in Customer Churn Data

Null values badly affect our model performance because, these null values are irreverent in

nature they are misplaced in the dataset so we have to remove them and replace them with

other values if null values are less, but if it was present in large quantity then we just drop it.

Now we have to check for null values, for this, we use the pandas IsNull() method

which will give True if the null value is present and False when there are no null

values.

# checking null value


df.isnull().sum()

# fill null values with mean


df['TotalCharges'] = df['TotalCharges'].fillna(df['TotalCharges'].mean())

37
To handle null values we fill null values of the TotalCharges column with the mean of

the TotalCharges column.

Now we will extract the numerical and categorical columns from the dataset for further

processes.

#numerical variables

num = list(df.select_dtypes(include=['int64','float64']).keys())

#categorical variables

cat = list(df.select_dtypes(include='O').keys())

print(cat)

print(num)
['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',
'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
'PaperlessBilling', 'PaymentMethod', 'Churn']
['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges']

Here we create the num variable for numerical columns and cat for the categorical columns

Now we see the value counts of each category in each categorical column.

# value_counts of the categorical columns


for i in cat:
print(df[i].value_counts())
# as we see that there is extra categories which we have to convert it into No.
df.MultipleLines = df.MultipleLines.replace('No phone service','No')
df.OnlineSecurity = df.OnlineSecurity.replace('No internet service','No')
df.OnlineBackup = df.OnlineBackup.replace('No internet service','No')
df.DeviceProtection = df.DeviceProtection.replace('No internet service','No')
df.TechSupport = df.TechSupport.replace('No internet service','No')
df.StreamingTV = df.StreamingTV.replace('No internet service','No')
df.StreamingMovies = df.StreamingMovies.replace('No internet service','No')

38
On observation we found that there are multiple columns were having some irrelevant

categories, so we have to just convert it into a useful manner. For we change the “No Phone

Service” category into the “No” category and we do it for all the columns where this “No

Phone Service” is present.

Handling categorical Variables in Customer Churn Data

So, here we have to handle categorical columns, handle means we have to convert

categorical values into numerical values because while the training model dataset contains all

the numerical values categories won’t w accept.

# we have to handel this all categorical variables


# there are mainly Yes/No features in most of the columns
# we will convert Yes = 1 and No = 0
for i in cat:
df[i] = df[i].replace('Yes',1)
df[i] = df[i].replace('No',0)

On observing the count values of the dataset then we found that there are NO and YES are

present, so we have to convert it into 1 and 0 which will be easy to process. For all categorical

variables, we replace Yes with 1 and No with 0.

# we will convert male = 1 and female = 0


39
df.gender = df.gender.replace('Male',1)
df.gender = df.gender.replace('Female',0)

In the gender column, we replace Male with 1 and Female with 0.

Now we importing LabelEncoder from the sklearn which will decode categorical values into

numeric ones.

from sklearn.preprocessing import LabelEncoder


label = LabelEncoder()
df['InternetService'] = label.fit_transform(df['InternetService'])
df['Contract'] = label.fit_transform(df['Contract'])
df['PaymentMethod'] = label.fit_transform(df['PaymentMethod'])

You can see that all the categorical columns are now typed cast into the numerical values.

The handling of categorical columns is over now we have to scale our data because there are

some columns present where values are much larger which will affect the runtime of the

program so we will convert bigger values into smaller ones.

scale_cols = ['tenure','MonthlyCharges','TotalCharges']
# now we scling all the data
from sklearn.preprocessing import MinMaxScaler
scale = MinMaxScaler()
df[scale_cols] = scale.fit_transform(df[scale_cols])

40
scale_cols contain that columns which are having large numerical values, and with

MinMaxScaler we will scale it into values between -1 to 1.

Independent and Dependent Variables

This is an important step into the model-building part we have to separate all the columns

which are important or by which target values are predicted with the target values which e

have to predict.

Now we start our model training process, first, we have to divide our dataset into dependent

and independent variables.

# independent and dependent variables


x = df.drop('Churn',axis=1)
y = df['Churn']

X contains an independent variable that is independent, Y contains a dependent variable

which is to target variable. All the columns except Churn are present in the X variable and

Churn is present in the Y variable.

Splitting data

This is the important part is we have to split our data into training and testing parts by which

we do further processes.

Now we have to split our dataset into train and test sets, where the training set is used to train

the model, and the testing set is used for testing the values of targeted columns.

from sklearn.model_selection import train_test_split


xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size=0.2,random_state=10)
print(xtrain.shape)
print(xtest.shape)

Output:- (5634, 19) (1409, 19)

41
We have just imported the train_test_split() method from the sklearn and we set some

parameters where testing size was 30% and the remaining 70% considered as training data.

42
Building Neural Network for Customer Churn Data

Now all our preprocessing and splitting part is our, its time for building the neural network, we

will use TensorFlow and Keras library for building the artificial neural net.

Firstly we have to import these important libraries for further processes.

# now we create our artificial neural net.


# import tensorflow
import tensorflow as tf
#import keras
from tensorflow import keras

Tensorflow is used for multiple tasks but has a particular focus on the training and inference

of deep neural networks and Keras acts as an interface for the TensorFlow library.

Define Model

Now we have to define our model, which means we have to set the parameters and layers of

the deep neural network which will be used for training the data.

# define sequential model


model = keras.Sequential([
# input layer
keras.layers.Dense(19, input_shape=(19,), activation='relu'),
43
keras.layers.Dense(15, activation='relu'),
keras.layers.Dense(10,activation = 'relu'),
# we use sigmoid for binary output
# output layer
keras.layers.Dense(1, activation='sigmoid')
]
)

Here we define sequential model, in the sequential model the input, hidden and output layers

are connected into the sequential manner, here we define one input layer which contains all

19 columns as an input, second and third layer is hidden layers which contain 15, 10 hidden

neurons and here we apply RelU activation function. Our last layer is the output layer, as our

output is in the form of 1 and 0 so, we will use the sigmoid activation function.

Now we compile our sequential model and fit the training data into our model.

Compile the Customer Churn Model

The compilation of the model is the final step of creating an artificial neural model. The

compile defines the loss function, the optimizer, and the metrics which we have to give into

parameters.

Here we use compile method for compiling the model, we set some parameters into the

compile method.

# time for compilation of neural net.


model.compile(optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
# now we fit our model to training data
model.fit(xtrain,ytrain,epochs=100)

We fit the training data and set the epochs into the model and in each epoch, our model tries

to get better accuracy.

44
Now we evaluate our model by this we can observe the summary of the model.

# evalute the model


model.evaluate(xtest,ytest)

As above we are performing scaling on the data, that’s why our predicted values are scaled

so we have to unscale it into normal form for this we write the following program.

45
#

46
3.3 Strengths and Limitation

In this section, the strength and limitation of the Design and Methodology are
discussed in brief.

The feature selection was used which eliminates the irrelevant features and thus it
helped in improving the performance of the model. This also helped to reduce the
training time and avoid overfitting. Another strength was that in this research the
customer’s age, gender and area were also considered and they were the prominent
predictors of identifying the customer churn. The main strength was that in this
research CU members data were used to determine Customer Churn prediction and
there was not much research performed in this area.

The main limitation of the research was that the data was very imbalanced and due to
that the classifiers were more likely to be biased towards the majority class. SMOTE
sampling technique was used to overcome this issue. Also, there were so many Date
time datatype variables present in the dataset which were not taken into consideration
in this research. A time-series model can be built for utilizing the Datetime data type
variables. In this research, a single snap of data was used so it was difficult to build a
time-series model for prediction. For time-series modelling different sets of data with
the proper date was required.

47
orical

3.4 Strengths of the Research

The main strength of the research was its ability to precisely identify the customer
churn. The results suggest that the Random Forest model was the best predictor of CU
customer data when compared to Logistic Regression, SVM and Neural Network.
Another strength in this research was that the customer’s age, gender and area were

48
also considered, and they were the prominent predictors of identifying the customer
churn. The main strength of the research was CU members data was used to determine
Customer Churn prediction and there was not much research performed in this area.
Feature selection was used in this research which has increased the accuracy of the
models.

3.5 Limitations of the Research

The main limitation of the research was that the data was very imbalanced and due to
that the classifiers were more likely to be biased towards the majority class. Supervised
machine learning models have performed well with an imbalanced dataset as compared
to the balanced dataset. Also, there were so many Date time datatype variables present
in the dataset which were not taken into consideration in this research. Time series
model was not supported in this dataset. Another limitation was that the customers’
data of only one CU was used for this research so it cannot be the representative of the
other CU financial institutions. The customer base would be different for different CU
institutions.

49
4. CONCLUSION

This chapter gives an overview of the research carried out. It summarises the results of
the experiment performed in predicting customer churn. The chapter summarises the
outcome of our research and derives proper interpretation from them. It summarises the
finding with respect to the research question which was set at the beginning of the
research: “Which supervised machine learning: Logistic regression, Random
forest, SVM or Neural network; can best predict the customer churn of CU with
the best accuracy, specificity, precision and recall?”

4.1 Research Overview

The goal of this research was to examine the predictive power of Supervised Machine
learning algorithms on CU customer dataset in predicting customer churn. CU is a
financial institution which is owned by its members and it is growing because of its
reasonable rate of interest, as discussed in the literature review. The four Supervised
machine learning algorithms were examined, Logistic Regression, Random Forest,
Support Vector Machine and Neural Network to predict whether the member will be
churned or retained with the institute. These four models were chosen for this research
based on previous research.

The supervised machine learning models aimed to predict whether the customer of CU
will churn or not. Many previous papers were dealt with the customer churn problem
of financial institutions like the bank, telecom industry. The papers reviewed for this
project did not cover CU customer data for churn prediction.

The main objective was to identify the supervised machine learning model with the
best accuracy in predicting the customer churn. Chapter two described previous
research carried out in this area, the various techniques and approaches applied to solve
the problem. Chapter three detailed the method and design approach adopted in the
current research to solve the problem. Chapter four outlined the implementation of the
models. Chapter five outlined the result analysis of the four supervised machine
learning models and compare their performance. The accuracy measure was considered
as the evaluation metrics to get the best model. It was found that the Random Forest

50
technique outperformed the other algorithms for both the experiments performed one
with imbalance dataset and others with balanced dataset using SMOTE technique. In
this research, it was found that all the models have performed better in the imbalanced
dataset on the contrary to the previous research on imbalance dataset. Therefore, the
alternative hypothesis was accepted that a random forest supervised machine learning
model build using the CU customer data, will achieve high accuracy (97%) than the
other supervised machine learning algorithms like Logistic Regression, Support Vector
Machine and Neural Network, to predict the customer churn.

4.2 Problem Definition

Customer Churn calculation and monitoring are very important in all sectors of an
industry because it is far cheaper to retain old customers than to acquire new ones. CU
is a financial institution owned by its members so churn prediction will be helpful for
them to try to retain their existing members.

The literature review confirmed that many supervised machine learning techniques
have been evaluated in the research area to predict customer churn. The SVM and
Random Forest techniques were seen to be performed with good results for customer
churn prediction in previous research.

This research aimed to determine which supervised machine learning algorithm would
be best in predicting the customer churn on CU member dataset.

Currently, the CU has not adopted any techniques to identify the members who were
likely to leave the institution. Adopting machine learning technique in building and
evaluating the supervised machine learning techniques for CU member dataset has
contributed to gain further insight into the members and helps the CUs to know the
churn prediction.

4.3 Design, Evaluation and Results

The design of this project mostly followed the CRISP-DM methodology. As


mentioned previously, the data was provided by one of the CU financial institutions. It
contained information about their customer base.

51
The first step was data exploration, data cleaning. Variables with missing values more
than 60% were not considered in the final dataset. Variables with 2% to 30% missing
values were imputed with mean values for continuous variables and mode value for
categorical variables. Feature Selection was also performed using a correlation matrix
and using Extra tree classifier algorithm. The normal distribution of the continuous
variables was also identified using the histogram and by measuring skew and kurtosis.
Label encoding was performed as for machine learning models string data type was not
accepted.

Finally, the data was divided into train and test data sets with 80% of data as training
and the remaining 20% as test datasets. Four supervised machine learning models
Logistic Regression, Random Forest, SVM and Neural Network were built on the
training data set. All the models were trained on same train dataset picked up randomly
and iterated for 40 times using for loop. They were tested on the test dataset. These
models were evaluated based on accuracy as the evaluation metrics. A ROC curve was
also plotted for the best model. In this research, Random Forest was the best in
predicting the customer churn for CU dataset.

This experiment was performed twice one for imbalanced dataset and the other for a
balanced dataset using SMOTE sampling technique.

The Random Forest model provided the highest accuracy of 97% in imbalance dataset
and 96% accuracy in a balanced dataset. Even the precision and recall percentage was
better for the random forest as compared to other models. All models performed with
better accuracy on imbalanced dataset instead of a balanced dataset in this research.
However, this was not true in the previous research papers.

4.4 Contributions and Impact

Most of the customer churn prediction literature were performed on telecom or bank or
app dataset. In this research, the CU dataset was used which is unique. There is not
much research done on CU dataset to identify the customer churn prediction. The
research about CU churn prediction contributes literature for future research.

From the business point of view, it could be helpful for the institution to know the most
likely to leave members. They can increase customer retention. Moreover, retaining an

52
old customer is more beneficial financially than getting new customers. Also, in CU as
the members own the institute so it is hard for the institute to get the trustworthy
members, so it is important for the CU to retain their old customers.

4.5 Future Work and Recommendations

Some future work identified throughout the project, which may be carried out. Here in
this research only one branch of the dataset was explored and analysed. In future,
another branch of CU dataset can be explored. Further research is needed to handle
datetime data type variables.

The four machine learning techniques were used in this project on the CU dataset.
Further other techniques can be explored as well. Different machine learning
algorithms can be explored, and data can be analysed.

Further research can be done to build the time-series model to predict customer churn.

Also, there is a scope of using clustering unsupervised machine learning technique to


examine the data. The similarities in data or some patterns can be determined using this
technique.

53
BIBLIOGRAPHY

Ahmed, A., Maheshware, D. (2017). Churn prediction on huge telecom data using hybrid
firefly based classification. Egyptian Informatics Journal,18(3) , 215-220.
doi.org/10.1016/j.eij.2017.02.002
Ali, O., Ariturk,U. (2014). Dynamic churn prediction framework with more effective use of
rare event data: The case of private banking. Expert Systems with Applications,
41(17).7889-7903. doi.org/10.1016/j.eswa.2014.06.018
Aliyu, A., Kasim, R., Martin, D. (2011). Impact of Violent Ethno-Religious Conflicts on
Residential Property Value Determination in Jos Metropolis of Northern Nigeria:
Theoretical Perspectives and Empirical Findings. Modern Applied Science, 5(5), 171-
183. doi:10.5539/mas.v5n5p171
Alwis, P., Kumara, B., Hapuarachchi, H. (2018). Customer Churn Analysis and Prediction in
Telecommunication for Decision Making. International Conference on Business
Innovation. 40-45. doi.org/10.1016/0305-0548(93)90063-O
Amin, A., Obeidat,F.,Shah,B., Adnan,A., Loo, J., Anwar,S. (2019).Customer churn prediction
in telecommunication industry using data certainty. Journal of Business Research,94.
290-301. doi.org/10.1016/j.jbusres.2018.03.003
Bin, L., Peiji, S., & Juan, L. (2007). Customer Churn Prediction Based on the Decision Tree in
Personal Handyphone System Service. 2007 International Conference On Service
Systems And Service Management. doi: 10.1109/icsssm.2007.4280145
Bin,L., Peiji,S., Juan,L. (2007).Customer Churn Prediction Based on the Decision Tree in
Personal Handyphone System Service. International Conference on Service Systems
and Service Management, 687- 696. DOI: 10.1109/ICSSSM.2007.4280145
Borrego, M., Douglas, E., & Amelink, C. (2009). Quantitative, Qualitative, and Mixed
Research Methods in Engineering Education. Journal Of Engineering
Education, 98(1), 53-66. doi: 10.1002/j.2168-9830.2009.tb01005.x
Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new
perspective. Neurocomputing, 300, 70–79. doi.org/10.1016/j.neucom.2017.11.077
Dalvi, P., Khandge, S., Deomore, A., Bankar, A., & Kanade, V. (2016). Analysis of customer
churn prediction in telecom industry using decision trees and logistic regression. doi:
10.1109/cdan.2016.7570883
doi.org/10.1016/j.ejor.2011.09.031

54
F.Y, O., J.E.T, A., O, A., J. O, H., O, O., & J, A. (2017). Supervised Machine Learning
Algorithms: Classification and Comparison. International Journal Of Computer
Trends And Technology, 48(3), 128-138. doi: 10.14445/22312803/ijctt-v48p1262016
Symposium On Colossal Data Analysis And Networking (CDAN).
Fabris, F., Magalhães, J., & Freitas, A. (2017). A review of supervised machine learning
applied to ageing research. Biogerontology, 18(2), 171-188. doi: 10.1007/s10522-017-
9683-y
Ganganwar, V. (2012). An overview of classification algorithms for imbalanced
datasets. International Journal of Emerging Technology and Advanced Engineering,
2(4). 42-47.
Gordini,N., Veglio, V.(2017). Customers churn prediction and marketing retention strategies.
An application of support vector machines based on the AUC parameter-selection
technique in B2B e-commerce industry. Industrial Marketing Management, 62,100-
107. doi.org/10.1016/j.indmarman.2016.08.003
Guo-en, X., Wei-dong, J.(2008). Model of Customer Churn Prediction on Support Vector
Machine. SETP Journal Title, 28(1), 71-77. doi.org/10.1016/S1874-8651(09)60003-X
Hadden, J., Tiwari, A., Roy, R., Ruta, D.(2005).Computer assisted customer churn
management:State-of-the-art and future trends. Computers & Operations Research
34(10), 2902-2917. doi.org/10.1016/j.cor.2005.11.007
He,B., Shi,Y., Wan, Q., Zhao, X. (2014). Prediction of Customer Attrition of Commercial
Banks based on SVM Model. Procedia Computer Science,31. 423-430.
doi.org/10.1016/j.procs.2014.05.286
Huang, B. Q., Kechadi, T., Buckley, B.,Keirnan, G.,Keogh, E., Rashid, T. (2010). A new
feature set with new window techniques for customer churn prediction in land-line
telecommunications. Expert Systems with Applications, 37(5). 3657-3665.
doi.org/10.1016/j.eswa.2009.10.025
Idris, A., Rizwan, M., & Khan, A. (2012). Churn prediction in telecom using Random Forest
and PSO based data balancing in combination with various feature selection
strategies. Computers & Electrical Engineering, 38(6), 1808–1819.
doi.org/10.1016/j.compeleceng.2012.09.001
Jahromi,A., Stakhovych,S., Ewing,M. (2014). Managing B2Bcustomer churn, retention and
profitability. Industrial Marketing Management,43(7).1258-1268.
doi.org/10.1016/j.indmarman.2014.06.016

55
Kang, H. (2013). The prevention and handling of the missing data. Korean Journal Of
Anesthesiology, 64(5), 402-409. doi: 10.4097/kjae.2013.64.5.402
Kaya, E., Dong, X., Suhara,Y., Balsicoy, S., Bozkaya, B., Pentland, A. (2018). Behavioral
Attributes and Financial Churn Prediction. EPJ Data Science, 7(1), 1-18.
doi.org/10.1140/epjds/s13688-018-0165-5
Kelleher, J., Mac Namee, B., & D’Arcy, A. (2015). Fundamentals of machine learning for
predictive data analytics: algorithms, worked examples, and case studies. Cambridge,
Massachusetts: The MIT Press, 2015.
Kelleher, J., Mac Namee, B., & D'Arcy, A. (2015). Fundamentals of machine learning for
predictive data analytics. Cambridge (Mass.): The MIT Press.
Khan, M. R., Manoj, J., Singh, A., & Blumenstock, J. (2015). Behavioral Modeling for Churn
Prediction: Early Indicators and Accurate Predictors of Custom Defection and
Loyalty. IEEE International Congress on Big Data. 1-4,
doi.org/10.1109/bigdatacongress.2015.107
Kim, M., Park, M., & Jeong, D. (2004). The effects of customer satisfaction and switching
barrier on customer loyalty in Korean mobile telecommunication
services. Telecommunications Policy, 28(2), 145-159. doi:
10.1016/j.telpol.2003.12.003
Kim, S., Shin, K., & Park, K. (2005). An Application of Support Vector Machines for
Customer Churn Analysis: Credit Card Case. Lecture Notes In Computer Science, 636-
647. doi: 10.1007/11539117_91
KORKMAZ, M., GÜNEY, S. and YİĞİTER, Ş. (2012). The importance of logistic regression
implementations in the Turkish livestock sector and logistic regression
implementations/fields. Harran University, 16(2), 25-36.
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A
review. GESTS International Transactions On Computer Science And Engineering, 30,
1-12.
Kumar, D., & Ravi, V. (2008). Predicting credit card customer churn in banks using data
mining. International Journal Of Data Analysis Techniques And Strategies, 1(1), 4.
doi: 10.1504/ijdats.2008.020020
Lee, H.,Lee, Y.,Cho, H., Im, K.,Kim,Y. (2017). Mining churning behaviors and developing
retention strategies based on a partial least squares (PLS) model. Decision Support
Systems 52(1), 207-216.doi.org/10.1016/j.dss.2011.07.005

56
Maheshwari, S., Jain, R.C.., & Jadon, R.S.. (2017). A Review on Class Imbalance Problem:
Analysis and Potential Solutions. International Journal Of Computer Science
Issues, 14(6), 43-51. doi: 10.20943/01201706.4351
Malhotra, K. (2007). Marketing research – An applied orientation (5th Edn ed.). New Jersey:
Pearson Education.
Manjupriya, R. and Poornima, A. (2018). Customer Churn Prediction in the Mobile
Telecommunication Industry Using Decision Tree Classification Algorithm. Journal of
Computational and Theoretical Nanoscience,15(9).2789-2793.
doi.org/10.1166/jctn.2018.7540
Miao, J., & Niu, L. (2016). A Survey on Feature Selection. Procedia Computer Science, 91,
919-926. doi: 10.1016/j.procs.2016.07.111
Mukaka, M. (2012). Statistics Corner: A guide to appropriate use of Correlation coefficient in
medical research. Malawi Medical Journal: The Journal Of Medical Association Of
Malawi, 24(3), 69-71.
Nashwan, S., & Hassan, H. (2017). Impact of customer relationship management (CRM) on
customer satisfaction and loyalty: A systematic review. Journal Of Advanced Research
In Business And Management Studies, 6(1), 86-107. Retrieved from
https://www.researchgate.net/publication/318206357_Impact_of_customer_relationshi
p_management_CRM_on_customer_satisfaction_and_loyalty_A_systematic_review
Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by
logistic regression and decision tree. Expert Systems With Applications, 38(12), 15273-
15285. doi: 10.1016/j.eswa.2011.06.028.
Oyeniyi, A. O., Adeyemo, A. B. (2015). Customer Churn Analysis In Banking Sector Using
Data Mining Techniques. African Journal of Computing and ICT, 8(3), 165 - 174.
10.1109/IWBIS.2019.8935884
Poel, D., Lariviere, B. (2004). Customer attrition analysis for financial services using
proportional hazard models. European Journal of Operational Research, 196-217.
doi.org/10.1016/S0377-2217(03)00069-9
Pretorius, A., Bierman, S., & Steel, S. (2016). A meta-analysis of research in random forests
for classification. 2016 Pattern Recognition Association Of South Africa And Robotics
And Mechatronics International Conference (PRASA-Robmech)., 1-6. doi:
10.1109/robomech.2016.7813171
Pretorius, A., Bierman, S., & Steel, S. J. (2016). A meta-analysis of research in random forests
for classification. Pattern Recognition Association of South Africa and Robotics and

57
Mechatronics International Conference (PRASA-RobMech).1-10,
doi.org/10.1109/robomech.2016.7813171
Saunders, M., Lewis, P., Thornbill, A. (2009). Research Methods for Business Students (5th
Edn ed.). England: Pearson Education.
Sayed, H., A., M., & Kholief, S. (2018). Predicting Potential Banking Customer Churn using
Apache Spark ML and MLlib Packages: A Comparative Study. International Journal
of Advanced Computer Science and Applications, 9(11).
doi.org/10.14569/ijacsa.2018.091196
Senanayake, D., Muthugama, L., Mendis, L., & Madushanka, T. (2015). Customer Churn
Prediction: A Cognitive Approach. World Academy Of Science, Engineering And
Technology International Journal Of Computer And Information Engineering, 9(3),
767-773. doi:org/10.5281/zenodo.1100190
Shaaban, E., Helmy, Y., Khedr, A., & Nasr, M. (2012). A Proposed Churn Prediction
Model. International Journal Of Engineering Research And Applications
(IJERA), 2(4), 693-697.
Sharma, A., & Kumar Panigrahi, P. (2011). A Neural Network based Approach for Predicting
Customer Churn in Cellular Network Services. International Journal of Computer
Applications, 27(11), 26–31. doi.org/10.5120/3344-4605
Singh, A., Thakur, N., & Sharma, A. (2016). A review of supervised machine learning
algorithms. 2016 3rd International Conference on Computing for Sustainable Global
Development (INDIACom), 1310-1315.
Subramanian, V., Hung, M., Hu, M.(1992). An Experimental Evaluation of Neural Network
for Classification. Computers & Operations Research, 20(7).769-
782.doi.org/10.1016/0305-0548(93)90063-O
Tian, Y., Shi, Y., & Liu, X. (2012). Recent Advances On Support Vector Machines
Research. Technological and Economic Development of Economy, 18(1), 5–33.
doi.org/10.3846/20294913.2012.661205
Tsai, C., Lu, Y. (2009). Customer churn prediction by hybrid neural networks. Expert Systems
with Applications, 36(10), 12547- 12553. doi.org/10.1016/j.eswa.2009.05.032
Umayaparvathi, V., & Iyakutti, K. (2012). Applications of Data Mining Techniques in
Telecom Churn Prediction. International Journal Of Computer Applications, 42(20),
5-9. doi: 10.5120/5814-8122

58
Vafeiadis, T., Diamantaras, K., Chatzisavvas, K., Sarigiannidis, G. (2015). A comparison of
machine learning techniques for customer churn prediction. Simulation Modelling
Practice and Theory, 55, 1-9, doi: 10.1016/j.simpat.2015.03.003.
Van den Poel, D., & Larivière, B. (2004). Customer attrition analysis for financial services
using proportional hazard models. European Journal Of Operational Research, 157(1),
196-217. doi: 10.1016/s0377-2217(03)00069-9
Verbeke, W., Dejaeger,K., Martens, D., Hur,J.,Baesens, B.(2012). New insights into churn
prediction in the telecommunication sector: A profit driven data mining approach.
European Journal of Operational Research 218(1). 211-229.
Wieringa, R., Maiden, N., Mead, N., & Rolland, C. (2005). Requirements engineering paper
classification and evaluation criteria: a proposal and a discussion. Requirements
Engineering, 11(1), 102-107. doi: 10.1007/s00766-005-0021-6

Wirth, R., Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining.
Proceedings of the Fourth International Conference on the Practical Application of
Knowledge Discovery and Data Mining
Xie, Y., Li, X., Ngai, E., & Ying, W. (2009). Customer churn prediction using improved
balanced random forests. Expert Systems With Applications, 36(3), 5445-5449. doi:
10.1016/j.eswa.2008.06.121
Zhang, S., Zhang, C., Yang, Q. (2003). Data Preparation for Data Mining. Applied Artificial
Intelligence,17,375–381.DOI:10.1080/08839510390219264
Zorich, A. (2018). Predicting Customer Churn In Banking Industry Using Neural Networks.
Interdisciplinary Description of Complex Systems, 14. 116-124.
https://doi.org/10.7906/indecs.14.2.1

59
60

You might also like