100% found this document useful (2 votes)
248 views

Bank Marketing Data

The document discusses marketing data from a Portuguese bank. [1] Various classification models were applied including KNN, logistic regression, decision trees, and random forest. [2] The random forest model achieved the best performance with 91.6% accuracy. [3] However, the conclusion was that the telemarketing strategy was not effective and a different approach should be used.

Uploaded by

sanju
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
248 views

Bank Marketing Data

The document discusses marketing data from a Portuguese bank. [1] Various classification models were applied including KNN, logistic regression, decision trees, and random forest. [2] The random forest model achieved the best performance with 91.6% accuracy. [3] However, the conclusion was that the telemarketing strategy was not effective and a different approach should be used.

Uploaded by

sanju
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

BANK MARKETING DATA

BY
Kotha Bala Venkata Naga
PavanKumar
Problem Statement

The data is related with direct marketing campaigns of a Portuguese


banking institution. The marketing campaigns were based on phone calls.
Often, more than one contact to the same client was required, in order to
access if the product (bank term deposit) would be ('yes') or not ('no')
subscribed. It is to be predicted whether a customer will subscribe to the
product or not.
Method of approach

• Study the given data


• Apply data cleaning methods
• Apply various classification models
• Test the designed model’s working
• Draw conclusions from the developed model
• Predict whether the plan (product) will be subscribed by the customer or
not.
Data cleaning

• Imputer – numerical data / continuous data


• Mode operation
• Mean operation

• Replace function – categorical data


Classification Models

• kNN classification model


• Logistic Regression
• Support Vector Classification
• Decision Tree Classification
• Random Forest Classification
KNN classification model

• The algorithm is simple and easy to implement.


• There’s no need to build a model, tune several parameters, or make
additional assumptions.
• The algorithm is versatile. It can be used for classification, regression, and
search (as we will see in the next section).
• The algorithm gets significantly slower as the number of examples and/or
predictors/independent variables increase.
Logistic Regression

• logistic regression does work better when you remove attributes that are
unrelated to the output variable as well as attributes that are very similar
(correlated) to each other
• logistic regression model is that the interpretation is more difficult because
the interpretation of the weights is multiplicative and not additive.
• The problem of complete separation can be solved by introducing
penalization of the weights or defining a prior probability distribution of
weights.
Decision tree classification

• Decision Trees are a type of Supervised Machine Learning (that is you


explain what the input is and what the corresponding output is in the
training data) where the data is continuously split according to a certain
parameter. The tree can be explained by two entities, namely decision
nodes and leaves
Diagram
Random forest

• Random forests or random decision forests are an ensemble learning


method for classification, regression and other tasks that operates by
constructing a multitude of decision trees at training time and outputting
the class that is the mode of the classes (classification) or mean prediction
(regression) .
Advantage:

• As we mentioned earlier a single decision tree tends to overfit the data. The process of
averaging or combining the results of different decision trees helps to overcome the
problem of overfitting.
• Random forests also have less variance than a single decision tree. It means that it
works correctly for a large range of data items than single decision trees.
• Random forests are extremely flexible and have very high accuracy.
• They also do not require preparation of the input data. You do not have to scale the
data.
• It also maintains accuracy even when a large proportion of the data are missing.
Disadvantage

• The main disadvantage of Random forests is their complexity. They are


much harder and time-consuming to construct than decision trees.
• They also require more computational resources and are also less intuitive.
When you have a large collection of decision trees it is hard to have an
intuitive grasp of the relationship existing in the input data.
• In addition, the prediction process using random forests is time-consuming
than other algorithms.
Conclusion from model

• Model – Random Forest Classification


• Accuracy - 91.6%
• Recall score - 51.71
• Roc score - 74.13
• Confusion matrix [[ 7082 253 ]
[ 436 467 ]]
Conclusion

• Tele marketing strategy is not giving the expected results


• Investment is going in vain
• Should go ahead with a different marketing strategy

You might also like