0% found this document useful (0 votes)

43 views19 pages

Project Presentation

This document presents a predictive modeling project aimed at analyzing factors influencing customers' decisions to subscribe to term deposits and developing a model to predict the likelihood of subscription. The project outlines the data collection and preprocessing steps, feature engineering approaches, model development and evaluation, and interpretation of results. The findings provide insights to help financial institutions optimize marketing strategies and improve subscription rates.

Uploaded by

Akash Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views19 pages

Project Presentation

Uploaded by

Akash Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

[Akash kumar] [Dev Bhoomi Uattarakhand University] [Date- 15/06/23]

[Predicting Whether The Customer Will Subscribe To Term Deposit.]

Abstract: This project report aims to analyze the factors influencing a customer's decision to
subscribe to a term deposit and develop a predictive model to forecast the likelihood of
subscription. The report outlines the data collection process, data preprocessing techniques,
feature engineering approaches, model development, evaluation metrics, and interpretation of
results. The project findings provide insights for financial institutions to optimize their marketing
strategies and improve the subscription rate.

Problem Statement

Business Use Case

There has been a revenue decline for a Portuguese bank and they would like to know what
actions to take. After investigation, they found out that the root cause is that their clients are not
depositing as frequently as before. Knowing that term deposits allow banks to hold onto a
deposit for a specific amount of time, so banks can invest in higher gain financial products to
make a profit. In addition, banks also hold better chance to persuade term deposit clients into
buying other products such as funds or insurance to further increase their revenues. As a result,
the Portuguese bank would like to identify existing clients that have higher chance to subscribe
for a term deposit and focus marketing efforts on such clients.

Data Science Problem Statement

Predict if the client will subscribe to a term deposit based on the analysis of the marketing
campaigns the bank performed.

Evaluation Metric
We will be using ROC-AUC for evaluation.

Objective of this template notebook

The main objective of this template is to take you through the entire working pipeline that you
may follow while appraoching a Machine Learning problem.

We will be defining a task to be performed and write the code to solve the task.

The tasks performed below should serve as a good guide regarding the steps that you
should go about a Machine Learning Problem. But kindly do not restrict yourself to only
the tasks that have been performed in this notebook and feel free to bring your ideas,skills
and strategies and implement them as well.
Word of caution
This template is just an example of a data-science pipeline, every data science problem is unique
and there are multiple ways to tackle them. Go through this template and try to leverage the
information in this while solving your hackathon problems but you may not be able to use all the
functions created here.

Understanding the dataset

Data Set Information

The data is related to direct marketing campaigns of a Portuguese banking institution. The
marketing campaigns were based on phone calls. Often, more than one contact to the same client
was required, in order to access if the product (bank term deposit) would be subscribed ('yes') or
not ('no') subscribed.

There are two datasets: train.csv with all examples (32950) and 21 inputs including the target
feature, ordered by date (from May 2008 to November 2010), very close to the data analyzed in
[Moro et al., 2014]

test.csv which is the test data that consists of 8238 observations and 20 features without the
target feature

Goal:- The classification goal is to predict if the client will subscribe (yes/no) a term deposit
(variable y).

Features

Feature Feature Type Description

age numeric age of a person

type of job ('admin.','blue-

Categorical,
job collar','entrepreneur','housemaid','management','retired','self-
nominal
employed','services','student','technician','unemployed','unknown')

categorical, marital status ('divorced','married','single','unknown'; note:

marital
nominal ‘divorced' means divorced or widowed)
Feature Feature Type Description

educatio categorical, ('basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.c

n nominal ourse','university.degree','unknown')

categorical,
default has credit in default? ('no','yes','unknown')
nominal

categorical,
housing has housing loan? ('no','yes','unknown')
nominal

categorical,
loan has personal loan? ('no','yes','unknown')
nominal

categorical,
contact contact communication type ('cellular','telephone')
nominal

categorical,
month last contact month of year ('jan', 'feb', 'mar', ..., 'nov', 'dec')
ordinal

day_of_ categorical,
last contact day of the week ('mon','tue','wed','thu','fri')
week ordinal

last contact duration, in seconds . Important note: this attribute

duration numeric
highly affects the output target (e.g., if duration=0 then
y='no')

campaig number of contacts performed during this campaign and for this
numeric
n client (includes last contact)
Feature Feature Type Description

number of days that passed by after the client was last contacted
pdays numeric from a previous campaign (999 means client was not previously
contacted)

number of contacts performed before this campaign and for this

previous numeric
client

P categorical, outcome of the previous marketing campaign

outcome nominal ('failure','nonexistent','success')

Target variable (desired output):

Feature Feature_Type Description

y binary has the client subscribed a term deposit? ('yes','no')

Importing necessary libraries

The following code is written in Python 3.x. Libraries provide pre-written functionality to
perform necessary tasks.

In [1]:
import numpy as np
import pandas as
pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwar
nings('ignore')

Loading Data
Modelling
Libraries
We will use the popular scikit-learn library to develop our machine learning algorithms. In
sklearn, algorithms are called Estimators and implemented in their own classes. For data
visualization, we will use the matplotlib and seaborn library. Below are common classes to load.
In [2]:
from sklearn.preprocessing import LabelEncoder,MinMaxScaler,StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier ,RandomForestClassifier ,GradientBo
ostingClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import Ridge,Lasso
from sklearn.metrics import roc_auc_score ,mean_squared_error,accuracy_score,classification_r
eport,roc_curve,confusion_matrix
import warnings
warnings.filterwarnings('ignore')
from scipy.stats.mstats import winsorize
from sklearn.feature_selection import RFE
from sklearn.model_selection import train_test_split
pd.set_option('display.max_columns',None)
import six
import
sys
sys.modul
es['sklearn
.externals.
six'] = six

Data Loading and Cleaning

Applying vanilla models on the data
Since we have performed preprocessing on our data and also done with the EDA part, it is now
time to apply vanilla machine learning models on the data and check their performance.

Fit vanilla classification models

Since we have label encoded our categorical variables, our data is now ready for applying
machine learning algorithms.

There are many Classification algorithms are present in machine learning, which are used for
different classification applications. Some of the main classification algorithms are as follows-

 Logistic Regression
 DecisionTree Classifier
 RandomForest Classfier
The code we have written below internally splits the data into training data and validation data. It
then fits the classification model on the train data and then makes a prediction on the validation
data and outputs the scores for this prediction.

PREPARING THE TRAIN AND TEST DATA

In
[4]:
# Predictors
X = dataframe.iloc[:,:-1]

# Target
y = dataframe.iloc[:,-1]

# Dividing the data into train and test subsets

x_train,x_val,y_train,y_val = train_test_split(X,y,test_size=0.2,random_state=42)

FITTING THE MODEL AND PREDICTING THE VALUES

In [5]:
# run Logistic Regression model
model = LogisticRegression()
# fitting the model
model.fit(x_train, y_train)
# predicting the values
y_scores =
model.predict(x_val)

GETTING THE METRICS

TO CHECK OUR MODEL
PERFORMANCE
I
n

[
6
]
:
# getting the auc roc curve
auc = roc_auc_score(y_val, y_scores)
print('Classification Report:')
print(classification_report(y_val,y_scores))
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_val, y_scores)
print('ROC_AUC_SCORE is',roc_auc_score(y_val, y_scores))

#fpr, tpr,
0 _ =0.90
roc_curve(y_test,
0.98 predictions[:,1])
0.93 5798
1 0.50 0.17 0.26 792
plt.plot(false_positive_rate, true_positive_rate)
plt.xlabel('FPR')
accuracy 0.88 659
plt.ylabel('TPR') 0
plt.title('ROC
macro avg curve') 0.70 0.57 0.60 6590
plt.show()
weighted avg 0.85 0.88 0.85 6590
Classification
ROC_AUC_SCORE Report:is 0.5742166403601381
precision

recall f1-
score
support
The above two steps are combined and run in a single cell for all the remaining models
respectively
In [7]:
# Run Decision Tree Classifier
model = DecisionTreeClassifier()

model.fit(x_train, y_train)
y_scores = model.predict(x_val)
auc = roc_auc_score(y_val, y_scores)
print('Classification Report:')
print(classification_report(y_val,y_scores))
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_val, y_scores)
print('ROC_AUC_SCORE is',roc_auc_score(y_val, y_scores))

#fpr, tpr, _ = roc_curve(y_test, predictions[:,1])

plt.plot(false_positive_rate, true_positive_rate)
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC curve')
plt.show()
Classification Report:
precision

recall f1-
score
support
accuracy 0.87 6590
macro
0 avg 0.69 0.69 0.69 6590
0.93
0.93
0.93
5798
1
0.46
weighted avg 0.87 0.87 0.87 6590

ROC_AUC_SCORE is 0.6924298608715649

In [8]:
from sklearn import tree
from sklearn.tree import export_graphviz # display the tree within a Jupyter notebook
from IPython.display import SVG
from graphviz import Source
from IPython.display import
display
from ipywidgets import interactive, IntSlider, FloatSlider, interact
import ipywidgets
from IPython.display import Image
from subprocess import call
import matplotlib.image as mpimg
I
n

[
9
]
:
@interact
def plot_tree(crit=["gini", "entropy"],
split=["best", "random"],
depth=IntSlider(min=1,max=30,valu
e=2,
continuous_update=False),
min_split=IntSlider(min=2,max=5,value=2, continuous_update=False),
min_leaf=IntSlider(min=1,max=5,value=1, continuous_update=False)):

estimator = DecisionTreeClassifier(random_state=0,
criterion=crit, splitter =
split, max_depth =
depth,
print('Decision Tree Training Accuracy: {:.3f}'.format(accuracy_score(y_train, estimator.predi
ct(x_train))))
print('Decision Tree Test Accuracy: {:.3f}'.format(accuracy_score(y_val, estimator.predict(x_
val))))

graph = Source(tree.export_graphviz(estimator,
out_file=None,
feature_names=x_train.columns,
class_names=['0', '1'],
filled = True))

display(Image(data=graph.pipe(format='png'

))) return estimator

Decision Tree Training Accuracy: 0.896
Decision Tree Test Accuracy: 0.889

DecisionTreeClassifier(max_depth=2, random_state=0)
In [10]:
# run Random Forrest Classifier
model = RandomForestClassifier()

#fpr, tpr, _ = roc_curve(y_test, predictions[:,1])

plt.plot(false_positive_rate, true_positive_rate)
plt.xlabel('FPR')
plt.ylabel('TPR')
plt.title('ROC curve')
plt.show()
Classification Report:
precision

recall f1-
score
support
accuracy 0.90 6590
macro0 avg 0.78 0.66 0.70 6590
weighted avg
0.92 0.88 0.90 0.89 6590
0.97
ROC_AUC_SCORE
0.94 is 0.662138372340166
5798
1
0.64
0.35
0.45
792

Feature Selection
Now that we have applied vanilla models on our data, we now have a basic understanding of
what our predictions look like. Let's now use feature selection methods for identifying the best
set of features for each model.

Using RFE for feature selection

In this task let's use Recursive Feature Elimination for selecting the best features. RFE is a
wrapper method that uses the model to identify the best features.
 For the below task, we have inputted 8 feature. You can change this value and input the
number of features you want to retain for your model
In
[11]:
# Selecting 8 number of features
# selecting models
models = LogisticRegression()
# using rfe and selecting 8
features
rfe = RFE(models,8)
# fitting the model
rfe = rfe.fit(X,y)
# ranking features
feature_ranking = pd.Series(rfe.ranking_, index=X.columns)
plt.show()
print('Features to be selected for Logistic Regression model are:')
print(feature_ranking[feature_ranking.values==1].index.tolist())
print('===='*30)
Features to be selected for Logistic Regression model are:
['job', 'marital', 'education', 'housing', 'contact', 'day_of_week',
'campaign', 'poutcome']
===============================================
======================
===============================================
====
I
n

[
1
2
]
:
# Selecting 8 number of features
# Random Forrest classifier model
models = RandomForestClassifier()
# using rfe and selecting 8 features
rfe = RFE(models,8)
# fitting the model
rfe = rfe.fit(X,y)
# ranking features
feature_ranking = pd.Series(rfe.ranking_, index=X.columns)
plt.show()
print('Features to be selected for Random Forrest Classifier are:')
print(feature_ranking[feature_ranking.values==1].index.tolist())
print('===='*30)
Features to be selected for Random Forrest Classifier are:
['age', 'job', 'education', 'month', 'day_of_week', 'duration',
'campaign', 'poutcome']
===============================================
======================
In [13]:
# splitting the data into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=
y)
# selecting the data
rfc = RandomForestClassifier(random_state=42)
# fitting the data
rfc.fit(X_train, y_train)
# predicting the data
y_pred =
rfc.predict(X_test)
# feature importances
rfc_importances =
pd.Series(rfc.feature_i
mportances_,
index=X.columns).sort
_values().tail(10)
# plotting bar chart
according to feature
importance
rfc_importances.plot(kind='bar')
plt.show()

Observations :
We can test the features obtained from both the feature selection techniques by inserting these
features to the model and depending on which set of features perform better, we can retain them
for the model.

The Feature Selection techniques can differ from problem to problem and the techniques
applied for this problem may or may not work for the other problems. In those cases, feel
free to try out other methods like PCA, SelectKBest(), SelectPercentile(), tSNE etc.

Grid-Search & Hyperparameter Tuning

Hyperparameters are function attributes that we have to specify for an algorithm. By now, you
should be knowing that grid search is done to find out the best set of hyperparameters for your
model.
Grid Search for Random Forest
In the below task, we write a code that performs hyperparameter tuning for a random forest
classifier. We have used the hyperparameters max_features, max_depth and criterion for this
task. Feel free to play around with this function by introducing a few more hyperparameters and
chaniging their values

In
[14]:
# splitting the data
x_train,x_val,y_train,y_val = train_test_split(X,y, test_size=0.3, random_state=42, stratify=y)
# selecting the classifier
rfc = RandomForestClassifier()
# selecting the parameter
param_grid = {
'max_features': ['auto', 'sqrt',
'log2'],
'max_depth' : [4,5,6,7,8],
'criterion' :['gini', 'entropy']
}
# using grid search with respective parameters
grid_search_model = GridSearchCV(rfc, param_grid=param_grid)
# fitting the model
grid_search_model.fit(x_train, y_train)
# printing the best parameters
print('Best Parameters are:',grid_search_model.best_params_)
Best Parameters are: {'criterion': 'gini', 'max_depth': 8,
'max_features': 'log2'}

Applying the best parameters obtained using Grid Search on

Random Forest model
In the task below, we fit a random forest model using the best parameters obtained using Grid
Search. Since the target is imbalanced, we apply Synthetic Minority Oversampling (SMOTE) for
undersampling and oversampling the majority and minority classes in the target respectively.

Kindly note that SMOTE should always be applied only on the training data and not on the
validation and test data.

You can try experimenting with and without SMOTE and check for the difference in recall.

In [15]:
from sklearn.metrics import roc_auc_score,roc_curve,classification_report
from sklearn.model_selection import cross_val_score
from imblearn.over_sampling import SMOTE
from yellowbrick.classifier import roc_auc

# A function to use smote

def
grid_search_random_forrest_best(dataframe,ta
rget):
# splitting the data
x_train,x_val,y_train,y_val = train_test_split(dataframe,target, test_size=0.3, random_state=42
)

# Applying Smote on train data for dealing with class imbalance

smote = SMOTE()

X_sm, y_sm = smote.fit_sample(x_train, y_train)

rfc = RandomForestClassifier(n_estimators=11, max_features='auto', max_depth=8, criterion=

'entropy',random_state=42)

rfc.fit(X_sm, y_sm)
y_pred = rfc.predict(x_val)
print(classification_report(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
visualizer = roc_auc(rfc,X_sm,y_sm,x_val,y_val)

grid_search_random_forrest_best(X,y)
precision recall f1-score
support

0 0.96 0.78 0.86

8723
1 accuracy 0.31 0.77 0.43
0.73 9885
macro1162
avg 0.63 0.76 0.65 9885
weighted avg 0.88 0.77 0.81 9885

[[6801 1922]
[ 309 853]]
Applying the grid search function for random forest only on the best features obtained
using Random Forest
In [16]:
grid_search_random_forrest_best(X[['age', 'job', 'education', 'month', 'day_of_week', 'duration', 'c
ampaign', 'poutcome']],y)
precision recall f1-score support

0 0.96 0.81 0.88 8723

1 0.36 0.78 0.49 1162

accuracy 0.81 9885

macro avg 0.66 0.80 0.69 9885
weighted avg 0.89 0.81 0.84 9885

[[7099 1624]
[ 258 904]]
Ensembling
Ensemble learning uses multiple machine learning models to obtain better predictive
performance than could be obtained from any of the constituent learning algorithms alone. In the
below task, we have used an ensemble of three models
- RandomForestClassifier(), GradientBoostingClassifier(), LogisticRegression(). Feel free to
modify this function as per your requirements and fit more models or change the parameters for
every model.

In [17]:
from sklearn.metrics import confusion_matrix
from sklearn.ensemble import VotingClassifier

# splitting the data

x_train,x_val,y_train,y_val =
train_test_split(X, y, test_size=0.3,
random_state=42)
# using smote
smote = SMOTE()
X_sm, y_sm = smote.fit_sample(x_train,
y_train)
# models to use for ensembling
model1 = RandomForestClassifier()
model3 = GradientBoostingClassifier()
model2 = LogisticRegression()
# fitting the model
model = VotingClassifier(estimators=[('rf', model1), ('lr', model2), ('xgb',model3)], voting='soft')
model.fit(X_sm,y_sm)
# predicting balues and getting the metrics
y_pred = model.predict(x_val)
In [18]:
print(classification_report(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
visualizer = roc_auc(model,X_sm,y_sm,x_val,y_val)
precision recall f1-score support

0 0.95 0.85 0.90 8723

1 0.38 0.69 0.49 1162

accuracy 0.83 988

5
macro avg 0.67 0.77 0.70 9885
weighted avg 0.89 0.83 0.85 9885

[[7420 1303]
[ 358 804]]

Prediction on the test data

In the below task, we have performed a prediction on the test data. We have used Logistic
Regression for this prediction. You can use the model of your choice that will give you the best
metric score on the validation data.

In this task below, we will read the test file and store the Id column from the test file in a
variable Id. This column would be of use to us while submission since we need to have an Id
column in the submission file which is the same Id of the observations in the test data.
We have to perform the same preprocessing operations on the test data that we have performed
on the train data. For demonstration purposes, we have preprocessed the test data and this
preprocessed data is present in the csv file test_preprocessed.csv

We then make a prediction on the preprocessed test data using the Grid Search Logisitic
regression model. And as the final step, we concatenate this prediction with the Id column and
then convert this into a csv file which becomes the submission.csv

In
[19]
:
# Preprocessed Test File
test = pd.read_csv('../input/banking-project-term-deposit/new_test.csv')
test.head()
a O
jo mar educa defa hous lo cont mo day_of_ durat camp upoutc
b ital tion ult ing act week ion aign t ome
g
an nth [
e 1
9
]
3
0 4 0 6 0 0 0 0 3 3 131 5 :1
2

3 1
1 3 6 0 0 0 0 4 3 100 1 1
7 0

5
2 5 0 5 1 2 0 0 3 2 131 2 1
5

4
3 2 1 0 1 0 0 1 4 3 48 2 1
4

2
4 0 2 3 0 0 0 0 5 0 144 2 1
8

#Creating Pre-processed Test

[21]: In
smote = SMOTE()

X_sm, y_sm = smote.fit_sample(x_train, y_train)

rfc = RandomForestClassifier()
# selecting the parameter
param_grid = {
'max_features': ['auto', 'sqrt',
'log2'],
'max_depth' : [4,5,6,7,8],
'criterion' :['gini', 'entropy']
}
# using grid search with
respective parameters
grid_search_model =
GridSearchCV(rfc,
param_grid=param_grid)

# fitting the model

grid_search_model.fit(X_sm, In [22]:
y_sm)
prediction = pd.DataFrame(y_pred,columns=['y'])
submission = pd.concat([prediction['y']],1)
# Predict on the preprocessed
test file
submission.to_csv('submission.csv',index=False)
y_pred =
grid_search_model.predict(test)
Conclusion-this project report outlines the steps taken to develop a predictive model for
forecasting whether a customer will subscribe to a term deposit. By leveraging machine learning
algorithms and analyzing customer attributes, the report provides valuable insights to financial
institutions for optimizing their marketing efforts and improving the subscription rate.

Project Presentation.
No ratings yet
Project Presentation.
19 pages
Project Report
No ratings yet
Project Report
19 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
Project Report Asg
No ratings yet
Project Report Asg
30 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
ANIL DS PROJECT
No ratings yet
ANIL DS PROJECT
33 pages
Digital Transformation in Banking
No ratings yet
Digital Transformation in Banking
4 pages
Data Analytics on Banking
No ratings yet
Data Analytics on Banking
3 pages
Predictive Analysis For Retail Banking
No ratings yet
Predictive Analysis For Retail Banking
28 pages
PWC PPT DOC
No ratings yet
PWC PPT DOC
19 pages
Random forest and logistic regression algorithms A comparison of classification methods for bank ma
No ratings yet
Random forest and logistic regression algorithms A comparison of classification methods for bank ma
4 pages
Classification - Bank - Marketing - Dataset - Jupyter Notebook
No ratings yet
Classification - Bank - Marketing - Dataset - Jupyter Notebook
23 pages
ML-Adv.pptx
No ratings yet
ML-Adv.pptx
51 pages
Marketing Project: Reza Marzban
No ratings yet
Marketing Project: Reza Marzban
18 pages
Bank Names
No ratings yet
Bank Names
2 pages
Predicting The Term Deposit Subscription
No ratings yet
Predicting The Term Deposit Subscription
38 pages
Bank Additional Names
No ratings yet
Bank Additional Names
1 page
Dataset Information
No ratings yet
Dataset Information
1 page
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Part A Doc 1
No ratings yet
Part A Doc 1
21 pages
EEE - 559: Mathematical Pattern Recognition Individual Project Abinaya Manimaran
No ratings yet
EEE - 559: Mathematical Pattern Recognition Individual Project Abinaya Manimaran
41 pages
NTCC Seminar Sem6 Prachi Kumari A35400719009
No ratings yet
NTCC Seminar Sem6 Prachi Kumari A35400719009
30 pages
CASE STUDY STOCK MARKET PREDICITON
No ratings yet
CASE STUDY STOCK MARKET PREDICITON
10 pages
Default_of_Credit_Card_Clients
No ratings yet
Default_of_Credit_Card_Clients
33 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
Project Report
No ratings yet
Project Report
12 pages
1_Understanding_the_problem_and_the_data.ipynb - Colaboratory (1)
No ratings yet
1_Understanding_the_problem_and_the_data.ipynb - Colaboratory (1)
9 pages
12113667 an Kit
No ratings yet
12113667 an Kit
12 pages
Bank Additional Names
No ratings yet
Bank Additional Names
2 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Project Report
No ratings yet
Project Report
11 pages
DWDM Cep
No ratings yet
DWDM Cep
13 pages
Default Payment Analysis of Credit Card Clients: July 2018
No ratings yet
Default Payment Analysis of Credit Card Clients: July 2018
7 pages
Amazon_Sales_Analysis_Presentation
No ratings yet
Amazon_Sales_Analysis_Presentation
24 pages
ML Project.docx
No ratings yet
ML Project.docx
5 pages
Credit Risk Analysis
No ratings yet
Credit Risk Analysis
6 pages
Bank Marketing Targets 1724510938
No ratings yet
Bank Marketing Targets 1724510938
13 pages
Daa-01
No ratings yet
Daa-01
11 pages
Mid Semester Project Review UditSoni
No ratings yet
Mid Semester Project Review UditSoni
25 pages
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
24msp3077 1st Rev
No ratings yet
24msp3077 1st Rev
20 pages
Ml Project Part b
No ratings yet
Ml Project Part b
8 pages
Machine Learning Using Python Question Paper 1
No ratings yet
Machine Learning Using Python Question Paper 1
4 pages
Varshini Phase 3
No ratings yet
Varshini Phase 3
12 pages
Bank Marketing Case-Study:: Relevant Information About The Data
100% (1)
Bank Marketing Case-Study:: Relevant Information About The Data
1 page
DS-Food
No ratings yet
DS-Food
23 pages
Finclub Summer Project 2 (2025) (1)
No ratings yet
Finclub Summer Project 2 (2025) (1)
7 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
INSY446 - 4 - Classification Part 1
No ratings yet
INSY446 - 4 - Classification Part 1
26 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Master_thesis_full_report_v3_livroChurneELifetimevalue_ltv
No ratings yet
Master_thesis_full_report_v3_livroChurneELifetimevalue_ltv
51 pages
Project 2 Classification Models
No ratings yet
Project 2 Classification Models
5 pages
Paper - Predicting customer subscription in bank telemarketing campaigns using ensemble learning models 2025
No ratings yet
Paper - Predicting customer subscription in bank telemarketing campaigns using ensemble learning models 2025
5 pages
Loan-Prediction Using Machine Learning
No ratings yet
Loan-Prediction Using Machine Learning
31 pages
Building a Retirement Planner Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Retirement Planner
From Everand
Building a Retirement Planner Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Retirement Planner
Lumavalle Press
No ratings yet
Building a Mortgage Calculator Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Mortgage Calculator
From Everand
Building a Mortgage Calculator Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Mortgage Calculator
Lumavalle Press
No ratings yet
Applied General Statistics (HIS 223)
No ratings yet
Applied General Statistics (HIS 223)
35 pages
Education and Crime - Evidence From Italian Regions
No ratings yet
Education and Crime - Evidence From Italian Regions
6 pages
Andrei Virtusio - MT Act3
No ratings yet
Andrei Virtusio - MT Act3
2 pages
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
No ratings yet
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
6 pages
Correlation and Regression
0% (1)
Correlation and Regression
45 pages
Auronova Consulting
No ratings yet
Auronova Consulting
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
6 pages
Geographical Analysis - 2022 - Comber - A Route Map For Successful Applications of Geographically Weighted Regression
No ratings yet
Geographical Analysis - 2022 - Comber - A Route Map For Successful Applications of Geographically Weighted Regression
24 pages
ANOVA MCQ (Free PDF) - Objective Question Answer For ANOVA Quiz - Download Now!
No ratings yet
ANOVA MCQ (Free PDF) - Objective Question Answer For ANOVA Quiz - Download Now!
10 pages
FYBCOM CORRELATION
No ratings yet
FYBCOM CORRELATION
2 pages
Chapter 7-Tahoe-Salt
No ratings yet
Chapter 7-Tahoe-Salt
13 pages
Brown Durbin CUSUM
No ratings yet
Brown Durbin CUSUM
15 pages
CHAPTER 22 &23: Bivariate Statistical Analysis
No ratings yet
CHAPTER 22 &23: Bivariate Statistical Analysis
10 pages
1 s2.0 S1470160X16300279 Main
No ratings yet
1 s2.0 S1470160X16300279 Main
6 pages
Assiment 3 FM Final
No ratings yet
Assiment 3 FM Final
31 pages
Econometrics by Shalabh Sir
No ratings yet
Econometrics by Shalabh Sir
488 pages
Iftar Adiandra Pratama_Tugas Uji Validitas Dan Reabilitas
No ratings yet
Iftar Adiandra Pratama_Tugas Uji Validitas Dan Reabilitas
2 pages
Planar/Bilinear Least Squares Regression
No ratings yet
Planar/Bilinear Least Squares Regression
2 pages
Indian Institute of Technology, Kanpur: Applied Machine Learning
100% (1)
Indian Institute of Technology, Kanpur: Applied Machine Learning
4 pages
Chapter 13
No ratings yet
Chapter 13
129 pages
Chapter 16 - Logistic Regression Model
No ratings yet
Chapter 16 - Logistic Regression Model
7 pages
E32 - Analytical Performance - v.01
No ratings yet
E32 - Analytical Performance - v.01
2 pages
Chapter 3 two variable regression model
No ratings yet
Chapter 3 two variable regression model
7 pages
ml unit 3
No ratings yet
ml unit 3
3 pages
Data Mining: Ensemble Techniques Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Data Mining: Ensemble Techniques Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
11 pages
Worksheet - Correlation Coefficients
No ratings yet
Worksheet - Correlation Coefficients
4 pages
KMO and Bartlett's Test
No ratings yet
KMO and Bartlett's Test
3 pages
RJwrapper
No ratings yet
RJwrapper
24 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
Univariate and Bivariate Analysis
No ratings yet
Univariate and Bivariate Analysis
21 pages

Project Presentation

Uploaded by

Project Presentation

Uploaded by

[Akash kumar] [Dev Bhoomi Uattarakhand University] [Date- 15/06/23]

[Predicting Whether The Customer Will Subscribe To Term Deposit.]

Business Use Case

Data Science Problem Statement

Objective of this template notebook

Understanding the dataset

Feature Feature Type Description

age numeric age of a person

type of job ('admin.','blue-

categorical, marital status ('divorced','married','single','unknown'; note:

educatio categorical, ('basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.c

last contact duration, in seconds . Important note: this attribute

number of contacts performed before this campaign and for this

P categorical, outcome of the previous marketing campaign

Target variable (desired output):

Feature Feature_Type Description

y binary has the client subscribed a term deposit? ('yes','no')

Importing necessary libraries

Data Loading and Cleaning

Fit vanilla classification models

PREPARING THE TRAIN AND TEST DATA

# Dividing the data into train and test subsets

FITTING THE MODEL AND PREDICTING THE VALUES

GETTING THE METRICS

#fpr, tpr, _ = roc_curve(y_test, predictions[:,1])

))) return estimator

#fpr, tpr, _ = roc_curve(y_test, predictions[:,1])

Using RFE for feature selection

Grid-Search & Hyperparameter Tuning

Applying the best parameters obtained using Grid Search on

# A function to use smote

# Applying Smote on train data for dealing with class imbalance

X_sm, y_sm = smote.fit_sample(x_train, y_train)

rfc = RandomForestClassifier(n_estimators=11, max_features='auto', max_depth=8, criterion=

0 0.96 0.78 0.86

0 0.96 0.81 0.88 8723

accuracy 0.81 9885

# splitting the data

0 0.95 0.85 0.90 8723

accuracy 0.83 988

Prediction on the test data

#Creating Pre-processed Test

X_sm, y_sm = smote.fit_sample(x_train, y_train)

# fitting the model

You might also like