0% found this document useful (0 votes)

329 views78 pages

All Life Bank - AIML_ML_Project_low_code_notebook

AllLife Bank aims to convert liability customers into personal loan customers to increase loan business. A successful previous campaign had a conversion rate of over 9%, prompting the marketing department to develop targeted campaigns. The project involves building a predictive model to identify potential loan customers based on various attributes and analyzing the dataset for insights.

Uploaded by

sanjaycj99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

329 views78 pages

All Life Bank - AIML_ML_Project_low_code_notebook

Uploaded by

sanjaycj99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

 Problem Statement

Context

AllLife Bank is a US bank that has a growing customer base. The majority of these customers
are liability customers (depositors) with varying sizes of deposits. The number of customers
who are also borrowers (asset customers) is quite small, and the bank is interested in
expanding this base rapidly to bring in more loan business and in the process, earn more
through the interest on loans. In particular, the management wants to explore ways of
converting its liability customers to personal loan customers (while retaining them as
depositors).

A campaign that the bank ran last year for liability customers showed a healthy conversion
rate of over 9% success. This has encouraged the retail marketing department to devise
campaigns with better target marketing to increase the success ratio.

You as a Data scientist at AllLife bank have to build a model that will help the marketing
department to identify the potential customers who have a higher probability of purchasing
the loan.

Objective

To predict whether a liability customer will buy personal loans, to understand which customer
attributes are most signiMcant in driving purchases, and identify which segment of customers
to target more.
Data Dictionary

ID : Customer ID
Age : Customer’s age in completed years
Experience : #years of professional experience
Income : Annual income of the customer (in thousand dollars)
ZIP Code : Home Address ZIP code.
Family : the Family size of the customer
CCAvg : Average spending on credit cards per month (in thousand dollars)
Education : Education Level. 1: Undergrad; 2: Graduate;3: Advanced/Professional
Mortgage : Value of house mortgage if any. (in thousand dollars)
Personal_Loan : Did this customer accept the personal loan offered in the last
campaign? (0: No, 1: Yes)
Securities_Account : Does the customer have securities account with the bank? (0:
No, 1: Yes)
CD_Account : Does the customer have a certiMcate of deposit (CD) account with the
bank? (0: No, 1: Yes)
Online : Do customers use internet banking facilities? (0: No, 1: Yes)
CreditCard : Does the customer use a credit card issued by any other Bank (excluding
All life Bank)? (0: No, 1: Yes)

Please read the instructions carefully before starting the


project.
This is a commented Jupyter IPython Notebook Mle in which all the instructions and tasks to
be performed are mentioned.

Blanks '___' are provided in the notebook that needs to be Mlled with an appropriate code
to get the correct result. With every '___' blank, there is a comment that briecy describes
what needs to be Mlled in the blank space.
Identify the task to be performed correctly, and only then proceed to write the required
code.
Fill the code wherever asked by the commented lines like "# write your code here" or "#
complete the code". Running incomplete code may throw error.
Please run the codes in a sequential manner from the beginning to avoid any
unnecessary errors.
Add the results/observations (wherever mentioned) derived from the analysis in the
presentation and submit the same.

 Importing necessary libraries

# Installing the libraries with the specified version.
!pip install numpy==1.25.2 pandas==1.5.3 matplotlib==3.7.1 seaborn==0.13.1 scikit-l

error: subprocess-exited-with-error

Getting requirements to build wheel did not run successfully.

exit code: 1

[33 lines of output]

Traceback (most recent call last):
File "C:\Users\conne\anaconda3\Lib\site-packages\pip\_vendor\pyproject_hook
main()
File "C:\Users\conne\anaconda3\Lib\site-packages\pip\_vendor\pyproject_hook
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\conne\anaconda3\Lib\site-packages\pip\_vendor\pyproject_hook
backend = _build_backend()
^^^^^^^^^^^^^^^^
File "C:\Users\conne\anaconda3\Lib\site-packages\pip\_vendor\pyproject_hook
obj = import_module(mod_path)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\conne\anaconda3\Lib\importlib\__init__.py", line 90, in impo
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_remove
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_remove
File "C:\Users\conne\AppData\Local\Temp\pip-build-env-5akfrxos\overlay\Lib\
import setuptools.version
File "C:\Users\conne\AppData\Local\Temp\pip-build-env-5akfrxos\overlay\Lib\
import pkg_resources
File "C:\Users\conne\AppData\Local\Temp\pip-build-env-5akfrxos\overlay\Lib\
register_finder(pkgutil.ImpImporter, find_on_path)
^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean
[end of output]

note: This error originates from a subprocess, and is likely not a problem wi
error: subprocess-exited-with-error

Getting requirements to build wheel did not run successfully.

exit code: 1

See above for output.

note: This error originates from a subprocess, and is likely not a problem with
Note:

1. After running the above cell, kindly restart the notebook kernel (for Jupyter Notebook) or
runtime (for Google Colab) and run all cells sequentially from the next cell.

2. On executing the above line of code, you might see a warning regarding package
dependencies. This error message can be ignored as the above code ensures that all
necessary libraries and their dependencies are maintained to successfully execute the
code in this notebook.

# Libraries to help with reading and manipulating data

import pandas as pd
import numpy as np

# libaries to help with data visualization

import matplotlib.pyplot as plt
import seaborn as sns

# Library to split data

from sklearn.model_selection import train_test_split

# To build model for prediction

from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

# To get diferent metric scores

from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
)

# to suppress unnecessary warnings

import warnings
warnings.filterwarnings("ignore")

 Loading the dataset

# uncomment the following lines if Google Colab is being used

# from google.colab import drive
# drive.mount('/content/drive')
Loan = pd.read_csv("C:\\Users\\conne\\OneDrive\\Desktop\\AI ML COURSE\\Loan_Modelli

# copying data to another variable to avoid any changes to original data

data = Loan.copy()

 Data Overview

 View the Mrst and last 5 rows of the dataset.

data.head(5) ## Complete the code to view top 5 rows of the data

ID Age Experience Income ZIPCode Family CCAvg Education Mortgage Pers

0 1 25 1 49 91107 4 1.6 1 0

1 2 45 19 34 90089 3 1.5 1 0

2 3 39 15 11 94720 1 1.0 1 0

3 4 35 9 100 94112 1 2.7 2 0

4 5 35 8 45 91330 4 1.0 2 0

data.tail(5) ## Complete the code to view last 5 rows of the data

ID Age Experience Income ZIPCode Family CCAvg Education Mortgage

4995 4996 29 3 40 92697 1 1.9 3 0

4996 4997 30 4 15 92037 4 0.4 1 85

4997 4998 63 39 24 93023 2 0.3 3 0

4998 4999 65 40 49 90034 3 0.5 2 0

4999 5000 28 4 83 92612 3 0.8 1 0

 Understand the shape of the dataset.

data.shape ## Complete the code to get the shape of the data

(5000, 14)
 Check the data types of the columns for the dataset

data.dtypes ## Complete the code to view the datatypes of the data

ID int64
Age int64
Experience int64
Income int64
ZIPCode int64
Family int64
CCAvg float64
Education int64
Mortgage int64
Personal_Loan int64
Securities_Account int64
CD_Account int64
Online int64
CreditCard int64
dtype: object

 Checking the Statistical Summary

data.describe().T ## Complete the code to print the statistical summary of the dat

count mean std min 25% 50% 75%

ID 5000.0 2500.500000 1443.520003 1.0 1250.75 2500.5 3750.2

Age 5000.0 45.338400 11.463166 23.0 35.00 45.0 55.0

Experience 5000.0 20.104600 11.467954 -3.0 10.00 20.0 30.0

Income 5000.0 73.774200 46.033729 8.0 39.00 64.0 98.0

ZIPCode 5000.0 93169.257000 1759.455086 90005.0 91911.00 93437.0 94608.0

Family 5000.0 2.396400 1.147663 1.0 1.00 2.0 3.0

CCAvg 5000.0 1.937938 1.747659 0.0 0.70 1.5 2.5

Education 5000.0 1.881000 0.839869 1.0 1.00 2.0 3.0

Mortgage 5000.0 56.498800 101.713802 0.0 0.00 0.0 101.0

Personal_Loan 5000.0 0.096000 0.294621 0.0 0.00 0.0 0.0

Securities_Account 5000.0 0.104400 0.305809 0.0 0.00 0.0 0.0

CD_Account 5000.0 0.060400 0.238250 0.0 0.00 0.0 0.0

Online 5000.0 0.596800 0.490589 0.0 0.00 1.0 1.0

CreditCard 5000.0 0.294000 0.455637 0.0 0.00 0.0 1.0

Most customers fall within a broad working age range, with a median age of 45. This
indicates the bank's customer base is largely in their prime earning years, making them
potentially attractive targets for loans.
Negative values in Experience suggest data quality issues that need correction. Most
customers have signiMcant professional experience, which might correlate with stable
incomes.
Wide variability in income indicates a diverse customer base, with a signiMcant
proportion earning under $98k (75th percentile). The higher-income customers could be
prioritized for loan marketing as they may be more likely to qualify and accept larger
loans.
Most customers belong to small families. This suggests marketing campaigns should
focus on individual or small family needs rather than larger family-speciMc messaging.
SigniMcant variation in credit card spending indicates that high spenders might be more
inclined toward loans, either to consolidate debt or manage expenses.
Customers with higher education levels (Graduate/Professional) may be more receptive
to Mnancial products due to better Mnancial literacy and higher earning potential.
A high standard deviation suggests a mixed customer base, where a signiMcant
proportion may not have mortgages (median = 0). Those without mortgages could be
targeted for personal loans.

 Dropping columns

data = data.drop(['ID'], axis=1) ## Complete the code to drop a column from the da

Dropping ID column, since it is a running serial number and won't have any bearing on
the model

data.head()

Age Experience Income ZIPCode Family CCAvg Education Mortgage Personal

0 25 1 49 91107 4 1.6 1 0

1 45 19 34 90089 3 1.5 1 0

2 39 15 11 94720 1 1.0 1 0

3 35 9 100 94112 1 2.7 2 0

4 35 8 45 91330 4 1.0 2 0
Start coding or generate with AI.

data.isnull().sum()

Age 0
Experience 0
Income 0
ZIPCode 0
Family 0
CCAvg 0
Education 0
Mortgage 0
Personal_Loan 0
Securities_Account 0
CD_Account 0
Online 0
CreditCard 0
dtype: int64

 Data Preprocessing

 Checking for Anomalous Values

data["Experience"].unique()

array([ 1, 19, 15, 9, 8, 13, 27, 24, 10, 39, 5, 23, 32, 41, 30, 14, 18,
21, 28, 31, 11, 16, 20, 35, 6, 25, 7, 12, 26, 37, 17, 2, 36, 29,
3, 22, -1, 34, 0, 38, 40, 33, 4, -2, 42, -3, 43], dtype=int64)

# checking for experience <0

data[data["Experience"] < 0]["Experience"].unique()

array([-1, -2, -3], dtype=int64)

# Correcting the experience values

data["Experience"].replace(-1, 1, inplace=True)
data["Experience"].replace(-2, 2, inplace=True)
data["Experience"].replace(-3, 3, inplace=True)

data["Education"].unique()

array([1, 2, 3], dtype=int64)

data.head()

Age Experience Income ZIPCode Family CCAvg Education Mortgage Personal

0 25 1 49 91107 4 1.6 1 0

1 45 19 34 90089 3 1.5 1 0

2 39 15 11 94720 1 1.0 1 0

3 35 9 100 94112 1 2.7 2 0

4 35 8 45 91330 4 1.0 2 0

# data["Mortgage"].unique()
data["Personal_Loan"].unique()

array([0, 1], dtype=int64)

data["CD_Account"].unique()

array([0, 1], dtype=int64)

data["Online"].unique()

array([0, 1], dtype=int64)

Start coding or generate with AI.

 Feature Engineering

# checking the number of uniques in the zip code

data["ZIPCode"].nunique()

467
data["ZIPCode"] = data["ZIPCode"].astype(str)
print(
"Number of unique values if we take first two digits of ZIPCode: ",
data["ZIPCode"].str[0:2].nunique(),
)
data["ZIPCode"] = data["ZIPCode"].str[0:2]

data["ZIPCode"] = data["ZIPCode"].astype("category")

Number of unique values if we take first two digits of ZIPCode: 7

## Converting the data type of categorical features to 'category'

cat_cols = [
"Education",
"Personal_Loan",
"Securities_Account",
"CD_Account",
"Online",
"CreditCard",
"ZIPCode",
]
data[cat_cols] = data[cat_cols].astype("category")

 Exploratory Data Analysis (EDA)

 Univariate Analysis
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined

data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the col
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram

# function to create labeled barplots

def labeled_barplot(data, feature, perc=False, n=None):

"""
Barplot with percentage at the top

data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all level
"""

total = len(data[feature]) # length of the column

count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)

for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category

x = p.get_x() + p.get_width() / 2 # width of the plot

y = p.get_height() # height of the plot

ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage

plt.show() # show the plot

 Observations on Age
data.dtypes

Age int64
Experience int64
Income int64
ZIPCode category
Family int64
CCAvg float64
Education category
Mortgage int64
Personal_Loan category
Securities_Account category
CD_Account category
Online category
CreditCard category
dtype: object

histogram_boxplot(data, "Age")
 Observations on Experience

histogram_boxplot(data, "Experience") ## Complete the code to create histogram_boxp

 Observations on Income
histogram_boxplot(data, "Income") ## Complete the code to create histogram_boxplot

 Observations on CCAvg
histogram_boxplot(data, "CCAvg") ## Complete the code to create histogram_boxplot

0000000000000

500

400-

300-
Count

200-

100-

0
2
CCAvg

 Observations on Mortgage
histogram_boxplot(data, "Mortgage") ## Complete the code to create histogram_boxpl

 Observations on Family
labeled_barplot(data, "Family", perc=True)

 Observations on Education
labeled_barplot(data, "Education") ## Complete the code to create labeled_barplot

 Observations on Securities_Account
labeled_barplot(data, "Securities_Account") ## Complete the code to create labele

 Observations on CD_Account
labeled_barplot(data, "CD_Account") ## Complete the code to create labeled_barplo

 Observations on Online
labeled_barplot(data, "Online") ## Complete the code to create labeled_barplot fo

 Observation on CreditCard
labeled_barplot(data, "CreditCard") ## Complete the code to create labeled_barplo

 Observation on ZIPCode
labeled_barplot(data, "ZIPCode") ## Complete the code to create labeled_barplot f

 Bivariate Analysis
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart

data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
### function to plot distributions wrt target

def distribution_plot_wrt_target(data, predictor, target):

fig, axs = plt.subplots(2, 2, figsize=(12, 10))

target_uniq = data[target].unique()

axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))

sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
stat="density",
)

axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))

sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
stat="density",
)

axs[1, 0].set_title("Boxplot w.r.t target")

sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainb

axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")

sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)

plt.tight_layout()
plt.show()

 Correlation check
plt.figure(figsize=(15, 7))
sns.heatmap(data.corr(numeric_only=True), annot=True, vmin=-1, vmax=1, fmt=".2f", c
plt.show()

Ag 1.00 0.99 -0.06 -0.05

Experience

0.99 1.00 -0.05 -0.05

Income

-0.06 -0.05 1.00 -0.16

Family

-0.05 -0.05 -0.16 1.00

CCAV9

-0.05 -0.05 0.65 -0.11

Mortglage

- -0.01 -0.01 0.21 -0.02

Age Experience Income Family

Let's check how a customer's interest in purchasing a loan varies with their
 education
stacked_barplot(data, "Education", "Personal_Loan")

Personal_Loan 0 1 All
Education
All 4520 480 5000
3 1296 205 1501
2 1221 182 1403
1 2003 93 2096
-------------------------------------------------------------------------------

Higher education levels correlate with higher loan acceptance rates. This trend may be
due to:

Higher Mnancial literacy and understanding of loan beneMts.

Better income levels associated with advanced education.

 Personal_Loan vs Family
stacked_barplot(data, "Personal_Loan", "Family") ## Complete the code to plot stac

Family 1 2 3 4 All
Personal_Loan
All 1472 1296 1010 1222 5000
0 1365 1190 877 1088 4520
1 107 106 133 134 480
-------------------------------------------------------------------------------

 Personal_Loan vs Securities_Account
stacked_barplot(data, "Personal_Loan","Securities_Account") ## Complete the code to

Securities_Account 0 1 All
Personal_Loan
All 4478 522 5000
0 4058 462 4520
1 420 60 480
-------------------------------------------------------------------------------

 Personal_Loan vs CD_Account
stacked_barplot(data, "Personal_Loan", "CD_Account") ## Complete the code to plot s

CD_Account 0 1 All
Personal_Loan
All 4698 302 5000
0 4358 162 4520
1 340 140 480
-------------------------------------------------------------------------------

 Personal_Loan vs Online
stacked_barplot(data, "Personal_Loan", "Online") ## Complete the code to plot stack

Online 0 1 All
Personal_Loan
All 2016 2984 5000
0 1827 2693 4520
1 189 291 480
-------------------------------------------------------------------------------

 Personal_Loan vs CreditCard
stacked_barplot(data, "Personal_Loan", "CreditCard") ## Complete the code to plot s

CreditCard 0 1 All
Personal_Loan
All 3530 1470 5000
0 3193 1327 4520
1 337 143 480
-------------------------------------------------------------------------------

 Personal_Loan vs ZIPCode
stacked_barplot(data, "Personal_Loan", "ZIPCode") ## Complete the code to plot stac

ZIPCode 90 91 92 93 94 95 96 All
Personal_Loan
All 703 565 988 417 1472 815 40 5000
0 636 510 894 374 1334 735 37 4520
1 67 55 94 43 138 80 3 480
-------------------------------------------------------------------------------

 Let's check how a customer's interest in purchasing a loan varies with their age

distribution_plot_wrt_target(data, "Age", "Personal_Loan")

 Personal Loan vs Experience

distribution_plot_wrt_target(data, "Experience", "Personal_Loan") ## Complete the c

 Personal Loan vs Income

distribution_plot_wrt_target(data, "Income", "Personal_Loan") ## Complete the code

distribution_plot_wrt_target(data, "Income", "Personal_Loan") ## Complete the code
 Personal Loan vs CCAvg

distribution_plot_wrt_target(data, "CCAvg", "Personal_Loan") ## Complete the code t

 Data Preprocessing (contd.)

 Outlier Detection

# Select numerical columns only

numerical_data = data.select_dtypes(include=["float64", "int64"])

Q1 = numerical_data.quantile(0.25) # To find the 25th percentile and 75th percenti

Q3 = numerical_data.quantile(0.75)

IQR = Q3 - Q1 # Inter Quantile Range (75th perentile - 25th percentile)

lower = ( Q1 - 1.5 * IQR) # Finding lower and upper bounds for all values. All val
upper = Q3 + 1.5 * IQR

(
(data.select_dtypes(include=["float64", "int64"]) < lower)
| (data.select_dtypes(include=["float64", "int64"]) > upper)
).sum() / len(data) * 100

Age 0.00
Experience 0.00
Income 1.92
Family 0.00
CCAvg 6.48
Mortgage 5.82
dtype: float64

 Data Preparation for Modeling

# dropping Experience as it is perfectly correlated with Age
X = data.drop(["Personal_Loan", "Experience"], axis=1)
Y = data["Personal_Loan"]

X = pd.get_dummies(X, columns=["ZIPCode", "Education"], drop_first=True)

X = X.astype(float)

# Splitting data in train and test sets

X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)

print("Shape of Training set : ", X_train.shape)

print("Shape of test set : ", X_test.shape)
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))

Shape of Training set : (3500, 17)

Shape of test set : (1500, 17)
Percentage of classes in training set:
Personal_Loan
0 0.905429
1 0.094571
Name: proportion, dtype: float64
Percentage of classes in test set:
Personal_Loan
0 0.900667
1 0.099333
Name: proportion, dtype: float64

 Model Building

 Model Evaluation Criterion

mention the model evaluation criterion here with proper reasoning

Primary Focus: Recall (to minimize missed loan takers) and F1-Score (to balance
precision and recall).

Supplementary Metrics: Precision (to optimize targeting) and Accuracy (as a general
indicator).
First, let's create functions to calculate different metrics and confusion matrix so that we don't
have to use the same code repeatedly for each model.

The model_performance_classiMcation_sklearn function will be used to check the model

performance of models.
The confusion_matrix_sklearnfunction will be used to plot confusion matrix.

# defining a function to compute different metrics to check performance of a classi

def model_performance_classification_sklearn(model, predictors, target):
"""
Function to compute different metrics to check classification model performance

model: classifier
predictors: independent variables
target: dependent variable
"""

# predicting using the independent variables

pred = model.predict(predictors)

acc = accuracy_score(target, pred) # to compute Accuracy

recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score

# creating a dataframe of metrics

df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)

return df_perf
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages

model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum()
for item in cm.flatten()
]
).reshape(2, 2)

plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")

 Decision Tree (sklearn default)

model = DecisionTreeClassifier(criterion="gini", random_state=1)

model.fit(X_train, y_train)

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier(random_state=1)

 Checking model performance on training data

confusion_matrix_sklearn(model, X_train, y_train)

decision_tree_perf_train = model_performance_classification_sklearn(
model, X_train, y_train
)
decision_tree_perf_train

Accuracy Recall Precision F1

0 1.0 1.0 1.0 1.0

 Check Performance on Test Data

Evaluate the model's performance on the test data to verify if it generalizes well:

decision_tree_perf_test = model_performance_classification_sklearn(model, X_test, y

print("Test Performance:")
decision_tree_perf_test

Test Performance:
Accuracy Recall Precision F1

0 0.986 0.932886 0.926667 0.929766

 Tree Depth

print(f"Tree Depth: {model.get_depth()}")

print(f"Number of Leaves: {model.get_n_leaves()}")

Tree Depth: 10
Number of Leaves: 49

Start coding or generate with AI.

 Visualizing the Decision Tree

feature_names = list(X_train.columns)
print(feature_names)

['Age', 'Income', 'Family', 'CCAvg', 'Mortgage', 'Securities_Account', 'CD_Acco

plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
Start coding or generate with AI.

# Text report showing the rules of a decision tree -

print(tree.export_text(model, feature_names=feature_names, show_weights=True))

|--- Income <= 116.50

print(
pd.DataFrame(
model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)

Imp
Income 0.308098
Family 0.259255
Education_2 0.166192
Education_3 0.147127
CCAvg 0.048798
Age 0.033150
CD_Account 0.017273
ZIPCode_94 0.007183
ZIPCode_93 0.004682
Mortgage 0.003236
Online 0.002224
Securities_Account 0.002224
ZIPCode_91 0.000556
ZIPCode_92 0.000000
ZIPCode_95 0.000000
ZIPCode_96 0.000000
CreditCard 0.000000
importances = model.feature_importances_
indices = np.argsort(importances)

plt.figure(figsize=(8, 8))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
 Checking model performance on test data

confusion_matrix_sklearn(model, X_test, y_test) ## Complete the code to create conf

decision_tree_perf_test = model_performance_classification_sklearn(model, X_test, y

decision_tree_perf_test

Accuracy Recall Precision F1

0 0.986 0.932886 0.926667 0.929766

 Model Performance Improvement

 Pre-pruning

Note: The parameters provided below are a sample set. You can feel free to update the same
and try out other combinations.

# Define the parameters of the tree to iterate over

# max_depth_values = np.arange(2, 11, 2)
# max_leaf_nodes_values = [10, 51, 10]
# min_samples_split_values = [10, 51, 10]
# max_depth_values = np.arange(3, 21, 2) # Explore deeper trees
# max_leaf_nodes_values = [20, 50, 100, 200] # Allow more splits to refine leaf no
# min_samples_split_values = [2, 10, 20, 50] # Explore smaller splits for better g

# max_depth_values = np.arange(3, 16, 2) # Explore deeper trees

max_depth_values = np.arange(6, 15)
max_leaf_nodes_values = [20, 50, 100, 200] # Allow more splits to refine leaf node
min_samples_split_values = [2, 5, 10] # Explore smaller splits for better granular

# Initialize variables to store the best model and its performance

best_estimator = None
best_score_diff = float('inf')
best_test_score = 0.0

# Iterate over all combinations of the specified parameter values

for max_depth in max_depth_values:
for max_leaf_nodes in max_leaf_nodes_values:
for min_samples_split in min_samples_split_values:

# Initialize the tree with the current set of parameters

estimator = DecisionTreeClassifier(
max_depth=max_depth,
max_leaf_nodes=max_leaf_nodes,
min_samples_split=min_samples_split,
class_weight='balanced',
random_state=1
)

# Fit the model to the training data

estimator.fit(X_train, y_train)

# Make predictions on the training and test sets

y_train_pred = estimator.predict(X_train)
y_test_pred = estimator.predict(X_test)

# Calculate recall scores for training and test sets

train_recall_score = recall_score(y_train, y_train_pred)
test_recall_score = recall_score(y_test, y_test_pred)

# Calculate the absolute difference between training and test recall sc

score_diff = abs(train_recall_score - test_recall_score)

# Update the best estimator and best score if the current one has a sma
if (score_diff < best_score_diff) & (test_recall_score > best_test_scor
best_score_diff = score_diff
best_test_score = test_recall_score
best_estimator = estimator

# Print the best parameters

print("Best parameters found:")
print(f"Max depth: {best_estimator.max_depth}")
print(f"Max leaf nodes: {best_estimator.max_leaf_nodes}")
print(f"Min samples split: {best_estimator.min_samples_split}")
print(f"Best test recall score: {best_test_score}")

Best parameters found:

Max depth: 6
Max leaf nodes: 20
Min samples split: 2
Best test recall score: 0.9664429530201343
importances = model.feature_importances_
indices = np.argsort(importances)

▾ DecisionTreeClassifier i

DecisionTreeClassifier(class_weight='balanced', max_depth=6, max_leaf_nodes=20

random_state=1)

Checking performance on training data

# Generate predictions for the train data

# y_train_pred = estimator.predict(X_train)
# Predict labels for the training data
y_train_pred = estimator.predict(X_train)

confusion_matrix_sklearn(estimator, X_train, y_train) ## Complete the code to creat

decision_tree_tune_perf_train = model_performance_classification_sklearn(estimator,
decision_tree_tune_perf_train

Accuracy Recall Precision F1

0 0.973714 1.0 0.782506 0.877984

Visualizing the Decision Tree

plt.figure(figsize=(10, 10))
out = tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -

print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))

|--- Income <= 92.50

print(
pd.DataFrame(
estimator.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)

Imp
Income 0.662361
Education_2 0.143155
CCAvg 0.087565
Education_3 0.050404
Family 0.039987
CD_Account 0.007829
Mortgage 0.004987
Age 0.003711
Online 0.000000
ZIPCode_91 0.000000
ZIPCode_92 0.000000
ZIPCode_93 0.000000
ZIPCode_94 0.000000
ZIPCode_95 0.000000
ZIPCode_96 0.000000
Securities_Account 0.000000
CreditCard 0.000000
importances = estimator.feature_importances_
indices = np.argsort(importances)

Checking performance on test data

confusion_matrix_sklearn(estimator, X_test, y_test) # Complete the code to get the

decision_tree_tune_perf_test = model_performance_classification_sklearn(estimator,
decision_tree_tune_perf_test

Accuracy Recall Precision F1

0 0.963333 0.966443 0.742268 0.83965

 Post-pruning

clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)

ccp_alphas impurities

0 0.000000 0.000000

1 0.000186 0.001114

2 0.000214 0.001542

3 0.000242 0.002750

4 0.000250 0.003250

5 0.000268 0.004324

6 0.000272 0.004868

7 0.000276 0.005420

8 0.000381 0.005801

9 0.000527 0.006329

10 0.000625 0.006954

11 0.000700 0.007654

12 0.000769 0.010731

13 0.000882 0.014260

14 0.000889 0.015149

15 0.001026 0.017200

16 0.001305 0.018505

17 0.001647 0.020153

18 0.002333 0.022486

19 0.002407 0.024893

20 0.003294 0.028187

21 0.006473 0.034659

22 0.025146 0.084951

23 0.039216 0.124167

24 0.047088 0.171255
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()

Next, we train a decision tree using effective alphas. The last value in ccp_alphas is the
alpha value that prunes the whole tree, leaving the tree, clfs[-1] , with one node.

clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train) ## Complete the code to fit decision tree on trai
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)

Number of nodes in the last tree is: 1 with ccp_alpha: 0.04708834100596766

clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]

node_counts = [clf.tree_.node_count for clf in clfs]

depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
Recall vs alpha for training and testing sets
recall_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)

recall_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)

fig, ax = plt.subplots(figsize=(15, 5))

ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post"
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
best_alpha = ccp_alphas[index_best_model]

print(best_model)

DecisionTreeClassifier(random_state=1)

print(ccp_alpha)

0.04708834100596766

estimator_2 = DecisionTreeClassifier(
ccp_alpha=best_alpha, class_weight={0: 0.15, 1: 0.85}, random_state=1 #
)
estimator_2.fit(X_train, y_train)

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier(class_weight={0: 0.15, 1: 0.85}, random_state=1)

Checking performance on training data

confusion_matrix_sklearn(estimator, X_train, y_train) ## Complete the code to creat

decision_tree_tune_post_train = model_performance_classification_sklearn(estimator,
decision_tree_tune_post_train

Accuracy Recall Precision F1

0 0.973714 1.0 0.782506 0.877984

Visualizing the Decision Tree

plt.figure(figsize=(10, 10))
out = tree.plot_tree(
estimator_2,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -

print(tree.export_text(estimator_2, feature_names=feature_names, show_weights=True)

|--- Income <= 98.50

print(
pd.DataFrame(
estimator_2.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)

Imp
Income 0.597264
Education_2 0.138351
CCAvg 0.078877
Education_3 0.067293
Family 0.066244
Age 0.018973
CD_Account 0.011000
Mortgage 0.005762
Securities_Account 0.004716
ZIPCode_94 0.004702
ZIPCode_91 0.003587
CreditCard 0.002428
ZIPCode_92 0.000802
Online 0.000000
ZIPCode_93 0.000000
ZIPCode_95 0.000000
ZIPCode_96 0.000000
importances = estimator_2.feature_importances_
indices = np.argsort(importances)

Checking performance on test data

confusion_matrix_sklearn(estimator_2, X_test, y_test) # Complete the code to get t

decision_tree_tune_post_test = model_performance_classification_sklearn(estimator_2
decision_tree_tune_post_test

Accuracy Recall Precision F1

0 0.979333 0.852349 0.933824 0.891228

 Model Performance Comparison and Final Model Selection

# training performance comparison

models_train_comp_df = pd.concat(
[decision_tree_perf_train.T, decision_tree_tune_perf_train.T, decision_tree_tun
)
models_train_comp_df.columns = ["Decision Tree (sklearn default)", "Decision Tree (
print("Training performance comparison:")
models_train_comp_df

Training performance comparison:

Decision Tree (sklearn Decision Tree (Pre- Decision Tree
default) Pruning) (Post-Pruning)

Accuracy 1.0 0.973714 0.973714

Recall 1.0 1.000000 1.000000

Precision 1.0 0.782506 0.782506

F1 1.0 0.877984 0.877984

# testing performance comparison

models_test_comp_df = pd.concat(
[decision_tree_perf_test.T, decision_tree_tune_perf_test.T, decision_tree_tune_
)
models_test_comp_df.columns = ["Decision Tree (sklearn default)", "Decision Tree (P
print("Test set performance comparison:")
models_test_comp_df

Test set performance comparison:

Decision Tree (sklearn Decision Tree (Pre- Decision Tree
default) Pruning) (Post-Pruning)

Accuracy 0.986000 0.963333 0.979333

Recall 0.932886 0.966443 0.852349

Precision 0.926667 0.742268 0.933824

F1 0.929766 0.839650 0.891228

 Actionable Insights and Business Recommendations

Start coding or generate with AI.

What recommedations would you suggest to the bank?

 Actionable Insights
Actionable Insights
1. Key Drivers of Loan Acceptance:

Income: The most critical predictor of loan acceptance. Customers with higher
incomes are more likely to accept loans, rececting better Mnancial stability and
repayment ability.
Education Level: Graduate and advanced education levels correlate strongly with
loan acceptance, likely due to better Mnancial literacy and earning potential.
Credit Card Spending (CCAvg): High-spending customers are more inclined toward
loans, possibly for debt consolidation or managing high expenses.
Family Size: Customers with larger families show a moderate likelihood of
accepting loans, potentially driven by higher Mnancial responsibilities.

2. Targeted Segments:

High-Income Customers: Focus on customers with income above the 75th

percentile as they are more likely to qualify and accept loans.
Educated Customers: Graduate and professional-level education segments should
be prioritized in marketing campaigns.
Financially Active Customers: Customers with high credit card spending are key
targets for cross-selling loans.

3. Pruned Decision Tree Model:

The post-pruned decision tree model offers the best balance of recall (85.23%)
and precision (93.38%), making it suitable for identifying high-probability loan
takers while minimizing false positives.

Business Recommendations
1. Targeted Marketing Campaigns:

Use the model to identify customers with high probabilities of loan acceptance
based on their income, education, and spending habits.
Develop personalized loan offers tailored to these segments to increase
conversion rates.

2. Precision-Driven Strategy:

Leverage the high precision of the post-pruned model to minimize wasted

marketing efforts on uninterested customers.
Focus on cost-effective campaigns by prioritizing customers with the highest
predicted probabilities of loan acceptance.
3. Cross-Selling Opportunities:

Identify customers with existing Mnancial products (e.g., CD accounts) for

potential upselling or cross-selling of loans.
Highlight speciMc loan beneMts, such as low-interest rates for high-income earners
or cexible repayment plans for large families.

4. Educational Outreach:

For customers with undergraduate education, consider campaigns focused on

Mnancial literacy to improve their understanding of personal loan beneMts.
This could expand the target base for future campaigns.

5. Digital Marketing Channels:

Given that a signiMcant portion of the bank’s customers use online banking,
prioritize digital marketing channels to reach these customers effectively.

o. Continuous Model Monitoring:

Deploy the post-pruned decision tree model in production, but continuously

monitor its performance on new data to detect potential data drift.
Regularly retrain the model using updated customer data for sustained accuracy.

7. Customer Feedback Loop:

Collect and analyze feedback from customers who decline loans to identify
potential barriers (e.g., interest rates, terms) and improve future offerings.

Expected BeneRts

- Improved conversion rates for personal loan campaigns by focusing on high-probability seg
- Reduced marketing costs by targeting precise customer segments and minimizing false posit
- Enhanced customer satisfaction through personalized and relevant loan offers.

ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Project Mobilization Plan-Narrative
100% (1)
Project Mobilization Plan-Narrative
25 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Lovelock PPT Chapter 15
100% (1)
Lovelock PPT Chapter 15
28 pages
Electrical Workshop Practice 2
95% (20)
Electrical Workshop Practice 2
42 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Mvchine Learning Project Report
No ratings yet
Mvchine Learning Project Report
33 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Project Questions
No ratings yet
Project Questions
3 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
FRA Extended
No ratings yet
FRA Extended
22 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
Color: Due On Sunday June 7th, by 11:59PM
No ratings yet
Color: Due On Sunday June 7th, by 11:59PM
2 pages
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
100% (1)
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
24 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
ML Quiz 2
No ratings yet
ML Quiz 2
1 page
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Clustering Project
100% (1)
Clustering Project
44 pages
Assignment ML
100% (2)
Assignment ML
21 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
100% (2)
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
5 pages
Great Lakes Extraa_Learn Project Business Report - 2-Kavish-Rathod
No ratings yet
Great Lakes Extraa_Learn Project Business Report - 2-Kavish-Rathod
22 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Predictive Modeling - Supporting File1
No ratings yet
Predictive Modeling - Supporting File1
3 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Capstone Project Report 2
No ratings yet
Capstone Project Report 2
178 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Problem 1 - (Download Data) : Importing Nessceary Libraries
No ratings yet
Problem 1 - (Download Data) : Importing Nessceary Libraries
16 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
100% (2)
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
47 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Naive Bayes vs Logistic Regression
No ratings yet
Naive Bayes vs Logistic Regression
16 pages
Answer Sheet: Personal Development Lesson 7 A. What's in
No ratings yet
Answer Sheet: Personal Development Lesson 7 A. What's in
4 pages
Summer Camp Flyer
No ratings yet
Summer Camp Flyer
2 pages
A Comprehensive Look at The Acid Number Test PDF
No ratings yet
A Comprehensive Look at The Acid Number Test PDF
6 pages
81381C Plastic Machinery Eng PDF
No ratings yet
81381C Plastic Machinery Eng PDF
76 pages
Ch3 - Text
No ratings yet
Ch3 - Text
38 pages
Hots Polynomials 2 Marks Questions1
No ratings yet
Hots Polynomials 2 Marks Questions1
6 pages
The Hindu Editorial Vocabulary Compilation August
No ratings yet
The Hindu Editorial Vocabulary Compilation August
25 pages
TD Zinc Stearate
No ratings yet
TD Zinc Stearate
1 page
Bicubic Interpolation Wiki PDF
No ratings yet
Bicubic Interpolation Wiki PDF
4 pages
Quiz - 2: Group 2
No ratings yet
Quiz - 2: Group 2
5 pages
DSS PD Tank-Manual
No ratings yet
DSS PD Tank-Manual
7 pages
Panasonic Drive
No ratings yet
Panasonic Drive
24 pages
Quo 226 PNC Cv. Citra Pandugo
No ratings yet
Quo 226 PNC Cv. Citra Pandugo
1 page
Hubungan Posisi Kerja Dengan Keluhan Muskuloskeletal Pada Unit Pengelasan Pt. X Bekasi
No ratings yet
Hubungan Posisi Kerja Dengan Keluhan Muskuloskeletal Pada Unit Pengelasan Pt. X Bekasi
10 pages
13 Titration Lesson
No ratings yet
13 Titration Lesson
15 pages
TMS Grid Pack Developers Guide
No ratings yet
TMS Grid Pack Developers Guide
217 pages
Challenging American Boundaries Indigenous People and The Gift of U S Citizenship
100% (1)
Challenging American Boundaries Indigenous People and The Gift of U S Citizenship
14 pages
Helicopter Stringing Management Plan
No ratings yet
Helicopter Stringing Management Plan
44 pages
OBE-Lecture-Plan-Mechanics of Materials - 2013-14 - Rev1
No ratings yet
OBE-Lecture-Plan-Mechanics of Materials - 2013-14 - Rev1
5 pages
SAT Suite Question Bank-craft and Structure 6(With Explanations)
No ratings yet
SAT Suite Question Bank-craft and Structure 6(With Explanations)
226 pages
Alice in Pragmaticland CLAC 2000
No ratings yet
Alice in Pragmaticland CLAC 2000
16 pages
3331 Ch.10 Circuit Switching and Packet Switching
No ratings yet
3331 Ch.10 Circuit Switching and Packet Switching
30 pages
Intervention Program Thesis
100% (2)
Intervention Program Thesis
4 pages
Wing Design K-12
100% (1)
Wing Design K-12
56 pages
CHAPTER 2 Instrumentation
No ratings yet
CHAPTER 2 Instrumentation
11 pages
Danfoss Multiple Motors
No ratings yet
Danfoss Multiple Motors
3 pages
Product Shellfish Diet 1800 Use Info
No ratings yet
Product Shellfish Diet 1800 Use Info
2 pages