Description: Bank - Marketing - Part1 - Data - CSV

The document provides instructions for a data mining project involving customer segmentation and insurance claim prediction. For customer segmentation, participants are asked to perform exploratory data analysis, determine if scaling is needed, perform hierarchical and K-means clustering on credit card usage data to identify customer segments, and recommend promotional strategies. For insurance claim prediction, participants will use decision tree, random forest, and neural network models on tour insurance data and compare their performance to predict claims and provide recommendations.

Uploaded by

ravikgovindu

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views

Description: Bank - Marketing - Part1 - Data - CSV

Uploaded by

ravikgovindu

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Description

Dear Participants,
Please find below the Data Mining Project instructions:
 You have to submit 2 files :

0. Answer Report: In this, you need to submit all the answers to all the
questions in a sequential manner. It should include the detailed
explanation of the approach used, insights, inferences, all outputs
of codes like graphs, tables etc. Your report should not be filled with
codes. You will be evaluated based on the business report.
1. Jupyter Notebook file: This is a must and will be used for reference
while evaluating
Problem 1: Clustering
A leading bank wants to develop a customer segmentation to give promotional offers
to its customers. They collected a sample that summarizes the activities of users
during the past few months. You are given the task to identify the segments based
on credit card usage.

1.1 Read the data, do the necessary initial steps, and exploratory data analysis
(Univariate, Bi-variate, and multivariate analysis).
1.2 Do you think scaling is necessary for clustering in this case? Justify
1.3 Apply hierarchical clustering to scaled data. Identify the number of optimum
clusters using Dendrogram and briefly describe them
1.4 Apply K-Means clustering on scaled data and determine optimum clusters. Apply
elbow curve and silhouette score. Explain the results properly. Interpret and write
inferences on the finalized clusters.
1.5 Describe cluster profiles for the clusters defined. Recommend different
promotional strategies for different clusters.
Dataset for Problem 1: bank_marketing_part1_Data.csv
Data Dictionary for Market Segmentation:
1. spending: Amount spent by the customer per month (in 1000s)
2. advance_payments: Amount paid by the customer in advance by cash (in
100s)
3. probability_of_full_payment: Probability of payment done in full by the
customer to the bank
4. current_balance: Balance amount left in the account to make purchases (in
1000s)
5. credit_limit: Limit of the amount in credit card (10000s)
6. min_payment_amt : minimum paid by the customer while making payments
for purchases made monthly (in 100s)
7. max_spent_in_single_shopping: Maximum amount spent in one purchase (in
1000s)
Problem 2: CART-RF-ANN
An Insurance firm providing tour insurance is facing higher claim frequency. The
management decides to collect data from the past few years. You are assigned the
task to make a model which predicts the claim status and provide recommendations
to management. Use CART, RF & ANN and compare the models' performances in
train and test sets.
2.1 Read the data, do the necessary initial steps, and exploratory data analysis
(Univariate, Bi-variate, and multivariate analysis).
2.2 Data Split: Split the data into test and train, build classification model CART,
Random Forest, Artificial Neural Network
2.3 Performance Metrics: Comment and Check the performance of Predictions on
Train and Test sets using Accuracy, Confusion Matrix, Plot ROC curve and get
ROC_AUC score, classification reports for each model.
2.4 Final Model: Compare all the models and write an inference which model is
best/optimized.
2.5 Inference: Based on the whole Analysis, what are the business insights and
recommendations
Dataset for Problem 2: insurance_part2_data-1.csv
Attribute Information:
1. Target: Claim Status (Claimed)
2. Code of tour firm (Agency_Code)
3. Type of tour insurance firms (Type)
4. Distribution channel of tour insurance agencies (Channel)
5. Name of the tour insurance products (Product)
6. Duration of the tour (Duration in days)
7. Destination of the tour (Destination)
8. Amount worth of sales per customer in procuring tour insurance policies in rupees
(in 100’s)
9. The commission received for tour insurance firm (Commission is in percentage of
sales)
10.Age of insured (Age)

Important Note: Please reflect on all that you have learned while working on
this project. This step is critical in cementing all your concepts and closing the
loop. Please write down your thoughts here.
All the very best!
Regards,
Program Office

Scoring guide (Rubric) - Project - Data Mining

Criteria Points

1.1 Read the data and do exploratory data analysis (3 pts). Describe the data briefly. Interpret the inferences for each (3 pts). Initial steps
like head() .info(), Data Types, etc . Null value check. Distribution plots(histogram) or similar plots for the continuous columns. Box plots,
Correlation plots. Appropriate plots for categorical variables. Inferences on each plot. Summary stats, Skewness, Outliers proportion 6
should be discussed, and inferences from above used plots should be there. There is no restriction on how the learner wishes to
implement this but the code should be able to represent the correct output and inferences should be logical and correct.
Criteria Points

1.2 Do you think scaling is necessary for clustering in this case? Justify The learner is expected to check and comment about the
difference in scale of different features on the bases of appropriate measure for example std dev, variance, etc. Should justify whether 2
there is a necessity for scaling and which method is he/she using to do the scaling. Can also comment on how that method works.

1.3 Apply hierarchical clustering to scaled data (3 pts). Identify the number of optimum clusters using Dendrogram and briefly describe
them (4). Students are expected to apply hierarchical clustering. It can be obtained via Fclusters or Agglomerative Clustering. Report
should talk about the used criterion, affinity and linkage. Report must contain a Dendrogram and a logical reason behind choosing the 7
optimum number of clusters and Inferences on the dendrogram. Customer segmentation can be visualized using limited features or
whole data but it should be clear, correct and logical. Use appropriate plots to visualize the clusters.

1.4 Apply K-Means clustering on scaled data and determine optimum clusters (2 pts). Apply elbow curve and silhouette score (3 pts).
Interpret the inferences from the model (2.5 pts). K-means clustering code application with different number of clusters. Calculation of
WSS(inertia for each value of k) Elbow Method must be applied and visualized with different values of K. Reasoning behind the
selection of the optimal value of K must be explained properly. Silhouette Score must be calculated for the same values of K taken 7
above and commented on. Report must contain logical and correct explanations for choosing the optimum clusters using both elbow
method and silhouette scores. Append cluster labels obtained from K-means clustering into the original data frame. Customer
Segmentation can be visualized using appropriate graphs.

1.5 Describe cluster profiles for the clusters defined (2.5 pts). Recommend different promotional strategies for different clusters in
context to the business problem in-hand (2.5 pts ). After adding the final clusters to the original dataframe, do the cluster profiling.
Divide the data in the finalyzed groups and check their means. Explain each of the group briefly. There should be at least 3-4
Recommendations. Recommendations should be easily understandable and business specific, students should not give any technical 5
suggestions. Full marks will only be allotted if the recommendations are correct and business specific. variable means. Students to
explain the profiles and suggest a mechanism to approach each cluster. Any logical explanation is acceptable.

2.1 Read the data and do exploratory data analysis (4 pts). Describe the data briefly. Interpret the inferences for each (2 pts). Initial steps
like head() .info(), Data Types, etc . Null value check. Distribution plots(histogram) or similar plots for the continuous columns. Box plots,
Correlation plots. Appropriate plots for categorical variables. Inferences on each plot. Summary stats, Skewness, Outliers proportion 6
should be discussed, and inferences from above used plots should be there. There is no restriction on how the learner wishes to
implement this but the code should be able to represent the correct output and inferences should be logical and correct.

2.2 Data Split: Split the data into test and train(1 pts), build classification model CART (1.5 pts), Random Forest (1.5 pts), Artificial Neural
Network(1.5 pts). Object data should be converted into categorical/numerical data to fit in the models. (pd.categorical().codes(),
pd.get_dummies(drop_first=True)) Data split, ratio defined for the split, train-test split should be discussed. Any reasonable split is
acceptable. Use of random state is mandatory. Successful implementation of each model. Logical reason behind the selection of 5.5
different values for the parameters involved in each model. Apply grid search for each model and make models on best_params. Feature
importance for each model.

2.3 Performance Metrics: Check the performance of Predictions on Train and Test sets using Accuracy (1 pts), Confusion Matrix (2 pts),
Plot ROC curve and get ROC_AUC score for each model (2 pts), Make classification reports for each model. Write inferences on each
model (2 pts). Calculate Train and Test Accuracies for each model. Comment on the validness of models (overfitting or underfitting)
Build confusion matrix for each model. Comment on the positive class in hand. Must clearly show obs/pred in row/col Plot roc_curve 7
for each model. Calculate roc_auc_score for each model. Comment on the above calculated scores and plots. Build classification reports
for each model. Comment on f1 score, precision and recall, which one is important here.

2.4 Final Model - Compare all models on the basis of the performance metrics in a structured tabular manner (2.5 pts). Describe on
which model is best/optimized (1.5 pts ). A table containing all the values of accuracies, precision, recall, auc_roc_score, f1 score.
Comparison between the different models(final) on the basis of above table values. After comparison which model suits the best for the 4
problem in hand on the basis of different measures. Comment on the final model.

2.5 Based on your analysis and working on the business problem, detail out appropriate insights and recommendations to help the
management solve the business objective. There should be at least 3-4 Recommendations and insights in total. Recommendations
should be easily understandable and business specific, students should not give any technical suggestions. Full marks should only be 4.5
allotted if the recommendations are correct and business specific.
Criteria Points

Quality of Business Report (Please refer to the Evaluation Guidelines for Business report checklist. Marks in this criteria are at the
moderator's discretion) 6

Please reflect on all that you learnt and fill this reflection
report:https://docs.google.com/forms/d/e/1FAIpQLSd7e_bJVCiFpZAYbBTMtKrr9TLRnl8kuvZT7IsZ5MSjRtfjcQ/viewform?usp=sf_link 0

Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
Data Mining Business Report-Clustering & CART
100% (4)
Data Mining Business Report-Clustering & CART
57 pages
Project Questions
No ratings yet
Project Questions
4 pages
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
No ratings yet
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
50 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Data Mininig Project
67% (3)
Data Mininig Project
28 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
100% (5)
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
83 pages
AS350B3 Transition Pilot POI
No ratings yet
AS350B3 Transition Pilot POI
8 pages
Dual PQ Theory PPT Presentation
0% (1)
Dual PQ Theory PPT Presentation
19 pages
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
Project Report - Data Mining
0% (1)
Project Report - Data Mining
52 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Project 4 Data Mining Final v2
100% (1)
Project 4 Data Mining Final v2
19 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Data Mining
No ratings yet
Data Mining
28 pages
Data Mining Project
No ratings yet
Data Mining Project
11 pages
Data Mining Project
No ratings yet
Data Mining Project
34 pages
Data Mining Project Report - Reshma
No ratings yet
Data Mining Project Report - Reshma
23 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Proposal CIS 412
No ratings yet
Proposal CIS 412
1 page
UL Coded Project Report_KC (1)
No ratings yet
UL Coded Project Report_KC (1)
30 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Insights
No ratings yet
Insights
2 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Day13-K-Means Clustering
No ratings yet
Day13-K-Means Clustering
10 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Data Mining Project Anshul
100% (1)
Data Mining Project Anshul
48 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
ITNPBD6 Assignment 2018-2 PDF
No ratings yet
ITNPBD6 Assignment 2018-2 PDF
2 pages
Assignment 1 DA_E Oct 2023 V1-1 (3)
No ratings yet
Assignment 1 DA_E Oct 2023 V1-1 (3)
3 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
K Means Clustering
No ratings yet
K Means Clustering
12 pages
Data Mining - Project
100% (2)
Data Mining - Project
25 pages
Default Payment Analysis of Credit Card Clients: July 2018
No ratings yet
Default Payment Analysis of Credit Card Clients: July 2018
7 pages
Churn Analysis of Bank Customers
100% (1)
Churn Analysis of Bank Customers
12 pages
Clustering
No ratings yet
Clustering
53 pages
B Tech-AIML-question bank-2 Answer Key
No ratings yet
B Tech-AIML-question bank-2 Answer Key
9 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
A1991370857_65680_10_2025_CSM355CA1
No ratings yet
A1991370857_65680_10_2025_CSM355CA1
6 pages
Bank Customer Segmentation
No ratings yet
Bank Customer Segmentation
14 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
Data Mining Project Ragunathan
No ratings yet
Data Mining Project Ragunathan
21 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Thera Bank
100% (1)
Thera Bank
25 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Assignment
No ratings yet
Assignment
2 pages
Acquisition of PP&E Acquisition of PP&E
No ratings yet
Acquisition of PP&E Acquisition of PP&E
20 pages
Zeiss Ikon Contina Iia
No ratings yet
Zeiss Ikon Contina Iia
31 pages
Achieving Competitive Advantage Through Porter's Generic Competitive Strategies - Wal Mart Case Study.
No ratings yet
Achieving Competitive Advantage Through Porter's Generic Competitive Strategies - Wal Mart Case Study.
19 pages
List of Software Bugs
No ratings yet
List of Software Bugs
10 pages
Course Outline Final Admin Public Officers Elec
No ratings yet
Course Outline Final Admin Public Officers Elec
17 pages
Motilal Oswal PMS Brochure
No ratings yet
Motilal Oswal PMS Brochure
12 pages
Form - X: Plot No 116-117 & 65-66, Part-A, Mie, Bahadurgarh, Distt - Jhajjar (Haryana)
No ratings yet
Form - X: Plot No 116-117 & 65-66, Part-A, Mie, Bahadurgarh, Distt - Jhajjar (Haryana)
11 pages
Sergia Soriano Esteban Integrated School Sped Center Junior High School Department
No ratings yet
Sergia Soriano Esteban Integrated School Sped Center Junior High School Department
2 pages
Home Entertainment: Displays
No ratings yet
Home Entertainment: Displays
7 pages
Pizza Hut.. Strategic
100% (1)
Pizza Hut.. Strategic
12 pages
Signage Manual
No ratings yet
Signage Manual
14 pages
Lenovo A5000 Manual
No ratings yet
Lenovo A5000 Manual
6 pages
واقع وآفاق الاقتصاد الجزائري في ظل التكامل الاقتصادي الإقليمي المغاربي
No ratings yet
واقع وآفاق الاقتصاد الجزائري في ظل التكامل الاقتصادي الإقليمي المغاربي
22 pages
Fire Insurance Concept Map
No ratings yet
Fire Insurance Concept Map
1 page
Job Satisfaction and Performance of The Faculty in The College of Teacher Education
No ratings yet
Job Satisfaction and Performance of The Faculty in The College of Teacher Education
16 pages
Dissertation Methodology Structure Secondary Data
100% (1)
Dissertation Methodology Structure Secondary Data
4 pages
Curriculum Vitae: (Your Full Name)
No ratings yet
Curriculum Vitae: (Your Full Name)
5 pages
BMC Health Services Research: Willing To Wait?: The Influence of Patient Wait Time On Satisfaction With Primary Care
No ratings yet
BMC Health Services Research: Willing To Wait?: The Influence of Patient Wait Time On Satisfaction With Primary Care
5 pages
Spring Course Syllabus
No ratings yet
Spring Course Syllabus
4 pages
Bluenrg 2 PDF
No ratings yet
Bluenrg 2 PDF
175 pages
2 - CIR V Filinvest PDF
No ratings yet
2 - CIR V Filinvest PDF
2 pages
Future Con Ten
No ratings yet
Future Con Ten
7 pages
EZX
100% (1)
EZX
3 pages
Models of Consumer Behaviour
100% (1)
Models of Consumer Behaviour
43 pages
Manual of Capillary Cutting Machine
No ratings yet
Manual of Capillary Cutting Machine
29 pages
Ra 1405 Notes
No ratings yet
Ra 1405 Notes
11 pages
Calamansi Juice: in Partial Fulfillment of The Requirements For The Subject of
No ratings yet
Calamansi Juice: in Partial Fulfillment of The Requirements For The Subject of
9 pages