0% found this document useful (0 votes)

19 views

Agglomerative Clustering - Customer Segmentation Term Paper

The document discusses customer segmentation done on a grocery mart's customer data using agglomerative clustering. The dataset from Kaggle containing 2240 customers and 29 attributes is preprocessed and reduced to 3 dimensions using PCA. Agglomerative clustering identifies 4 optimal customer segments which are profiled based on spending patterns, demographics and marketing campaign responses.

Uploaded by

vedant wakankar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Agglomerative Clustering - Customer Segmentation Term Paper

Uploaded by

vedant wakankar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Unveiling Customer Diversity: Exploring Agglomerative

Clustering for Grocery Mart Segmentation

ABSTRACT
This project explores the application of agglomerative clustering on customer
data taken from a grocery mart’s database. The aim of this project is to segment
the customers into distinct clusters based on purchasing behaviour. The dataset
has been streamlined using dimensionality reduction methods, followed by
agglomerative clustering to identify clusters. The research resulted in 4 distinct
customer segments that were profiled based on various factors such as family
structures, income levels and spending patterns. These insights offer valuable
opportunities for developing targeted marketing strategies to meet the needs of
each customer segment, thereby increasing the effectiveness of marketing
strategies in the retail industry.

INTRODUCTION
The process of grouping a set of physical or abstract objects into classes of
similar objects is called clustering. A cluster is a collection of data objects that
are like one another within the same cluster and are dissimilar to the objects in
other clusters. A cluster of data objects can be treated collectively as one group
and so may be considered as a form of data compression. Although
classification is an effective means for distinguishing groups or classes of
objects, it requires the often-costly collection and labelling of a large set of
training tuples or patterns, which the classifier uses to model each group.
Clustering is also called data segmentation in some applications because
clustering partitions large data sets into groups according to their similarity .
(Huda Hamdan Ali, 2015)

The reasons for customer segmentation are elaborated below:

• Understanding Customer Mindset: The most important stakeholder for
a business, specially in the retail industry is their customers. It is pivotal
for the management to understand the needs and preferences of their
customers to make the business successful.
• Targeted Marketing Strategies: Deploying marketing strategies without
understanding the customers and their preferences will not bring success.
By segmenting the customers, tailor made marketing campaigns can be
deployed to target each distinct segment.
• High Revenue: This is one of the main requirements of any customer
segmentation process. Higher revenue can be collected due to the
collective efforts of the abovementioned advantages.

Customer segmentation can be done by cluster analysis, an unsupervised

machine learning algorithm. Clusters are made based on similarities or distance
measures. There are various types of clustering methods available such as:
1. K-means clustering,
2. Hierarchical clustering,
3. Density-Based clustering (DBSCAN),
4. Agglomerative clustering,
And so on. For this project, agglomerative clustering has been used.
Agglomerative clustering is a hierarchical clustering algorithm that starts by
assuming each data point to be a separate cluster and iteratively merges till the
specified number of clusters remain. Through the analysis, 4 clusters have been
chosen as the optimal number of clusters by plotting the silhouette scores.

DATA COLLECTION AND RESEARCH METHODOLOGY

The dataset named “marketing_campaign.csv” was taken from Kaggle.com.
Kaggle is an online platform of data scientists and machine learning experts.
Akash Patel.2023.Customer Personality Analysis.Kaggle. Customer Personality Analysis
(kaggle.com)

The dataset consists of 2240 datapoints and 29 attributes. It can be categorised

into the following subsets:
1. Customer’s Information:
• ID – Customer’s unique identifier.
• Year_Birth – Customer’s birth year.
• Education – Education level.
• Marital Status – customer’s marital status.
• Income – Yearly household income.
• Kidhome – Number of children at home.
• Teenhome – Number of teenagers at home.
• Dt_Customer – Date of customer’s enrolment with company.
• Recency – Number of days since customer’s last purchase.
• Complain – 1 if customer complained in last 2 years, 0 otherwise.

2. Products (Amount spent on different products in last 2 years)

• MntWines – amount spent on wines.
• MntFruits – amount spent on fruits.
• MntMeatProducts – amount spent on meat products.
• MntFishProducts – amount spent on fish products.
• MntSweetProducts – amount spent on sweets.
• MntGoldProducts – amount spent on gold.
3. Promotions
• NumDealsPurchased – Number of purchases made with a discount
• AcceptedCmp1 – If customer accepted offer in 1st campaign, 0
otherwise.
• AcceptedCmp2 - If customer accepted offer in 2nd campaign, 0
otherwise.
• AcceptedCmp3 - If customer accepted offer in 3rd campaign, 0
otherwise.
• AcceptedCmp4 - If customer accepted offer in 4th campaign, 0
otherwise.
• AcceptedCmp5 - If customer accepted offer in 5th campaign, 0
otherwise.
• Response - If customer accepted offer in the last campaign, 0
otherwise.

4. Place
• NumWebPurchases – Number of purchases through website.
• NumCatalogPurchases – Number of purchases made using
catalogue.
• NumStorePurchases – Number of purchases made directly in
stores.
• NumWebVisitsMonth – Number of visits to company website in
last month.
For this project, the model has been built in Python as it is the most preferred
and largely used programming language for machine learning applications.
Execution of the code for the agglomerative clustering model has been done in
Jupyter Notebook, which is an IDE (Interactive Development Environment) for
Python.
The dataset has been imported into the IDE, following which the data has been
cleaned to deal with missing values. After cleaning, feature engineering has
been done to further aid with dimensionality reduction later in the project.
The features have been plotted and the identified outliers have been removed.

Clearly there are a few outliers in the Income and Age features.

DATA PREPROCESSING AND DIMENSIONALITY REDUCTION

Firstly, the correlation amongst the features was plotted (excluding the
categorical attributes)
The data is quite clean, and the new features have also been included.
Following this, label encoding was done on the categorical features and the
features were scaled using a standard scaler.
A subset of the dataset has been created for further dimensionality reduction
using Principal Component Analysis (PCA). High number of features are more
difficult to work with. Hence, dimensionality reduction was done.
Dimensionality reduction is the process of reducing the number of random
variables under consideration, by obtaining a set of principal values.
Post PCA, the dimensions have been reduced to 3 and the summary is as
follows:
By looking at these statistics, the mean for all three components is close to zero,
and the standard deviation is positive for all three components. This suggests
that the data points are spread out around the mean, but not all in one direction.
3D projection of the data in the reduced dimensions is as follows:

3D scatter plot showing a projection of high-dimensional data onto three

principal components, which are often referred to as PC1, PC2, and PC3. These
axes capture the most important information from the original data set. Text
labels along the axes show the values from -6 to 6 for PC1, -4 to 4 for PC2, and
-2 to 2 for PC3.
CLUSTERING AND MODEL EVALUATION
Steps involved in Clustering:
• Plotting Silhouette Scores to determine optimal number of clusters.
• Agglomerative Clustering
• Examining clusters via scatter plot.

Based on this plot, the optimal number of clusters chosen is 4. The point before
the curve plateaus has been chosen, the point indicates that the clusters have
high cohesion.

The clusters have then been examined via a scatter plot.

The Clusters have been plotted as bar graphs to check their distribution.
The clusters seem to be fairly distributed.
Income vs Spending Plot:

Based on the graph we can assume that,

Group 0: High spending & average income
Group 1: High spending & high income
Group 2: Low spending & low income
Group 3: High spending & low income

Distribution of Clusters based on products:

We can infer that cluster 1 is our biggest set of customers closely followed by
cluster 0.

Exploration of past campaigns:

Based on the plot, we can infer that no customer has taken part in all 5
campaigns. The overall response is underwhelming.
Deals offered:

The deals offered have done well. The best outcomes can be seen with cluster 0
and cluster 3. Cluster 1 and 2 haven’t been attracted as much.

CUSTOMER SEGMENT PROFILING AND INTERPRETATION

For profiling of the customers into different segments, 9 plots have been made.
The following features have been plotted against Expenditure (“Spent”):

1. “Kidhome”,
2. “Teenhome”,
3. “Customer_for”,
4. “Age”,
5. “Children”,
6. “Family_Size”,
7. “Is_parent”,
8. “education”,
9. “Living_with”
Based on these plots, the following information can be deduced about the
customers:
Cluster 0:
• A parent.
• At least 2 and at most 4 members in the family.
• Single parents are a subset of this group.
• Most have a teenager at home.
• Relatively older.

Cluster 1:
• Not a parent.
• At most 2 family members.
• Slight majority of couples.
• Span all ages.
• High income.
Cluster 2:
• Majority Parents.
• At most 3 members in the family.
• They majorly have one kid.
• Relatively Younger.
Cluster 3:
• A parent.
• At most 5 and at least 2 family members.
• Majority of them have a teenager at home.
• Relatively older.
• Lower-income group.

DISCUSSION AND CONCLUSION

Understanding a business’s customer base is extremely important for any
business organisation. Customer segmentation is one of the ways to gain
deeper understanding of customer behaviour. It is one of the important
applications of cluster analysis amongst many applications spread across
different domains. Sales and marketing efforts can be well designed for
these clusters of customers to achieve high return on investment. Unsupervised
machine learning algorithms such as Agglomerative clustering algorithms can
be easily applied using python support libraries to summarize and visualize the
clusters. The current research applied Agglomerative clustering algorithms
on Marketing Campaign dataset and discovered different clusters from that data.
These clusters can help marketing team of the Grocery Mart to focus on these
segments of customers differently and achieve maximum profit.

REFERENCES:
1. H. H. Ali and L. E. Kadhum, ‘K-Means Clustering Algorithm Applications in Data Mining and Pattern
Recognition’, Int. J. Sci. Res., vol. 6, no. 8, pp. 1577–1584, 2017
2. Customer Personality Analysis (kaggle.com)

Salesforce AI Associate Dumps
100% (4)
Salesforce AI Associate Dumps
60 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
67% (3)
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
66 pages
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
No ratings yet
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
3 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
Segmentation Analysis
No ratings yet
Segmentation Analysis
17 pages
2019 Book EssentialsOfBusinessAnalytics PDF
93% (14)
2019 Book EssentialsOfBusinessAnalytics PDF
971 pages
AP Statistics Chapter 3
0% (1)
AP Statistics Chapter 3
3 pages
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
100% (1)
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
1 page
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
Logistics Customer Segmentation Modeling On Attribute Reduction and K-Means Clustering
No ratings yet
Logistics Customer Segmentation Modeling On Attribute Reduction and K-Means Clustering
20 pages
Home Depot Strategy
100% (1)
Home Depot Strategy
8 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
Behavioural Customer Segmentation Based
No ratings yet
Behavioural Customer Segmentation Based
7 pages
WORK BOOK 8 - Segmentation
No ratings yet
WORK BOOK 8 - Segmentation
12 pages
br17 Final Project Report
No ratings yet
br17 Final Project Report
7 pages
Assignment ....
No ratings yet
Assignment ....
8 pages
Mall Customer Segmentation
No ratings yet
Mall Customer Segmentation
19 pages
Tivo Report
No ratings yet
Tivo Report
3 pages
Customer Analytics - Course Notes
No ratings yet
Customer Analytics - Course Notes
19 pages
Audience Profiling and Segmentation
No ratings yet
Audience Profiling and Segmentation
4 pages
Business Analytics Course
No ratings yet
Business Analytics Course
11 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Cluster analysis (3)
No ratings yet
Cluster analysis (3)
46 pages
Understanding Customers - Profiling and Segmentation: Mircea Andrei SCRIDON
No ratings yet
Understanding Customers - Profiling and Segmentation: Mircea Andrei SCRIDON
10 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
288175101
No ratings yet
288175101
51 pages
Cluster Analysis
No ratings yet
Cluster Analysis
49 pages
Customer Profiling - Overview
No ratings yet
Customer Profiling - Overview
12 pages
Enterprise Final Demo
No ratings yet
Enterprise Final Demo
8 pages
Customer Segmentation Using Data Science
No ratings yet
Customer Segmentation Using Data Science
7 pages
Factor Analysis - Segmentation New
No ratings yet
Factor Analysis - Segmentation New
142 pages
Projects PDF
No ratings yet
Projects PDF
12 pages
Marketic Analytics Slides (1)
No ratings yet
Marketic Analytics Slides (1)
70 pages
ML Project Report
No ratings yet
ML Project Report
22 pages
DSML - Project Report - Group 3
No ratings yet
DSML - Project Report - Group 3
17 pages
SmartFresh Retail Case_v2
No ratings yet
SmartFresh Retail Case_v2
21 pages
da_cs-1
No ratings yet
da_cs-1
11 pages
Employee Mangement System
No ratings yet
Employee Mangement System
60 pages
Markdown Optimization 2017
No ratings yet
Markdown Optimization 2017
34 pages
Outmarket the Competition: Advanced Marketing Tactics to Drive Growth and Profitability
From Everand
Outmarket the Competition: Advanced Marketing Tactics to Drive Growth and Profitability
Nick Doyle
No ratings yet
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
_DWDM_PPT
No ratings yet
_DWDM_PPT
13 pages
Session 1 - Marketing Business Analytics - 0621
No ratings yet
Session 1 - Marketing Business Analytics - 0621
68 pages
Mall Customer Segmentation Kalash Daf
No ratings yet
Mall Customer Segmentation Kalash Daf
12 pages
Marketing Analytics Unit 3
No ratings yet
Marketing Analytics Unit 3
18 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
WQD7005 Case Study - 17219402
No ratings yet
WQD7005 Case Study - 17219402
21 pages
DABI - Final Assignment - Arif - Shayekh
No ratings yet
DABI - Final Assignment - Arif - Shayekh
12 pages
FTU 2024 Chap10 Using Customer Related Data for Analytics
No ratings yet
FTU 2024 Chap10 Using Customer Related Data for Analytics
26 pages
Chen2012 Article DataMiningForTheOnlineRetailIn
No ratings yet
Chen2012 Article DataMiningForTheOnlineRetailIn
12 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Customer Analytics Retail Project
No ratings yet
Customer Analytics Retail Project
8 pages
3.2 CUSTOMER PR-WPS Office
No ratings yet
3.2 CUSTOMER PR-WPS Office
17 pages
Aldy Budhi Iskandar - PPT Final Project
No ratings yet
Aldy Budhi Iskandar - PPT Final Project
34 pages
MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
Customer Analytics
No ratings yet
Customer Analytics
3 pages
Automobile Manufacturing - MRA - Priyanka
No ratings yet
Automobile Manufacturing - MRA - Priyanka
24 pages
Cluster Analysis: Learning Objectives
No ratings yet
Cluster Analysis: Learning Objectives
53 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Data Mining
No ratings yet
Data Mining
27 pages
Customer Segmentation New
No ratings yet
Customer Segmentation New
11 pages
Suwarti - Final Project
No ratings yet
Suwarti - Final Project
20 pages
Bdareport
No ratings yet
Bdareport
15 pages
JPSP202244
No ratings yet
JPSP202244
7 pages
Data Insights - Module 2 (Sanskar)
No ratings yet
Data Insights - Module 2 (Sanskar)
19 pages
Customer Classification Based on The Historical Purchase Data
No ratings yet
Customer Classification Based on The Historical Purchase Data
3 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
Resume Updated
100% (3)
Resume Updated
2 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
TED Talks List
100% (2)
TED Talks List
15 pages
ATS Resume Template PDF
No ratings yet
ATS Resume Template PDF
1 page
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
Political Analysis
No ratings yet
Political Analysis
11 pages
Outdoor Living Skills (PDFDrive) PDF
No ratings yet
Outdoor Living Skills (PDFDrive) PDF
157 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
Cyber Resilience Blueprint
No ratings yet
Cyber Resilience Blueprint
12 pages
Globalization Strategy Playbook: Document Revision History
100% (2)
Globalization Strategy Playbook: Document Revision History
93 pages
SAP GTS Case Study - Citrix - Systems
100% (1)
SAP GTS Case Study - Citrix - Systems
2 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
Guidance On Good Data and Record Management Practices
No ratings yet
Guidance On Good Data and Record Management Practices
44 pages
2015 Book IntroductionToNursingInformati
100% (1)
2015 Book IntroductionToNursingInformati
456 pages
The Chemical Engineer - Issue 983 - May 2023
No ratings yet
The Chemical Engineer - Issue 983 - May 2023
68 pages
NIST 2 Framework
100% (1)
NIST 2 Framework
32 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
Whitepaper - Third-Party Risk Management Services
No ratings yet
Whitepaper - Third-Party Risk Management Services
24 pages
Spatial and Temporal Correlation
No ratings yet
Spatial and Temporal Correlation
7 pages
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
No ratings yet
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
42 pages
Speech Recognition Using MFCC: September 2015
No ratings yet
Speech Recognition Using MFCC: September 2015
5 pages
Machine Learning: Design, Development and Augmented Intelligence
No ratings yet
Machine Learning: Design, Development and Augmented Intelligence
25 pages
Convex Cardinality Optimization
No ratings yet
Convex Cardinality Optimization
26 pages
Artículo Evolución 2023
No ratings yet
Artículo Evolución 2023
21 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Introduction to Environmental Data Science Hsieh instant download
No ratings yet
Introduction to Environmental Data Science Hsieh instant download
23 pages
Analytical Methods: Paper
No ratings yet
Analytical Methods: Paper
8 pages
Metabolites 03 00259
No ratings yet
Metabolites 03 00259
18 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
17 pages
frm_download_file
No ratings yet
frm_download_file
11 pages
CIP-007-6-Cyber Security - Systems Security Managemen
No ratings yet
CIP-007-6-Cyber Security - Systems Security Managemen
53 pages
JMP Muco
No ratings yet
JMP Muco
80 pages
Abstrak UGM
No ratings yet
Abstrak UGM
23 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
6 pages
Mental Health Analysis in Social Media Posts: A Survey: Muskan Garg
No ratings yet
Mental Health Analysis in Social Media Posts: A Survey: Muskan Garg
24 pages
NLP Research paper
No ratings yet
NLP Research paper
19 pages
Get Multivariate analysis for the behavioral sciences 2nd Edition Brian S. Everitt PDF ebook with Full Chapters Now
100% (4)
Get Multivariate analysis for the behavioral sciences 2nd Edition Brian S. Everitt PDF ebook with Full Chapters Now
55 pages
Mixomics
No ratings yet
Mixomics
100 pages
The Effect of Work Stress On The Performance of Readymade Garment Workers in Bangladesh
No ratings yet
The Effect of Work Stress On The Performance of Readymade Garment Workers in Bangladesh
11 pages
RG Cross Disciplinary Machinelearning MAIN
No ratings yet
RG Cross Disciplinary Machinelearning MAIN
21 pages
Biodiversity R
No ratings yet
Biodiversity R
85 pages
Halket (1999) - Deconvolution Gas Chromatography Mass Spectrometry of Urinary Organic Acids - Potential For Pattern Recognition and Automated Identification of Metabolic Disorders
No ratings yet
Halket (1999) - Deconvolution Gas Chromatography Mass Spectrometry of Urinary Organic Acids - Potential For Pattern Recognition and Automated Identification of Metabolic Disorders
6 pages
Genetic Erosion, Drought Tolerance and Genotype by Environmentinteractionofdurum Wheat (Triticum Turgidum
No ratings yet
Genetic Erosion, Drought Tolerance and Genotype by Environmentinteractionofdurum Wheat (Triticum Turgidum
205 pages
Multivariate Data Analysis Using SPSS
90% (31)
Multivariate Data Analysis Using SPSS
124 pages
Researches On The Nutritional and Biological Value of Bee Pollen
No ratings yet
Researches On The Nutritional and Biological Value of Bee Pollen
31 pages
Artificial Intelligence and Machine Learning For EDGE Computing 1st Edition Rajiv Pandey Download PDF
No ratings yet
Artificial Intelligence and Machine Learning For EDGE Computing 1st Edition Rajiv Pandey Download PDF
54 pages
Ecography E02671
No ratings yet
Ecography E02671
15 pages

Agglomerative Clustering - Customer Segmentation Term Paper

Uploaded by

Agglomerative Clustering - Customer Segmentation Term Paper

Uploaded by

Unveiling Customer Diversity: Exploring Agglomerative

Clustering for Grocery Mart Segmentation

The reasons for customer segmentation are elaborated below:

Customer segmentation can be done by cluster analysis, an unsupervised

DATA COLLECTION AND RESEARCH METHODOLOGY

The dataset consists of 2240 datapoints and 29 attributes. It can be categorised

2. Products (Amount spent on different products in last 2 years)

DATA PREPROCESSING AND DIMENSIONALITY REDUCTION

3D scatter plot showing a projection of high-dimensional data onto three

The clusters have then been examined via a scatter plot.

Based on the graph we can assume that,

Distribution of Clusters based on products:

Exploration of past campaigns:

CUSTOMER SEGMENT PROFILING AND INTERPRETATION

DISCUSSION AND CONCLUSION

You might also like