0% found this document useful (0 votes)

2 views

phase3.3

The document details a customer journey analysis framework utilizing Principal Component Analysis (PCA) for dimensionality reduction and K-Means clustering for behavior segmentation. It outlines the model training and evaluation process, including algorithm selection, hyperparameter tuning, and the use of various evaluation metrics such as silhouette score and Adjusted Rand Index. Enhanced visualizations are also presented to illustrate customer segments and their behaviors, ultimately aiming to improve user experience through tailored interactions.

Uploaded by

sihiwrites

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

phase3.3

Uploaded by

sihiwrites

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Customer Journey Analysis Using Clustering and Dimensionality

Reduc on
Phase 3: Model Training and Evaluation
3.1 Overview of Model Training and Evaluation
In this phase, we focus on selecting suitable algorithms, training the models using processed
customer journey data, and evaluating their performance. The goal is to identify distinct
customer behavior patterns and enhance user experience by optimizing touchpoints along
their journey. Principal Component Analysis (PCA) is used for dimensionality reduction,
followed by K-Means clustering for segmentation. Various evaluation metrics are employed
to assess clustering effectiveness, ensuring robust model performance.

3.2 Choosing Suitable Algorithms

For customer journey analysis, the key algorithms employed are:
1. Principal Component Analysis (PCA) (for feature extraction and dimensionality
reduction) – PCA reduces the high-dimensional customer interaction data into a
lower-dimensional space while retaining key behavioral features.
2. K-Means Clustering (for behavior segmentation) – After dimensionality reduction,
K-Means clustering is applied to group customers based on their journey patterns.
Source Code:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import seaborn as sns

# Load and preprocess data

data = pd.read_csv('customer_journey_data.csv')
data = data.dropna()
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
# Apply PCA
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)
pca_df = pd.DataFrame(data=pca_data, columns=['PCA1', 'PCA2'])

# Determine optimal clusters

sil_scores = []
for i in range(2, 11):
kmeans = KMeans(n_clusters=i, random_state=42, n_init='auto')
kmeans.fit(pca_df)
sil_scores.append(silhouette_score(pca_df, kmeans.labels_))

optimal_clusters = sil_scores.index(max(sil_scores)) + 2

# Apply K-Means with optimal clusters

kmeans = KMeans(n_clusters=optimal_clusters, random_state=42, n_init='auto')
cluster_labels = kmeans.fit_predict(pca_df)
pca_df['Cluster'] = cluster_labels

3.3 Hyperparameter Tuning

Hyperparameter tuning is crucial to ensure optimal clustering. We optimize PCA components
to retain maximum variance while reducing dimensions effectively. Additionally, we
determine the best number of clusters using silhouette scores.
# Evaluate optimal PCA components
explained_variance = []
for n in range(1, data.shape[1] + 1):
pca = PCA(n_components=n)
pca.fit(scaled_data)
explained_variance.append(sum(pca.explained_variance_ratio_))
optimal_pca_components = next(i for i, var in enumerate(explained_variance) if var > 0.95)
+1

# Evaluate optimal clusters using Silhouette Score

best_k = optimal_clusters
best_score = max(sil_scores)

3.4 Model Evaluation Metrics

We evaluate clustering quality and the autoencoder’s reconstruction accuracy using:
1.Silhouette Score – Measures how well-separated clusters are.
2.Adjusted Rand Index (ARI) – Compares predicted clusters with ground truth labels.
3.Mean Squared Error (MSE) – Assesses the difference between original and
reconstructed data.
Source Code:
# Applying K-Means with the best number of clusters
kmeans = KMeans(n_clusters=best_k,
random_state=42) clusters = kmeans.fit_predict(latent_features)
# Silhouette Score sil_score = silhouette_score(latent_features, clusters)
print(f"Silhouette Score: {sil_score:.4f}")
# Adjusted Rand Index (Replace with actual labels if available)
true_labels = None
# Provide ground truth labels if available if true_labels is not None:
ari_score = adjusted_rand_score(true_labels, clusters)
print(f"Adjusted Rand Index: {ari_score:.2f}")
# Autoencoder Reconstruction
Loss reconstructed_data = autoencoder.predict(scaled_data)
reconstruction_loss = np.mean(np.square(scaled_data - reconstructed_data))
print(f"Reconstruction Loss: {reconstruction_loss:.4f}")
3.5 Cross-Validation
Cross-valida on ensures the model’s robustness by tes ng on mul ple data subsets.
from sklearn.model_selec on import KFold
kf = KFold(n_splits=5, shuﬄe=True, random_state=42)

silhoue e_scores = []
for train_idx, test_idx in kf.split(pca_df):
X_train, X_test = pca_df.iloc[train_idx], pca_df.iloc[test_idx]
kmeans = KMeans(n_clusters=best_k, random_state=42)
kmeans.ﬁt(X_train)
clusters_pred = kmeans.predict(X_test)

score = silhoue e_score(X_test, clusters_pred)

silhoue e_scores.append(score)

avg_silhoue e_score = np.mean(silhoue e_scores)

print(f'Average Silhoue e Score: {avg_silhoue e_score:.4f}')

3.6 Enhanced Visualizations

Sca er plot, bar graph, and cluster insights

Source code :
# Sca er Plot with Centroids
plt.ﬁgure(ﬁgsize=(14, 10))
sns.sca erplot(x=latent_features[:, 0], y=latent_features[:, 1], hue=clusters, pale e='viridis',
s=100, alpha=0.7, edgecolor='w', linewidth=0.6)
plt.sca er(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=400, c='red',
marker='X', edgecolor='black', linewidth=1.5, label='Centroids')
plt. tle('Customer Journey Clusters with Centroids', fontsize=18, weight='bold')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

# Bar Graph for Cluster Distribu on

plt.ﬁgure(ﬁgsize=(10, 6))
sns.countplot(x=clusters, pale e='Set3')
plt. tle('Customer Count per Cluster', fontsize=16, weight='bold')
plt.xlabel('Cluster', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

3.7 Conclusion of Phase 3

In this phase, we applied PCA for dimensionality reduction and used K-Means clustering to
segment customers based on their journey data. Hyperparameter tuning was conducted to
optimize PCA components and cluster numbers. Evaluation metrics such as silhouette score
and cluster distribution provided insights into clustering quality. Cross-validation ensured
robustness. These insights enable businesses to enhance user experience by tailoring
interactions based on customer behavior.
Visualizations:
1. PCA Projection Scatter Plot – Visualizing clusters in 2D space.
2. Silhouette Score vs. Cluster Count – Determining the optimal number of clusters.
3. Cluster Distribution – Analyzing customer distribution across segments.
This document outlines the customer journey analysis framework using PCA and K-Means
clustering for segmentation.
OUTPUT
Here are the generated visualizations:
• Enhanced Scatter Plot with Centroid : It helps identify the distinct boundaries between
clusters and the relative positioning of cluster centers, highlighting customer segments with
similar behaviors.
• Modified Bar Graph: Customer Count per Cluster: It reveals the popularity or
dominance of certain journey patterns and helps identify niche versus mainstream customer
behaviors.
• Histogram: Session Duration Distribution per Cluster: It highlights engagement
patterns, helping to distinguish between short-session users and more engaged customer
segments.
• Line Graph: Mean Feature Values per Cluster: It provides a comparative view of how
clusters differ across multiple dimensions, enabling targeted marketing strategies based on
behavioral tendencies

Day 04 Maintenance of LPG System
No ratings yet
Day 04 Maintenance of LPG System
11 pages
Review Your Answers
100% (1)
Review Your Answers
15 pages
phase 3
No ratings yet
phase 3
5 pages
Untitled document-2-1-13-7-11.4
No ratings yet
Untitled document-2-1-13-7-11.4
5 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Peer Eval
No ratings yet
Peer Eval
6 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
Objectives of Clustering
No ratings yet
Objectives of Clustering
3 pages
Dbscan implementation in python
No ratings yet
Dbscan implementation in python
5 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Ds Paper
No ratings yet
Ds Paper
35 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
ML-Lab Programs - VTU
No ratings yet
ML-Lab Programs - VTU
5 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Phase 2
No ratings yet
Phase 2
5 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Segmentation Algorithm
No ratings yet
Segmentation Algorithm
2 pages
doc_A5
No ratings yet
doc_A5
3 pages
Ads Phase 4
No ratings yet
Ads Phase 4
12 pages
DATA MINING Project Report
No ratings yet
DATA MINING Project Report
28 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
unit-3
No ratings yet
unit-3
130 pages
Data Mining
No ratings yet
Data Mining
28 pages
22MID0187_ML_LAB-5
No ratings yet
22MID0187_ML_LAB-5
13 pages
VL2024250504566_AST03
No ratings yet
VL2024250504566_AST03
2 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Practical 5
No ratings yet
Practical 5
6 pages
Determining Clusters
No ratings yet
Determining Clusters
4 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Practical 03
No ratings yet
Practical 03
3 pages
Mining and Visualising Real-World Data: About This Module
100% (1)
Mining and Visualising Real-World Data: About This Module
16 pages
Comparison of Learning Techniques For Prediction of Customer Churn in Telecommunication
No ratings yet
Comparison of Learning Techniques For Prediction of Customer Churn in Telecommunication
36 pages
AVD Lab 3
No ratings yet
AVD Lab 3
2 pages
Lab6 instruction (1)
No ratings yet
Lab6 instruction (1)
3 pages
Report 3
No ratings yet
Report 3
3 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
Esam - DWM Lab 8
No ratings yet
Esam - DWM Lab 8
5 pages
Week 8 DS Practical (1)
No ratings yet
Week 8 DS Practical (1)
13 pages
Daa-01
No ratings yet
Daa-01
11 pages
AML Clustering
No ratings yet
AML Clustering
7 pages
KMeans Clustering
No ratings yet
KMeans Clustering
1 page
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
CSE4062S24 Group5 Project DescriptiveAnalysis
No ratings yet
CSE4062S24 Group5 Project DescriptiveAnalysis
10 pages
Mla 3
No ratings yet
Mla 3
5 pages
Pravesh 6301
No ratings yet
Pravesh 6301
11 pages
Lab Report 4
No ratings yet
Lab Report 4
6 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
To Develop Clusters of The Users Using ML For The Customer Segmentation
No ratings yet
To Develop Clusters of The Users Using ML For The Customer Segmentation
20 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Clustering
No ratings yet
Clustering
1 page
Sukanya 3rd December 2023 Machine Learning1 Coded
No ratings yet
Sukanya 3rd December 2023 Machine Learning1 Coded
58 pages
CC Unit IV
No ratings yet
CC Unit IV
30 pages
Cluster Australia: 1 Strategy
No ratings yet
Cluster Australia: 1 Strategy
5 pages
ML Project Report
No ratings yet
ML Project Report
22 pages
Predictivemaintenance FaultDetection
No ratings yet
Predictivemaintenance FaultDetection
12 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Norden 124-12101WH Datasheet
No ratings yet
Norden 124-12101WH Datasheet
1 page
TCD 7.8 Agriculture en
No ratings yet
TCD 7.8 Agriculture en
2 pages
Pricing Models in Web Advertising
No ratings yet
Pricing Models in Web Advertising
6 pages
Brocade TAC-FAQ
No ratings yet
Brocade TAC-FAQ
11 pages
KSD Series Bellows Coupling
No ratings yet
KSD Series Bellows Coupling
1 page
MD901F Cataloge PDF
No ratings yet
MD901F Cataloge PDF
2 pages
Irima 7 - Status Survey
No ratings yet
Irima 7 - Status Survey
2 pages
Vitrohm RXF Wirewound Fusible Resistor Datasheet
No ratings yet
Vitrohm RXF Wirewound Fusible Resistor Datasheet
2 pages
Quotation: "Solid Waste Management Waste Water Treatment Preventive Maintenance Specialist"
No ratings yet
Quotation: "Solid Waste Management Waste Water Treatment Preventive Maintenance Specialist"
26 pages
Lats Kanwil Tanjung Selor (R7)
No ratings yet
Lats Kanwil Tanjung Selor (R7)
50 pages
4G15T RPW Camshaft Spec File
No ratings yet
4G15T RPW Camshaft Spec File
65 pages
FTKC25NVM4
No ratings yet
FTKC25NVM4
20 pages
Ladish Fitting Dimension Tables PDF
No ratings yet
Ladish Fitting Dimension Tables PDF
10 pages
Volvo Penta Overheat Diagnosis
No ratings yet
Volvo Penta Overheat Diagnosis
15 pages
Hydrapurge II-X Detrashing System
No ratings yet
Hydrapurge II-X Detrashing System
4 pages
COMP
No ratings yet
COMP
319 pages
Types of Boilers: (1) Horizontal Return Tube (HRT) - Figure 2.02A
No ratings yet
Types of Boilers: (1) Horizontal Return Tube (HRT) - Figure 2.02A
6 pages
Amtex 2017
No ratings yet
Amtex 2017
25 pages
0568-MI20-00S1-0110-002 Rev C PIPING MATERIALS - CLASSIFICATION DATA SHEETS
No ratings yet
0568-MI20-00S1-0110-002 Rev C PIPING MATERIALS - CLASSIFICATION DATA SHEETS
72 pages
Properties of Aggregates FOR AC20
No ratings yet
Properties of Aggregates FOR AC20
38 pages
Evaluation of Cold-Formed Steel Members and Connections: Use Uncoated Base Steel Thickness
No ratings yet
Evaluation of Cold-Formed Steel Members and Connections: Use Uncoated Base Steel Thickness
4 pages
water 1
No ratings yet
water 1
1 page
Medical Devices Surgical and Image Guided Technologies 1st Edition Martin Culjat - Get the ebook in PDF format for a complete experience
No ratings yet
Medical Devices Surgical and Image Guided Technologies 1st Edition Martin Culjat - Get the ebook in PDF format for a complete experience
49 pages
ECE Project Report On Automated Toll Tax Gate - Free PDF Download
No ratings yet
ECE Project Report On Automated Toll Tax Gate - Free PDF Download
39 pages
Truform Engineered Formwork Solutions INT
No ratings yet
Truform Engineered Formwork Solutions INT
6 pages
Reference-List Seven Blast Furnace Repairs
No ratings yet
Reference-List Seven Blast Furnace Repairs
6 pages
B5210 2 Megapixel Day/Night Bullet IP Camera: User Manual
No ratings yet
B5210 2 Megapixel Day/Night Bullet IP Camera: User Manual
27 pages
1.2 Space and Human Activity
No ratings yet
1.2 Space and Human Activity
51 pages

phase3.3

Uploaded by

phase3.3

Uploaded by

Customer Journey Analysis Using Clustering and Dimensionality

3.2 Choosing Suitable Algorithms

# Load and preprocess data

# Determine optimal clusters

# Apply K-Means with optimal clusters

3.3 Hyperparameter Tuning

# Evaluate optimal clusters using Silhouette Score

3.4 Model Evaluation Metrics

score = silhoue e_score(X_test, clusters_pred)

avg_silhoue e_score = np.mean(silhoue e_scores)

3.6 Enhanced Visualizations

# Bar Graph for Cluster Distribu on

3.7 Conclusion of Phase 3

You might also like