0% found this document useful (0 votes)

5 views

phase 3

Uploaded by

kruthiprabhu12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

phase 3

Uploaded by

kruthiprabhu12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Advanced Market Segmentation Using Deep Clustering

Phase 3: Model Training and Evaluation

3.1 Overview of Model Training and Evaluation

In this phase, we focus on selecting suitable algorithms, training the models using the processed
data, and evaluating their performance. We aim to choose algorithms that are well-suited for
deep clustering and market segmentation tasks. Hyper parameter tuning is performed to optimize
model performance, and various evaluation metrics are employed to assess the model's predictive
capabilities. Cross-validation is also performed to ensure that the model generalizes well to
unseen data.

3.2 Choosing Suitable Algorithms

For the Advanced Market Segmentation using Deep Clustering project, the key algorithms
are:

1. Autoencoder (for feature extraction and dimensionality reduction) – This deep

learning model is used to encode customer data into a lower-dimensional latent space.
2. K-Means Clustering (for customer segmentation) – After dimensionality reduction, K-
Means clustering is applied to group customers into distinct segments.

Source code :
# Import necessary libraries
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, adjusted_rand_score

# Assume 'data' is the dataset that has been preprocessed (scaled, cleaned)

# Step 1: Train an Autoencoder for feature extraction

autoencoder = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(data.shape[1],)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'), # Latent space
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(data.shape[1], activation='sigmoid')
])

autoencoder.compile(optimizer='adam', loss='mean_squared_error')
autoencoder.fit(data_scaled, data_scaled, epochs=50, batch_size=256, validation_split=0.2)

# Step 2: Extract latent features

latent_features = autoencoder.predict(data_scaled)

# Step 3: Apply K-Means clustering to the latent features

kmeans = KMeans(n_clusters=5, random_state=42)
clusters = kmeans.fit_predict(latent_features)

# Step 4: Evaluate the clustering quality

silhouette_avg = silhouette_score(latent_features, clusters)
print("Silhouette Score:", silhouette_avg)

3.3 Hyperparameter Tuning

Hyperparameter tuning is a crucial step to ensure that the model performs optimally. In this
project, we will perform grid search for the K-Means algorithm to find the best number of
clusters. Additionally, the autoencoder model's architecture and training parameters (e.g.,
learning rate, batch size) can be tuned using techniques like random search or Bayesian
optimization.

Source code for grid search for K-Means to find the best number of clusters

# Import necessary libraries

import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, adjusted_rand_score

# Assume 'data' is the dataset that has been preprocessed (scaled, cleaned)

# Step 1: Train an Autoencoder for feature extraction

autoencoder.compile(optimizer='adam', loss='mean_squared_error')
autoencoder.fit(data_scaled, data_scaled, epochs=50, batch_size=256, validation_split=0.2)
# Step 2: Extract latent features
latent_features = autoencoder.predict(data_scaled)

# Step 3: Apply K-Means clustering to the latent features

kmeans = KMeans(n_clusters=5, random_state=42)
clusters = kmeans.fit_predict(latent_features)

# Step 4: Evaluate the clustering quality

silhouette_avg = silhouette_score(latent_features, clusters)
print("Silhouette Score:", silhouette_avg)

3.4 Model Evaluation Metrics

The performance of the model is evaluated using several metrics that measure clustering quality
and the reconstruction accuracy of the autoencoder. These include:

1. Silhouette Score – Measures how similar data points are within their cluster compared to
other clusters. A higher score indicates better clustering.

Source code :

silhouette_avg = silhouette_score(latent_features, clusters)

print("Silhouette Score:", silhouette_avg)

Adjusted Rand Index (ARI) – Measures the similarity between the predicted clusters and
ground truth labels, adjusting for chance. ARI values closer to 1 indicate better alignment
with true labels.

Source code :

from sklearn.metrics import adjusted_rand_score

true_labels = _ # Replace with your actual ground truth labels

ari_score = adjusted_rand_score(true_labels, clusters)
print(f"Adjusted Rand Index: {ari_score:.2f}")

Mean Squared Error (MSE) – Measures the difference between the original and
reconstructed data, indicating how well the autoencoder captures the data's structure.
Source code :

# Predict reconstructed data

reconstructed_data = autoencoder.predict(data_scaled)

# Calculate mean squared error (MSE) between original and reconstructed data
reconstruction_loss = np.mean(np.square(data_scaled - reconstructed_data))
print(f"Reconstruction Loss: {reconstruction_loss:.4f}")

3.5 Cross-Validation

Cross-validation is performed to assess the model’s generalizability and ensure it is not

overfitting to the training data. Since clustering does not inherently provide a validation set,
techniques such as K-Fold cross-validation can be adapted by splitting the data into multiple
subsets, training the model on some subsets, and testing it on others.

Source code:

from sklearn.model_selection import KFold

from sklearn.metrics import silhouette_score

# Define KFold cross-validation

kf = KFold(n_splits=5, shuffle=True, random_state=42)

silhouette_scores = []

# Perform cross-validation
for train_index, test_index in kf.split(latent_features):
X_train, X_test = latent_features[train_index], latent_features[test_index]
y_train, y_test = clusters[train_index], clusters[test_index]

# Fit KMeans on the training data

kmeans = KMeans(n_clusters=best_n_clusters, random_state=42)
kmeans.fit(X_train)

# Predict clusters on the test set

clusters_pred = kmeans.predict(X_test)

# Evaluate clustering quality using Silhouette Score

score = silhouette_score(X_test, clusters_pred)
silhouette_scores.append(score)

# Average Silhouette Score across all folds

avg_silhouette_score = np.mean(silhouette_scores)
print(f"Average Silhouette Score from cross-validation: {avg_silhouette_score:.4f}")

3.6 Conclusion of Phase 3

In Phase 3, the model was trained using the autoencoder for dimensionality reduction and K-
Means for clustering. We tuned the K-Means clustering algorithm’s hyperparameters using grid
search and evaluated the model’s performance using several metrics, including silhouette score,
adjusted Rand index, and reconstruction loss. Cross-validation was applied to assess the model's
robustness and ensure generalizability. The evaluation metrics provided insights into the
clustering quality and the effectiveness of the autoencoder in capturing the underlying data
patterns.

Reference Booklet For Online CPV Assessment
0% (2)
Reference Booklet For Online CPV Assessment
29 pages
Lx502-Predicate Logic 1
No ratings yet
Lx502-Predicate Logic 1
14 pages
phase3.3
No ratings yet
phase3.3
8 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Lab6 instruction (1)
No ratings yet
Lab6 instruction (1)
3 pages
Untitled document-2-1-13-7-11.4
No ratings yet
Untitled document-2-1-13-7-11.4
5 pages
AIML_Lab_7_8_9_10
No ratings yet
AIML_Lab_7_8_9_10
10 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Lab Report 4
No ratings yet
Lab Report 4
6 pages
sklearn
No ratings yet
sklearn
141 pages
Python for Data Science IA 1 Programs
No ratings yet
Python for Data Science IA 1 Programs
14 pages
Guide
No ratings yet
Guide
24 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
Python for Data Science IA 1 Programs
No ratings yet
Python for Data Science IA 1 Programs
14 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Cluster Australia: 1 Strategy
No ratings yet
Cluster Australia: 1 Strategy
5 pages
Fashion MNIST-6
No ratings yet
Fashion MNIST-6
10 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
AAM CODES
No ratings yet
AAM CODES
8 pages
AI Lab M.Tech
No ratings yet
AI Lab M.Tech
29 pages
ML assignment
No ratings yet
ML assignment
11 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Clustering
No ratings yet
Clustering
1 page
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
unit-3
No ratings yet
unit-3
130 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
AI With Python - Unsupervised Learning - Clustering
No ratings yet
AI With Python - Unsupervised Learning - Clustering
12 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
AML - LAB (1-6)
No ratings yet
AML - LAB (1-6)
15 pages
doc_A5
No ratings yet
doc_A5
3 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
data preprocessing
No ratings yet
data preprocessing
9 pages
machine learning lab
No ratings yet
machine learning lab
20 pages
minor project
No ratings yet
minor project
21 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Crash Course Sul Machine Learning ?
No ratings yet
Crash Course Sul Machine Learning ?
13 pages
Tensor Flow and Keras Sample Programs
No ratings yet
Tensor Flow and Keras Sample Programs
22 pages
3c]Cross Validation
No ratings yet
3c]Cross Validation
6 pages
P 4 Andp 5
No ratings yet
P 4 Andp 5
4 pages
KMeans Clustering
No ratings yet
KMeans Clustering
1 page
ML-Lab Programs - VTU
No ratings yet
ML-Lab Programs - VTU
5 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
hand writing using _cnn (1)
No ratings yet
hand writing using _cnn (1)
5 pages
CP4252 Lab Manual(1)
No ratings yet
CP4252 Lab Manual(1)
13 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Data Science
No ratings yet
Data Science
8 pages
DM ML Practical
No ratings yet
DM ML Practical
13 pages
Chapter04 - Getting Started With Neural Networks
No ratings yet
Chapter04 - Getting Started With Neural Networks
9 pages
Advanced Scikit Learn
No ratings yet
Advanced Scikit Learn
98 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
6 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
ML Algorithms Python
No ratings yet
ML Algorithms Python
4 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Knn
No ratings yet
Knn
7 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
782392978-21EC72-OWC-MODULE-1-PPT
No ratings yet
782392978-21EC72-OWC-MODULE-1-PPT
31 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
phase 1 (1)
No ratings yet
phase 1 (1)
4 pages
phase 4
No ratings yet
phase 4
4 pages
The Developmental-Health Framework Within The McGill Model of Nursing
No ratings yet
The Developmental-Health Framework Within The McGill Model of Nursing
15 pages
Example Based Machine Translation For English-Sinhala Translations
No ratings yet
Example Based Machine Translation For English-Sinhala Translations
10 pages
IB_Psychology_-_All_Possible_Paper_1_Questions
No ratings yet
IB_Psychology_-_All_Possible_Paper_1_Questions
4 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
Skinners-Operant Worksheet
No ratings yet
Skinners-Operant Worksheet
2 pages
Cantonese Jihgei - Subject-Object Asymmetry and Non-Subject Antecedent Potential
No ratings yet
Cantonese Jihgei - Subject-Object Asymmetry and Non-Subject Antecedent Potential
123 pages
STS (Information Revolution)
No ratings yet
STS (Information Revolution)
2 pages
Course Syllabus in Values Education 1
78% (9)
Course Syllabus in Values Education 1
5 pages
Reviewer ENG 1
0% (1)
Reviewer ENG 1
6 pages
Pronounds
No ratings yet
Pronounds
10 pages
REview of Philosophical Chemistry
No ratings yet
REview of Philosophical Chemistry
5 pages
The Passive Voice
No ratings yet
The Passive Voice
3 pages
Assessment of Intelligence
No ratings yet
Assessment of Intelligence
12 pages
Empathy Exercises For Kids PDF
No ratings yet
Empathy Exercises For Kids PDF
3 pages
MIDTERM EXAMINATION (K21 Online) - Test 2: Answer Sheet
No ratings yet
MIDTERM EXAMINATION (K21 Online) - Test 2: Answer Sheet
4 pages
Buddhist Idealists and Their Jain Critic
No ratings yet
Buddhist Idealists and Their Jain Critic
26 pages
English For SNBT 2
No ratings yet
English For SNBT 2
3 pages
Teaching Methods: Social Studies
100% (1)
Teaching Methods: Social Studies
30 pages
Polygon Poster Rubric
100% (1)
Polygon Poster Rubric
2 pages
Modified ELM
No ratings yet
Modified ELM
3 pages
Lesson Plan - Over and Under The Snow
No ratings yet
Lesson Plan - Over and Under The Snow
5 pages
Completed Go To Page
No ratings yet
Completed Go To Page
17 pages
Simple Compound Complex Practice 1 PDF
No ratings yet
Simple Compound Complex Practice 1 PDF
2 pages
How A Handful of Tech Companies Control Billions of Minds Every Day
No ratings yet
How A Handful of Tech Companies Control Billions of Minds Every Day
2 pages
IC5 Level Intro - Scope - and - Sequence
No ratings yet
IC5 Level Intro - Scope - and - Sequence
4 pages
Duane Elgin: Awakening To A Living Universe
100% (1)
Duane Elgin: Awakening To A Living Universe
5 pages
Reported Speech
82% (11)
Reported Speech
21 pages
Going Places (Refugees Bring New Life To A Village)
No ratings yet
Going Places (Refugees Bring New Life To A Village)
3 pages