0% found this document useful (0 votes)

40 views

9 - IAI5101 Unsupervised Learning - 20-40

This document discusses methods for evaluating clustering models, including external and internal validation. External validation compares cluster assignments to known ground truth labels, calculating metrics like homogeneity, completeness, and V-measure. Internal validation assesses clustering quality without ground truth by measuring compactness and separation of clusters, using a metric like the Silhouette coefficient which combines similarity within and dissimilarity between clusters. The Silhouette coefficient is calculated for each sample and evaluates how well separated it is from other clusters.

Uploaded by

Mhd rdb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

9 - IAI5101 Unsupervised Learning - 20-40

Uploaded by

Mhd rdb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

How to Determine K?

Find point where:

▪ Adding more clusters
will not improve the
solution considerably A clear ‘elbow’ is
visible at 5 clusters.
▪ Having a smaller # of Hence, K=5
clusters will increase the
error significantly

The value of k should be such that if increased after several levels of

clustering, the SSE remains constant.

IAI5101 Winter 2023

Hierarchical Clustering

IAI5101 Winter 2023

Hierarchical Clustering
▪ Works by grouping data objects into a hierarchy or tree of clusters

2 approaches are used when building hierarchy of clusters

▪ Agglomerative (Bottom-up):
▪ Start with single-instance cluster
▪ At each step, join the two closest clusters

▪ Divisive (Top-down):
▪ Start with one universal cluster
▪ Find two clusters
▪ Proceed recursively on each subset

▪ Both approaches produce a dendrogram

IAI5101 Winter 2023

Dendrogram
▪ A tree-like diagram that Agglomerative
illustrates hierarchical
clustering techniques
▪ Each level shows clusters
for that level
▪ Leaf – individual
clusters
▪ Root – one cluster

▪ A cluster at level i is the

union or division of Divisive
clusters at level i+1

IAI5101 Winter 2023

Agglomerative vs. Divisive Clustering
Use distance as
clustering criteria, e.g.,
merge objects to form
minimum
Euclidean distance

Uses the principle,

e.g., maximum
Euclidean distance
Use algorithm to split
clusters & reassign
data instances to the
most distant pair of
instances

IAI5101 Winter 2023

AGNES (Agglomerative Nesting)
▪ Introduced in Kaufmann and Rousseeuw (1990)
▪ Implemented in statistical packages, e.g., Splus
▪ Use the single-link method & the dissimilarity matrix
▪ Merge nodes that have the least dissimilarity
▪ Go on in a non-descending fashion
▪ Eventually all nodes belong to the same cluster

IAI5101 Winter 2023

Agglomerative Nesting

IAI5101 Winter 2023

DIANA (Divisive Analysis)
▪ Introduced in Kaufmann and Rousseeuw (1990)
▪ Implemented in statistical analysis packages, e.g., Splus
▪ Inverse order of AGNES
▪ Eventually each node forms a cluster on its own

IAI5101 Winter 2023

Distance between Clusters
▪ Singleton clusters are iteratively
combined, based on the linkage
method used
Linkage measures used:
▪ Single link (Nearest
Neighbor): Smallest distance
between 2 nearest observations,
one from each cluster
▪ Complete Link: Largest distance
between an element in one cluster
and an element in the other
▪ Average: Avg. distance between
an element in one cluster & an
element in the other
Measure of proximity is based on distance, e.g., Euclidean distance
IAI5101 Winter 2023
Linkage Measures Derivation
▪ 4 widely used linkage measures for distance between clusters are:

Minimum Distance:

Maximum Distance:

Mean Distance:

Average Distance:

IAI5101 Winter 2023

Exercise

IAI5101 Winter 2023

Exercise 1 - Dendrogram
▪ The daily expenditures on food ▪ The distance between each pair of
(X1) & clothing (X2) of 5 persons observations is shown in the table
are shown in the Table
▪ Distance matrix:

▪ For example, the Euclidean

distance between a & b is:

IAI5101 Winter 2023

Dendrogram
▪ After deriving the distance matrix
Form the cluster:
1. Find min value present in the matrix
▪ Min =1 (join b & e in the
dendrogram)

2. Recalculate the distance matrix &

update. To update, use the min (single
link) ▪ Reconstruct the matrix
▪ Min[(be, a)] = min [(b,a), (e,a)] = min(6, Cluster a be c d
7) = 6
a 0 6 7 1
▪ Min[(be, c)] = min [(b,c), (e,c)] = 1
be 6 0 1 7
▪ Min[(be, d)] = min [(b,d), (e,d)] = 7
c 7 1 0 8
3. Reconstruct matrix d 1 7 8 0
4. Repeat (go to step 1)

IAI5101 Winter 2023

Dendrogram
1. Find min value present in the matrix
▪ Min =1 (join a & d in the Cluster a be c d
dendrogram)
a 0 6 7 1
be 6 0 1 7
2. Recalculate the distance matrix &
update. To update, use the min c 7 1 0 8
(single link) d 1 7 8 0
▪ Min[(ad, be)] = min [(d,be),
(a,be)]
= min(7, 6) = 6
▪ Min[(ad, c)] = min [(d,c), (a,c)] ▪ Reconstruct the matrix
=7
Cluster ad be c
ad 0 6 7
3. Reconstruct matrix
be 6 0 1
4. Repeat (go to step 1) c 7 1 0

IAI5101 Winter 2023

Dendrogram
1. Find min value present in the matrix Cluster ad be c
▪ Min =1 (join c with be in the ad 0 6 7
dendogram)
be 6 0 1
2. Recalculate the distance matrix &
update. To update, use the min c 7 1 0
(single link)
▪ Min[(bec, ad)] = min [(be, ad), (c, ad)]
= min(6, 7) = 6
▪ Reconstruct the matrix

Cluster ad bec
ad 0 6
bec 6 0

IAI5101 Winter 2023

Exercise II - Dendrogram
▪ Show your results by drawing a dendrogram. The dendrogram
should clearly show the order in which the points are merged.
▪ How many sets of cluster can you deduce from the dendrogram ?

IAI5101 Winter 2023

Model Evaluation

IAI5101 Winter 2023

Performance Metrics - Clustering
▪ More difficult than classification due to absence of ground truth (i.e.,
absence of true labels in the data)
▪ Approaches:
1. External Validation: supervised, i.e., the ground truth is
available
▪ Compare clustering against the ground truth using certain clustering
quality measure
▪ Popular Quality Metrics:
▪ Homogeneity: All clusters contain only data points that are members
of a single class (based on the true class labels)
▪ Completeness: All data points of a specific ground truth class label are
also elements of the same cluster
▪ V-measure: Harmonic mean of homogeneity & completeness scores

IAI5101 Winter 2023

Example: External Validation

Note:
▪ Values are typically bounded between 0 & 1
▪ Higher values are better
▪ V-measure for the 1st model with 2 clusters is better than the 5 clusters
due to the higher completeness score

IAI5101 Winter 2023

Performance Metrics - Clustering
2. Internal Validation: Unsupervised, i.e., ground truth is unavailable
▪ Validate a clustering model by defining metrics that capture expected
behavior of a good clustering model

▪ A good clustering model can be identified by 2 traits:

▪ How compact, i.e., data points in a cluster are close to each other
▪ How well separated groups, i.e., 2 clusters are distant from each other

▪ Define metrics (e.g., Euclidean distance) that mathematically calculate

the goodness of the 2 major traits & use to evaluate clustering models
▪ Example: Use Silhouette coefficient

IAI5101 Winter 2023

Example - Silhouette Coefficient
▪ Metric combines the 2 traits of a good clustering model
▪ Uses a combination of similarity to the data points in a cluster &
dissimilarity to the data points not in the cluster

Clustering Model:

SC for each Sample:

b is mean distance btw sample & other SC value is usually bounded between -1
points in same cluster (incorrect clustering) and +1 (excellent quality
a is mean distance btw sample & other dense clusters). Lower scores indicate
points in nearest cluster overlapping clusters
IAI5101 Winter 2023

Universities Interview Qus (Lithuania)
100% (1)
Universities Interview Qus (Lithuania)
11 pages
Applicant Payment User Guide - May 2022
No ratings yet
Applicant Payment User Guide - May 2022
9 pages
Data Analysis - Groups - INCOMPLETE
No ratings yet
Data Analysis - Groups - INCOMPLETE
24 pages
DW&M Unit 3 Part II
No ratings yet
DW&M Unit 3 Part II
50 pages
Topic 6d - Hierarchical Algorithm
No ratings yet
Topic 6d - Hierarchical Algorithm
38 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Clustering
No ratings yet
Clustering
12 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages
AB1202 Statistics and Analysis
No ratings yet
AB1202 Statistics and Analysis
18 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Clustering Dendogram
No ratings yet
Clustering Dendogram
13 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
22IT601_LM_5.7
No ratings yet
22IT601_LM_5.7
10 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
PART2
No ratings yet
PART2
61 pages
Cluster
100% (1)
Cluster
72 pages
K means example
No ratings yet
K means example
8 pages
ML_Lec-18
No ratings yet
ML_Lec-18
21 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
Unit 5
No ratings yet
Unit 5
63 pages
Clustering
No ratings yet
Clustering
80 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
Dendrogram - Slides
No ratings yet
Dendrogram - Slides
27 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
13 Clustering and Classifier
No ratings yet
13 Clustering and Classifier
123 pages
Clustering
No ratings yet
Clustering
47 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Feedback The Correct Answer Is:analysis of Time Series
No ratings yet
Feedback The Correct Answer Is:analysis of Time Series
42 pages
Cluster Analysis Using Dicer: Install - Packages
No ratings yet
Cluster Analysis Using Dicer: Install - Packages
8 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Rmbi1020 Lec07 Clustering
No ratings yet
Rmbi1020 Lec07 Clustering
41 pages
Linkage (Analisis Gerarquico)
No ratings yet
Linkage (Analisis Gerarquico)
7 pages
Single Link Example
No ratings yet
Single Link Example
8 pages
ML - 8
No ratings yet
ML - 8
70 pages
Cluster Analysis Techniques
No ratings yet
Cluster Analysis Techniques
33 pages
Assignment 3_ed84b364-5b0f-4d8f-bf79-e60ce9fe3df4
No ratings yet
Assignment 3_ed84b364-5b0f-4d8f-bf79-e60ce9fe3df4
2 pages
Week-10
No ratings yet
Week-10
84 pages
Clustering Class Ppt
No ratings yet
Clustering Class Ppt
103 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Data Mining Modul 3 Notes
No ratings yet
Data Mining Modul 3 Notes
3 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
P-QRS-T Localization in ECG Using Deep Learning
No ratings yet
P-QRS-T Localization in ECG Using Deep Learning
4 pages
TensorFlow Roadmap
No ratings yet
TensorFlow Roadmap
22 pages
2.0 S2352711020303526 Main
No ratings yet
2.0 S2352711020303526 Main
8 pages
Goodfellow 18 A
No ratings yet
Goodfellow 18 A
18 pages
Pyakillya 2017 J. Phys. Conf. Ser. 913 012004
No ratings yet
Pyakillya 2017 J. Phys. Conf. Ser. 913 012004
6 pages
Inter-And Intra-Patient ECG Heartbeat Classification For Arrhythmia Detection: A Sequence To Sequence Deep Learning Approach
No ratings yet
Inter-And Intra-Patient ECG Heartbeat Classification For Arrhythmia Detection: A Sequence To Sequence Deep Learning Approach
8 pages
Semantic ECG Interval Segmentation Using Autoencoders
No ratings yet
Semantic ECG Interval Segmentation Using Autoencoders
7 pages
Supervised ECG Interval Segmentation Using LSTM Neural Network
No ratings yet
Supervised ECG Interval Segmentation Using LSTM Neural Network
7 pages
GEH-6800 - Vol - II 2 PDF
100% (1)
GEH-6800 - Vol - II 2 PDF
324 pages
A General Magnetic Energy Based
No ratings yet
A General Magnetic Energy Based
8 pages
Product Development Operations and Financial Plan
75% (4)
Product Development Operations and Financial Plan
173 pages
Resume 2.0
No ratings yet
Resume 2.0
1 page
Dr. Joy Margate Lee V. Police Superintendent Neri Ilagan
No ratings yet
Dr. Joy Margate Lee V. Police Superintendent Neri Ilagan
2 pages
Neocon S
No ratings yet
Neocon S
5 pages
Intel® Desktop Board DX58SO: Specification Update
No ratings yet
Intel® Desktop Board DX58SO: Specification Update
8 pages
Lec 2
No ratings yet
Lec 2
34 pages
Novasina LabMaster
No ratings yet
Novasina LabMaster
6 pages
MAX32665-MAX32668 Low-Power Arm Cortex-M4 With FPU-Based Microcontroller With Bluetooth 5 For Wearables
No ratings yet
MAX32665-MAX32668 Low-Power Arm Cortex-M4 With FPU-Based Microcontroller With Bluetooth 5 For Wearables
46 pages
Guías de Revit
No ratings yet
Guías de Revit
34 pages
T3 Kview
100% (1)
T3 Kview
91 pages
Fed STD 151B
No ratings yet
Fed STD 151B
46 pages
Mini CRD ErP - 1037515
No ratings yet
Mini CRD ErP - 1037515
2 pages
Voice of Quality Program Proposal
No ratings yet
Voice of Quality Program Proposal
37 pages
English 10 Quiz 1
100% (1)
English 10 Quiz 1
1 page
University of Okara: 4th Merit List Department:Computer Science Admissions:FALL 2020
No ratings yet
University of Okara: 4th Merit List Department:Computer Science Admissions:FALL 2020
6 pages
Factor Analysis T. Ramayah
No ratings yet
Factor Analysis T. Ramayah
29 pages
Interpersonal Synchrony Special Issue: Hypyp: A Hyperscanning Python Pipeline For Inter-Brain Connectivity Analysis
No ratings yet
Interpersonal Synchrony Special Issue: Hypyp: A Hyperscanning Python Pipeline For Inter-Brain Connectivity Analysis
12 pages
Bbit Exam Paper 1
No ratings yet
Bbit Exam Paper 1
2 pages
VU Provider For CS619 2021-2022
No ratings yet
VU Provider For CS619 2021-2022
10 pages
Sayma 160 Ai Applications
No ratings yet
Sayma 160 Ai Applications
8 pages
The Ecommerce Product Discovery RFP Template
No ratings yet
The Ecommerce Product Discovery RFP Template
39 pages
Avoinics Basic 19-Sep
No ratings yet
Avoinics Basic 19-Sep
8 pages
IBM 4247 Model V03 Service Manual
No ratings yet
IBM 4247 Model V03 Service Manual
567 pages
Different Apple Varieties Classification Using KNN and MLP Algorithms
No ratings yet
Different Apple Varieties Classification Using KNN and MLP Algorithms
4 pages
EMF Planner 7.1 User Manual Rev. 1.1
No ratings yet
EMF Planner 7.1 User Manual Rev. 1.1
76 pages
Essay
No ratings yet
Essay
5 pages

9 - IAI5101 Unsupervised Learning - 20-40

Uploaded by

9 - IAI5101 Unsupervised Learning - 20-40

Uploaded by

How to Determine K?

Find point where:

The value of k should be such that if increased after several levels of

IAI5101 Winter 2023

IAI5101 Winter 2023

2 approaches are used when building hierarchy of clusters

▪ Both approaches produce a dendrogram

IAI5101 Winter 2023

▪ A cluster at level i is the

IAI5101 Winter 2023

Uses the principle,

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

▪ For example, the Euclidean

IAI5101 Winter 2023

2. Recalculate the distance matrix &

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

IAI5101 Winter 2023

▪ A good clustering model can be identified by 2 traits:

▪ Define metrics (e.g., Euclidean distance) that mathematically calculate

IAI5101 Winter 2023

SC for each Sample:

You might also like