SlideShare a Scribd company logo
Clustering for new discovery in data
Mean shift clustering
Hierarchical clustering
- Kunal Parmar
Houston Machine
Learning Meetup
1/21/2017
Clustering : A world
without labels
• Finding hidden structure in data when we don’t
have labels/classes for the data
• We group data
together based
on some notion
of similarity in
the feature space
Clustering approaches
covered in previous lecture
• k-means clustering
o Iterative partitioning into k clusters based on proximity of an observation to
the cluster mean
Clustering approaches
covered in previous lecture
• DBSCAN
o Partition the feature space based on density
In this segment,
Mean shift clustering Hierarchical clustering
Mean shift clustering
• Mean shift clustering is a non-parametric iterative
mode-based clustering technique based on kernel
density estimation.
• It is very commonly used in the field of computer
vision because of it’s high efficiency in image
segmentation.
Mean shift clustering
• It assumes that our data is sampled from an
underlying probability distribution
• The algorithm finds out the modes(peaks) of the
probability distribution. The underlying kernel
distribution at the mode corresponds to a cluster
Kernel density estimation
Set of points KDE surface
Algorithm: Mean shift
1. Define a window (bandwidth of the kernel to be
used for estimation) and place the window on a
data point
2. Calculate mean of all the points within the window
3. Move the window to the location of the mean
4. Repeat step 2-3 until convergence
• On convergence, all data points within that window
form a cluster.
Example: Mean shift
Example: Mean shift
Example: Mean shift
Example: Mean shift
Types of kernels
• Generally, a Gaussian kernel is used for probability
estimation in mean shift clustering.
• However, other kinds of kernels that can be used
are,
o Rectangular kernel
o Flat kernel, etc.
• The choice of kernel affects the clustering result
Types of kernels
• The choice of the bandwidth of the kernel(window)
will also impact the clustering result
o Small kernels will result in lots of clusters, some even being individual data
points
o Big kernels will result in one or two huge clusters
Pros and cons : Mean Shift
• Pros
o Model-free, doesn’t assume predefined shape of clusters
o Only relies on one parameter: kernel bandwidth h
o Robust to outliers
• Cons
o The selection of window size is not trivial
o Computationally expensive; O(𝑛2
)
o Sensitive to selection of kernel bandwidth; small h will slow down convergence,
large h speeds it up but might merge two modes
Applications : Mean Shift
• Clustering and segmentation
• dfsn
Applications : Mean Shift
• Clustering and Segmentation
Hierarchical Clustering
• Hierarchical clustering creates clusters that have a
predetermined ordering from top to bottom.
• There are two types of hierarchical clustering:
o Divisive
• Top to bottom approach
o Agglomerative
• Bottom to top approach
Algorithm:
Hierarchical agglomerative clustering
1. Place each data point in it’s own singleton group
2. Iteratively merge the two closest groups
3. Repeat step 2 until all the data points are merged
into a single cluster
• We obtain a dendogram(tree-like structure) at the
final step. We cut the dendogram at a certain level
to obtain the final set of clusters.
Cluster similarity or
dissimilarity
• Distance metric
o Euclidean distance
o Manhattan distance
o Jaccard index, etc.
• Linkage criteria
o Single linkage
o Complete linkage
o Average linkage
Linkage criteria
• It is the quantification of the distance between sets
of observations/intermediate clusters formed in the
agglomeration process
Single linkage
• Distance between two clusters is the shortest
distance between two points in each cluster
Complete linkage
• Distance between two clusters is the longest
distance between two points in each cluster
Average linkage
• Distance between clusters is the average distance
between each point in one cluster to every point in
other cluster
Example: Hierarchical
clustering
• We consider a small dataset with seven samples;
o (A, B, C, D, E, F, G)
• Metrics used in this example
o Distance metric: Jaccard index
o Linkage criteria: Complete linkage
Example: Hierarchical
clustering
• We construct a dissimilarity matrix based on
Jaccard index.
• B and F are merged in this step as they have the
lowest dissimilarity
Example: Hierarchical
clustering
• How do we calculate distance of (B,F) with other
clusters?
o This is where the choice of linkage criteria comes in
o Since we are using complete linkage, we use the maximum distance
between two clusters
o So,
• Dissimilarity(B, A) : 0.5000
• Dissimilarity(F, A) : 0.6250
• Hence, Dissimilarity((B,F), A) : 0.6250
Example: Hierarchical
clustering
• We iteratively merge clusters at each step until all
the data points are covered,
i. merge two clusters with lowest dissimilarity
ii. update the dissimilarity matrix based on merged clusters
o sfs
Dendogram
• At the end of the agglomeration process, we
obtain a dendogram that looks like this,
• sfdafdfsdfsd
Cutting the tree
• We cut the dendogram at a level where there is a
jump in the clustering levels/dissimilarities
Cutting the tree
• If we cut the tree at 0.5, then we can say that within
each cluster the samples have more than 50%
similarity
• So our final set of clusters is,
i. (B,F),
ii. (A,E,C,G) and
iii. (D)
Final set of clusters
Impact of metrics
• The metrics chosen for hierarchical clustering can
lead to vastly different clusters.
• Distance metric
o In a 2-dimensional space, the distance between the point (1,0) and the
origin (0,0) can be 2 under Manhattan distance, 2 under Euclidean
distance.
• Linkage criteria
o Distance between two clusters can be different based on linkage criteria
used
Linkage criteria
• Complete linkage is the most popular metric used
for hierarchical clustering. It is less sensitive to
outliers.
• Single linkage can handle non-elliptical shapes. But,
single linkage can lead to clusters that are quite
heterogeneous internally and it more sensitive to
outliers and noise
Pros and Cons :
Hierarchical Clustering
• Pros
o No assumption of a particular number of clusters
o May correspond to meaningful taxonomies
• Cons
o Once a decision is made to combine two clusters, it can’t be undone
o Too slow for large data sets, O(𝑛2
log(𝑛))
References
i. https://spin.atomicobject.com/2015/05/26/mean-
shift-clustering/
ii. http://vision.stanford.edu/teaching/cs131_fall1314
_nope/lectures/lecture13_kmeans_cs131.pdf
iii. http://84.89.132.1/~michael/stanford/maeb7.pdf
Thank you!

More Related Content

What's hot (20)

PPT
★Mean shift a_robust_approach_to_feature_space_analysis
irisshicat
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PDF
Logistic regression in Machine Learning
Kuppusamy P
 
PPTX
Lecture 6: Ensemble Methods
Marina Santini
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
PDF
Bayes Belief Networks
Sai Kumar Kodam
 
PPTX
Unsupervised learning (clustering)
Pravinkumar Landge
 
PPTX
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
PPTX
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
PPTX
K means clustering
keshav goyal
 
PDF
Feature selection
Dong Guo
 
PPT
3.3 hierarchical methods
Krish_ver2
 
PDF
Deep Learning for Computer Vision: Image Classification (UPC 2016)
Universitat Politècnica de Catalunya
 
PPTX
Image feature extraction
Rushin Shah
 
PPTX
Support Vector Machines
CloudxLab
 
PPTX
Support Vector Machine ppt presentation
AyanaRukasar
 
PDF
Clustering
Rashmi Bhat
 
PPTX
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
PPTX
Unsupervised learning
amalalhait
 
PPTX
Kmeans
Nikita Goyal
 
★Mean shift a_robust_approach_to_feature_space_analysis
irisshicat
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Logistic regression in Machine Learning
Kuppusamy P
 
Lecture 6: Ensemble Methods
Marina Santini
 
Introduction to Machine Learning Classifiers
Functional Imperative
 
Bayes Belief Networks
Sai Kumar Kodam
 
Unsupervised learning (clustering)
Pravinkumar Landge
 
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
K means clustering
keshav goyal
 
Feature selection
Dong Guo
 
3.3 hierarchical methods
Krish_ver2
 
Deep Learning for Computer Vision: Image Classification (UPC 2016)
Universitat Politècnica de Catalunya
 
Image feature extraction
Rushin Shah
 
Support Vector Machines
CloudxLab
 
Support Vector Machine ppt presentation
AyanaRukasar
 
Clustering
Rashmi Bhat
 
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
Unsupervised learning
amalalhait
 
Kmeans
Nikita Goyal
 

Viewers also liked (20)

PDF
Nonlinear dimension reduction
Yan Xu
 
PDF
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
PPTX
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Yan Xu
 
PPTX
Clustering introduction
Yan Xu
 
PDF
K means and dbscan
Yan Xu
 
PPTX
Visualization using tSNE
Yan Xu
 
PPTX
Spectral clustering - Houston ML Meetup
Yan Xu
 
PDF
Kernel Bayes Rule
Yan Xu
 
PPTX
Feature Hierarchies for Object Classification
Vanya Valindria
 
PDF
Dbm630 lecture09
Tokyo Institute of Technology
 
PDF
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
grssieee
 
PPTX
Image segmentation techniques
gmidhubala
 
PDF
A comparison of image segmentation techniques, otsu and watershed for x ray i...
eSAT Journals
 
PPTX
Richardson
x0xmadisonx0x
 
PPTX
Hardware tania martinez3
Tania25made
 
PPTX
How to add products in your website
D'Trendy Clothings
 
PDF
Unidad 5.
felipe991107
 
PPT
Pollution
tingirikarrakesh42
 
PPTX
ONLINE STORE BUSINESS IN FAN PAGE
D'Trendy Clothings
 
PPTX
Cерницька пшб
Zarichne Crb Zarichne Crb
 
Nonlinear dimension reduction
Yan Xu
 
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Yan Xu
 
Clustering introduction
Yan Xu
 
K means and dbscan
Yan Xu
 
Visualization using tSNE
Yan Xu
 
Spectral clustering - Houston ML Meetup
Yan Xu
 
Kernel Bayes Rule
Yan Xu
 
Feature Hierarchies for Object Classification
Vanya Valindria
 
WE4.L09 - MEAN-SHIFT AND HIERARCHICAL CLUSTERING FOR TEXTURED POLARIMETRIC SA...
grssieee
 
Image segmentation techniques
gmidhubala
 
A comparison of image segmentation techniques, otsu and watershed for x ray i...
eSAT Journals
 
Richardson
x0xmadisonx0x
 
Hardware tania martinez3
Tania25made
 
How to add products in your website
D'Trendy Clothings
 
Unidad 5.
felipe991107
 
ONLINE STORE BUSINESS IN FAN PAGE
D'Trendy Clothings
 
Cерницька пшб
Zarichne Crb Zarichne Crb
 
Ad

Similar to Mean shift and Hierarchical clustering (20)

PPTX
05 Clustering in Data Mining
Valerii Klymchuk
 
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
PDF
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
PPTX
Data mining techniques unit v
malathieswaran29
 
PPTX
DS9 - Clustering.pptx
JK970901
 
PPT
26-Clustering MTech-2017.ppt
vikassingh569137
 
PDF
Chapter7 clustering types concepts algorithms.pdf
PRABHUCECC
 
PPTX
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
PPTX
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
 
PPTX
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
ABINASHPADHY6
 
PPTX
Machine Learning : Clustering - Cluster analysis.pptx
tecaviw979
 
PPTX
clustering and distance metrics.pptx
ssuser2e437f
 
PDF
clustering using different methods in .pdf
officialnovice7
 
PPT
pattern_recognition2.ppt
EricBacconi1
 
PPTX
Advanced database and data mining & clustering concepts
NithyananthSengottai
 
PDF
PPT s10-machine vision-s2
Binus Online Learning
 
PPTX
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
PDF
Hierarchical clustering for Petroleum.pdf
ArchanaBalikram1
 
PPTX
machine learning - Clustering in R
Sudhakar Chavan
 
PPTX
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
vigneshmatta2004
 
05 Clustering in Data Mining
Valerii Klymchuk
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
 
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
Data mining techniques unit v
malathieswaran29
 
DS9 - Clustering.pptx
JK970901
 
26-Clustering MTech-2017.ppt
vikassingh569137
 
Chapter7 clustering types concepts algorithms.pdf
PRABHUCECC
 
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
ABINASHPADHY6
 
Machine Learning : Clustering - Cluster analysis.pptx
tecaviw979
 
clustering and distance metrics.pptx
ssuser2e437f
 
clustering using different methods in .pdf
officialnovice7
 
pattern_recognition2.ppt
EricBacconi1
 
Advanced database and data mining & clustering concepts
NithyananthSengottai
 
PPT s10-machine vision-s2
Binus Online Learning
 
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
Hierarchical clustering for Petroleum.pdf
ArchanaBalikram1
 
machine learning - Clustering in R
Sudhakar Chavan
 
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
vigneshmatta2004
 
Ad

More from Yan Xu (20)

PPTX
Kaggle winning solutions: Retail Sales Forecasting
Yan Xu
 
PDF
Basics of Dynamic programming
Yan Xu
 
PPTX
Walking through Tensorflow 2.0
Yan Xu
 
PPTX
Practical contextual bandits for business
Yan Xu
 
PDF
Introduction to Multi-armed Bandits
Yan Xu
 
PDF
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
Yan Xu
 
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Yan Xu
 
PDF
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Yan Xu
 
PDF
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Yan Xu
 
PDF
Introduction to Autoencoders
Yan Xu
 
PPTX
State of enterprise data science
Yan Xu
 
PDF
Long Short Term Memory
Yan Xu
 
PDF
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
PPTX
Linear algebra and probability (Deep Learning chapter 2&3)
Yan Xu
 
PPTX
HML: Historical View and Trends of Deep Learning
Yan Xu
 
PDF
Secrets behind AlphaGo
Yan Xu
 
PPTX
Optimization in Deep Learning
Yan Xu
 
PDF
Introduction to Recurrent Neural Network
Yan Xu
 
PDF
Convolutional neural network
Yan Xu
 
PDF
Introduction to Neural Network
Yan Xu
 
Kaggle winning solutions: Retail Sales Forecasting
Yan Xu
 
Basics of Dynamic programming
Yan Xu
 
Walking through Tensorflow 2.0
Yan Xu
 
Practical contextual bandits for business
Yan Xu
 
Introduction to Multi-armed Bandits
Yan Xu
 
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
Yan Xu
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Yan Xu
 
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Yan Xu
 
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Yan Xu
 
Introduction to Autoencoders
Yan Xu
 
State of enterprise data science
Yan Xu
 
Long Short Term Memory
Yan Xu
 
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
Linear algebra and probability (Deep Learning chapter 2&3)
Yan Xu
 
HML: Historical View and Trends of Deep Learning
Yan Xu
 
Secrets behind AlphaGo
Yan Xu
 
Optimization in Deep Learning
Yan Xu
 
Introduction to Recurrent Neural Network
Yan Xu
 
Convolutional neural network
Yan Xu
 
Introduction to Neural Network
Yan Xu
 

Recently uploaded (20)

PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
PDF
Datàaaaaaaaaaengineeeeeeeeeeeeeeeeeeeeeee
juadsr96
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PPTX
Presentation abdominal distension (1).pptx
ChZiaullah
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
PPTX
Discrete Logarithm Problem in Cryptography (1).pptx
meshablinx38
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
covid 19 data analysis updates in our municipality
RhuAyungon1
 
PPTX
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
PPTX
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PDF
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
big data eco system fundamentals of data science
arivukarasi
 
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
Datàaaaaaaaaaengineeeeeeeeeeeeeeeeeeeeeee
juadsr96
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
Presentation abdominal distension (1).pptx
ChZiaullah
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Discrete Logarithm Problem in Cryptography (1).pptx
meshablinx38
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
covid 19 data analysis updates in our municipality
RhuAyungon1
 
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
Predicting Titanic Survival Presentation
praxyfarhana
 
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 

Mean shift and Hierarchical clustering

  • 1. Clustering for new discovery in data Mean shift clustering Hierarchical clustering - Kunal Parmar Houston Machine Learning Meetup 1/21/2017
  • 2. Clustering : A world without labels • Finding hidden structure in data when we don’t have labels/classes for the data • We group data together based on some notion of similarity in the feature space
  • 3. Clustering approaches covered in previous lecture • k-means clustering o Iterative partitioning into k clusters based on proximity of an observation to the cluster mean
  • 4. Clustering approaches covered in previous lecture • DBSCAN o Partition the feature space based on density
  • 5. In this segment, Mean shift clustering Hierarchical clustering
  • 6. Mean shift clustering • Mean shift clustering is a non-parametric iterative mode-based clustering technique based on kernel density estimation. • It is very commonly used in the field of computer vision because of it’s high efficiency in image segmentation.
  • 7. Mean shift clustering • It assumes that our data is sampled from an underlying probability distribution • The algorithm finds out the modes(peaks) of the probability distribution. The underlying kernel distribution at the mode corresponds to a cluster
  • 8. Kernel density estimation Set of points KDE surface
  • 9. Algorithm: Mean shift 1. Define a window (bandwidth of the kernel to be used for estimation) and place the window on a data point 2. Calculate mean of all the points within the window 3. Move the window to the location of the mean 4. Repeat step 2-3 until convergence • On convergence, all data points within that window form a cluster.
  • 14. Types of kernels • Generally, a Gaussian kernel is used for probability estimation in mean shift clustering. • However, other kinds of kernels that can be used are, o Rectangular kernel o Flat kernel, etc. • The choice of kernel affects the clustering result
  • 15. Types of kernels • The choice of the bandwidth of the kernel(window) will also impact the clustering result o Small kernels will result in lots of clusters, some even being individual data points o Big kernels will result in one or two huge clusters
  • 16. Pros and cons : Mean Shift • Pros o Model-free, doesn’t assume predefined shape of clusters o Only relies on one parameter: kernel bandwidth h o Robust to outliers • Cons o The selection of window size is not trivial o Computationally expensive; O(𝑛2 ) o Sensitive to selection of kernel bandwidth; small h will slow down convergence, large h speeds it up but might merge two modes
  • 17. Applications : Mean Shift • Clustering and segmentation • dfsn
  • 18. Applications : Mean Shift • Clustering and Segmentation
  • 19. Hierarchical Clustering • Hierarchical clustering creates clusters that have a predetermined ordering from top to bottom. • There are two types of hierarchical clustering: o Divisive • Top to bottom approach o Agglomerative • Bottom to top approach
  • 20. Algorithm: Hierarchical agglomerative clustering 1. Place each data point in it’s own singleton group 2. Iteratively merge the two closest groups 3. Repeat step 2 until all the data points are merged into a single cluster • We obtain a dendogram(tree-like structure) at the final step. We cut the dendogram at a certain level to obtain the final set of clusters.
  • 21. Cluster similarity or dissimilarity • Distance metric o Euclidean distance o Manhattan distance o Jaccard index, etc. • Linkage criteria o Single linkage o Complete linkage o Average linkage
  • 22. Linkage criteria • It is the quantification of the distance between sets of observations/intermediate clusters formed in the agglomeration process
  • 23. Single linkage • Distance between two clusters is the shortest distance between two points in each cluster
  • 24. Complete linkage • Distance between two clusters is the longest distance between two points in each cluster
  • 25. Average linkage • Distance between clusters is the average distance between each point in one cluster to every point in other cluster
  • 26. Example: Hierarchical clustering • We consider a small dataset with seven samples; o (A, B, C, D, E, F, G) • Metrics used in this example o Distance metric: Jaccard index o Linkage criteria: Complete linkage
  • 27. Example: Hierarchical clustering • We construct a dissimilarity matrix based on Jaccard index. • B and F are merged in this step as they have the lowest dissimilarity
  • 28. Example: Hierarchical clustering • How do we calculate distance of (B,F) with other clusters? o This is where the choice of linkage criteria comes in o Since we are using complete linkage, we use the maximum distance between two clusters o So, • Dissimilarity(B, A) : 0.5000 • Dissimilarity(F, A) : 0.6250 • Hence, Dissimilarity((B,F), A) : 0.6250
  • 29. Example: Hierarchical clustering • We iteratively merge clusters at each step until all the data points are covered, i. merge two clusters with lowest dissimilarity ii. update the dissimilarity matrix based on merged clusters o sfs
  • 30. Dendogram • At the end of the agglomeration process, we obtain a dendogram that looks like this, • sfdafdfsdfsd
  • 31. Cutting the tree • We cut the dendogram at a level where there is a jump in the clustering levels/dissimilarities
  • 32. Cutting the tree • If we cut the tree at 0.5, then we can say that within each cluster the samples have more than 50% similarity • So our final set of clusters is, i. (B,F), ii. (A,E,C,G) and iii. (D)
  • 33. Final set of clusters
  • 34. Impact of metrics • The metrics chosen for hierarchical clustering can lead to vastly different clusters. • Distance metric o In a 2-dimensional space, the distance between the point (1,0) and the origin (0,0) can be 2 under Manhattan distance, 2 under Euclidean distance. • Linkage criteria o Distance between two clusters can be different based on linkage criteria used
  • 35. Linkage criteria • Complete linkage is the most popular metric used for hierarchical clustering. It is less sensitive to outliers. • Single linkage can handle non-elliptical shapes. But, single linkage can lead to clusters that are quite heterogeneous internally and it more sensitive to outliers and noise
  • 36. Pros and Cons : Hierarchical Clustering • Pros o No assumption of a particular number of clusters o May correspond to meaningful taxonomies • Cons o Once a decision is made to combine two clusters, it can’t be undone o Too slow for large data sets, O(𝑛2 log(𝑛))