Cluster Analysis

Uploaded by

Amber Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Cluster Analysis

Uploaded by

Amber Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Cluster Analysis

Hierarchical Cluster Analysis

• Hierarchical cluster analysis, is an algorithm that groups similar
objects into groups called clusters.
• The endpoint is a set of clusters, where each cluster is distinct from
each other cluster, and the objects within each cluster are broadly
similar to each other.
• Hierarchical clustering involves creating clusters that have a
predetermined ordering from top to bottom. For example, all files
and folders on the hard disk are organized in a hierarchy
Agglomerative vs Divisive method
How hierarchical clustering works
Hierarchical clustering starts by treating each observation as a separate
cluster. Then, it repeatedly executes the following two steps:
(1) identify the two clusters that are closest together, and
(2) merge the two most similar clusters. This iterative process continues
until all the clusters are merged together.
The main output of Hierarchical Clustering is a dendrogram, which shows the
hierarchical relationship between the clusters. The one cluster which combines
all the distances is known as hierarchical clustering or agglomeration schedule.
Usage of Hierarchical clustering
• Hierarchical clustering is the most popular and widely used method to
analyze social network data. In this method, nodes are compared with
one another based on their similarity. Larger groups are built by
joining groups of nodes based on their similarity. A criterion is
introduced to compare nodes based on their relationship.
Measures of distance
Methods:
1. Distance between two points
2. Distance between point and cluster
3. Distance between two clusters
Distance methods- k means
• Euclidean distance
• Manhattan distance
• Minkowski distance
• Hamming distance
Euclidean distance L2
Manhattan distance L1
Minkowski distance

• Minkowski Distance is the

generalized form of Euclidean and
Manhattan Distance.
• When the order(p) is 1, it will
represent Manhattan Distance and
when the order in the above
formula is 2, it will represent
Euclidean Distance.
Hamming distance
• The hamming distance measures
in machine learning the
similarity between two strings of
Example: the same length. The hamming
“euclidean” and “manhattan” distance is the number of
positions at which the
corresponding characters are
different.
Hierarchical Linkage criteria

1.single-linkage: shortest distance

2. complete-linkage :farthest distance
3. mean or average-linkage: the center of the clusters or some other
criterion.
4. Ward linkage: minimum variance criterion minimizes the total within-
cluster variance.
Single linkage
• the distance between two clusters is defined as the shortest distance
between two points in each cluster
Complete linkage
• the distance between two clusters is defined as the longest distance
between two points in each cluster.
Average linkage
• the distance between two clusters is defined as the average distance
between each point in one cluster to every point in the other cluster
(average of all pairwise distances)
Ward’s method
• In Ward's minimum-variance method, the distance between two
clusters is the ANOVA sum of squares between the two clusters added
up over all the variables.
• At each generation, the within-cluster sum of squares is minimized
over all partitions obtainable by merging two clusters from the
previous generation.
• The sums of squares are easier to interpret when they are divided by
the total sum of squares to give proportions of variance (squared semi-
partial correlations).
• Imp:
• The choice of distance metric should be made based on theoretical concerns from
the domain of study. For example, if clustering crime sites in a city, city block
distance may be appropriate. Or, better yet, the time taken to travel between each
location. Where there is no theoretical justification for an alternative, the Euclidean
should generally be preferred, as it is usually the appropriate measure of distance in
the physical world.
The choice of linkage criteria should also be made based on theoretical
considerations from the domain of application. A key theoretical issue is what causes
variation. For example, in archeology, we expect variation to occur through
innovation and natural resources, so working out if two groups of artifacts are similar
may make sense based on identifying the most similar members of the cluster. Where
there are no clear theoretical justifications for the choice of linkage criteria, Ward’s
method is the sensible default. This method works out which observations to group
based on reducing the sum of squared distances of each observation from the average
observation in a cluster.
K-means clustering
• Partitional, non-
deterministic

• Works on centroid
K-means clustering
The idea behind k-Means is that we want to add k new points to the data we have.

Each one of those points — called a Centroid — will be going around trying to center itself in the middle of one of the k
clusters we have.

Once those points stop moving, our clustering algorithm stops.

The value of k is of great importance.

This k is called a hyper-parameter; a variable whose value we set before training. This k specifies the number of clusters we
want the algorithm to yield. This number of clusters is actually the number of centroids going around in the data.

Each data is mapped into the cluster with its nearest mean
Identifying k- Elbow method
• For each value of K, we are
calculating WCSS (Within-
Cluster Sum of Square). WCSS
is the sum of the squared
distance between each point and
the centroid in a cluster. This
point that defines the optimal
number of clusters is known as
the “elbow point”.
Identifying k- Silhouette coefficient
• This coefficient is a measure of
cluster cohesion and separation
• ranges between -1 and 1
Advantages K-means Advantages Hierarchical
Convergence is guaranteed Ease of handling of any forms of similarity or
distance.
Specialized to clusters of different sizes and shapes. Consequently, applicability to any attribute's types.

Can handle big data Cant handle big data

Disadvantages Disadvantages
K-Value is difficult to predict Hierarchical clustering requires the computation and
storage of an n×n distance matrix. For very large
datasets, this can be expensive and slow
K-means produces clusters with uniform sizes (in
terms of density and quantity of observations), even
though the underlying data might behave in a very
different way.
K-means is very sensitive to outliers, since centroids
can be dragged in the presence of noisy data.

Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
PPF Homework: Food (Pounds Per Month) Sunscreen (Gallons Per Month) 300 and 0 200 and 50 100 and 100 0 and 150
No ratings yet
PPF Homework: Food (Pounds Per Month) Sunscreen (Gallons Per Month) 300 and 0 200 and 50 100 and 100 0 and 150
4 pages
Detailed Lesson Plan in Science 5
90% (10)
Detailed Lesson Plan in Science 5
3 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
STAT452 Project1
No ratings yet
STAT452 Project1
13 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Chapter 4 _ Clustering
No ratings yet
Chapter 4 _ Clustering
21 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Clustering
No ratings yet
Clustering
69 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
An Overview On Clustering Methods: T. Soni Madhulatha
No ratings yet
An Overview On Clustering Methods: T. Soni Madhulatha
7 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Module12.02 UnsupervisedLearning
No ratings yet
Module12.02 UnsupervisedLearning
25 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Clustering
No ratings yet
Clustering
20 pages
AI20- Hierarchical-clustering
No ratings yet
AI20- Hierarchical-clustering
31 pages
Slides - Clustering
No ratings yet
Slides - Clustering
13 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
8.Cluster Analysis HCA
No ratings yet
8.Cluster Analysis HCA
31 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
11 Chapter 3
No ratings yet
11 Chapter 3
17 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
UNIT IV
No ratings yet
UNIT IV
19 pages
Clustering
No ratings yet
Clustering
75 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sem Mva
No ratings yet
Sem Mva
25 pages
Anova
No ratings yet
Anova
17 pages
ISB PM Brochure Batch 21
No ratings yet
ISB PM Brochure Batch 21
24 pages
13Sqft - Ams: Amal Jose Kollannoor
No ratings yet
13Sqft - Ams: Amal Jose Kollannoor
11 pages
Pitch - 13SQFT Ams
No ratings yet
Pitch - 13SQFT Ams
10 pages
Rodacar's Bulgarian Rover - The Car We Later Loved As The Ledbury Maestro
No ratings yet
Rodacar's Bulgarian Rover - The Car We Later Loved As The Ledbury Maestro
17 pages
Tutoring Lesson Plan 3
No ratings yet
Tutoring Lesson Plan 3
8 pages
TRACE FOSSILS Ichnology - Dr. BNS
No ratings yet
TRACE FOSSILS Ichnology - Dr. BNS
39 pages
WADA Presentation Final
100% (1)
WADA Presentation Final
30 pages
Mainstream Science Is A Religion
100% (1)
Mainstream Science Is A Religion
7 pages
Downloader (4)
No ratings yet
Downloader (4)
20 pages
(PDF Download) Atlas of Small Animal Diagnostic Imaging Nathan C. Nelson Fulll Chapter
100% (3)
(PDF Download) Atlas of Small Animal Diagnostic Imaging Nathan C. Nelson Fulll Chapter
64 pages
Aviation Licensing in India
No ratings yet
Aviation Licensing in India
9 pages
What Montessori Is Why It Matters
No ratings yet
What Montessori Is Why It Matters
43 pages
S&W 422 Rimfire Pistols
No ratings yet
S&W 422 Rimfire Pistols
11 pages
5 Operational Budgeting
No ratings yet
5 Operational Budgeting
6 pages
IB Physics SL/HL Study Guide
No ratings yet
IB Physics SL/HL Study Guide
23 pages
Issues & Challenges in Mobile Banking in India
No ratings yet
Issues & Challenges in Mobile Banking in India
10 pages
3rd Grade (Bundle 3) FOSSILS
No ratings yet
3rd Grade (Bundle 3) FOSSILS
44 pages
Homeless and Low Income Resources Clothing Provider List
No ratings yet
Homeless and Low Income Resources Clothing Provider List
2 pages
Dark and Light - The Strange Case of The Decline of Illustration
No ratings yet
Dark and Light - The Strange Case of The Decline of Illustration
28 pages
Question Bank
No ratings yet
Question Bank
131 pages
Program Evaluation Action Research Protocol Spring 2024
No ratings yet
Program Evaluation Action Research Protocol Spring 2024
13 pages
DATA PROCESSING MOCK EXAM QUESTIONS
No ratings yet
DATA PROCESSING MOCK EXAM QUESTIONS
4 pages
Special Rules of Court ADR
No ratings yet
Special Rules of Court ADR
58 pages
Sấy Phun Mật Ong
No ratings yet
Sấy Phun Mật Ong
13 pages
Eks-Cluster-Setup
No ratings yet
Eks-Cluster-Setup
11 pages
Pemerintah Kabupaten Bojonegoro
No ratings yet
Pemerintah Kabupaten Bojonegoro
12 pages
Who Are Briad
No ratings yet
Who Are Briad
2 pages
Perfect Modals
No ratings yet
Perfect Modals
3 pages
17.talent Sprint P&C
No ratings yet
17.talent Sprint P&C
5 pages
Alishia Leck, R/W - Nac
No ratings yet
Alishia Leck, R/W - Nac
8 pages
332
No ratings yet
332
16 pages

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

Cluster Analysis

Hierarchical Cluster Analysis

• Minkowski Distance is the

1.single-linkage: shortest distance

Once those points stop moving, our clustering algorithm stops.

The value of k is of great importance.

Can handle big data Cant handle big data

You might also like