Clustering Algorithms

The document provides an overview of clustering algorithms, focusing on their role in unsupervised learning to group similar data points. It discusses various clustering methods, including hierarchical, agglomerative, and k-means, along with their requirements and similarity measuring techniques. Additionally, it explains the k-means algorithm in detail, including its steps, centroid calculations, and alternatives like k-medoids and k-modes for specific data types.

Uploaded by

Areeba chaudhary306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Clustering Algorithms

Uploaded by

Areeba chaudhary306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Clustering Algorithms

Spring 2025, BSSE F221

Dr. Umara Zahid
Introduction
• Unsupervised Learning Approach
• A cluster is a subset of data which are similar
• Clustering divides the dataset into groups such as:
• The members of each group are as similar as possible to one another
• Different groups are as dissimilar as possible from one another
• Clustering can uncover previously undetected relationships in a
dataset
• Applications of cluster analysis:
• In businesses discover and characterize customers’ segments for marketing
• In biosciences, cluster analysis is used for distinguishing plants and animals
Clustering
Clustering Algorithms
Hierarchical Partitive

Self Organizing
Agglomerative Divisive K-means
Maps
Requirements of a Good Clustering
Method
• The ability to discover some or all of the hidden clusters
• Within cluster similarity and between cluster dissimilarity
• Ability to deal with various types of attributes
• Can deal with noise and outliers
• Can handle high dimensionality
• Scalable, Interpretable, and usable
Similarity Measuring Methods
• To determine similarities among
objects/ points distance functions are
used (as we used in Knn)
• A distance function returns a lower
value for pairs of objects that are more
similar to one another
Hierarchical Clustering
• Clusters with predetermined ordering from top to bottom
• Two kinds: Divisive and Agglomerative
Intuitive Explanation
Divisive
• Assign all the observations to a single
cluster and then partition the cluster into
two least similar clusters
• Proceed recursively on each cluster until
there is one cluster for each observation
Agglomerative
• Assign each observation to its own cluster
• Compute similarity among individual
clusters and join them based on small
distance
• Repeat until all data points are joined in a
single cluster
Proximity Matrix
• Before clustering is performed, it is required to determine the
proximity matrix containing the distance between each point using a
distance function
• Then, the matrix is updated to display the distance between each
cluster
• The following methods differ in how the distance between each
cluster is measured:
• Single Linkage
• Complete Linkage
• Average Linkage
• Minimum Variance (Ward’s method)
• Centroid Method
Single Linkage
• The distance between two clusters is defined as the shortest distance
between two points in each cluster
• The distance between cluster “Blue” and “Red” is equal to the length
of the arrow between their two closest points
Complete Linkage
• The distance between two clusters is defined as the Longest distance
between two points in each cluster
• The distance between cluster “r” and “s” is equal to the length of the
arrow between their two furthest points
Average Linkage
• The distance between two clusters is defined as the average distance
between each point in one cluster to every point in another cluster
• The distance between clusters “r” and “s” is equal to the average
length of each arrow connecting the points of one cluster to the other
Ward’ and Centroid Methods
• Ward’s Method (Minimum Variance)
• The error sum of squares between the two clusters over all of the data points
is used as the distance

• Centroid Method
• The center of the data points is used to determine the average distance
between clusters
K-means Clustering
• The k-means algorithm is an unsupervised machine learning technique used for
clustering similar data points together. The algorithm partitions a dataset into k
clusters, where each cluster represents a group of data points that are similar to each
other in some way.
• The intuitive working of the k-means algorithm can be summarized in the following
steps:
1. Initialization: Choose the number of clusters (k) and randomly initialize the centroids
of each cluster. A centroid is simply the mean of all the data points in the cluster.
2. Assignment: Assign each data point to the closest centroid based on their distance
to the centroid. The distance can be calculated using a variety of distance measures,
such as Euclidean distance or Manhattan distance.
3. Recalculation: Recalculate the centroids of each cluster based on the mean of all the
data points assigned to it.
4. Repeat: Repeat steps 2 and 3 until the centroids no longer move or a specified
number of iterations is reached.
A working Example
• To illustrate this algorithm, consider the following example of a
dataset consisting of points in a two-dimensional space:
[(2, 3), (3, 5), (2, 4), (7, 6), (8, 7), (9, 6), (11, 8), (12, 9)]
• Let’s say we want to cluster these points into two groups (k=2). We
can start by randomly initializing the centroids of each cluster, for
example:
• centroid1 = (2, 3)
• centroid2 = (7, 6)
Continued…
• Next, we assign each data point to the closest centroid based on their distance. In this
case, we can calculate the Euclidean distance between each point and each centroid:
• (2, 3)  centroid1: 0, centroid2: 5.8
• (3, 5)  centroid1: 2.2, centroid2: 4.1
• (2, 4)  centroid1: 1, centroid2: 5.3
• (7, 6)  centroid1: 5.8, centroid2: 0
• (8, 7)  centroid1: 7.2, centroid2: 1.4
• (9, 6)  centroid1: 7.6, centroid2: 2
• (11, 8)  centroid1: 10.29, centroid2: 4.4
• (12, 9)  centroid1: 11.6, centroid2: 5.8
• Based on these distances, we can assign each point to the closest centroid:
• Cluster 1: [(2, 3), (3, 5), (2, 4)]
• Cluster 2: [(7, 6), (8, 7), (9, 6), (11, 8), (12, 9)]
Continued…
• Next, we recalculate the centroids of each cluster based on the mean
of all the data points assigned to it:
• centroid1 = (2.33, 4)
• centroid2 = (9.4, 7.2)
• We then repeat the assignment and recalculation steps until the
centroids no longer move or a specified number of iterations is
reached.
• In k-means clustering, the centroid of a cluster is the mean of all the
data points assigned to that cluster. It is entirely possible that, after
recalculating the centroid, the new centroid does not exist in the
original dataset.
Is this a problem?
• No, this is expected behavior in standard k-means. The centroid does
not need to be an actual data point; it simply represents the average
location of the cluster. The algorithm still works by iterating until the
centroids stabilize.
Why does this happen?
• Centroid is an average: The centroid is computed as the mean of all
points in a cluster. The mean of multiple points may result in a value
that is not present in the dataset.

• Continuous feature space: If your data points are in a continuous

space (e.g., real numbers), the mean will almost always be a new
point not present in the original dataset.

• Non-uniform distributions: If data points are widely scattered, the

centroid may lie in an empty region of the space.
What if we want centroids to be
data points?
• k-Medoids (PAM - Partitioning Around Medoids): Instead of taking
the mean, this selects a representative data point as the cluster
center.

• k-Modes (for categorical data): Uses the most frequent categorical

value instead of the mean.

• Modified k-means (K-means++): You can modify centroid selection by

restricting it to the nearest actual data point.

ISC2 Certified in Cybersecurity Exam Questions - PDF
100% (6)
ISC2 Certified in Cybersecurity Exam Questions - PDF
18 pages
The Art of ChatGPT Prompting - A Guide To Crafting Clear and Effective Prompts PDF
100% (3)
The Art of ChatGPT Prompting - A Guide To Crafting Clear and Effective Prompts PDF
31 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
algo
No ratings yet
algo
59 pages
Clustering
No ratings yet
Clustering
125 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Module 5
No ratings yet
Module 5
98 pages
Unit 5
No ratings yet
Unit 5
63 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
K Means
No ratings yet
K Means
26 pages
Unit-4
No ratings yet
Unit-4
19 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering
No ratings yet
Clustering
17 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
M5
No ratings yet
M5
40 pages
Clustering_notes
No ratings yet
Clustering_notes
29 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
PART2
No ratings yet
PART2
61 pages
Lecture-18-Clustering-19092024-091909am
No ratings yet
Lecture-18-Clustering-19092024-091909am
33 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
ML 5 (1)
No ratings yet
ML 5 (1)
61 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
M5
No ratings yet
M5
40 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
K means Clustering
No ratings yet
K means Clustering
11 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Clustering Analysis: What Is Cluster Analysis?
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
5 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Association Rule Mining
No ratings yet
Association Rule Mining
13 pages
3
No ratings yet
3
19 pages
Lecture 12 e-eng
No ratings yet
Lecture 12 e-eng
27 pages
Design Patterns - 3
No ratings yet
Design Patterns - 3
26 pages
Lecture 11 re-eng
No ratings yet
Lecture 11 re-eng
21 pages
Design and Verification of AMBA APB Protocol
No ratings yet
Design and Verification of AMBA APB Protocol
8 pages
Pathfinder P1600 Embedded System-on-Module Platform
No ratings yet
Pathfinder P1600 Embedded System-on-Module Platform
2 pages
03 - Literature Review
No ratings yet
03 - Literature Review
13 pages
Integrative_AI_Framework_for_Molecular_Retrosynthesis_via_Heterogeneous_Knowledge_Graphs (1)
No ratings yet
Integrative_AI_Framework_for_Molecular_Retrosynthesis_via_Heterogeneous_Knowledge_Graphs (1)
5 pages
Macho: Programming With Man Pages: Anthony Cozzie, Murph Finnicum, and Samuel T. King University of Illinois
No ratings yet
Macho: Programming With Man Pages: Anthony Cozzie, Murph Finnicum, and Samuel T. King University of Illinois
5 pages
U.are.U SDK Developer Guide
100% (2)
U.are.U SDK Developer Guide
78 pages
Nagiosgraph Howto
No ratings yet
Nagiosgraph Howto
3 pages
Arbitrary Precision Calculator
No ratings yet
Arbitrary Precision Calculator
11 pages
DatabaseDesignDocumentV1 1
No ratings yet
DatabaseDesignDocumentV1 1
15 pages
Business driven information systems Fifth Edition. Edition Baltzan - The full ebook set is available with all chapters for download
100% (1)
Business driven information systems Fifth Edition. Edition Baltzan - The full ebook set is available with all chapters for download
51 pages
Subject: PRF192-PFC Workshop 06 Objectives: Managing Arrays
No ratings yet
Subject: PRF192-PFC Workshop 06 Objectives: Managing Arrays
3 pages
Smart Bus Management System Architecture Using Mesh App and Service Architecture
No ratings yet
Smart Bus Management System Architecture Using Mesh App and Service Architecture
8 pages
What is a Pointer in C
No ratings yet
What is a Pointer in C
15 pages
Mses Final Exam
No ratings yet
Mses Final Exam
8 pages
The Art of Using Text Messages To Get Your Ex Back (With 53 Examples)
No ratings yet
The Art of Using Text Messages To Get Your Ex Back (With 53 Examples)
31 pages
UNIT 3 Multimedia
No ratings yet
UNIT 3 Multimedia
22 pages
BT-70780 Universal Li-Ion Battery 7.2v PDF
No ratings yet
BT-70780 Universal Li-Ion Battery 7.2v PDF
1 page
Dokumen - Tips - Sugar Developer Guide 6 Sugarcrm Support Developer Guide 67 1 770 Sugar Developer
No ratings yet
Dokumen - Tips - Sugar Developer Guide 6 Sugarcrm Support Developer Guide 67 1 770 Sugar Developer
770 pages
LG 32LG6000 LCDTV Service Manual
No ratings yet
LG 32LG6000 LCDTV Service Manual
36 pages
W1 C2 Student Worksheet KEY
No ratings yet
W1 C2 Student Worksheet KEY
3 pages
Caliber rmd575bt
No ratings yet
Caliber rmd575bt
21 pages
Kernel Log
No ratings yet
Kernel Log
1 page
LAB MANUAL - OS - 2021 Regulation Final-1
No ratings yet
LAB MANUAL - OS - 2021 Regulation Final-1
68 pages
Manifest NonUFSFiles Win64
No ratings yet
Manifest NonUFSFiles Win64
4 pages
Description of Options: Tcp/ip Ethernet Module User's Manual
No ratings yet
Description of Options: Tcp/ip Ethernet Module User's Manual
25 pages
ENA&BBS Controller Reprogram Manual
No ratings yet
ENA&BBS Controller Reprogram Manual
8 pages
IOT Unit-1 (R20 DS)
No ratings yet
IOT Unit-1 (R20 DS)
25 pages
Cluster Management Using Oncommand System Manager: Ontap 9
No ratings yet
Cluster Management Using Oncommand System Manager: Ontap 9
356 pages

Clustering Algorithms

Uploaded by

Clustering Algorithms

Uploaded by

Clustering Algorithms

Spring 2025, BSSE F221

• Continuous feature space: If your data points are in a continuous

• Non-uniform distributions: If data points are widely scattered, the

• k-Modes (for categorical data): Uses the most frequent categorical

• Modified k-means (K-means++): You can modify centroid selection by

You might also like