0% found this document useful (0 votes)

30 views

3 Clustering

Clustering is an unsupervised machine learning algorithm that groups similar data points together. It relies on calculating the distance between data points to determine clusters. The k-means clustering algorithm aims to partition observations into k clusters by minimizing the within-cluster sum of squares by iteratively updating cluster centroids.

Uploaded by

Nihad Ahmed

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

3 Clustering

Uploaded by

Nihad Ahmed

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Clustering

Clustering
 Unsupervised Algorithm

 Grouping a number of similar

things
 Clustering algorithms rely on a
distance metric between data
points
 Distance calculation is the most
important step

BMSCE - ME
MCL - Python
| PA G E 2
Can you group these items ?

BMSCE - ME
MCL - Python
| PA G E 3
Distance Metrics

Similarity and dissimilarity is

1. d ( x, y )  0
measured in distance 2. d ( x, y )  0 iff x  y
Distance measures should 3. d ( x, y )  d ( y , x )
satisfy below conditions
4. d ( x, z )  d ( x, y )  d ( y , z )

√ √
𝑘 𝑘
𝑒𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒= ∑ √ ( 𝑥𝑖− 𝑦𝑖 ) 2
𝑚𝑎𝑛h𝑎𝑡𝑡𝑎𝑛𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒= ∑ ¿ 𝑥𝑖− 𝑦𝑖∨¿¿
𝑖=1 𝑖=1

Cosine distance
Edit distance
BMSCE - ME
MCL - Python
| PA G E 4
Good Clustering
Algorithm

1
The ability to discover some or all of the
hidden clusters.

2
Within-cluster similarity and between-cluster
dissimilarity.

3
Ability to deal with various types of
attributes.

4 Can deal with noise and outliers.

5 Can handle high dimensionality.

6 Scalable, Interpretable and usable.

BMSCE - ME
MCL - Python
| PA G E 5
Error Metrics

Sum of squared
errors
• Cohesion • Separation
• Intra-Cluster distance • Inter-Cluster distance
• Sum of squared errors between • Sum of squared errors between
all the points inside a cluster all the points between clusters
• Minimum is better • Maximum is better
BMSCE - ME | PA G E
MCL - Python 6
• Cohesion : Minimize
• Separation : Maximize

A = Mean Intra Cluster Distance

B = Mean Nearest-cluster Distance

BMSCE - ME
MCL - Python
| PA G E 7
Complete Average
Single Linkage Linkage Linkage
Between Cluster Distance between two Distance between two Distance between two
clusters is defined as clusters is defined as clusters is defined as the
Distance the shortest distance the longest distance average distance
Functions between two points in between two points in between each point in
each cluster each cluster one cluster to every
point in the other cluster

BMSCE - ME
MCL - Python
| PA G E 8
Hierarchical Clustering

Divisive Clustering
Agglomerative
Clustering

BMSCE - ME
MCL - Python
| PA G E 9
A
A, D, F
D
A, D, F, H
F

H H

A, B, C, D, E
Agglomerative
E E
E, F, G, H

approach B, C
B

(Hierarchical) B, C, G
C

G G

BMSCE - ME
MCL - Python
| PA G E 10
N objects given 01

Flow of Assign one cluster to each points 02

Agglomerative Calculate distance between these clusters 03
Method
Merge two clusters with minimum distance 04

Repeat 3 and 4 until you reach one big cluster 05

BMSCE - ME
MCL - Python
| PA G E 11
A
A, D, F
D
A, D, F, H
F

H H

A, B, C, D,
Divisive
E E
E, F, G, H E

approach B, C
B

(Hierarchical) B, C, G
C

G G

BMSCE - ME
MCL - Python
| PA G E 12
N objects given 01

Flow of Divisive
Assign all the points to a single cluster 02
Method Partition the cluster into two clusters with maximum distance
03
between them (or two clusters which are least similar)

Continue 3rd step till you finally get to clusters with individual
points (N clusters) 04

BMSCE - ME
MCL - Python
| PA G E 13
K-means
Clustering
 Partition n objects into k clusters

 Each object belong to the cluster

with the nearest mean
 Produces exactly K clusters

 The objective of K-Means clustering

is to minimize total intra-cluster
variance, or, the squared error
function

BMSCE - ME
MCL - Python
| PA G E 14
Objective is to partition n given points to k (non zero) clusters 01

Randomly select k points at random as the cluster centroids 02

Flow of k-Means
03
Assign all points to their closest cluster centroid according to the Euclidean
distance function
Clustering Method
Calculate the new centroid or mean of all objects in each cluster. 04
Repeat steps 2, 3 and 4 until the centroid are fixed and doesn’t
change (same points are assigned to each cluster in consecutive
rounds)
05

BMSCE - ME
MCL - Python
| PA G E 15
K-means
Algorithm Randomly initialize k Assign each object to Compute new
points (centroids) the cluster of the centroids of the
nearest seed point clusters of the current
measured with a partition (the centroid
specific distance metric is the centre, i.e., mean
point, of the cluster)

BMSCE - ME
MCL - Python
| PA G E 16
How many
clusters in
k-means ?

BMSCE - ME
MCL - Python
| PA G E 17
Agglomerative Clustering
https://youtu.be/XJ3194AmH40?

Clustering t=4m47s

Demos
K-Means Algorithm
https://www.naftaliharris.com/blog/
visualizing-k-means-clustering/

BMSCE - ME
MCL - Python
| PA G E 18

Clustering
No ratings yet
Clustering
84 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering
No ratings yet
Clustering
29 pages
Unsupervised Algorithms Unit3
No ratings yet
Unsupervised Algorithms Unit3
53 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Unit 5
No ratings yet
Unit 5
63 pages
Cluster
100% (1)
Cluster
72 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Clustering Part-2
No ratings yet
Clustering Part-2
49 pages
Clustering
No ratings yet
Clustering
65 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
lec2
No ratings yet
lec2
32 pages
07Clustering
No ratings yet
07Clustering
34 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
L18_19_Clustering
No ratings yet
L18_19_Clustering
48 pages
Data Mining - Chapter 4 Cluster Analysis
No ratings yet
Data Mining - Chapter 4 Cluster Analysis
37 pages
Clustering
No ratings yet
Clustering
110 pages
Clustering
No ratings yet
Clustering
75 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
cz4041 10 Clustering
No ratings yet
cz4041 10 Clustering
67 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Latent Clustering W Mplus v2
No ratings yet
Latent Clustering W Mplus v2
57 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
CSE 319 Pattern Recognition: Clustering
No ratings yet
CSE 319 Pattern Recognition: Clustering
58 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Unit_4 (1)
No ratings yet
Unit_4 (1)
63 pages
Topic4 Clustering
No ratings yet
Topic4 Clustering
78 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
2 1clustering
No ratings yet
2 1clustering
22 pages
Module5 QB 1
No ratings yet
Module5 QB 1
21 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
UNIT5
No ratings yet
UNIT5
60 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering
No ratings yet
Clustering
75 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Clustering
No ratings yet
Clustering
11 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Week 9
No ratings yet
Week 9
66 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
L08 Hierachical agglomerative clustering
No ratings yet
L08 Hierachical agglomerative clustering
41 pages
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
From Everand
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
Christophe Lecoutre
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Skeletal Muscle Structure and Function, Muscle Contraction and Relaxation, Muscle Modelling, Metabolism
No ratings yet
Skeletal Muscle Structure and Function, Muscle Contraction and Relaxation, Muscle Modelling, Metabolism
25 pages
Bone Structure, Function and Its Mechanical Properties
No ratings yet
Bone Structure, Function and Its Mechanical Properties
13 pages
forces numericals
No ratings yet
forces numericals
15 pages
Engine emission
No ratings yet
Engine emission
21 pages
BFE_I
No ratings yet
BFE_I
43 pages
Centrifugal Compressor and Numerical
No ratings yet
Centrifugal Compressor and Numerical
26 pages
Centrifugal Pump and Numerical1
No ratings yet
Centrifugal Pump and Numerical1
37 pages
Math Derivations
No ratings yet
Math Derivations
29 pages
Available Energy and Exergy Analysis
No ratings yet
Available Energy and Exergy Analysis
43 pages
EVS
No ratings yet
EVS
42 pages
Unit-1 Notes Statistics & Probability EC-B-IV Sem
No ratings yet
Unit-1 Notes Statistics & Probability EC-B-IV Sem
130 pages
Thesis Mari Maisuradze
No ratings yet
Thesis Mari Maisuradze
76 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
CSE Data Mining Report
No ratings yet
CSE Data Mining Report
36 pages
Instant Download Data analytics Anil Maheshwari PDF All Chapters
100% (1)
Instant Download Data analytics Anil Maheshwari PDF All Chapters
41 pages
Prediction of Heart Disease Using A Hybrid Technique in Data Mining Classification
No ratings yet
Prediction of Heart Disease Using A Hybrid Technique in Data Mining Classification
3 pages
FDM 2024 Assignment II
No ratings yet
FDM 2024 Assignment II
2 pages
Cs614 Solved Current Subjective Final Term by Junaid
No ratings yet
Cs614 Solved Current Subjective Final Term by Junaid
8 pages
Clustering Techniques
No ratings yet
Clustering Techniques
38 pages
Curriculum Management System
No ratings yet
Curriculum Management System
81 pages
Instructor Slides: Distribution Without The Prior Written Consent of Mcgraw-Hill Education
No ratings yet
Instructor Slides: Distribution Without The Prior Written Consent of Mcgraw-Hill Education
71 pages
Smart and Intelligent Production Strategy For The
No ratings yet
Smart and Intelligent Production Strategy For The
9 pages
Cosc6211 Advanced Concepts in Data Mining: Weekend
No ratings yet
Cosc6211 Advanced Concepts in Data Mining: Weekend
33 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
0% (1)
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
58 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
AutoSmart An Efficient and Automatic Machine Learn
No ratings yet
AutoSmart An Efficient and Automatic Machine Learn
9 pages
Ambridge Institute of Technology
No ratings yet
Ambridge Institute of Technology
2 pages
Machine Learning Techniques (KCS 055) 06.09.2023
No ratings yet
Machine Learning Techniques (KCS 055) 06.09.2023
34 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Orange: From Experimental Machine Learning To Interactive Data Mining
No ratings yet
Orange: From Experimental Machine Learning To Interactive Data Mining
16 pages
M. Eulàlia Torras-Virgili, Andreu BELLOT-URBANO: Learning Analytics: Online Higher Education in Management
No ratings yet
M. Eulàlia Torras-Virgili, Andreu BELLOT-URBANO: Learning Analytics: Online Higher Education in Management
7 pages
Data Mining and Data Warehousing
100% (2)
Data Mining and Data Warehousing
11 pages
Using Adaptive Alert Classification To Reduce False Positives in Intrusion Detection
No ratings yet
Using Adaptive Alert Classification To Reduce False Positives in Intrusion Detection
24 pages
Internal Guide-BE Project Groups 2015-16
No ratings yet
Internal Guide-BE Project Groups 2015-16
28 pages
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
No ratings yet
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
6 pages
What Is CRISP DM - Data Science Process Alliance
No ratings yet
What Is CRISP DM - Data Science Process Alliance
20 pages
Chapter 9-Analysis of Ecological Distance by Clustering
No ratings yet
Chapter 9-Analysis of Ecological Distance by Clustering
14 pages
2010 Icdc
No ratings yet
2010 Icdc
11 pages
Unit 2
No ratings yet
Unit 2
57 pages
Forest Supply Chains During Digitalization: Current Implementations and Prospects in Near Future
No ratings yet
Forest Supply Chains During Digitalization: Current Implementations and Prospects in Near Future
16 pages

3 Clustering

Uploaded by

3 Clustering

Uploaded by

Clustering

 Grouping a number of similar

Similarity and dissimilarity is

4 Can deal with noise and outliers.

5 Can handle high dimensionality.

6 Scalable, Interpretable and usable.

A = Mean Intra Cluster Distance

Flow of Assign one cluster to each points 02

Repeat 3 and 4 until you reach one big cluster 05

 Each object belong to the cluster

 The objective of K-Means clustering

Randomly select k points at random as the cluster centroids 02

You might also like