0% found this document useful (0 votes)

45 views

Chapter 5 Clustering

The document discusses clustering techniques including partitioning clustering and hierarchical clustering. Clustering involves grouping similar data objects into clusters while dissimilar objects are in separate clusters. The quality of clustering depends on intra-cluster similarity and inter-cluster dissimilarity. K-means clustering is presented as a partitioning clustering method that assigns objects to clusters based on distance from cluster centroids.

Uploaded by

Mohamedsultan Awol

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Chapter 5 Clustering

Uploaded by

Mohamedsultan Awol

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Clustering

• Clustering Quality
• Partitioning clustering
• Hierarchical clustering
Clustering
• Clustering is a data mining (machine learning) technique
that finds similarities between data according to the
characteristics found in the data & groups similar data
objects into one cluster
• Given a set of points, with a notion x x x
of distance between points, group x x x x x x x
x xx x x x x
the points into some number of
x x x x x x
clusters, so that members of a
x x x xx x
cluster are in some sense as close
to each other as possible. x x
• While data points in the same x x x x
cluster are similar, those in x x x
separate clusters are dissimilar to x
one another. 2
Example: clustering
• The example below demonstrates the clustering of
padlocks of same kind. There are a total of 10 b
which various in color, size, shape, etc.

• How many possible clusters of padlocks can be

identified?
– There are three different kind of padlocks; which can
be grouped into three different clusters.
– The padlocks of same kind are clustered into a group
as shown below:
Example: Clustering Application
• Text/Document Clustering:
– Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
– Approach:
• Identify content-bearing terms in each document.
• Form a similarity measure based on the frequencies
of different terms and use it to cluster documents.
– Application:
• Information Retrieval can utilize the clusters to
relate a new document or search term to clustered
documents.
Quality: What Is Good Clustering?
• The quality of a clustering result depends Intra-cluster
on both the similarity measure used by distances are
the method and its implementation minimized
– Key requirement of clustering: Need a
good measure of similarity between
instances.
• The quality of a clustering method is also
measured by its ability to discover some
or all of the hidden patterns in the given
datasets
• A good clustering method will produce
high quality clusters with Inter-cluster
distances are
– high intra-class similarity Inter
maximized
– low inter-class similarity
5
Cluster Evaluation: Hard Problem
• The quality of a clustering is very hard to
evaluate because
– We do not know the correct clusters/classes

• Some methods are used:

– User inspection
• Study centroids of the cluster, and spreads of data items in
each cluster
• For text documents, one can read some documents in clusters
to evaluate the quality of clustering algorithms employed.

7
Cluster Evaluation: Ground Truth
• We use some labeled data (for classification)
– Assumption: Each class is a cluster.
• After clustering, a confusion matrix is
constructed. From the matrix, we compute
various measurements, entropy, purity,
precision, recall and F-score.
– Let the classes in the data D be C = (c1, c2, …, ck). The
clustering method produces k clusters, which divides D
into k disjoint subsets, D1, D2, …, Dk.

8
Evaluation of Cluster Quality using Purity

Purity ( )i ) = max j (nij )

1
j C
ni
Purity example
• • • • •
• • • • •
• • • • • •
•
Cluster I Cluster II Cluster III
• Assume that we cluster three category of data items (those
colored with red, blue and green) into three clusters as
shown in the above figures. Calculate purity to measure the
quality of each cluster.
Cluster I: Purity = 1/6 (max(5, 1, 0)) = 5/6 = 83%
Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6 = 67%
Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5 = 60%
Indirect Evaluation
• In some applications, clustering is not the primary task, but
used to help perform another task.
• We can use the performance on the primary task to
compare clustering methods.
• For instance, in an application, the primary task is to
provide recommendations on book purchasing to online
shoppers.
– If we can cluster books according to their features, we might be
able to provide better recommendations.
– We can evaluate different clustering algorithms based on how
well they help with the recommendation task.
– Here, we assume that the recommendation can be reliably
evaluated.
13
Similarity/Dissimilarity Measures
• Each clustering problem is based on some kind of “distance” or
“nearness measurement” between data points.
• Distances are normally used to measure the similarity or
dissimilarity between two data objects
• Popular similarity measure is: Minkowski distance:
n
dis( X ,Y ) = q  (| x − y |) q

i =1 i i
where X = (x1, x2, …, xn) and Y = (y1, y2, …, yn) are two n-
dimensional data objects; n is size of vector attributes of the
data object; q= 1,2,3,…
• If q = 1, dis is Manhattan distance
n
dis( X , Y ) =  (| xi − yi |
i =1 14
Similarity and Dissimilarity Between Objects
• If q = 2, dis is Euclidean distance:
n
dis( X ,Y ) =  (| x − y |) 2

i =1 i i
• Cosine Similarity
– If X and Y are two vector attributes of data objects, then
cosine similarity measure is given by:
x •y
dis( X ,Y ) = i i
x  y
i i

where • indicates vector dot product, ||xi|| the length of

vector d
16
Example: Similarity measure
• Ex: Find the similarity between documents 1 and 2.
d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)
d1•d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25
||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)½
=(42)½ = 6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)½ =
(17)½ = 4.12

cos(d1, d2 ) = 0.94

17
The need for representative
• Key problem: as you build clusters, how do you
represent the location of each cluster, to tell which
pair of clusters is closest?
• For each cluster assign a centroid (closest to all
other points)= average of its points.

 N (C )
Cm = i = 1 ip
N
• Measure inter_cluster distances by distances of centroids.
Major Clustering Approaches
• Partitioning clustering approach:
– Construct various partitions and then evaluate them by
some criterion, e.g., minimizing the sum of square errors
– Typical methods:
• distance-based: K-means clustering
• model-based: expectation maximization (EM)
clustering.
• Hierarchical clustering approach:
– Create a hierarchical decomposition of the set of data (or
objects) using some criterion
– Typical methods:
• Agglomerative Vs Divisive
• Single link Vs Complete link 19
Partitioning Algorithms: Basic Concept
• Partitioning method: Construct a partition of a database D of
n objects into a set of k clusters; such that, sum of squared
distance is minimum
• Given a k, find a partition of k clusters that optimizes the
chosen partitioning criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic methods: k-means and k-medoids algorithms
– k-means: Each cluster is represented by the center of the cluster
• K is the number of clusters to partition the dataset
• Means refers to the average location of members of a
particular cluster
– k-medoids or PAM (Partition Around Medoids): Each cluster is
represented by one of the objects in the cluster
20
The K-Means Clustering Method
• Algorithm:
• Select K cluster points as initial centroids (the initial
centroids are selected randomly)
– Given k, the k-means algorithm is implemented as
follows:
• Repeat
–Partition objects into k nonempty subsets
–Recompute the centroids of each K clusters of the
current partition (the centroid is the center, i.e.,
mean point, of the cluster)
–Assign each object to the cluster with the nearest
seed point
• Until the centroid don’t change
21
The K-Means Clustering Method

• Example
10 Assign 10

9
10

9
9

8 each 8

7
Update 8

7
7

6
objects 6
the 6

5 5
5

4
to most 4
cluster 4

similar
3
3

2
2
means 2

center
1 1
1
0 0
0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

reassign reassign
K=2
Update
10 10

9 9

Arbitrarily choose 8

7 the cluster 8

K object as initial 6

5 means 6

cluster center 4

3
4

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

22
Example Problem
• Cluster the following eight points (with (x, y)
representing locations) into three clusters : A1(2,
10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)
A7(1, 2) A8(4, 9).
– Assume that initial cluster centers are: A1(2, 10),
A4(5, 8) and A7(1, 2).
• The distance function between two points a=(x1,
y1) and b=(x2, y2) is defined as:
dis(a, b) = |x2 – x1| + |y2 – y1| .
• Use k-means algorithm to find optimal centroids to
group the given data into three clusters.
Iteration 1
First we list all points in the first column of the table below. The initial
cluster centers – centroids, are (2, 10), (5, 8) and (1, 2) - chosen
randomly.
(2,10) (5, 8) (1, 2)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4) 12 7 9 2
A4 (5, 8) 5 0 10 2
A5 (7, 5) 10 5 9 2
A6 (6, 4) 10 5 7 2
A7 (1, 2) 9 10 0 3
A8 (4, 9) 3 2 10 2
Next, we will calculate the distance from each points to each of the
three centroids, by using the distance function:
dis(point i,mean j)=|x2 – x1| + |y2 – y1|
Iteration 1
• Starting from point A1 calculate the distance to each of the three
means, by using the distance function:
dis (A1, mean1) = |2 – 2| + |10 – 10| = 0 + 0 = 0
dis(A1, mean2) = |5 – 2| + |8 – 10| = 3 + 2 = 5
dis(A1, mean3) = |1 – 2| + |2 – 10| = 1 + 8 = 9
–Fill these values in the table & decide which cluster should the point (2,
10) be placed in? The one, where the point has the shortest distance to
the mean – i.e. mean 1 (cluster 1), since the distance is 0.
• Next go to the second point A2 and calculate the distance:
dis(A2, mean1) = |2 – 2| + |10 – 5| = 0 + 5 = 5
dis(A2, mean2) = |5 – 2| + |8 – 5| = 3 + 3 = 6
dis(A2, mean2) = |1 – 2| + |2 – 5| = 1 + 3 = 4
– So, we fill in these values in the table and assign the point (2, 5) to
cluster 3 since mean 3 is the shortest distance from A2.
• Analogically, we fill in the rest of the table, and place each point in
one of the clusters
Iteration 1
• Next, we need to re-compute the new cluster centers (means). We do
so, by taking the mean of all points in each cluster.
• For Cluster 1, we only have one point A1(2, 10), which was the old
mean, so the cluster center remains the same.
• For Cluster 2, we have five points and needs to take average of
them as new centroid, i,e.
( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)
• For Cluster 3, we have two points. The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
• That was Iteration1 (epoch1). Next, we go to Iteration2 (epoch2),
Iteration3, and so on until the centroids do not change anymore.
– In Iteration2, we basically repeat the process from Iteration1 this
time using the new means we computed.
Second epoch
• Using the new centroid we have to compute cluster members.
(2,10) (6, 6) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 8 7 1
A2 (2, 5) 5 5 2 3
A3 (8, 4) 12 4 7 2
A4 (5, 8) 5 3 8 2
A5 (7, 5) 10 2 7 2
A6 (6, 4) 10 2 5 2
A7 (1, 2) 9 9 2 3
A8 (4, 9) 3 5 8 1
• After the 2nd epoch the results would be:
cluster 1: {A1,A8} with new centroid=(3,9.5);
cluster 2: {A3,A4,A5,A6} with new centroid=(6.5,5.25);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Third epoch
• Using the new centroid we have to compute cluster members.
(3,9.5) (6.5, 5.25) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 7 1
A2 (2, 5) 2 3
A3 (8, 4) 7 2
A4 (5, 8) 8 1
A5 (7, 5) 7 2
A6 (6, 4) 5 2
A7 (1, 2) 2 3
A8 (4, 9) 8 1
• After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Fourth epoch
• Using the new centroid we have to compute cluster members.
(3.66,9) (7, 4.33) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 1
A2 (2, 5) 3
A3 (8, 4) 2
A4 (5, 8) 1
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
• After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Final results
• Finally in the 4th epoch there is no change of members of
clusters and centroids. So the algorithm stops.
• The result of clustering is shown in the following figure
Comments on the K-Means Method
• Strength: Relatively efficient: O(tkn), where n is # objects, k is #
clusters, and t is # iterations. Normally, k, t << n.
• Weakness
–Applicable only when mean is defined, then what about
categorical data? Use hierarchical clustering
• Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers Since an object with
an extremely large value may substantially distort the
distribution of the data.
• K-Medoids: Instead of taking the mean value of the object in a
cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
31
Hierarchical Clustering
• As compared to partitioning algorithm, in
hierarchical clustering the data are not
partitioned into a particular cluster in a
0.2

single step. 0.15

–Instead, a series of partitions takes place, 0.1

which may run from a single cluster containing 0.05

all objects to n clusters each containing a 0

1 3 2 5 4 6
single object.
• Produces a set of nested clusters organized
as a hierarchical tree.
–Hierarchical clustering outputs a hierarchy, a
5
structure that is more informative than the 6

4
unstructured set of clusters returned by 3
2
4

partitioning clustering. 5
2
–Can be visualized as a dendrogram; a tree
1
like diagram that records the sequences of 3 1

merges or splits
Dendrogram: Shows How the Clusters are Merged

Decompose data objects into a several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level, then each connected
component forms a cluster.

34
Example of hierarchical clustering
Two main types of hierarchical clustering
• Agglomerative: it is a Bottom Up clustering technique
– Start with all sample units in n clusters of size 1.
– Then, at each step of the algorithm, the pair of clusters with the shortest distance
are combined into a single cluster.
– The algorithm stops when all sample units are grouped into one cluster of size n.
• Divisive: it is a Top Down clustering technique
– Start with all sample units in a single cluster of size n.
– Then, at each step of the algorithm, clusters are partitioned into a pair of
daughter clusters, selected to maximize the distance between each daughter.
– The algorithm stops when sample units are partitioned into n clusters of size 1.
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
a
ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0
Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Let each data point be a cluster
2. Compute the proximity matrix
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
• Key operation is the computation of the proximity of two clusters
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Example
• Perform a agglomerative clustering of five samples using
two features X and Y. Calculate Manhattan distance
between each pair of samples to measure their similarity.
Data item X Y
1 4 4
2 8 4
3 15 8
4 24 4
5 24 12

• Assignment: apply divisive clustering of five samples given

above. Calculate Manhattan distance between each pair of
samples to measure their similarity/dissimilarity.
Proximity Matrix: First epoch
n
dis( X , Y ) =  (| xi − yi |
i =1

1 2 3 4 5

1 = (4,4) X 4 15 20 28
2= (8,4) X 11 16 24
3=(15,8) X 13 13
4=(24,4) X 8
5=(24,12) X
Proximity Matrix: second epoch

{1, 2} 3 4 5

{1,2} 13 18 26
(6,4) X
3=(15,8) X 13 13
4=(24,4) X 8
5=(24,12) X
Proximity Matrix: second epoch

{1, 2} 3 {4, 5}

{1,2} 13 22
(6,4) X
3=(15,8) X 13
{4,5}(24,8) X X
Proximity Matrix: second epoch

{1, 2} {3,4, 5}

{1,2} 17.5
(6,4) X
{3,4,5} X
(19.5,8) X
Assignment
• Single link agglomerative clustering (Abush)
• Complete link agglomerative clustering
(desalegn)
• Average link agglomerative clustering
(mohammed)
• Divisive clustering (Samuel & Getu)
• Expectation Maximization clustering (Kamal &
Ali)
• K-Medoid clustering (Sofonias & Yohannes)
Exercise: Hierarchical clustering
• Using centroid apply agglomerative
clustering algorithm to cluster the following 8
examples. Show the dendrograms.
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8),
A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).

• Also try to use single-link, complete-link,

average-link agglomerative clustering to
cluster the above given data?
Strengths of Hierarchical Clustering
• Do not have to assume any particular number
of clusters
– Any desired number of clusters can be obtained
by ‘cutting’ the dendogram at the proper level

• They may correspond to meaningful

taxonomies
– Example in biological sciences (e.g., animal
kingdom, phylogeny reconstruction, …)

Why GIS Petroleum
100% (2)
Why GIS Petroleum
49 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering
No ratings yet
Clustering
104 pages
37 Application of k means clustering
No ratings yet
37 Application of k means clustering
38 pages
Microsoft PowerPoint - Clustering - Week - 12 - 2 - 4.04
No ratings yet
Microsoft PowerPoint - Clustering - Week - 12 - 2 - 4.04
31 pages
Clustering
No ratings yet
Clustering
80 pages
dm 4
No ratings yet
dm 4
76 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
5 Algoritma Klastering
No ratings yet
5 Algoritma Klastering
85 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Unit5 - Unsupervised Learning
No ratings yet
Unit5 - Unsupervised Learning
48 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering
No ratings yet
Clustering
39 pages
Class19-22 Clustering 17-25oct2019
No ratings yet
Class19-22 Clustering 17-25oct2019
42 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Clustering Class Ppt
No ratings yet
Clustering Class Ppt
103 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Lecture 8
No ratings yet
Lecture 8
56 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Clustering
No ratings yet
Clustering
80 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
clustering
No ratings yet
clustering
62 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
13_Unsupervised_Learning
No ratings yet
13_Unsupervised_Learning
9 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
CLIQUE and PROCLUS
0% (1)
CLIQUE and PROCLUS
13 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Clustering
No ratings yet
Clustering
7 pages
8-cluster
No ratings yet
8-cluster
33 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Clustering New
No ratings yet
Clustering New
64 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Cluster
No ratings yet
Cluster
66 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Muhamedsultan Awol Compiler
No ratings yet
Muhamedsultan Awol Compiler
7 pages
Distributed System Assignment
No ratings yet
Distributed System Assignment
4 pages
Chapter 6-Synchronozation
No ratings yet
Chapter 6-Synchronozation
24 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
30 pages
Chapter 2 - Architecture
No ratings yet
Chapter 2 - Architecture
30 pages
Chapter 5-Naming
No ratings yet
Chapter 5-Naming
30 pages
Chapter 3-Process
No ratings yet
Chapter 3-Process
25 pages
Muhamed Nesru Temam Internshipreport
No ratings yet
Muhamed Nesru Temam Internshipreport
42 pages
01 - Introduction To Data Mining and Warehousing
No ratings yet
01 - Introduction To Data Mining and Warehousing
30 pages
Data Mining Issues
No ratings yet
Data Mining Issues
5 pages
Universite Kasdi Merbah Ouargla
No ratings yet
Universite Kasdi Merbah Ouargla
6 pages
Module 2 GIS Data Models Mensah2020
No ratings yet
Module 2 GIS Data Models Mensah2020
44 pages
An Assessment of The Applications of Artificial Intelligence AI in Remote Sensing and Geographical Information System GIS
No ratings yet
An Assessment of The Applications of Artificial Intelligence AI in Remote Sensing and Geographical Information System GIS
7 pages
Spatio-Temporal Variability of Soil Salinity and
No ratings yet
Spatio-Temporal Variability of Soil Salinity and
28 pages
GE-202-PRINCIPLES-OF-GEOGRAPHICAL-INFORMATION-SYSTEM-GIS
No ratings yet
GE-202-PRINCIPLES-OF-GEOGRAPHICAL-INFORMATION-SYSTEM-GIS
4 pages
Get Quantitative Methods and Socio Economic Applications in GIS 2nd Edition Fahui Wang Free All Chapters
100% (10)
Get Quantitative Methods and Socio Economic Applications in GIS 2nd Edition Fahui Wang Free All Chapters
70 pages
PG Town Planning
100% (1)
PG Town Planning
3 pages
Spatial Analyses of Homicide With Areal Data
No ratings yet
Spatial Analyses of Homicide With Areal Data
37 pages
Geography Matters: Section 1
No ratings yet
Geography Matters: Section 1
3 pages
Spatial Database
No ratings yet
Spatial Database
14 pages
Urban Morphology The Classical and Modern Research-Periodica Polytechnica Architecture
No ratings yet
Urban Morphology The Classical and Modern Research-Periodica Polytechnica Architecture
11 pages
1 s2.0 S1569843222000619 Main
No ratings yet
1 s2.0 S1569843222000619 Main
20 pages
Seismic Magnitude Forecasting Through Machine Learning Paradigms: A Confluence of Predictive Models
No ratings yet
Seismic Magnitude Forecasting Through Machine Learning Paradigms: A Confluence of Predictive Models
8 pages
Sameer - 2021 - J. - Phys. - Conf. - Ser. - 1973 - 012060
No ratings yet
Sameer - 2021 - J. - Phys. - Conf. - Ser. - 1973 - 012060
15 pages
Chapter 3-Gis Basics R
No ratings yet
Chapter 3-Gis Basics R
17 pages
(eBook PDF) GIS Research Methods: Incorporating Spatial Perspectives all chapter instant download
100% (8)
(eBook PDF) GIS Research Methods: Incorporating Spatial Perspectives all chapter instant download
56 pages
Geographical Appraisal on Spatial Pattern of the Declassification Trends among Census Towns in West Bengal
No ratings yet
Geographical Appraisal on Spatial Pattern of the Declassification Trends among Census Towns in West Bengal
20 pages
Spatial Statistics: Module Handbook
No ratings yet
Spatial Statistics: Module Handbook
11 pages
Green Premium in Buildings Evidence From The Real 2021 Journal of Cleaner
No ratings yet
Green Premium in Buildings Evidence From The Real 2021 Journal of Cleaner
14 pages
CED 1IntroductionToGIS
No ratings yet
CED 1IntroductionToGIS
23 pages
Tybsc.i.t. Sample MCQ
No ratings yet
Tybsc.i.t. Sample MCQ
11 pages
Elements of Success Urban Transportation Systems of 25 Global Cities July 2021
No ratings yet
Elements of Success Urban Transportation Systems of 25 Global Cities July 2021
138 pages
Spatial Information System
No ratings yet
Spatial Information System
8 pages
Trends in GIS
100% (1)
Trends in GIS
24 pages
Advanced Spatial Analysis Methods - Assignment Questions - Asian Institute of Technology
No ratings yet
Advanced Spatial Analysis Methods - Assignment Questions - Asian Institute of Technology
4 pages
Inter Cluster Inertia Gains: Slim Kammoun
No ratings yet
Inter Cluster Inertia Gains: Slim Kammoun
13 pages
Ijsms V2i6p110
No ratings yet
Ijsms V2i6p110
11 pages
Human Health and Ecological Risk Assessment With Spatial Analysis and Decision Assistance (SADA) Freeware
No ratings yet
Human Health and Ecological Risk Assessment With Spatial Analysis and Decision Assistance (SADA) Freeware
57 pages

Chapter 5 Clustering

Uploaded by

Chapter 5 Clustering

Uploaded by

Clustering

• How many possible clusters of padlocks can be

• Some methods are used:

Purity ( )i ) = max j (nij )

where • indicates vector dot product, ||xi|| the length of

single step. 0.15

–Instead, a series of partitions takes place, 0.1

which may run from a single cluster containing 0.05

all objects to n clusters each containing a 0

Decompose data objects into a several levels of nested

A clustering of the data objects is obtained by cutting the

• Assignment: apply divisive clustering of five samples given

• Also try to use single-link, complete-link,

• They may correspond to meaningful

You might also like