0% found this document useful (0 votes)
9 views

Unit- 4 DMA

The document provides an overview of cluster analysis, detailing its definition, applications, and various methods such as k-means, k-medoids, and hierarchical clustering. It discusses the importance of quality in clustering, the challenges faced in data mining, and the types of data involved in clustering, including interval-scaled, binary, categorical, ordinal, and ratio-scaled variables. Additionally, it addresses distance measures used for calculating dissimilarity between objects and the significance of data standardization.

Uploaded by

livyajain143
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit- 4 DMA

The document provides an overview of cluster analysis, detailing its definition, applications, and various methods such as k-means, k-medoids, and hierarchical clustering. It discusses the importance of quality in clustering, the challenges faced in data mining, and the types of data involved in clustering, including interval-scaled, binary, categorical, ordinal, and ratio-scaled variables. Additionally, it addresses distance measures used for calculating dissimilarity between objects and the significance of data standardization.

Uploaded by

livyajain143
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 145

Cluster Analysis Unit - 4

Syllabus
• Cluster Analysis: Introduction
• Requirements and overview of different categories
• Partitioning method: Introduction
• k-means
• k-medoids
• Hierarchical method: Introduction
• Agglomerative vs. Divisive method
• Distance measures in algorithmic methods
• BIRCH technique
• DBSCAN technique
• STING technique
• CLIQUE technique
• Evaluation of clustering techniques
Session 1
Cluster Analysis: Introduction
Requirements and overview of different categories
• Clustering is the process of grouping a set of data objects intomultiple
groups or clusters
• so that objects within a cluster have high similarity, but are very
dissimilar to objects in other clusters.
• Dissimilarities and similarities are assessed based on the attribute
values describing the objects and often involve distance measures.
• Clustering as a data mining tool has its roots in many application
areas such as biology, security, business intelligence, and Web search.
Cluster Analysis
• Cluster: A collection of data objects
• similar (or related) to one another within the same group
• dissimilar (or unrelated) to the objects in other groups
• Cluster analysis (or clustering, data segmentation, …)
• Finding similarities between data according to the characteristics found in the
data and grouping similar data objects into clusters
• Unsupervised learning: no predefined classes (i.e., learning by observations vs.
learning by examples: supervised)
• Typical applications
• As a stand-alone tool to get insight into data distribution
• As a preprocessing step for other algorithms
Applications of Cluster Analysis
• Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species
• Information retrieval: document clustering
• Land use: Identification of areas of similar land use in an earth observation database
• Marketing: Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs
• City-planning: Identifying groups of houses according to their house type, value, and
geographical location
• Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
• Climate: understanding earth climate, find patterns of atmospheric and ocean
• Economic Science: market research
• Owing to the huge amounts of data collected in databases, cluster analysis has recently become
a highly active topic in data mining research.
Quality: What Is Good Clustering?

• A good clustering method will produce high quality clusters


• high intra-class similarity: cohesive within clusters
• low inter-class similarity: distinctive between clusters
• The quality of a clustering method depends on
• the similarity measure used by the method
• its implementation, and
• Its ability to discover some or all of the hidden patterns
Measure the Quality of Clustering
• Dissimilarity/Similarity metric
• Similarity is expressed in terms of a distance function, typically metric: d(i, j)
• The definitions of distance functions are usually rather different for interval-
scaled, boolean, categorical, ordinal ratio, and vector variables
• Weights should be associated with different variables based on applications
and data semantics
• Quality of clustering:
• There is usually a separate “quality” function that measures the “goodness”
of a cluster.
• It is hard to define “similar enough” or “good enough”
• The answer is typically highly subjective
Requirements and Challenges
(of clustering Data mining)
• Scalability
• Clustering all the data instead of only on samples // Example: Web search scenarios
• Sample bias should not exist
• Ability to deal with different types of attributes
• Numerical, binary, categorical, ordinal, linked, and mixture of these
• complex data types such as graphs, sequences, images, and documents.
• Constraint-based clustering
• User may give inputs on constraints
• Use domain knowledge to determine input parameters//the clustering results may be sensitive to such
parameters
• Quality of clusters
• Interpretability and usability
• Others
• Discovery of clusters with arbitrary shape
• Ability to deal with noisy data //outlier. Erraneous,missing,unknown
• need clustering methods that are robust to noise.

• Incremental clustering and insensitivity to input order // D will be updated


• High dimensionality
Types of Data in Cluster Analysis
Data matrix (or object-by-variable structure):
This represents n objects, such as persons, with p variables (also
called measurements or attributes), such as age, height, weight, gender,
and so on.
• The structure is in the form of a relational table, or n-by-p matrix (n
objects p variables)

Data matrix Dissimilarity matrix


Types of Data in Cluster Analysis
Dissimilarity matrix (or object-by-object structure):
This stores a collection of proximities that are available for all pairs of n objects. It is often
represented by an n-by-n table.

• d(i, j) - difference or dissimilarity between objects i and j.

• d(i, j) - nonnegative number that is close to 0 when objects i and j are highly similar or “near” each
other, and becomes larger the more they differ.

• Since d(i, j)=d( j, i), and d(i, i)=0.

• Data matrix - two-mode matrix.(Deals with different entities)


• Dissimilarity matrix - one-mode matrix.(Deals with same entity)
Types of Data in Cluster
Analysis
1. Interval-Scaled Variables
• Interval-scaled variables are continuous measurements of a roughly linear scale. Eg: weight
and height

• The measurement unit used can affect the clustering analysis.

• For example, changing measurement units from meters to inches for height, or from kilograms
to pounds for weight, may lead to a very different clustering structure.

• To avoid dependence on the choice of measurement units, the data should be standardized.
• Standardizing measurements attempts to give all variables an equal weight.
• Data Transformation by Normalization
• The measurement unit used can affect the data analysis.
To help avoid dependence on the choice of measurement units, the
data should be normalized or standardized. This involves transforming the data to fall
within a smaller or common range
How can the data for a variable be
standardized?
• To standardize measurements, one choice is to convert the original
measurements to unit-less variables.
Distance measures
Distance Measure
• After standardization, or without standardization in certain
applications, the dissimilarity (or similarity) between the objects
described by interval-scaled variables is typically computed based on
the distance between each pair of objects. The most popular distance
measure is Euclidean distance

• Another well-known metric is Manhattan (or city block) distance,


defined as
Distance Measure
• Both the Euclidean distance and Manhattan distance satisfy the
following mathematic requirements of a distance function
Distance Measure
• Minkowski distance is a generalization of both Euclidean distance and
Manhattan distance. It is defined as

• p - positive integer. Such a distance is also called Lp norm.


• It represents the Manhattan distance when p = 1 (i.e., L1 norm)
• Euclidean distance when p = 2 (i.e., L2 norm).

• If each variable is assigned a weight according to its perceived importance,


the weighted Euclidean distance can be computed as
Distance Measure
Distance Measure
2. Binary Variables
• How to compute the dissimilarity between objects
described by either symmetric or asymmetric binary
variables.

• A binary variable has only two states: 0 or 1

• 0 – absent & 1- present

• Treating binary variables as if they are interval-scaled


can lead to misleading clustering results.

• If all binary variables are thought of as having the


same weight, we have the 2-by-2 contingency table
as
Binary Variables
• The total number of variables is p, where p = q+r+s+t.

• A binary variable is symmetric if both of its states are equally valuable and carry the same weight.

• There is no preference on which outcome should be coded as 0 or 1.

• Dissimilarity that is based on symmetric binary variables is called symmetric binary dissimilarity.

• A binary variable is asymmetric if the outcomes of the states are not equally important, such as
the positive and negative outcomes of a disease test.
• A binary variable contains two possible outcomes: 1 (positive/present) or 0
(negative/absent). If there is no preference for which outcome should be coded as 0
and which as 1, the binary variable is called symmetric.
• For example, the binary variable "is evergreen?" for a plant has the possible states
"loses leaves in winter" and "does not lose leaves in winter." Both are equally
valuable and carry the same weight when a proximity measure is computed.
• If the outcomes of a binary variable are not equally important, the binary variable is
called asymmetric.
• An example of such a variable is the presence or absence of a relatively rare
attribute, such as "is color-blind" for a human being.
• While you say that two people who are color-blind have something in common, you
cannot say that people who are not color-blind have something in common.
Jaccard Coefficient
• The number of negative matches, t, is considered unimportant and thus is
ignored in the computation, as

• we can measure the distance between two binary variables based on the
notion of similarity instead of dissimilarity.

• The coefficient sim(i, j) is called the Jaccard coefficient.


Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Dissimilarity between Binary
Variables
• Example
Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4
Jack M Y N P N N N
Mary F Y N P N P N
Jim M Y P N N N N

• gender is a symmetric attribute


• the remaining attributes are asymmetric binary
• let the values Y and P be set to 1, and the value N be set to 0
0 1
d ( jack , mary )  0.33
2  0 1
11
d ( jack , jim )  0.67
111
1 2
d ( jim , mary )  0.75
11 2
Han: Clustering 26
Object1,object 2=(1,1) or (p,p)=a
Object1,object 2=(1,0) or (p,n) =b
Object1,object 2=(0,1) or (n,p)= c
Object1,object 2=(0,0) or (n,n)=d
a= ? b=? c=?

0 1
d ( jack , mary )  0.33
2  0 1
11
d ( jack , jim )  0.67
111
1 2
d ( jim , mary )  0.75
11 2
How can we compute the dissimilarity between
objects described by categorical, ordinal, and
ratio-scaled variables?"
Categorical, Ordinal, and Ratio-Scaled
Variables
• A categorical variable is a generalization of the binary variable in that it can take on more than
two states.
• For example, map color is a categorical variable that may have, say, five states: red, yellow,
green, pink, and blue.
• Let the number of states of a categorical variable be M. The states can be denoted by letters,
symbols, or a set of integers, such as 1, 2, : : : , M.
• The dissimilarity between two objects i and j can be computed based on the ratio of
mismatches (Eqn 7.3)

• m - where m is the number of matches (i.e., the number of variables for which i and j are in
the same state), and p is the total number of variables.
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Suppose that we have the sample data of Table 7.3, except that only the object-identifier and
the variable (or attribute) test-1 are available, where test-1 is categorical. Let's compute the
dissimilarity matrix
3. Ordinal Variables
• An ordinal variable can be discrete or continuous. (we need to convert
ordinal into ratio scale)
• Order is important, e.g. rank (junior, senior)
• Can be treated like interval-scaled
• Replace an ordinal variable value by its rank:
• The distance can be calculated by treating ordinal as quantitative
• Map the range of each variable onto [0,1] by replacing i-th object in f-th
• Normalized Rank
variable by

• Compute the dissimilarity using methods for interval-scaled variables.


Dissimilarity between ordinal variables. Suppose that we have the sample data of Table 7.3,
except that this time only the object-identi¯er and the continuous ordinal variable, test-2, are available. There
are three states for test-2, namely fair, good, and excellent, that is Mf = 3. For step 1, if we replace each value
for test-2 by its rank, the four objects are assigned the ranks 3, 1, 2, and 3, respectively. Step 2 normalizes the
ranking by mapping rank 1 to 0.0, rank 2 to 0.5, and rank 3 to 1.0. For step 3, we can use, say, the Euclidean
distance (Equation 7.5), which results in the following dissimilarity matrix:
Dissimilarity between ordinal variables

• There are three states for test-2, namely fair, good, and excellent, that
is Mf =3.
• step 1, if we replace each value for test-2 by its rank, the four objects
are assigned the ranks 3, 1, 2, and 3, respectively.
• Step 2 normalizes the ranking by mapping rank 1 to 0.0, rank 2 to 0.5,
and rank 3 to 1.0.
• For step 3, we can use, say, the Euclidean distance, which results in
the following dissimilarity matrix:

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Ratio-Scaled Variables

• A ratio-scaled variable makes a positive measurement on a nonlinear


scale, such as an exponential scale.

• where A and B are positive constants, and t typically represents time.


Eg: growth of a bacteria population or the decay of a radioactive
element.

• There are three methods to handle ratio-scaled variables for computing


the dissimilarity between objects.
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Ratio-Scaled Variables
• Treat ratio-scaled variables like interval-scaled variables.

• Apply logarithmic transformation to a ratio-scaled variable f having value


for object i by using the formula = log(). The values can be treated as
interval-valued.

• Treat as continuous ordinal data and treat their ranks as interval-valued.

• The latter two methods are the most effective, although the choice of
method used may depend on the given application.
Dissimilarity between ratio-scaled variables

• The ratio-scaled variable, test-3, are available. Let’s try a logarithmic


transformation.
• Taking the log of test-3 results in the values 2.65, 1.34, 2.21, and 3.08
for the objects 1 to 4, respectively.

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
4. Variables of Mixed Types
• how can we compute the dissimilarity between objects of mixed
variable types?”
• One approach is to group each kind of variable together, performing a
separate cluster analysis for each variable type.
• A more preferable approach is to process all variable types together,
performing a single cluster analysis.
• Suppose that the data set contains p variables of mixed type. The
dissimilarity d(i, j) between objects i and j is defined as

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Variables of Mixed Types

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Variables of Mixed Types
• Apply logarithmic transformation to its values. Based on the
transformed values of 2.65, 1.34, 2.21, and 3.08 obtained for the
objects 1 to 4.
• =3.08 and =1.34.
• Then normalize the values in the dissimilarity matrix obtained in
Example 7.5 by dividing each one by (3.08 – 1.34) = 1.74.

• We can now use the dissimilarity matrices for the three variables in
our computation.

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Vector Objects
• There are several ways to define such a similarity function, s(x, y), to
compare two vectors x and y.
• One popular way is to define the similarity function as a cosine measure

• where is a transposition of vector x, ||x|| is the Euclidean norm of


vector ,||y|| is the Euclidean norm of vector y, and s is essentially the
cosine of the angle between vectors x and y.

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Session 2
Partitioning Method: Introduction
K-Means Algorithm
Partitioning Algorithms: Basic
Concept
• Partitioning method: Partitioning a database D of n objects into a set of k clusters, such that the
sum of squared distances is minimized (where ci is the centroid or medoid of cluster Ci)
E  ik1 pCi ( p  ci ) 2

• Given k, find a partition of k clusters that optimizes the chosen partitioning criterion
• Global optimal: exhaustively enumerate all partitions
• Heuristic methods: k-means and k-medoids algorithms
• k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the center of the
cluster
• k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each cluster is
represented by one of the objects in the cluster
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
The K-Means Clustering Method
• Given k, the k-means algorithm is implemented in four steps:
• Partition objects into k nonempty subsets
• Compute seed points as the centroids of the clusters of the current
partitioning (the centroid is the center, i.e., mean point, of the cluster)
• Assign each object to the cluster with the nearest seed point
• Go back to Step 2, stop when the assignment does not change
K-Means Clustering-

K-Means clustering is an unsupervised iterative clustering technique.


It partitions the given data set into k predefined distinct clusters.
A cluster is defined as a collection of data points exhibiting certain similarities.

Partitions each dataset such that


• Each data point belongs to a cluster with the nearest mean.
• Data points belonging to one cluster have high degree of similarity.
• Data points belonging to different clusters have high degree of dissimilarity.
Step-01:

Choose the number of clusters K.


Step-02:
Randomly select any K data points as cluster centers.
•Select cluster centers in such a way that they are as farther as possible from each other.

Step-03:
Calculate the distance between each data point and each cluster center.
•The distance may be calculated either by using given distance function or by using euclidean distance formula.

Step-04:
Assign each data point to some cluster.
•A data point is assigned to that cluster whose center is nearest to that data point.

Step-05:
Re-compute the center of newly formed clusters.
•The center of a cluster is computed by taking mean of all the data points contained in that cluster.

Step-06:
Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping criteria is met-
•Center of newly formed clusters do not change
•Data points remain present in the same cluster
•Maximum number of iterations are reached
• K-Means Clustering Algorithm offers the following advantages-

• Point-01:
• It is relatively efficient with time complexity O(nkt) where-
• n = number of instances
• k = number of clusters
• t = number of iterations
• Point-02:

• It often terminates at local optimum.


• Techniques such as Simulated Annealing or Genetic Algorithms may be used to find the global optimum.

• Disadvantages-

• K-Means Clustering Algorithm has the following disadvantages-


• It requires to specify the number of clusters (k) in advance.
• It can not handle noisy data and outliers.
• It is not suitable to identify clusters with non-convex shapes.
An Example of K-Means Clustering

K=2

Arbitrarily Update
partition the
objects cluster
into k centroids
groups
The initial data Loop if
set Reassign objects
needed
 Partition objects into k nonempty
subsets
 Repeat
 Compute centroid (i.e., mean Update
the
point) for each partition cluster
 Assign each object to the centroids
cluster of its nearest centroid
 Until no change
Comments on the K-Means Method
• Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n.
• Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k)) PAM// Partition around Mediods
• Clustering in LARge Applications.
• Comment: Often terminates at a local optimal.
• Weakness
• Applicable only to objects in a continuous n-dimensional space
• Using the k-modes method for categorical data
• In comparison, k-medoids can be applied to a wide range of data
• Need to specify k, the number of clusters, in advance (there are ways to automatically determine the best k
(see Hastie et al., 2009)
• Sensitive to noisy data and outliers
• Not suitable to discover clusters with non-convex shapes
What Is the Problem of the K-Means
Method?
• The k-means algorithm is sensitive to outliers !

• Since an object with an extremely large value may substantially distort the distribution of the
data

• K-Medoids: Instead of taking the mean value of the object in a cluster as a reference point,
medoids can be used, which is the most centrally located object in a cluster

10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
• Another variant to k-means is the k-modes method, which extends the k-means
paradigm to cluster categorical data by replacing the means of clusters with
modes, using new dissimilarity measures to deal with categorical objects and a
frequency-based method to update modes of clusters. The k-means and the k-
modes methods can be integrated to cluster data with mixed numeric and
categorical values.
• The EM (Expectation-Maximization) algorithm extends the k-means paradigm in
a different way. Whereas the k-means algorithm assigns each object to a cluster,
• In EM,each object is assigned to each cluster according to a weight representing
its probability of membership.
• In other words, there are no strict boundaries between clusters. Therefore, new
means are computed based on weighted measures.
How can we make the k-means algorithm more scalable?"
A recent approach to scaling the k-means algorithm is based on the idea of identifying three
kinds of regions in data:
1. regions that are compressible,
2. regions that must be maintained in main memory,
3. and regions that are discardable.
An object is discardable if its membership in a cluster is ascertained.
An object is compressible if it is not discardable but belongs to a tight subcluster.
A data structure known as a clustering feature is used to summarize objects that have
been discarded or compressed.
If an objectis neither discardable nor compressible, then it should be retained in main
memory.
To achieve scalability,
The iterative clustering algorithm only includes the clustering features of the compressible objects and the objects that
must be retained in main memory,
• thereby turning a secondary-memory-based algorithm into a main-memory- based algorithm.
• An alternative approach to scaling the k-means algorithm explores the microclustering idea,
Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
Ρ(a, b) = |x2 – x1| + |y2 – y1|

Use K-Means Algorithm to find the three cluster centers after the second iteration.
Calculating Distance Between A1(2, 10) and C2(5, 8)-

Calculating Distance Between A1(2, 10) and C3(1, 2)-


https://www.gatevidyalay.com/k-means-clustering-algorithm-
example
• Calculate center of the clusters
• For Cluster-01:
• We have only one point A1(2, 10) in Cluster-01.
So, cluster center remains the same.
• For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
• For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
• This is completion of Iteration-01.
• Calculating Distance Between A1(2, 10) and C3(1.5, 3.5)-

We calculate the distance of each point from each of the center of the three clusters.
The distance is calculated by using the given distance function.
• Calculate new cluster centers
• After second iteration, the center of the three clusters are-
• C1(3, 9.5)
• C2(6.5, 5.25)
• C3(1.5, 3.5)
Practice Problem
Session 3
K-Medoids
Hierarchical Method: Introduction
Quality of clustering,
Variation within clustering
Error E=
“How can we modify the k-means algorithm to diminish such sensitivity to
outliers?”
• Instead of taking the mean value of the objects in a cluster as a reference point,
• pick actual objects to represent the clusters, using one representative object
per cluster.
• Each remaining object is assigned to the cluster of which the representative
object is the most similar.
• The partitioning method is then performed based on the principle of
minimizing the sum of the dissimilarities between each object p and its
correspondingrepresentative object.
• That is, an absolute-error criterion is used, defined as
The K-Medoid Clustering Method
• K-Medoids Clustering: Find representative objects (medoids) in clusters

• PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one of the medoids by one of
the non-medoids if it improves the total distance of the resulting clustering
• PAM works effectively for small data sets, but does not scale well for large data sets (due to
the computational complexity)

• Efficiency improvement on PAM

• CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

• CLARANS (Ng & Han, 1994): Randomized re-sampling


• The Partitioning Around Medoids (PAM) algorithm is a popular realization of k-medoids
clustering.
• It tackles the problem in an iterative, greedy
way.
• Like the k-means algorithm, the initial representative objects (called seeds) are chosen
arbitrarily.
• We consider whether replacing a representative object by a nonrepresentative
object would improve the clustering quality.
• All the possible replacements are tried out. The iterative process of replacing
representative objects by other objects
• continues until the quality of the resulting clustering cannot be improved by any
replacement.
• Generalization of k-means
• But more robust to noise than k-means
• The Partitioning Around Medoids (PAM) algorithm is a popular realization of k-
medoids clustering. It tackles the problem in an iterative, greedy way.
• Like the k-means algorithm, the initial representative objects (called seeds) are
chosen arbitrarily.
• It considers whether replacing a representative object by a nonrepresentative
object would improve the clustering quality. All the possible replacements
are tried out.
• The iterative process of replacing representative objects by other objects continues
until the quality of the resulting clustering cannot be improved by any replacement.
• This quality is measured by a cost function of the average dissimilarity between
an object and the representative object of its cluster
1. Specifically, let {o1, …. ,ok} be the current set of representative objects (i.e.,
medoids).
2. To determine whether a nonrepresentative object, denoted by orandom, is a good
replacement for a current medoid oj .
3. Calculate the distance from every object p to the closest object in the set
f{o1, …..,oj}􀀀1,orandom,ojC1, : : : ,okg, and
use the distance to update the cost function.
• The reassignments of objects to new medoid are simple.
PAM: A Typical K-Medoids Algorithm
Total Cost = 20
10 10 10

9 9 9

8 8 8

7 7 7

6
Arbitrar 6
Assign 6

5
y 5
each 5

4 choose 4 remaini 4

3
k object 3
ng 3

2
as 2
object 2

initial to
1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10
medoid 0 1 2 3 4 5 6 7 8 9 10
nearest 0 1 2 3 4 5 6 7 8 9 10

s medoid
K=2 Total Cost = 26
s Randomly select a
nonmedoid
object,Oramdom
10 10

Do loop 9

8
Compute
9

8
Swapping total cost
Until no
7 7

O and 6
of 6

change Oramdom 5
swapping
5

4 4

If quality is 3 3

2 2
improved. 1 1

0 0
0Source
31 4 :5
2 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
• More robust
• Time complexity
• “How can we scale up the k-medoids method?”
• CLARA( Random Samples)
• In some situations we may want to partition our data into groups at
different levels such as in a hierarchy.
• A hierarchical clustering method works by grouping data objects into
a hierarchy or “tree” of clusters.
• Representing data objects in the form of a hierarchy is useful for data
summarization and visualization.
• Handwriting recognition, hierarchy of species(animals,birds
etc),employee,gaming (chess).
• Agglomerative versus divisive hierarchical clustering,
• which organize objects into a hierarchy using a bottom-up or top-
down strategy, respectively.
• Agglomerative methods start with individual objects as clusters,
which are iteratively merged to form larger clusters.
• Conversely, divisive methods initially let all the given objectsform one
cluster, which they iteratively split into smaller clusters.
• Hierarchical clustering methods can encounter difficulties regarding
the selection of merge or split points. Such a decision is critical,
• merge or split decisions, if not well chosen, may lead to low-quality
clusters.
Moreover, the methods do not scale well because each decision of
merge or split needs to examine and evaluate many objects or clusters.
Solution: can be combined with Multiphase clustering
Hierarchical Clustering

• Use distance matrix as clustering criteria. This method does


not require the number of clusters k as an input, but needs a
termination condition

Step 0 Step 1 Step 2 Step 3 Step 4


agglomerative
(AGNES-agglometriveNesting)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA) DIVisive ANAlysis
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Session 4
Agglomerative vs. Divisive method
Distance measures in Algorithmic methods
BIRCH technique

DBSCAN technique

STING technique

CLIQUE technique

Evaluation of clustering techniques


AGNES (Agglomerative Nesting)

• Introduced in Kaufmann and Rousseeuw (1990)


• Implemented in statistical packages, e.g., Splus
• Use the single-link method and the dissimilarity matrix
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
• Decompose data objects into a several levels of nested partitioning (tree of
clusters), called a dendrogram.

• A clustering of the data objects is obtained by cutting the dendrogram at


the desired level, then each connected component forms a cluster.
• A dendrogram is a diagram that shows the hierarchical relationship
between objects. It is most commonly created as an output from
hierarchical clustering. The main use of a dendrogram is to work out
the best way to allocate objects to clusters
Dendrogram: Shows How Clusters are
Merged

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
DIANA (Divisive Analysis)

• Introduced in Kaufmann and Rousseeuw (1990)


• Implemented in statistical analysis packages, e.g., Splus
• Inverse order of AGNES
• Eventually each node forms a cluster on its own

10 10
10

9 9
9
8 8
8

7 7
7
6 6
6

5 5
5
4 4
4

3 3
3
2 2
2

1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Distance Between Clusters
• Single Link: smallest distance between points
• Complete Link: largest distance between points
• Average Link: average distance between points
• Centroid: distance between centroids
Distance between Clusters

• Single link: smallest distance between an element in one cluster and an element
X
in the other, i.e., dist(Ki, Kj) = min(tip, tjq)// updating distance matrix.

• Complete link: largest distance between an element in one cluster and an


element in the other, i.e., dist(K i, Kj) = max(tip, tjq)

• Average: avg distance between an element in one cluster and an element in the
other, i.e., dist(Ki, Kj) = avg(tip, tjq)
X

• Centroid: distance between the centroids of two clusters, i.e., dist(K i, Kj) = dist(Ci,
Cj)

• Medoid: distance between the medoids of two clusters, i.e., dist(K i, Kj) = dist(Mi,
Mj)
• Medoid: a chosen, centrally located object
Source : in the cluster
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Centroid, Radius and Diameter of a Cluster (for
numerical data sets)
• Centroid: the “middle” of a cluster iN1(t )
Cm  N ip

• Radius: square root of average distance from any point of the


 N (t  c ) 2
cluster to its centroid ip m
Rm  i 1
N

• Diameter: square root of average mean squared distance


between all pairs of points in the cluster
 N  N (t  t ) 2
Dm  i 1 i 1 ip iq
N ( N  1)
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Hierarchical Clustering
• Clusters are created in levels actually creating sets of clusters at each
level.
• Agglomerative
• Initially each item in its own cluster
• Iteratively clusters are merged together
• Bottom Up
• Divisive
• Initially all items in one cluster
• Large clusters are successively divided
• Top Down
Hierarchical Algorithms
• Single Link
• MST Single Link
• Complete Link
• Average Link
Dendrogram
• Dendrogram: a tree data
structure which illustrates
hierarchical clustering
techniques.
• Each level shows clusters for that
level.
• Leaf – individual clusters
• Root – one cluster
• A cluster at level i is the union of
its children clusters at level i+1.
Levels of Clustering
Agglomerative Example
A B C D E
A B
A 0 1 2 2 3
B 1 0 2 4 3
C 2 2 0 1 5 E C
D 2 4 1 0 3
E 3 3 5 3 0
D

Threshold of

12 34 5

A B C D E
Problem: For the one dimensional data set {7,10,20,28,35}, perform
hierarchical clustering and plot the dendogram to visualize it.

Solution : First, let’s the visualize the data.


Observing the plot above, we can intuitively conclude that:
1.The first two points (7 and 10) are close to each other and should be
in the same cluster
2.Also, the last two points (28 and 35) are close to each other and
should be in the same cluster
3.Cluster of the center point (20) is not easy to conclude
Let’s solve the problem by hand using both the types of agglomerative
hierarchical clustering :
• Single Linkage : In single link hierarchical clustering, we merge in each
step the two clusters, whose two closest members have the smallest
distance.

Using single linkage two clusters are formed :


Cluster 1 : (7,10)
Cluster 2 : (20,28,35)
Complete Linkage : In complete link hierarchical
clustering, we merge in the members of the clusters in each
step, which provide the smallest maximum pairwise distance.

Using complete linkage two clusters are


formed :
Cluster 1 : (7,10,20)
Cluster 2 : (28,35)
• https://online.stat.psu.edu/stat555/node/86
Agglomerative Algorithm
Single Link
• View all items with links (distances) between them.
• Finds maximal connected components in this graph.
• Two clusters are merged if there is at least one edge which
connects them.
• Uses threshold distances at each level.
• Could be agglomerative or divisive.
MST Single Link Algorithm
Single Link Clustering
AGNES (Agglomerative
Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical analysis packages, e.g., Splus
• Use the Single-Link method and the dissimilarity matrix.
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
DIANA (Divisive Analysis)

• Introduced in Kaufmann and Rousseeuw (1990)


• Implemented in statistical analysis packages, e.g., Splus
• Inverse order of AGNES
• Eventually each node forms a cluster on its own

10 10
10

9 9
9
8 8
8

7 7
7
6 6
6

5 5
5
4 4
4

3 3
3
2 2
2

1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Extensions to Hierarchical Clustering

• Major weakness of agglomerative clustering methods

• Can never undo what was done previously

• Do not scale well: time complexity of at least O(n2), where n is


the number of total objects

• Integration of hierarchical & distance-based clustering

• BIRCH (1996): uses CF-tree and incrementally adjusts the


quality of sub-clusters
• CHAMELEON (1999): hierarchical clustering using dynamic
modeling
Session 5
BIRCH

Multiphase Hierarchical Clustering


Using Clustering Feature Trees
Balanced Iterative Reducing and Clustering using
Hierarchies (BIRCH) is designed for

• clustering a large amount of numeric data by integrating hierarchical


clustering (at the initial microclustering stage) and other clustering
methods such as iterative partitioning (at the later macroclustering
stage).
• It overcomes the two difficulties in agglomerative clustering
methods:
• (1) scalability and (2) the inability to undo what was done in
theprevious step.
• BIRCH uses the notions of clustering feature to summarize a cluster,
and
• Clustering feature tree (CF-tree) to represent a cluster hierarchy.
• Given a limited amount of main memory, an important consideration
in BIRCH is to minimize the time required for input/output (I/O).
• BIRCH applies a multiphase clustering technique,
• A single scan of the data set yields a basic, good clustering, and
• one or more additional scans can optionally be used to further improve the
quality
BIRCH (Balanced Iterative Reducing
and Clustering Using Hierarchies)
• Zhang, Ramakrishnan & Livny, SIGMOD’96
• Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for
multiphase clustering
• Phase 1: BIRCH scans the database to build an initial in-memory
CF-tree, whichcan be viewed as a multilevel compression of the
data that tries to preserve the data’s
• inherent clustering structure.
• Phase 2: BIRCH applies a (selected) clustering algorithm to cluster
the leaf nodes of
• the CF-tree, which removes sparse clusters as outliers and groups
dense clusters into larger ones.

Scales linearly: finds a good clustering with a single scan and
.

improves the quality with a few additional scans


• Weakness: handles only numeric data, and sensitive to the order of the data record
Clustering Feature Vector in
BIRCH
Clustering Feature (CF): CF = (N, LS, SS)
N: Number of data points
N
LS: linear sum of N points:  Xi
i 1

SS: square sum of N points CF = (5, (16,30),(54,190))

N 2 10
(3,4)
 Xi 9

8
(2,6)
i 1 7

5
(4,5)
4

3
(4,7)
2

1
(3,8)
0
0 1 2 3 4 5 6 7 8 9 10

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
CF-Tree in BIRCH
• Clustering feature:
• Summary of the statistics for a given subcluster: the 0-th, 1st, and 2nd moments of the
subcluster from the statistical point of view
• Registers crucial measurements for computing cluster and utilizes storage efficiently
• A CF tree is a height-balanced tree that stores the clustering features for a hierarchical
clustering
• A nonleaf node in a tree has descendants or “children”
• The nonleaf nodes store sums of the CFs of their children
• A CF tree has two parameters
• Branching factor: max # of children
• Threshold: max diameter of sub-clusters stored at the leaf nodes
• The branching factor specifies
• the maximum number of children per nonleaf node.
• The threshold parameter specifies
the maximum diameter of subclusters stored at the leaf nodes of the tree.
These two parameters implicitly control the resulting tree’s size.
The CF Tree Structure
Root

B=7 CF1 CF2 CF3 CF6


L=6 child1 child2 child3 child6

Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5

Leaf node Leaf node


prev CF1 CF2 CF6 next prev CF1 CF2 CF4 next

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
The Birch Algorithm
• Cluster Diameter 1 2
 (x  x )
n( n  1) i j

• For each point in the input


• Find closest leaf entry
• Add point to leaf entry and update CF
• If entry diameter > max_diameter, then split leaf, and possibly parents
• Algorithm is O(n)
• Concerns
• Sensitive to insertion order of data points
• Since we fix the size of leaf nodes, so clusters may not be so natural
• Clusters tend to be spherical given the radius and diameter measures

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Partitioning and hierarchical
methods are designed to find
spherical-shaped clusters.
Session 6
DBSCAN
Density-based clusters are dense areas in the data
space separated from each other by sparser areas.
Given such data, portioning and hierarchical would
likely inaccurately identify convex regions, where
noise or outliers are included in the clusters.

• To find clusters of arbitrary shape, alternatively, we may model clusters as


dense
• regions in the data space, separated by sparse regions.
This is the main strategy behind density-based clustering methods, which can
discover clusters of nonspherical shape.
• “How can we find dense regions in density-based clustering?”
• The density of an object o can be measured by the number of objects
close to o.
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
finds core objects, that is, objects that have dense neighborhoods.
• It connects core objects and their neighborhoods to form dense
regions as clusters.
• “How does DBSCAN quantify the neighborhood of an object?”
Density-Based Clustering
Methods
• Clustering based on density (local cluster criterion), such as
density-connected points
• Major features:
• Discover clusters of arbitrary shape
• Handle noise
• One scan
• Need density parameters as termination condition
• Several interesting studies:
• DBSCAN: Ester, et al. (KDD’96)
• OPTICS: Ankerst, et al (SIGMOD’99).
• DENCLUE: Hinneburg & D. Keim (KDD’98)
• CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)
Density-Based Clustering: Basic
Concepts
• Two parameters:
• Eps: Maximum radius of the neighbourhood
• MinPts: Minimum number of points in an Eps-
neighbourhood of that point
• NEps(p): {q belongs to D | dist(p,q) ≤ Eps}
• Directly density-reachable: A point p is directly density-
reachable from a point q w.r.t. Eps, MinPts if
• p belongs to NEps(q)
• core point condition: p MinPts = 5
|NEps (q)| ≥ MinPts Eps = 1 cm
q
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Density-Reachable and Density-Connected

• Density-reachable:
• A point p is density-reachable from a
p
point q w.r.t. Eps, MinPts if there is a
chain of points p1, …, pn, p1 = q, pn = p p1
q
such that pi+1 is directly density-
reachable from pi
• Density-connected
• A point p is density-connected to a
point q w.r.t. Eps, MinPts if there is a p q
point o such that both, p and q are
density-reachable from o w.r.t. Eps o
and MinPts
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
DBSCAN: Density-Based Spatial
Clustering of Applications with Noise
• Relies on a density-based notion of cluster: A cluster is defined as
a maximal set of density-connected points
• Discovers clusters of arbitrary shape in spatial databases with
noise

Outlier

Border
Eps = 1cm
Core MinPts = 5

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
DBSCAN: The Algorithm
• Arbitrary select a point p
• Retrieve all points density-reachable from p w.r.t. Eps and
MinPts
• If p is a core point, a cluster is formed
• If p is a border point, no points are density-reachable from p
and DBSCAN visits the next point of the database
• Continue the process until all of the points have been
processed
DBSCAN: Sensitive to Parameters

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Session 7
STING
Grid-Based Clustering Method
• Using multi-resolution grid data structure
• Several interesting methods
• STING (a STatistical INformation Grid approach) by Wang,
Yang and Muntz (1997)
• WaveCluster by Sheikholeslami, Chatterjee, and Zhang
(VLDB’98)
• A multi-resolution clustering approach using wavelet method
• CLIQUE: Agrawal, et al. (SIGMOD’98)
• Both grid-based and subspace clustering
STING: A Statistical Information Grid
Approach
• Wang, Yang and Muntz (VLDB’97)
• The spatial area is divided into rectangular cells
• There are several levels of cells corresponding to different levels of
resolution
1st layer

(i-1) st layer

i-th layer

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
The STING Clustering Method
• Each cell at a high level is partitioned into a number of smaller
cells in the next lower level
• Statistical info of each cell is calculated and stored beforehand
and is used to answer queries
• Parameters of higher level cells can be easily calculated from
parameters of lower level cell
• count, mean, s, min, max
• type of distribution—normal, uniform, etc.
• Use a top-down approach to answer spatial data queries
• Start from a pre-selected layer—typically with a small number of
cells
• For each cell in the current level compute the confidence
interval
STING Algorithm and Its Analysis

• Remove the irrelevant cells from further consideration


• When finish examining the current layer, proceed to the next
lower level
• Repeat this process until the bottom layer is reached
• Advantages:
• Query-independent, easy to parallelize, incremental update
• O(K), where K is the number of grid cells at the lowest level
• Disadvantages:
• All the cluster boundaries are either horizontal or vertical,
and no diagonal boundary is detected
Session 8
CLIQUE
CLIQUE (Clustering In QUEst)

• Agrawal, Gehrke, Gunopulos, Raghavan (SIGMOD’98)


• Automatically identifying subspaces of a high dimensional data space that allow
better clustering than original space
• CLIQUE can be considered as both density-based and grid-based
• It partitions each dimension into the same number of equal length interval
• It partitions an m-dimensional data space into non-overlapping rectangular
units
• A unit is dense if the fraction of total data points contained in the unit
exceeds the input model parameter
• A cluster is a maximal set of connected dense units within a subspace
CLIQUE: The Major Steps

• Partition the data space and find the number of points that lie
inside each cell of the partition.
• Identify the subspaces that contain clusters using the Apriori
principle
• Identify clusters
• Determine dense units in all subspaces of interests
• Determine connected dense units in all subspaces of interests.
• Generate minimal description for the clusters
• Determine maximal regions that cover a cluster of connected
dense units for each cluster
• Determination of minimal cover for each cluster
Vacation
(10,000)

(week)
Salary

0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
age age
20 30 40 50 60 20 30 40 50 60

=3

Vacation
y
l ar 30 50
Sa age

Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Strength and Weakness of
CLIQUE
• Strength
• automatically finds subspaces of the highest dimensionality
such that high density clusters exist in those subspaces
• insensitive to the order of records in input and does not
presume some canonical data distribution
• scales linearly with the size of input and has good scalability
as the number of dimensions in the data increases
• Weakness
• The accuracy of the clustering result may be degraded at the
expense of simplicity of the method
Session 9
Evaluation of Clustering Techniques
Assessing Clustering Tendency

• Assess if non-random structure exists in the data by measuring the probability that the data is
generated by a uniform data distribution
• Test spatial randomness by statistic test: Hopkins Static
• Given a dataset D regarded as a sample of a random variable o, determine how far away o is
from being uniformly distributed in the data space
• Sample n points, p1, …, pn, uniformly from D. For each pi, find its nearest neighbor in D: xi =
min{dist (pi, v)} where v in D
• Sample n points, q1, …, qn, uniformly from D. For each qi, find its nearest neighbor in D – {qi}:
yi = min{dist (qi, v)} where v in D and v ≠ qi
• Calculate the Hopkins Statistic:

• If D is uniformly distributed, ∑ xi and ∑ yi will be close to each other and H is close to 0.5. If D
is highly skewed, H is close to 0
Source :
http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.p
Determine the Number of Clusters
• Empirical method
• # of clusters ≈√n/2 for a dataset of n points
• Elbow method
• Use the turning point in the curve of sum of within cluster variance w.r.t
the # of clusters
• Cross validation method
• Divide a given data set into m parts
• Use m – 1 parts to obtain a clustering model
• Use the remaining part to test the quality of the clustering
• E.g., For each point in the test set, find the closest centroid, and use
the sum of squared distance between all points in the test set and the
closest centroids to measure how well the model fits the test set
• For any k > 0, repeat it m times, compare the overall quality measure w.r.t.
different k’s, and find # of clusters that fits the data the best
Measuring Clustering Quality
• Two methods: extrinsic vs. intrinsic
• Extrinsic: supervised, i.e., the ground truth is available
• Compare a clustering against the ground truth using certain
clustering quality measure
• Ex. BCubed precision and recall metrics
• Intrinsic: unsupervised, i.e., the ground truth is unavailable
• Evaluate the goodness of a clustering by considering how well
the clusters are separated, and how compact the clusters are
• Ex. Silhouette coefficient
Measuring Clustering Quality: Extrinsic
Methods

• Clustering quality measure: Q(C, Cg), for a clustering C given the ground truth Cg.
• Q is good if it satisfies the following 4 essential criteria
• Cluster homogeneity: the purer, the better
• Cluster completeness: should assign objects belong to the same category in
the ground truth to the same cluster
• Rag bag: putting a heterogeneous object into a pure cluster should be
penalized more than putting it into a rag bag (i.e., “miscellaneous” or “other”
category)
• Small cluster preservation: splitting a small category into pieces is more
harmful than splitting a large category into pieces
References
• Jiawei Han and Micheline Kamber, “ Data Mining: Concepts and
Techniques”, 3rd Edition, Morgan Kauffman Publishers, 2011.
• http://ccs1.hnue.edu.vn/hungtd/DM2012/DataMining_BOOK.pdf

You might also like