0% found this document useful (0 votes)
30 views

Cluster-Analysis

Data Mining IOE - Chapter 5 Notes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Cluster-Analysis

Data Mining IOE - Chapter 5 Notes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

5.

Cluster Analysis(9 Hrs)

Pukar Karki
Assistant Professor
[email protected]
Contents
1. Basics and Algorithms
2. K-means Clustering https://www.youtube.com/watch?v=5FpsGnkbEpM gATE SMASHERS

3. Hierarchical Clustering GATE SMASHERS

4. DBSCAN Clustering 5 minutes engineering

5. Issues : Evaluation, Scalability, Comparison

Data Mining(CT725) 2
Contents
1. Basics and Algorithms
2. K-means Clustering
3. Hierarchical Clustering
4. DBSCAN Clustering
5. Issues : Evaluation, Scalability, Comparison

Data Mining(CT725) 3
What is Cluster Analysis?
 Cluster: A collection of data objects
 similar (or related) to one another within the same group
 dissimilar (or unrelated) to the objects in other groups
 Cluster analysis (or clustering, data segmentation, …)
 Finding similarities between data according to the characteristics
found in the data and grouping similar data objects into clusters
 Unsupervised learning: no predefined classes (i.e., learning by
observations vs. learning by examples: supervised)
 Typical applications
 As a stand-alone tool to get insight into data distribution
 As a preprocessing step for other algorithms 4
Clustering for Data Understanding and Applications
 Biology: taxonomy of living things: kingdom, phylum, class, order, family,
genus and species
 Information retrieval: document clustering
 Land use: Identification of areas of similar land use in an earth observation
database
 Marketing: Help marketers discover distinct groups in their customer bases,
and then use this knowledge to develop targeted marketing programs
 City-planning: Identifying groups of houses according to their house type,
value, and geographical location
 Earth-quake studies: Observed earth quake epicenters should be clustered
along continent faults
 Climate: understanding earth climate, find patterns of atmospheric and ocean
 Economic Science: market resarch 5
Clustering as a Preprocessing Tool (Utility)
 Summarization:
 Preprocessing for regression, PCA, classification, and association
analysis
 Compression:
 Image processing: vector quantization
 Finding K-nearest Neighbors
 Localizing search to one or a small number of clusters
 Outlier detection
 Outliers are often viewed as those “far away” from any cluster
6
Vector Quantization
 Left: original image; middle: using 23.9% of the storage; right: using 6.25% of
the storage

K-means is often called “Lloyd’s algorithm” in computer science and engineering, and is
used in vector quantization for compression

Basic idea: run K-means clustering on 4 × 4 squares of pixels in an image, and keep only
the clusters and labels. Smaller K means more compression
7
Quality: What Is Good Clustering?
 A good clustering method will produce high quality clusters
 high intra-class similarity: cohesive within clusters
 low inter-class similarity: distinctive between clusters
 The quality of a clustering method depends on
 the similarity measure used by the method
 its implementation, and
 Its ability to discover some or all of the hidden patterns
8
Measure the Quality of Clustering
 Dissimilarity/Similarity metric
 Similarity is expressed in terms of a distance function, typically metric:
d(i, j)
 The definitions of distance functions are usually rather different for
interval-scaled, boolean, categorical, ordinal ratio, and vector variables
 Weights should be associated with different variables based on
applications and data semantics
 Quality of clustering:
 There is usually a separate “quality” function that measures the
“goodness” of a cluster.
 It is hard to define “similar enough” or “good enough”
 The answer is typically highly subjective
9
Considerations for Cluster Analysis
 Partitioning criteria
 Single level vs. hierarchical partitioning (often, multi-level hierarchical
partitioning is desirable)
 Separation of clusters
 Exclusive (e.g., one customer belongs to only one region) vs. non-
exclusive (e.g., one document may belong to more than one class)
 Similarity measure
 Distance-based (e.g., Euclidian, road network, vector) vs. connectivity-
based (e.g., density or contiguity)
 Clustering space
 Full space (often when low dimensional) vs. subspaces (often in high-
dimensional clustering)
10
Requirements and Challenges
 Scalability
 Clustering all the data instead of only on samples
 Ability to deal with different types of attributes
 Numerical, binary, categorical, ordinal, linked, and mixture of these
 Constraint-based clustering
 User may give inputs on constraints
 Use domain knowledge to determine input parameters
 Interpretability and usability
 Others
 Discovery of clusters with arbitrary shape
 Ability to deal with noisy data
 Incremental clustering and insensitivity to input order
 High dimensionality 11
Major Clustering Approaches
 Partitioning approach:
 Construct various partitions and then evaluate them by some criterion, e.g., minimizing
the sum of square errors
 Typical methods: k-means, k-medoids, CLARANS
 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or objects) using some criterion
 Typical methods: Diana, Agnes, BIRCH, CAMELEON
 Density-based approach:
 Based on connectivity and density functions
 Typical methods: DBSCAN, OPTICS, DenClue
 Grid-based approach:
 based on a multiple-level granularity structure
 Typical methods: STING, WaveCluster, CLIQUE
12
Major Clustering Approaches (II)

13
Contents
1. Basics and Algorithms
2. K-means Clustering
3. Hierarchical Clustering
4. DBSCAN Clustering
5. Issues : Evaluation, Scalability, Comparison

Data Mining(CT725) 14
Partitioning Algorithms: Basic Concept
 Partitioning method: Partitioning a database D of n objects into a set of k
clusters, such that the sum of squared distances is minimized (where ci is the
centroid or medoid of cluster Ci)

 Given k, find a partition of k clusters that optimizes the chosen partitioning


criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
 k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster 15
The K-Means Clustering Method

16
An Example of K-Means Clustering
Comments on the K-Means Method
 Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations.
Normally, k, t << n.
 Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
 Comment: Often terminates at a local optimal.
 Weakness
 Applicable only to objects in a continuous n-dimensional space
 Using the k-modes method for categorical data
 In comparison, k-medoids can be applied to a wide range of data
 Need to specify k, the number of clusters, in advance (there are ways to
automatically determine the best k (see Hastie et al., 2009)
 Sensitive to noisy data and outliers
 Not suitable to discover clusters with non-convex shapes 18
Variations of the K-Means Method
 Most of the variants of the k-means which differ in
 Selection of the initial k means
 Dissimilarity calculations
 Strategies to calculate cluster means
 Handling categorical data: k-modes
 Replacing means of clusters with modes
 Using new dissimilarity measures to deal with categorical objects
 Using a frequency-based method to update modes of clusters
 A mixture of categorical and numerical data: k-prototype method 19
What Is the Problem of the K-Means Method?
 The k-means algorithm is sensitive to outliers !
 Since an object with an extremely large value may substantially distort the
distribution of the data
 K-Medoids: Instead of taking the mean value of the object in a cluster as a
reference point, medoids can be used, which is the most centrally located
object in a cluster

20
PAM: A Typical K-Medoids Algorithm
Total Cost = 20
10 10 10

9 9 9

8 8 8

7 7 7

6 Arbitrary 6 Assign 6

5
choose k 5
each 5

4 4

object as remaining
4

3 3 3

2 initial 2
object to 2

medoids nearest
1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
medoids 0 1 2 3 4 5 6 7 8 9 10

K=2
Randomly select a
Total Cost = 26 nonmedoid object,Oramdom
10 10

Do loop 9

8 Compute
9

Swapping O 7
total cost of 7

Until no change and Oramdom 6

5 swapping
6

4 4

If quality is 3 3

2 2
improved. 1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

21
The K-Medoid Clustering Method

22
The K-Medoid Clustering Method
 K-Medoids Clustering: Find representative objects (medoids) in clusters
 PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)
 Starts from an initial set of medoids and iteratively replaces one of the medoids
by one of the non-medoids if it improves the total distance of the resulting
clustering
 PAM works effectively for small data sets, but does not scale well for large data
sets (due to the computational complexity)
 Efficiency improvement on PAM
 CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples
 CLARANS (Ng & Han, 1994): Randomized re-sampling
23
Contents
1. Basics and Algorithms
2. K-means Clustering
3. Hierarchical Clustering
4. DBSCAN Clustering
5. Issues : Evaluation, Scalability, Comparison

Data Mining(CT725) 24
Hierarchical Clustering
 There are two basic approaches for generating a hierarchical
clustering:
Agglomerative: Start with the points as individual clusters and, at each
step, merge the closest pair of clusters. This requires defining a notion
of cluster proximity.
Divisive: Start with one, all-inclusive cluster and, at each step, split a
cluster until only singleton clusters of individual points remain. In this
case, we need to decide which cluster to split at each step and how to
do the splitting.

25
Hierarchical Clustering

26
Hierarchical Clustering
 Use distance matrix as clustering criteria.
 This method does not require the number of clusters k as an input, but
needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
a (AGNES)
ab
b abcde
c
cde
d de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0
(DIANA) 27
Basic Agglomerative Hierarchical Clustering Algorithm

28
Steps 1 and 2

✔ Start with clusters of individual points and a proximity


matrix p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
.
.
.
Proximity Matrix

...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate Situation
✔ After some merging steps, we have some clusters
C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
Step 4
✔ We want to merge the two closest clusters (C2 and C5) and update
the proximity matrix. C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
Step 5
✔ The question is “How do we update the proximity matrix?”
C2 U
C5
C1 C3 C4
C1 ?

C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?

C1
Proximity Matrix

C2 U C5

...
p1 p2 p3 p4 p9 p10 p11 p12
How to Define Inter-Cluster Distance
p1 p2 p3 p4 p5 ...
p1
Similarity?
p2
p3

p4
p5
✔ MIN .
✔ MAX .
✔ Group Average . Proximity Matrix
✔ Distance Between Centroids
✔ Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
✔ MIN .
✔ MAX .
✔ Group Average . Proximity Matrix
✔ Distance Between Centroids
✔ Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
✔ MIN .
✔ MAX .
✔ Group Average . Proximity Matrix
✔ Distance Between Centroids
✔ Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
✔ MIN .
✔ MAX .
✔ Group Average . Proximity Matrix
✔ Distance Between Centroids
✔ Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
  p2
p3

p4
p5
✔ MIN .
✔ MAX .
✔ Group Average . Proximity Matrix
✔ Distance Between Centroids
✔ Other methods driven by an objective
function
– Ward’s Method uses squared error
Distance between Clusters
Distance between Clusters
Example

Set of six two-dimensional points. (X,Y) -coordinates of six points.


Example

Euclidean distance matrix for six points.


Example: Single Link or MIN
 For the single link or MIN version of hierarchical clustering, the
proximity of two clusters is defined as the minimum of the
distance (maximum of the similarity) between any two points in
the two different clusters.
 The single link technique is good at handling non-elliptical
shapes, but is sensitive to noise and outliers.
Hierarchical Clustering: MIN

5
1
3
5 0.2

2 1 0.15

2 3 6 0.1

0.05

4
4 0
3 6 2 5 4 1

Nested Clusters Dendrogram


Strength of MIN

Original Points Six Clusters

• Can handle non-elliptical shapes


Limitations of MIN

Two Clusters

Original Points

• Sensitive to noise
Three Clusters
Example: Complete Link or MAX
 For the complete link or MAX version of hierarchical clustering,
the proximity of two clusters is defined as the maximum of the
distance (minimum of the similarity) between any two points in
the two different clusters.
 Complete link is less susceptible to noise and outliers, but it can
break large clusters and it favors globular shapes.
Hierarchical Clustering: MAX

4 1
2 5 0.4

0.35
5
2
0.3

0.25

3 6
0.2

3 0.15

1 0.1

4
0.05

0
3 6 4 1 2 5

Nested Clusters Dendrogram


Example: Complete Link or MAX
Strength of MAX

Original Points Two Clusters

• Less susceptible to noise


Limitations of MAX

Original Points Two Clusters

• Tends to break large clusters


• Biased towards globular clusters
Example: Group Average
 For the group average version of hierarchical clustering, the proximity
of two clusters is defined as the average pairwise proximity among all
pairs of points in the different clusters.
 For group average, the cluster proximity proximity (Ci,Cj) of clusters
Ci and Cj, which are of size mi and mj, respectively, is expressed by
the following equation:
Hierarchical Clustering: Group Average

5
4 1
0.25
2
5 0.2

2 0.15

3 6 0.1

1 0.05

4 0
3 3 6 4 1 2 5

Nested Clusters Dendrogram


Example: Group Average

Because dist({3, 6, 4}, {2, 5}) is smaller than dist({3, 6, 4}, {1}) and dist({2, 5}, {1}),
clusters {3, 6, 4} and {2, 5} are merged at the fourth stage.
Hierarchical Clustering: Group Average

✔ Compromise between Single and Complete Link

✔ Strengths
– Less susceptible to noise

✔ Limitations
– Biased towards globular clusters
Cluster Similarity: Ward’s Method
✔ Similarity of two clusters is based on the increase in squared
error when two clusters are merged
– Similar to group average if distance between points is distance
squared

✔ Less susceptible to noise

✔ Biased towards globular clusters

✔ Hierarchical analogue of K-means


– Can be used to initialize K-means
Hierarchical Clustering: Comparison

5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3 1
4 4
4

5
1 5 4 1
2 2
5 Ward’s Method 5
2 2
3 6 Group Average 3 6
3 1 1
4
4 4
3
Hierarchical Clustering: Time and Space requirements

✔ O(N2) space since it uses the proximity matrix.


– N is the number of points.

✔ O(N3) time in many cases


– There are N steps and at each step the size, N2, proximity
matrix must be updated and searched
– Complexity can be reduced to O(N2 log(N) ) time with some
cleverness
Hierarchical Clustering: Problems and Limitations

✔ Once a decision is made to combine two clusters, it cannot


be undone

✔ No global objective function is directly minimized

✔ Different schemes have problems with one or more of the


following:
– Sensitivity to noise
– Difficulty handling clusters of different sizes and non-globular
shapes
– Breaking large clusters
AGNES (Agglomerative Nesting)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical packages, e.g., Splus
 Use the single-link method and the dissimilarity matrix
 Merge nodes that have the least dissimilarity
 Go on in a non-descending fashion
 Eventually all nodes belong to the same cluster

59
DIANA (Divisive Analysis)
 Introduced in Kaufmann and Rousseeuw (1990)
 Implemented in statistical analysis packages, e.g., Splus
 Inverse order of AGNES
 Eventually each node forms a cluster on its own
Contents
1. Basics and Algorithms
2. K-means Clustering
3. Hierarchical Clustering
4. DBSCAN Clustering
5. Issues : Evaluation, Scalability, Comparison

Data Mining(CT725) 61
Density-Based Clustering Methods
 Partitioning and hierarchical methods are designed to find spherical-shaped
clusters.
 They have difficulty finding clusters of arbitrary shape such as the “S” shape
and oval clusters in figure.

62
Density-Based Clustering Methods
 To find clusters of arbitrary shape, alternatively, we can model clusters as
dense regions in the data space, separated by sparse regions.
 This is the main strategy behind density-based clustering methods, which can
discover clusters of nonspherical shape.

63
Density-Based Clustering Methods
 Clustering based on density (local cluster criterion), such as density-connected
points
 Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters as termination condition
 Several interesting studies:
 DBSCAN: Ester, et al. (KDD’96)
 OPTICS: Ankerst, et al (SIGMOD’99).
 DENCLUE: Hinneburg & D. Keim (KDD’98)
 CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based) 64
DBSCAN

The density of an object o can be measured by the number of
objects close to o.

DBSCAN (Density-Based Spatial Clustering of Applications
with Noise) finds core objects, that is, objects that have dense
neighborhoods.

It connects core objects and their neighborhoods to form dense
regions as clusters.

65
DBSCAN
“How does DBSCAN quantify the neighborhood of an object?”

→ A user-specified parameter ε > 0 is used to specify the radius of a


neighborhood we consider for every object.
→ The ε-neighborhood of an object o is the space within a radius ε
centered at o.
→ Due to the fixed neighborhood size parameterized by ε, the density of a
neighborhood can be measured simply by the number of objects in the
neighborhood.

66
DBSCAN
→ To determine whether a neighborhood is dense or not, DBSCAN uses
another user-specified parameter, MinPts, which specifies the density
threshold of dense regions.

→ An object is a core object if the ε-neighborhood of the object contains


at least MinPts objects. Core objects are the pillars of dense regions.

67
DBSCAN
 Given a set, D, of objects, we can identify all core objects with respect
to the given parameters, ε and MinPts.
 The clustering task is therein reduced to using core objects and their
neighborhoods to form dense regions, where the dense regions are
clusters.

68
Directly Density-Reachable
 For a core object q and an object p, we say that p is directly
density-reachable from q (with respect to ε and MinPts) if p is within
the ε-neighborhood of q.

69
Density-Reachable and Density-Connected
 Density- reachable:
 A point p is density-reachable from a point q w.r.t. Eps, MinPts if
there is a chain of points p1, …, pn, p1 = q, pn = p such that pi+1 is
directly density-reachable from pi
 Density- connected
 A point p is density-connected to a point q w.r.t. Eps, MinPts if there
is a point o such that both, p and q are density-reachable from o
w.r.t. Eps and MinPts

70
Consider the following figure for a given ε represented by the radius of
the circles, and, say, let MinPts = 3.
Consider the following figure for a given ε represented by the radius of
the circles, and, say, let MinPts = 3.

Of the labeled points, m, p, o, r are core objects because each is in an ε-


neighborhood containing at least three points.
Consider the following figure for a given ε represented by the radius of
the circles, and, say, let MinPts = 3.

Object q is directly density-reachable from m.


Object m is directly density-reachable from p and vice versa.
Consider the following figure for a given ε represented by the radius of
the circles, and, say, let MinPts = 3.

Object q is (indirectly) density-reachable from p because q is directly density-


reachable from m and m is directly density-reachable from p. However, p is not
density-reachable from q because q is not a core object.
Consider the following figure for a given ε represented by the radius of
the circles, and, say, let MinPts = 3.

Similarly, r and s are density-reachable from o and o is density-


reachable from r. Thus, o, r, and s are all density-connected.
DBSCAN: The Algorithm

77
DBSCAN: Determining EPS and MinPts
✔ Idea is that for points in a cluster, their kth nearest neighbors are
at close distance
✔ Noise points have the kth nearest neighbor at farther distance
✔ So, plot sorted distance of every point to its kth nearest neighbor
DBSCAN: Core, Border and Noise Points

Original Points Point types: core,


border and noise
Eps = 10, MinPts = 4
DBSCAN: Core, Border and Noise Points

Core points: These points are in the interior of a density-based cluster. A point
is a core point if there are at least MinPts within a distance of Eps, where
MinPts and Eps are user-specified parameters. In Figure, point A is a core point
for the radius (Eps) if MinPts ≥ 7.
DBSCAN: Core, Border and Noise Points

Border points: A border point is not a core point, but falls within the
neighborhood of a core point. In Figure, point B is a border point. A border
point can fall within the neighborhoods of several core points.
DBSCAN: Core, Border and Noise Points

Noise points: A noise point is any point that is neither a core point nor a
border point. In Figure, point C is a noise point.
DBSCAN Algorithm
When DBSCAN Works Well

Original Points Clusters (dark blue points indicate noise)

• Can handle clusters of different shapes and sizes


• Resistant to noise
Contents
1. Basics and Algorithms
2. K-means Clustering
3. Hierarchical Clustering
4. DBSCAN Clustering
5. Issues : Evaluation, Scalability, Comparison

Data Mining(CT725) 85
Considerations for Cluster Analysis
 Partitioning criteria
 Single level vs. hierarchical partitioning (often, multi-level hierarchical
partitioning is desirable)
 Separation of clusters
 Exclusive (e.g., one customer belongs to only one region) vs. non-
exclusive (e.g., one document may belong to more than one class)
 Similarity measure
 Distance-based (e.g., Euclidian, road network, vector) vs. connectivity-
based (e.g., density or contiguity)
 Clustering space
 Full space (often when low dimensional) vs. subspaces (often in high-
dimensional clustering)
86
Requirements and Challenges
 Scalability
 Clustering all the data instead of only on samples
 Ability to deal with different types of attributes
 Numerical, binary, categorical, ordinal, linked, and mixture of these
 Constraint-based clustering
 User may give inputs on constraints
 Use domain knowledge to determine input parameters
 Interpretability and usability
 Others
 Discovery of clusters with arbitrary shape
 Ability to deal with noisy data
 Incremental clustering and insensitivity to input order
 High dimensionality 87
Cluster Validity
✔ For supervised classification we have a variety of measures to evaluate
how good our model is
– Accuracy, precision, recall

✔ For cluster analysis, the analogous question is how to evaluate the


“goodness” of the resulting clusters?

✔ But “clusters are in the eye of the beholder”!


– In practice the clusters we find are defined by the clustering algorithm

✔ Then why do we want to evaluate them?


– To avoid finding patterns in noise
– To compare clustering algorithms
– To compare two sets of clusters
– To compare two clusters
Clusters found in Random Data
1 1

0.9 0.9

0.8 0.8

0.7 0.7

Random 0.6 0.6


DBSCAN
Points 0.5 0.5
y

y
0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
1 1

0.9 0.9

0.8 0.8
K-means 0.7 0.7
Complete
0.6 0.6
Link
0.5 0.5
y

y
0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
Measures of Cluster Validity
✔ Numerical measures that are applied to judge various aspects of
cluster validity, are classified into the following two types.
– Supervised: Used to measure the extent to which cluster labels match
externally supplied class labels.
 Entropy
 Often called external indices because they use information external to the data
– Unsupervised: Used to measure the goodness of a clustering structure
without respect to external information.
 Sum of Squared Error (SSE)
 Often called internal indices because they only use information in the data

✔ You can use supervised or unsupervised measures to compare clusters


or clusterings

You might also like