0% found this document useful (0 votes)

7 views

Cluster Analysis

CLUSTER

Uploaded by

abhishekkdokania

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Cluster Analysis

CLUSTER

Uploaded by

abhishekkdokania

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

591

CHAPTER 20 Clueter Analyis

he variables are measured in vastly different units, the clustering solution wi e
nTiuenced by the units of measurement. In a supermarket shopping study, attituainal
variables may be measured on a nine-point Likert-type scale: patronage, in ters
frequency of visits per month and the dollar amount spent; and brand loyalty. In teis
Pereentage of grocery shopping expenditure allocated to the favorite supermarket. In these
Cases, before clustering respondents. we must standardize the data by rescaling eacn
hierarchical clustering Vanable to have a mean of zero and a standard deviation of unity. Although standardization
A
clustering procedure character Can remove the influence of the unit of measurement. it can also reduce the differences
ized by the development of a hierar oetween groups on variables that may best discriminate groups or clusters. It 1s also
chy or tree-like structure.

agglomerative clustering desirable to eliminate outliers (cases with atypical values).'o

results. Hence, it is
Hierarchical clustering procedure OSe of different distance measures may lead to different clustering
where each object starts out in a advisable to use different measures and compare the results. Having selected a distance or
separate cluster. Clusters are formed similarity measure, we can next select a clustering procedure.
by grouping objects into bigger and
bigger clusters.
divisive chusteringR Select a Clustering Procedure
Hierarchical clustering procedure Figure 20.4 is a classification of clustering procedures. Clustering procedures can be
where all objects start out in one hierarchical or nonhierarchical. Hierarchical clustering is characterized by the develop
giant cluster. Clusters are formed by or
dividing this cluster into smaller and ment of a hierarchy or tree-like structure. Hierarchical methods can be agglomerative
divisive. Agglomerative clustering starts with each obiect in a separate cluster. Clusters
smaller clusters.
linkage methods are formed by grouping objects into bigger and bigger clusters. This process is continued
Agglomerative methods of hierar until all objects are members of a single cluster. Divisive clustering starts with all the
is in a
chical clustering that cluster objects objects grouped in a single cluster. Clusters are divided or split until each object
based on a computation of the dis separate cluster.
tance between them.
Agglomerative methods are commonly used in marketing research. They consist of
single linkage linkage methods, error sums of squares or variance methods, and centroid methods.
Linkage method that is based on Linkage methods include single linkage, complete linkage, and average linkage. The
minimum distance or the nearest single linkage method is based on minimum distance or the nearest neighbor rule. The first
neighbor rule.

Figure 20.4 Chst

A
Classification of Clustering
Procedures

Nonhierarchical
Hierarchical

Divisive
Agglomerative

Sequential Parallel Optimizing

Threshold Threshold Partitioning

Variance Centroid
Linkage Methods Methods
Methods

Ward's
Method

Complete Average
Single Linkage Linkage
Linkage
592 PART II Dala Collection, Prparutim, Analuis, and Reporting
Figure 20.5 Single Linkage
Linkage Methods of Clustering

Minimunm
Distance
Cluster| Cluster 2

Complete Linkage

Maximum
Distance

Cluster l Cluster 2

Average Linkage

Average
Distance
complete linkage
Linkage method that is based on Cluster! Cluster2
maximum distance or the furthest
neighbor approach.
arerage linkage twoobjects clustered are those that have the smallest distance between them. The next
A
linkage method based on the
average distance between all pairs shortest distance is identified, and either the third object is clustered with the first two, or a
of objects, where one member of the new two-object cluster is formed. At every stage, the distance between two clusters is the
pair is from each of the clusters. distance between their two closest points (see Figure 20.5). Two clusters are merged at any
variance methods stage by the single shortest link between them. This process is continued until allobjects
An agglomerative method of hierar are in one cluster. The single linkage method does not work well when the clusters ae
chicalclustering in which clusters
are generated to minimize the poorly defined. The complete linkage method is similar to single linkage. except that it is
within-cluster variance. based on the maximum distance or the furthest neighbor approach. In complete inkage.
the distance between two clusters is calculated as the distance between their two furthest
Ward's procedure
Variance method in which the points. The average linkage method works similarly. However, in this method. the
squared euclidean distance to the distance between two clusters is defined as the average of the distances between all pairs
cluster means is minimized. of objects, where one member of the pair is from each of the clusters (Figure 20.5). As can
centroid methods be seen, the average linkage method uses information on allpairs of distances, not merely
A variance method of hierarchical the minimum or maximum distances. For this reason, it is usually preferred to the singe
clustering in which the distance and complete linkage methods.
between two clusters is the distance
between their centroids (means for The variance methods attempt to generate clustersto minimize the within-cluster
all the variables). variance. Acommonly used variance method is the Ward's procedure. For each cluster, he
nonhierarchical custering means for all the variables are computed. Then, for each object, the squared euclidean
A
procedure that first assigns or distance to the cluster means is calculated (Figure 20.6). These distances are summed tor
determines a cluster center and then all the objects. At each stage, the two clusters with the smallest increase in the overall sunt
groups all objects within a prespeci of squares within cluster distances are combined. In the centroid methods, the distanct
fied threshold value from the center.
between two clusters is the distance between their centroids (means for all the variables.
sequential threshold method as shown in Figure 20.6. Every time objects are grouped, a new centroid is computed.
A nonhierarchical clustering method the hierarchical methods, average linkage and Ward's methods have been shown "
in which acluster center is selected
and all objects within a prespecified perform better than the other procedures. l
threshold value from the center are The second type of clustering procedures, the nonhierarchical clustering methou
grouped together. frequently referred to as k-means clustering. These methods include sequential thresno
parallel threshold method parallel threshold, and optimizing partitioning. In the sequential threshold metho,
Nonhierarchical clustering method cluster center is selected and all objects within a prespecified threshold value from u
that specifies several cluster centers center are grouped together. Then a newcluster center or seed is selected, and the pro
at once. All objects that are within a
prespecified threshold value from is repeated for the unclustered points. Once an object is clustered with a seed. t
the center are grouped together. longer considered for clustering with subsequent seeds. The parallel threshold mele
593
CHAPTER 20 Cluster Analui
Figure 20.6
Other Agglomerative Clustering Ward's Method
Methods

Centroid Method

simultaneously. and
operates similarly, except that several cluster centers are selected
optimizing partitioning method objects within the threshold level are grouped with the nearest center. The optimnizing
Nonhierarchical clustering method partitioning method differs from the two threshold procedures in that objects can later be
that allows for later reassignment of overall criterion, such as average within-cluster
reassigned to clusters to optimize an
obiects to clusters to optimize an distance for a given number of clusters.
overall criterion.
the number of
Two major disadvantages of the nonhierarchical procedures are that Furthermore.
clustersmust be prespecified and the selection of cluster centers is arbitrary.
selected. Many nonhierarchical
the clustering results may depend on how the centers are
missing values as initial
programs select the first k (k= number of clusters) cases without
order of observations in the
cluster centers. Thus, the clustering results may depend on the
methods and has merit when
data. Yet nonhierarchical clustering is faster than hierarchical
suggested that the hierarchical and
the number of objects or observations is large. It has been is obtained
nonhierarchical methods be used in tandem. First, an initial clustering solution
Ward's. The number of clusters
using a hierarchical procedure, such as average linkage oroptimizing
as inputs to the partitioning method.2
and cluster centroids so obtained are used
Choice of a clustering method and choice of a distance measure are interrelated. For
should be used with the Ward's and centroid
example, squared euclidean distances
distances.
methods. Several nonhierarchical procedures also use squared euclidean clustering. The output
We will use the Ward's procedure to illustrate hierarchical
Table 20.2. Useful information is
obtained by clustering the data of Table 20.1 is given in cases or clusters
the number of
contained in the agglomeration schedule, which shows
stage 1, with 19 clusters.
being combined at each stage. The first line represents shown in the columns labeled
Respondents 14 and 16 are combined at this stage, as
these two respondents is
"Clusters Combined." The squared euclidean distance between "Stage Cluster First
entitled
oiven under the column labeled Coefficients." The column illustrate, an entry oflat
To
Appears" indicates the stage at which a cluster is tirst formed. last column, "Next
at stage 1. The
stage 6 indicates that respondent 14 was first grouped
(respondent) or cluster is combined with
Stage." indicates the stage at which another case
column is 6, we see that at stage 6.
this one Because the number in the first line of the last Similarly, the second Jine
cluster.
respondent 10 is combined with 14 and l6 to fornm a single and7 are grouped together
respondents 6
represents stage 2 with l8 clusters. In stage 2, plot given in Figure 20.7.
Another important part of the output is contained in the iciclecase respondents labeled
in this
The columns correspond to the objects being clustered, read from bottom to
through 20.The rows correspond to the number of clusters. This figure is
considered
cases are as individual clusters. Because there are 20 respondents.
top. At first, all objects are combined, resulting in
there are 20 initial clusters. At the first slep, the two closest
PART III Data Csllatim, Prepartin, Analyi, and Reportng
TABLE 20.2
Results of Hierarchical Clustering
CASE PROCESSING SUMMARY®,b
CASES
VALID MissING
TOTAL
Percent Percent zN

20 100,0 0.0 20
Percent
100.0
"Squared Euclidean Distance used
bWard Linkage

WARD LINKAGE
AGGLOMERATION SCHEDULE
STAGE CLUSTER FIRST
CLUSTER COMBINED APPEARS
STAGE CLUSTER 1 CLUSTER 2 COEFFICIENTS CLUSTER 1 CLUSTER 2 NEXT STAGE
14 16 1.000 6
6 7 2.000 7
13 3.500 15
5.000
3 6.500 16
10 |4 8.167 9
7 6 12 10.500
9 20 13.000
9 10 15.583 (0 6 12
6 18.500 7 13
23.000 4 8
12 19 27.750
13 1 17 33.100 14
14 41.333 13 16
2 S1.833 3 11
16 3 64.500 14
17 79.667 12
18 4 172.667 15 17 19
19 2 328.600 16 18

CLUSTER MEMBERSHIP
CASE 4CLUSTERS 3 CLUSTERS 2 CLUSTERS
1
2 2
3
4 3 2
2 2
6
7
8
9
3 2
2 2
12
13 2 2
14 3

16 3 3
17
18 4 3
19 3
20
1 X X X X X X X X X
X X X X X X X X X
X X X X X X X X X
X
6 X X X X X X X X X X X X
X X X X X X X
X X X X X X X X X X X X X
XX X
X X
7 X X X X X X X X X X X X X X X
X X X X X X X X X X X X
12 X X X X X X X X X
X X X X
X X
X X X X X X X
17 X X X X X X
X X X X X X X X X X X
X
X X X X X
X
15 X X XI X X
X X X X
X X X X X X X X X
X X X
3 X X X X X X X X X X X X X
X X X X X X
X X X X
X X X X X X X X X
X X X X X X X X X X X X X X X
X X
X

X X X X X X X X
2 X X X X X X X X X X X

X X X X X
X X X X X X X X X
13 X X X X X X
X X X X X X
X X X X X X X

X X X X
X XX
5
X X X X X X X X X X X X X X X

X X X X X X X X X
X X
X X X X X X X
11 X X X X X X X X X X X X

X X X X X X
X X XX
X
X X X X X X X X X X X X X
X X X X

X X X X X X
20 X X X X X X X X X X X X X X X X X X

X XX
X
X X X X X
X X X X X X X XX X X X X
4
X X X X X X X

X X X X X X X X X X
10 X XXX X X X X X

X X X X X X
X X X
X
X X X X X X X X X X X X
14 X X X
X X X X
X X X
X X X X X X X X X
Procedure
X X X X X
X X
X
X X X X
X X X X X X X X X
16 X X X
X X X X X X X
X X X X X X X X X X X X X X Ward's
19 X X
X X X X X X X X X X Using
X X X X
18 X X X X X X

CLUSTERS Plot
lcicle
20.7
Figure
OF Vertical
NUMBER
12 3 14 15 16 17 18 19
CASE 8 9 10
2 3 4

595
PART II Data Cllacton, Prpantiwm, Analut, and Reporting
14
Using Ward's
4

Number
Case
Label

10 15 20 2
Rescaled Distance Cluster Combine

19 clusters. The last line of Figure 20.7shows these 19

clusters. The two cases, respondents
14 and 16, that have been combined at this stage have between them all
Xs in rows Ithrough
19. Row number 18 corresponds to the next stage, with 18 clusters. At
this stage, respondents
6 and 7 are grouped together. The column of Xs between respondents 6 and 7 has a
blankin
row 19.Thus, at this stage there are 18 clusters; 16 of them consist of individual
respondents.
and two contain two respondents each. Each subsequent step leads to the formation of a new
cluster in one of three ways: (1) two individual cases are grouped together, (2) a case is joined
to an already existing cluster, or (3) two clusters are grouped together.
Another graphic device that is useful in displaying clustering results is the
dendrogram (see Figure 20.8). The dendrogram is read from left to right. Vertical lines
represent clusters that are joined together. The position of the line on the scale indicates
the distances at which clusters were joined. Because many of the distances in the early
stages are of similar magnitude, it is difficult to tell the sequence in which some of the
earlyclusters are formed. However, it is clear that in the last two stages, the distances at
which the clusters are being combined are large. This information is useful in deciding
on the number of clusters.
It is also possible to obtain information on cluster membership of cases if the number of
clusters is specified. Although this information can be discerned from the icicle plot, a
tabular display is helpful. Table 20.2 contains the cluster membership for the cases,
depending on whether the final solution contains two, three, or four clusters. Information of
this type can be obtained for any number of clusters and is useful for deciding on the number
of clusters.

Decide on the Number of Clusters

Amajor issue in cluster analysis is deciding on the number of clusters. Although there are
no hard and fast rules, some guidelines are available:
1. Theoretical, conceptual, or practical considerations may suggest a certain number ot
clusters. For example, if the purpose of clustering is to identify market segmets.
management may want a particular number of clusters.
2. In hierarchical clustering, the distances at which clusters are combined can be used as
criteria. This information can be obtained from the agglomeration schedule or from
the dendrogram. In our case, we see from the agglomeration schedule in Table 20.
that the value in the "Coefficients" column suddenly more than doubles between
stages 17 (3 clusterS) and 18 (2 clusters). Likewise, at the last two stages of the de

Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Chapter 20: Cluster Analysis: Advance Marketing Research
No ratings yet
Chapter 20: Cluster Analysis: Advance Marketing Research
40 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Week-9-Part-2 Agglomerative Clustering
No ratings yet
Week-9-Part-2 Agglomerative Clustering
40 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Cluster Analysis CH 20
No ratings yet
Cluster Analysis CH 20
2 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
2.3. Clustering - Scikit-Learn 1
No ratings yet
2.3. Clustering - Scikit-Learn 1
24 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
2 pages
Cluster
No ratings yet
Cluster
22 pages
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
No ratings yet
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
50 pages
UNIT 2 DMW
No ratings yet
UNIT 2 DMW
26 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
4 Clustering
No ratings yet
4 Clustering
21 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
Hierarchical-Clustering-A-Comprehensive-Guide (1)
No ratings yet
Hierarchical-Clustering-A-Comprehensive-Guide (1)
10 pages
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
35 pages
Lecture 16
No ratings yet
Lecture 16
29 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Clustering new
No ratings yet
Clustering new
6 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Clustering
No ratings yet
Clustering
7 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
decision tree new
No ratings yet
decision tree new
8 pages
clustering1
No ratings yet
clustering1
2 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
Agglomerative Hierarchial Clustering
No ratings yet
Agglomerative Hierarchial Clustering
10 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Mid 2
No ratings yet
Mid 2
11 pages
Linkage Methods
No ratings yet
Linkage Methods
2 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Performance Evaluation of Distance Metrics in The Clustering Algorithms
No ratings yet
Performance Evaluation of Distance Metrics in The Clustering Algorithms
14 pages
Block 18 ST3188
No ratings yet
Block 18 ST3188
29 pages
DataMining_Unit4_notes
No ratings yet
DataMining_Unit4_notes
27 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Unit 4
No ratings yet
Unit 4
77 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Hierarchical and Partitional Clustering
No ratings yet
Hierarchical and Partitional Clustering
2 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Absil Bib PDF
No ratings yet
Absil Bib PDF
20 pages
Transportation and Assignment Problem
No ratings yet
Transportation and Assignment Problem
67 pages
Solving Inequalities
No ratings yet
Solving Inequalities
39 pages
Climax: A Foundation Model For Weather and Climate
No ratings yet
Climax: A Foundation Model For Weather and Climate
41 pages
Machine Learning MS
No ratings yet
Machine Learning MS
5 pages
01 Nonlinear Optimization
No ratings yet
01 Nonlinear Optimization
90 pages
Advanced Network Adjustment - Leica Infinity
No ratings yet
Advanced Network Adjustment - Leica Infinity
18 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Comparison of Ziegler-Nichols, Cohen-Coon and Fuzzy Logic Controllers For Heat Exchanger Model: A Review
No ratings yet
Comparison of Ziegler-Nichols, Cohen-Coon and Fuzzy Logic Controllers For Heat Exchanger Model: A Review
7 pages
Given A Square Matrix
No ratings yet
Given A Square Matrix
6 pages
Module 3 Ai Viva Questions
No ratings yet
Module 3 Ai Viva Questions
7 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
14 pages
Smart Rice Cooker Report
0% (1)
Smart Rice Cooker Report
24 pages
AIQSS2024 SummerInternship1B2024
No ratings yet
AIQSS2024 SummerInternship1B2024
1 page
Sanyam Modi Review PAPER PDF
No ratings yet
Sanyam Modi Review PAPER PDF
3 pages
Lecture 5 - Arithmetic Mean
No ratings yet
Lecture 5 - Arithmetic Mean
24 pages
Joshua Holden : Resource Guide For Teaching Post-Quantum Cryptography
No ratings yet
Joshua Holden : Resource Guide For Teaching Post-Quantum Cryptography
9 pages
VIT Image Processing Question Paper
No ratings yet
VIT Image Processing Question Paper
5 pages
Lecture 1
No ratings yet
Lecture 1
85 pages
Transportation Theory
No ratings yet
Transportation Theory
16 pages
Advantage and Disadvantage On ANN
No ratings yet
Advantage and Disadvantage On ANN
2 pages
University Questions
No ratings yet
University Questions
16 pages
Latex Test PDF Rendering
No ratings yet
Latex Test PDF Rendering
4 pages
Quiz 2 PDF
No ratings yet
Quiz 2 PDF
2 pages
University of Birmingham: Coursework Assignment
No ratings yet
University of Birmingham: Coursework Assignment
15 pages
DIP Final Practical List
No ratings yet
DIP Final Practical List
3 pages
Data Mining at UVA: New Horizons in Teaching and Learning Conference
No ratings yet
Data Mining at UVA: New Horizons in Teaching and Learning Conference
19 pages
NAME:K.Harshavardhan Reg no:11BEC1074
No ratings yet
NAME:K.Harshavardhan Reg no:11BEC1074
13 pages
Probability and Statistics
No ratings yet
Probability and Statistics
2 pages
Course Outline
No ratings yet
Course Outline
3 pages

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

591

CHAPTER 20 Clueter Analyis

agglomerative clustering desirable to eliminate outliers (cases with atypical values).'o

Figure 20.4 Chst

Sequential Parallel Optimizing

19 clusters. The last line of Figure 20.7shows these 19

Decide on the Number of Clusters

You might also like