Cluster Analysis GP Seminar
Cluster Analysis GP Seminar
SABYASACHI GHATWAL
RA 1832001010027
CLUSTER ANALYSIS
Cluster analysis is a class of techniques that are used to classify objects or cases into relative groups called
clusters. Cluster analysis is also called classification analysis or numerical taxonomy. In cluster analysis, there is no
prior information about the group or cluster membership for any of the objects.
Cluster Analysis has been used in marketing for various purposes. Segmentation of consumers in cluster analysis is
used on the basis of benefits sought from the purchase of the product. It can be used to identify homogeneous
groups of buyers.
Cluster analysis involves formulating a problem, selecting a distance measure, selecting a clustering procedure,
deciding the number of clusters, interpreting the profile clusters and finally, assessing the validity of clustering.
The variables on which the cluster analysis is to be done should be selected by keeping past research in mind. It
should also be selected by theory, the hypotheses being tested, and the judgment of the researcher. An
appropriate measure of distance or similarity should be selected; the most commonly used measure is the
Euclidean distance or its square.
The non-hierarchical methods in cluster analysis are frequently referred to as K means clustering. The two-step
procedure can automatically determine the optimal number of clusters by comparing the values of model choice
criteria across different clustering solutions. The choice of clustering procedure and the choice of distance measure
are interrelated. The relative sizes of clusters in cluster analysis should be meaningful. The clusters should be
interpreted in terms of cluster centroids.
• USES AND OBJECTIVES
• Used to classify objects (cases) into homogeneous
groups called clusters.
• Objects in each cluster tend to be similar and
dissimilar to objects in the other clusters.
• Both cluster analysis and discriminant analysis are
concerned with classification.
• Discriminant analysis requires prior knowledge of
group membership.
• In cluster analysis groups are suggested by the data.
An Ideal Clustering Situation
Variable 1
Variable 2
Statistics Associated with Cluster Analysis
• Cluster centroid. Mean values of the variables for all the cases
in a particular cluster.
Hierarchical Nonhierarchical
Agglomerative Divisive
Ward’s
Method
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
Hierarchical Agglomerative Clustering-
Variance and Centroid Method
• Variance methods generate clusters to minimize the within-cluster
variance.
Ward’s Procedure
Centroid Method