0% found this document useful (0 votes)
75 views13 pages

Cluster Analysis GP Seminar

Cluster analysis is used to classify objects into homogeneous groups called clusters without prior knowledge of group membership. It involves selecting variables and a distance measure, choosing a clustering procedure like hierarchical or non-hierarchical, deciding the number of clusters, interpreting results. Hierarchical methods include agglomerative approaches like single, complete, average linkage and divisive methods that join or split clusters.

Uploaded by

Arnab Mukherjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views13 pages

Cluster Analysis GP Seminar

Cluster analysis is used to classify objects into homogeneous groups called clusters without prior knowledge of group membership. It involves selecting variables and a distance measure, choosing a clustering procedure like hierarchical or non-hierarchical, deciding the number of clusters, interpreting results. Hierarchical methods include agglomerative approaches like single, complete, average linkage and divisive methods that join or split clusters.

Uploaded by

Arnab Mukherjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Presented By:-

SABYASACHI GHATWAL
RA 1832001010027
CLUSTER ANALYSIS
Cluster analysis is a class of techniques that are used to classify objects or cases into relative groups called
clusters. Cluster analysis is also called classification analysis or numerical taxonomy. In cluster analysis, there is no
prior information about the group or cluster membership for any of the objects.

Cluster Analysis has been used in marketing for various purposes. Segmentation of consumers in cluster analysis is
used on the basis of benefits sought from the purchase of the product. It can be used to identify homogeneous
groups of buyers.

Cluster analysis involves formulating a problem, selecting a distance measure, selecting a clustering procedure,
deciding the number of clusters, interpreting the profile clusters and finally, assessing the validity of clustering.
The variables on which the cluster analysis is to be done should be selected by keeping past research in mind. It
should also be selected by theory, the hypotheses being tested, and the judgment of the researcher. An
appropriate measure of distance or similarity should be selected; the most commonly used measure is the
Euclidean distance or its square.

Clustering procedures in cluster analysis may be hierarchical, non-hierarchical, or a two-step procedure. A


hierarchical procedure in cluster analysis is characterized by the development of a tree like structure. A hierarchical
procedure can be agglomerative or divisive. Agglomerative methods in cluster analysis consist of linkage methods,
variance methods, and centroid methods. Linkage methods in cluster analysis are comprised of single linkage,
complete linkage, and average linkage.

The non-hierarchical methods in cluster analysis are frequently referred to as K means clustering. The two-step
procedure can automatically determine the optimal number of clusters by comparing the values of model choice
criteria across different clustering solutions. The choice of clustering procedure and the choice of distance measure
are interrelated. The relative sizes of clusters in cluster analysis should be meaningful. The clusters should be
interpreted in terms of cluster centroids.
• USES AND OBJECTIVES
• Used to classify objects (cases) into homogeneous
groups called clusters.
• Objects in each cluster tend to be similar and
dissimilar to objects in the other clusters.
• Both cluster analysis and discriminant analysis are
concerned with classification.
• Discriminant analysis requires prior knowledge of
group membership.
• In cluster analysis groups are suggested by the data.
An Ideal Clustering Situation

Variable 1

Variable 2
Statistics Associated with Cluster Analysis

• Agglomeration schedule. Gives information on the objects or


cases being combined at each stage of a hierarchical clustering
process.

• Cluster centroid. Mean values of the variables for all the cases
in a particular cluster.

• Cluster centers. Initial starting points in nonhierarchical


clustering. Clusters are built around these centers, or seeds.

• Cluster membership. Indicates the cluster to which each


object or case belongs.
Statistics Associated with Cluster Analysis
• Dendrogram (A tree graph). A graphical device for displaying
clustering results.

-Vertical lines represent clusters that are joined together.

-The position of the line on the scale indicates distances at


which clusters were joined.

• Distances between cluster centers. These distances indicate how


separated the individual pairs of clusters are. Clusters that are widely
separated are distinct, and therefore desirable.

• Icicle diagram. Another type of graphical display of clustering results.


Conducting Cluster Analysis
Formulate the Problem

Select a Distance Measure

Select a Clustering Procedure

Decide on the Number of Clusters

Interpret and Profile Clusters

Assess the Validity of Clustering


Classification of Clustering Procedures
Clustering Procedures

Hierarchical Nonhierarchical

Agglomerative Divisive

Linkage Variance Centroid Sequential Parallel Optimizing


Methods Methods Methods Threshold Threshold Partitioning

Ward’s
Method

Single Complete Average


Linkage Linkage Linkage
Hierarchical Clustering Methods
• Hierarchical clustering is characterized by the development of
a hierarchy or tree-like structure.
-Agglomerative clustering starts with each object in a
separate cluster. Clusters are formed by grouping objects into
bigger and bigger clusters.
-Divisive clustering starts with all the objects grouped in a
single cluster. Clusters are divided or split until each object is in
a separate cluster.
• Agglomerative methods are commonly used in marketing
research. They consist of linkage methods, variance methods,
and centroid methods.
Hierarchical Agglomerative Clustering-Linkage Method

• The single linkage method is based on minimum


distance, or the nearest neighbor rule.

• The complete linkage method is based on the


maximum distance or the furthest neighbor approach.

• The average linkage method the distance between two


clusters is defined as the average of the distances
between all pairs of objects
Linkage Methods of Clustering
Single Linkage
Minimum Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average Distance
Cluster 1 Cluster 2
Hierarchical Agglomerative Clustering-
Variance and Centroid Method
• Variance methods generate clusters to minimize the within-cluster
variance.

• Ward's procedure is commonly used. For each cluster, the sum of


squares is calculated. The two clusters with the smallest increase in the
overall sum of squares within cluster distances are combined.

• In the centroid methods, the distance between two clusters is the


distance between their centroids (means for all the variables),

• Of the hierarchical methods, average linkage and Ward's methods have


been shown to perform better than the other procedures.
Other Agglomerative Clustering Methods

Ward’s Procedure

Centroid Method

You might also like