0% found this document useful (0 votes)
54 views

Business Research: Cluster Analysis

Cluster analysis is a statistical technique used to group objects based on their similarities. It assigns observations to clusters so that objects within each cluster are homogeneous and distinct from objects in other clusters. There are several methods of cluster analysis including hierarchical, non-hierarchical, and dimensionalizing. Hierarchical methods involve linking objects into tree-like groups, non-hierarchical methods assign objects to partitions, and dimensionalizing methods represent objects by their factor scores. Common hierarchical clustering techniques include single linkage, complete linkage, average linkage, centroid linkage, and Ward's method. Cluster analysis involves calculating distances or similarities between objects and grouping them together iteratively.

Uploaded by

popat vishal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Business Research: Cluster Analysis

Cluster analysis is a statistical technique used to group objects based on their similarities. It assigns observations to clusters so that objects within each cluster are homogeneous and distinct from objects in other clusters. There are several methods of cluster analysis including hierarchical, non-hierarchical, and dimensionalizing. Hierarchical methods involve linking objects into tree-like groups, non-hierarchical methods assign objects to partitions, and dimensionalizing methods represent objects by their factor scores. Common hierarchical clustering techniques include single linkage, complete linkage, average linkage, centroid linkage, and Ward's method. Cluster analysis involves calculating distances or similarities between objects and grouping them together iteratively.

Uploaded by

popat vishal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 10

Business research

Cluster analysis

CLUSTER ANALYSIS
Introductio
n Cluster analysis is the name given to a bewildering

assortment
of
techniques
designed
to
perform
classification by assigning observation to groups so that
each group is more or less homogeneous and distinct
from other. Given the multivariate nature of data, the
researcher s posed with the problem of identifying natural
grouping of the objects. Cluster analysis deals with the
process of assigning object to groups so that similarity
within and difference among groups is restored. Cluster
analysis is a pre-classificatory method, where groups of
objects have been formed on the basis of profile
resemblance in the data matrix itself. Many of these
procedures are relatively simple but are usually not
supported by an extensive body of statistical reasoning.

Meaning and
definition
Cluster analysis is a class of statistical
techniques that can be applied to data that EXHIBIT
groupings.
Cluster analysis classifies a set of observations
into two or more mutually exclusive groups based on
combination of interval variables.

Methods of cluster
once having decided the measure of similarity
analysis
coefficient, the researcher may draw upon a variety of
clustering programmes, which can be grouped under the
following three categories;

(1)Dimensionalising methods,
(2)Nonhierarchial methods,
(a) Sequential threshold
(b) parallel threshold
(c) partitioning method

(3) Hierarchial methods


(a) single linkage or minimum distance
(b) complete linkage
(c) average linkage
(d) centroid method
(e) median method
(f) wards method

(1) Dimensionalising
methods
These approaches use principal-components or
other factor analysis methods to find a dimensional
representation of points from inter-object association
measures. Cluster are then developed based on
grouping their company scores.

(2) Nonhierarchial
methods
These methods, based on the
methods, use three categories;

proximity

matrix

(a)A sequential threshold to develop clusters one by


one successively determining cluster centers,
(b) parallel threshold to develop several clusters
simultaneously and
(c) partitioning method where the clusters are formed
on the basis of optimizing some overall criterion
measure for a given number of clusters.

(3) Hierarchial methods

In this procedures, a hierarchy or tree-like structure is


constructed starting from each point as a cluster. At the
next level the two closest points are placed in a cluster.
At the following level, a third point joins the first two, or
else a second two-point clusters formed, based on
various criterion function for assignment. Eventually, all
points are grouped into one large cluster.

(a) single linkage or minimum distance


This rule finds two points with the shortest Euclidean
distance. These are placed in the first cluster. then the
third point with the shortest distance to the members of
the cluster (smaller then the two closest un-clustered
points)joins this cluster. Otherwise two closest unclustered points are placed in a cluster.

(b) complete linkage


this also starts in a similar way as the single linkage .
But the criterion for joining points to clusters or clusters-

(c) average linkage


this rule is similar to the previous rules; however,
the distance between two clusters is the average
distance from points in the first cluster to the points in
the second cluster.

(d) centroid method


the two clusters are joined for which the distance
between the two centroid (points with mean values on
each clustering variable) is smallest.

(e) median method


This is same as centroid method, except that when
two clusters are joined, the centroid of new cluster is
computed giving equal weight to the two component
clusters.

(f) wards method


The two clusters are joined yield the smallest

Performed/steps of cluster
1. analysis
The largest off-diagonal element in the correlation
2.
3.
4.
5.

matrix(the highest correlation between two variables)


gives two variables to from the nucleus of the cluster.
Each of the remaining variables is to the cluster in turn
and the b coefficient for the cluster with that variables
included is calculated.
The variable whose inclusion yields the highest b
coefficient for the new cluster(of three variables) is
added to the cluster.
Steps 2 and 3 are repeated for a fourth variables, adding
to the cluster the variables that yields the highest b
coefficient.
Continue adding variables by the above procedure until
there is a sharp drop in the b coefficient or until the b
coefficient falls below some predetermined value. What
constitute a sharp drop or a minimum acceptable value

6 when the decision is reached that the first cluster is


complete, a new cluster may be started by searching
among the variables that have not been clustered for the
most highly correlated pair and proceeding as above,
being careful not to include already clustered variables in
the new cluster.
7 variables are added to the second cluster until the b
coefficient for that cluster becomes too low.
8 additional clusters may be formed from among the
remaining variables until all variables have been placed
in one or another cluster or until there is no pair of
variables remaining that yields a satisfactory b
coefficient, at which point clustering is complete.

You might also like