Performance Measures On Cluster Analysis
Performance Measures On Cluster Analysis
the computational complexity using a parallel In this paper, different concepts are analysed in the
programming schedule and applied to both global and light of graph clustering. At first, quality metric is
local graph clustering. They have analysed the simulated identified based on the mathematical measurements
and real graph data. which are measured through stochastic random walk
model. The similarities of clusters are discussed and
Xin et al(2016)have discussed random walk based computed by using the number of edges connecting any
methods that use the Markov chain model to analyse the two clusters which belong to different regions. Finally,
graph. In this regard, vertices and edges indicate the the strengths of the cluster structure are obtained and the
states and transitions between states respectively. probability values are computed when the pair of vertices
Similarly, the graph structures represent the probabilities. belong to same cluster and that belong to different
clusters.
Chung and Kempton(2013) have analysed big graph
data in which the concept becomes more challenging and III. QUALITY METRICS ON CLUSTERS
the researchers are interested to find the cluster for the
seed vertex and it is known as “local clustering problem” In a network, a cluster is defined as a set of
densely connected vertices which are also connected to
Spielman and Teng (2008) have analysed the graph other clusters in a graph. There are a variety of metrics
conductance measurement as the fitness function. which lead to evaluate the quality of clustering based on
intra-cluster and inter-cluster densities. Among the list of
Andersen et al (2007) have introduced random walk
metrics, for this study, consider only three metrics,
to find important vertices around the seed vertex.
namely, “Modularity, Conductance and Coverage” Here
Random walk methods have gained great attention on this
we discuss these three metrics mathematically along with
local graph clustering problems, since the walk started
real life example.
from the seed vertex.
And
It is observed that the conductance of a graph
𝑎𝑖 is the probability of either an intra-cluster always ranges from 0 to 1. It is noted that there are
edge in cluster𝐶𝑖 different ways to define the conductance of a graph which
has been well clustered. Here, we utilize only inter-
or an inter-cluster edge incident on cluster 𝐶𝑖 .
cluster conductance. But, in the case of coverage, the
In other words, 𝑒𝑖𝑖 and 𝑎𝑖 are again empirically intra-cluster conductance is adopted. Still recent days, the
stated as follows. definition of conductance emphasizes the notation of
inter-cluster sparsity but does not capture the intra-cluster
𝑒𝑖𝑖 is the ratio of the favorable number of density. Almeida et al(2011) have discussed the measures
forward and backward edges with in a cluster to the total of conductance for inter-cluster as well as intra-cluster.
number of edges in the graph.
C. Coverage
𝑎𝑖 is the ratio of the favorable number of forward and
backward edges within a cluster plus the number inter- Coverage is defined as the ratio of the number of
cluster edges which linked with neighboring clusters to intra-cluster edges in the graph to the total number of
the total number of edges in the graph. edges in the graph. It is mathematically stated as
i.e,
1 Fig:1: Drunken random walk
∅(𝐺) = 1 − ∑𝑛𝑖=1 ∅(𝐶𝑖 ) (4)
𝑛
Fig(1) revealed that the original path of a the choice between the fast bad method and a slow good
drunkard person from origin to destination is only a method requires a special quantification which makes a
straight way. But his unsteadiness, he forms various method either good or bad.
junctions (vertices) and edges by zic-zac walk in different
directions. The vertices and edges construct a graph with The performance of a clustering method requires
some clusters. By using these network components of comparison of results to either standard results or to the
fig(1) and the expressions(1), (4) and (5), the measures of results of other method. The purpose of this selection is to
the quality metrics, particularly modularity, conductance measure the similarity between clusterings. One of the
and coverage are obtained and presented here. classification problems is discriminant analysis which
provides a correct classification against others.
Modularity, m= 0.53482
In the clustering situations, there is no proper
Conductance, ∅(G)= 0.66785
methodology to measure clustering and their similarities.
Coverage, C = 0.83333 Anyhow, the comparison of two clustering is the suitable
method. In clustering method, the similarity measure
motivates the following three considerations which form
IV. SIMILARITY OF CLUSTERS the basis for clustering problem.
Clustering analysis has applied as a prime term for the Clustering is discrete and every point is
techniques which are dealt with the specified problem. unequivocably assigned to a particular cluster.
The selection of clusters and vertices of clusters that Clusters are defined just by those points which
belong to different regions is a great task. The nature of do not contain and that do contain
clusters and their vertices along with edges has been Equal importances are given to all points in the
studied depending on their closeness and other characters. justification of clustering.
The similarity of clusters has been analysed with the help
of vertices and edges. The above three considerations lead to the basic unit
of comparison between two clustering is how pair of
Consider a clustering problem which is formed as a triplet points are clustered. If the elements of an individual
(X,Y,m). point-pair are placed together in a cluster in each of the
Here X is a set of N objects to be clustered. X={X 1, two clustering or if they are assigned to different clusters
X2, X3,…….,Xn}. in both clusterings, this shows a similarity between the
Y is a specific partitioning of these objects in to k clusterings.
disjoint sets. Y={Y1, Y2, Y3,…, Yk}.
of similar assignments of point -pairs normalized by the other hand, for large number of edges, there is null
total number of point–pairs. For estimating the similarity, similarity.
a mathematical form is essentially established.
V. STRENGTH OF GRAPH CLUSTERING
Let N be the total number of vertices. Let
nij(i=1,2,3,……,n1;j=1,2,3,……,n2) be the number of The process of graph clustering is a challenging
vertices simultaneously in the ith cluster of Y and the jth and cumbersome task. During the last decades, many
cluster of Y'. algorithms on graph clustering have been proposed. The
criteria based approaches try to optimize clustering
It is otherwise stated that the number of edges linking
fitness functions by applying optimization techniques.
the vertices of ith cluster and the vertices of jth cluster.
The size of the graph data has grown rapidly. In general,
By these assumptions, the similarity between Y and Y' a social network graph contains countless number of
is defined as vertices and edges. Processing these data are very
challenging and time consuming since the nature of these
1 2 2
[(𝑁2) − [2 {∑𝑛1 𝑛2 𝑛2 𝑛1 𝑛1 𝑛2 2
𝑖=1(∑𝑗=1 𝑛𝑖𝑗 ) + ∑𝑗=1(∑𝑖=1 𝑛𝑖𝑗 ) } − ∑𝑖=1 ∑𝑗=1 𝑛𝑖𝑗 ]] graphs are also heterogeneous.
𝐶(𝑌, 𝑌 ′ ) = ⁄𝑁
(2 )
(6)
The expression (8) is useful for generating The strength of the cluster structure by using the
graphs with ‘c’ clusters and each cluster contains equal equation (7) and the probabilities from equations (8),(9)
number of vertices. Consider pair of vertices which are and (10) are computed and presented in table(2).
belong to either the same cluster or different clusters.
Let 𝑃(𝑆𝐶) and 𝑃(𝐷𝐶)be the probability that the given Number Strength
of edges (q) 𝒑(𝒔𝒄) 𝒑(𝑫𝒄)
pair of vertices belong to the samecluster and different
clusters respectively.
For the purpose of illustrating this model, The computational results showed that, for
consider a graph with three clusters each have six vertices increasing the number of edges, the strength of the cluster
since constructing and analyzing a graph with large structure decreases but the probability that the edges
number of vertices and edges are very tedious. The connecting the vertices of different clusters increases
number of edges raised from each vertex of a cluster rapidly. But, on the other hand, for the edges connecting
connecting the vertices of other clusters are considered as the vertices belong to same cluster, the probability
equal in each case. For estimating the strengths of cluster remains the same.
structure and probabilities, equal number of edges taken
as 3,4,5 and 6. IV. CONCLUSION AND DISCUSSION