0% found this document useful (0 votes)

37 views

Community Detection Using Statistically Significant Subgraph Mining

This document discusses using statistically significant subgraph mining to detect communities in graphs. It begins by introducing the topic and motivation. It then reviews previous work on community detection algorithms. The main body of the document describes the authors' work. It argues that statistically significant subgraphs are likely to be similar to dense subgraphs and could be used to find communities. It evaluates different labelling methods and compares algorithms using metrics like Jaccard index and F-score on real-world datasets. The authors ran 2-approximate and statistically significant subgraph algorithms to find communities and compare results.

Uploaded by

Anonymous

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Community Detection Using Statistically Significant Subgraph Mining

Uploaded by

Anonymous

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Community Detection using Statistically

Significant Subgraph Mining

Mohit Kumar Garg, Prithvi Sharma, and Rohit Kumar Jha
Indian Institute of Technology Kanpur, India

Abstract. In our project, we are trying to find the relation, if any,

among densest subgraphs, statistically significant subgraphs and communities. We also try to find the reasons for the existence or non-existence
of the relation among them. We have used synthetic as well as real world
datasets to analyze the algorithms for the same.
Keywords: statistically significant subgraphs, community detection, densest subgraphs

Introduction

Structured patterns in graphs are studied for the purpose of understanding the
intrinsic characteristics of the scientific data represented by the graphs. Graph
data has been growing significantly in various commercial and scientific applications. This has led to increasing research in studying various patterns within
graphs. For instance, frequent pattern mining in drug development data reveals
substructures in chemical compounds that are medically effective and thus gives
a better understanding about the data[13]. As we see, most of these applications depend on finding significant substructures in the forms of subgraphs. So
statistically significant sub-graphs is precisely the problem of extracting these
significant substructures from the graph data.
Considering all the existing work done on community detection and significant sub-graph mining, we explore the possibility of using statistically significant
sub-graph mining[14] to find communities in this graph. We have analysed results
of a few existing approaches and of our approach on several real world data-sets.
We also explored the potential labelling methods that can be used while finding
significant sub-graphs so as to use them to find potential communities.

Motivation

Like there is a wide variety of group organizations in the society, there has
been formation of online communities or virtual groups on the internet. For
a long time, studies have been done on social communities and these studies
have given enormous amount of insights into the behaviour of organizations or
structures in general. In several networked systems ranging from biology, politics,
economics, to, computer science and engineering etc there is existence of similar

Fig. 1. Interrelation Triangle

communities. For example, in protein-protein interaction networks, proteins that

do carry out the same function within the cell lie within a community. They mean
functional structures such as pathways and cycles in metabolic networks, they
correspond to groups of pages dealing with related topics in the graph of the
internet, in food webs they may represent compartments, and so on.
Finding communities or structured patterns in networks can have manifold
applications. In protein structure analysis, evolutionarily essential patterns of
interactions can be revealed by the existence of subgraphs that are conserved
in contact maps developed in proteins[11]. Clustering clients who have similar
interests and are near to each other may improve the performance of applications provided on the internet as each cluster of clients can then be served by
a separate server. In the purchase relationships networks between products and
consumers, finding groups of customers who have interests in similar domains
or product categories(like in the case of www.amazon.com) can enable setting up
of recommendation systems that are efficient and enhance business opportunities. In order to efficiently handle navigational queries and store the graph data,
clusters of large graphs can be used. In the ad hoc networks, devices move and
because of that the network rapidly changes. There is no central routing table
that specify how one node need to communicate to another node.

Previous Work

The problem of detecting communities in various type of graphs has been studied
in the field of computer science for quite a long time. Various algorithms have
been proposed using different techniques. Initially this problem was tried to solve
using graph partitioning. Most of the variants of these algorithms dont have
polynomial running time. However several algorithms give a better complexity
but their solutions are far from optimal (Pothen, 1997)[1]. One of the earliest
but still frequently used method in the field is the spectral bisection method
(Barness, 1982)[17]. There are several algorithms which use maximum flow in
graphs, like algorithm proposed by Goldberg and Tarjan (Goldbert and Tarjan
1988)[2], which takes O(n3 ). Another variant was proposed by Flake et al (Flake

et al, 2002)[3]. There are some hierarchical clustering algorithms. But those
algorithms are not scalable.
Some authors have proposed the extension of kmeans clustering (Schenker
et al.,2003[5]; Hlaoui and Wang,2004[4]). In partition clustering, the number
of clusters must be specified in the beginning, which is generally not known.
Shi and Malik proposed a method based upon unnormalized spectral clustering
(CVPR 97)[6] and Ng et al proposed normalized spectral clustering technique
to solve the problem of community detection[7].
Girvan and Newman proposed a divisive algorithm (Girvan and Newman,
2002; Newman and Girvan,2004)[8]. This method has historical importance because it opened the field of community detection to physicists. The complete
calculation requires a time O(n3 ) on a sparse graph. Pinney and Westhead extended the algorithm by Girvan and Newman to find overlapping communities
in graph[9].
Some algorithms were proposed to detect overlapping communities, where a
node can be a part of between more than one communities. The most popular
work in this field is the Clique Percolation Method (CPM) by Palla et al (Palla
et al., 2005)[10].

4
4.1

Our Work
Statistically Significant Subgraphs and Communities

If a node lies at an abnormal distance from other nodes in a random sample taken
from a population, it is called an outlier. If we choose the labelling method as the
degree of each node, then the outliers are the vertices with least and the largest
degrees. The node labels follow a normal distribution after mean adjustment and
normalization by standard deviation.
In general, in a graph, we have more concentration of vertices with small
degrees than vertices with large degrees. So, with respect to standard normal
distribution, the vertex with the largest degree is an outlier. Contiguous region
of the outliers with respect to the null hypothesis is defined as statistically significant[14].
There is no hard and fast rule or lemma establishing a direct relation between
Statistically significant subgraphs and Dense Subgraphs. But intuitively, it can
be seen that the statistically significant subgraphs obtained can be close to densest subgraphs. As we have described, the vertices with extreme degrees (outliers)
would be there in the statistically significant subgraphs. The same vertices, the
ones with the largest degrees are expected to be in the densest subgraph. We
apply the same unfolding logic to see that the statistically significant and the
densest subgraphs we obtain are highly likely to be close.
Kumar et al [15] defined the web communities as dense bipartite subgraphs.
Their hypothesis suggested that any topically focused community on the web will
most likely contain a dense bipartite subgraph and that almost every occurrence
of the same corresponds to a web community. This gives enough reasons to try
to use statistically significant subgraphs to find communities.

To compare the different algorithms, we have used Jaccards Index and F

Score. Jaccard Index for two sets A and B is given by
|A B|/|A B|
Similarly, F Score for ground-truth G and result R is given by harmonic mean
of precision and recall. Precision is given by
|G R|/|R|
Recall is given by
|G R|/|G|

4.2

Importance of Labelling

In most applications, the nature of the graph is such that every node of the
graph has to be assigned a label[14]. For example, in a communication network,
labelling schemes are used to represent the amount of traffic at a particular node.
Similarly in biological networks, several biochemical entities like genes, molecules
etc. are represented by labels on the nodes. Like these, there are many other
applications where labelling is important including coding theory, circuit design,
x-ray crystallography, database management, radar, astronomy and many more.

4.3

Labelling Methods

The algorithm we used[14], uses discrete and continuous labelling of node to find
statistically significant subgraphs. We employed different labelling methods as
mentioned below:
Degree of the node as discrete label - Degree of each node in the graph is
taken and then quantized to a maximum of threshold number of labelling
values.
Page Rank of the node as discrete label - Page rank of each node
is calculated and then quantized to a maximum of threshold number of
labelling values.
Degree of the node as continuous label - Degree of each node in the
graph is taken and then standard deviation normalization is applied after
mean adjustment to give distribution with mean of zero and a standard
deviation of one.
Page Rank of the node as continuous label - Page rank of each node is
calculated and then standard deviation normalization is applied after mean
adjustment to give distribution with mean of zero and a standard deviation
of one.

4.4

Description of Work

There already exist an O(n3 log n) time algorithm(Goldbergs Algorithm) to find

the densest subgraph in a graph. So, using our hypothesis, we cant expect to get
better results that the same. There are, however, approximate algorithms to find
densest subgraphs with better time complexity but they dont yield such good
results. The code for statistically significant subgraph gives results in linear time
and can be run for larger graphs in practical scenarios. So, we ran 2-approximate
algorithm[16] to find dense subgraphs and compared it with the results of the
other algorithm. We ran all the algorithms on Facebook, DBLP and Amazon
datasets.
Null hypothesis plays a very important role in finding statistically significant
subgraphs when the labels of the vertices are discrete. We experimented with
various null hypotheses on Facebook and DBLP datasets, and observed how
results varied. So coming with a hypothesis which will work for all kind of graphs
is not possible, as null hypothesis essentially depend on what type of graph you
are working and how you define your community. So we moved on to testing
with continuous labels of graph.

Datasets

DBLP Collaboration Network:

(http://snap.stanford.edu/data/com-DBLP.html):
The DBLP computer science bibliography provides a comprehensive list of
research papers in computer science. The dataset is a co-authorship network
in which if two authors have published at least one paper together, they
are connected. Individual ground truth communities are formed on the basis
publication journal or conference etc. The dataset regards each connected
component in a group as a community. Overall, the dataset has 317080 nodes
and 1049866 edges and the ground truth set has top 5000 communities.
Facebook user interaction data:
(https://sites.google.com/site/himeldevportfolio/codes)
The dataset contains 2421 interactions from 221 Facebook users. In this
dataset, a unique id is given to each user ranging from 1 to 221. Number
of common page likes, number of wall posts etc. are the interaction types
considered. Ground truth communities are given on the basis of these interactions as occurring in the real world.
Web data - Amazon product co-purchasing network:
(http://snap.stanford.edu/data/com-Amazon.html)
This dataset uses Customers Who Bought This Item Also Bought feature
of the Amazon website, as in, if a product i is frequently purchased along
with product j, then the graph contains an undirected edge from i to j.
Ground truth communities are provided as each of the product categories.
A connected component in a product category is given as a ground truth
community. Overall, the dataset has 334863 nodes and 925872 edges.

Results

As mentioned in the last section, we have primarily used three datasets for
testing and experimentation. On these three datasets, we have tried to compare
results of the following approaches in various scenarios:

Statistically significant subgraph using discrete labeling

Statistically significant subgraph using continuous labeling
Densest Subgraph using Goldbergs Algorithm
2Approximation Algorithm for Densest Subgraph

We have tried to compare the algorithm we have worked on with the densest
subgraph algorithm. For this we check how close the results of these algorithms
are as compared to the ground truth communities available to us. For measuring this closeness, we use the F-measure using precision recall and the Jaccard
similarity coefficient.
6.1

Closeness of results to Groundtruth Communities

Jaccard coef- Discrete

ficient
Labeling
Facebook
0.1290
Amazon
0.896
DBLP
0.275

Continuous Label- Continuous Label- 2Approx. Dens- Densest

ing - Degree
ing - Pagerank
est subgraph
graph
0.4059
0.3535
0.762
0.78125
0.1774
0.1531
NA
NA
0.536
0.868
NA
NA

sub-

Table 1. Closeness of results to Groundtruth Communities: Jaccard Index (NA: Not

Applicable as Communities defined as the Densest Subgraph)

F-score
Facebook
Amazon
DBLP

Discrete
Labeling
0.2285
0.1645
0.536

Continuous Label- Continuous Label- 2Approx. Dens- Densest

ing via Degree
ing via Pagerank est subgraph
graph
0.3684
0.3889
0.865
0.8772
0.3173
0.3215
NA
NA
0.1734
0.2252
NA
NA

Table 2. Closeness of results to Groundtruth Communities: F-Score (NA: Not Applicable as Communities defined as the Densest Subgraph)

As is visible from the adjoining graph and the results table, the densest
subgraph is most similar to the ground truth communities in the case of Facebook
dataset. In the other two datasets, on experimentation, we found that their
communities are actually same as densest subgraphs.

sub-

Jaccard coefficient

Discrete Labeling

Facebook
Amazon
DBLP

0.409
0.896
0.275

Continuous Labeling Continuous Labeling

- Degree
- Pagerank
0.1596
0.5420
0.2994
0.6427
0.1148
0.2352

Table 3. Closeness of results to Densest Communities: Jaccard Index

F-score

Discrete Labeling

Facebook
Amazon
DBLP

0.786
0.1645
0.536

Continuous Labeling Continuous Labeling

- Degree
- Pagerank
0.2727
0.7030
0.4832
0.7916
0.1764
0.2741

Table 4. Closeness of results to Densest Communities: F-Score

6.2

Closeness of Results to Densest Communities

As is visible from the adjoining graph and the results table, the results of our
algorithm with pagerank as continuous labels is the closest to densest subgraph
as compared to degree as continuous labels or with discrete labelling. In all
the cases, the recall was very high but F score was still low because of large
communities.

One of the primary tasks of our project was to find out upto what extent
statistically significant subgraphs and the ground truth communities are related

with each other. When we started experimenting with different datasets and
started analyzing the results, one of our observations was that the algorithm
for finding statistically significant subgraphs had a high recall but low precision.
On further analysis, we found out that the reason for this was the large size of
communities given by the algorithm. Due to the large size, we were including all
relevant nodes of the communities(high recall) but we were also including many
irrelevant nodes along with the relevant ones(low precision).
The algorithm for finding statistically significant subgraph [14] first converts
the given graph into a much smaller graph and then find the most statistically
significant subgraph on this smaller graph. This process of reducing the size of
the graph takes place in two steps. In first step it creates a supergraph from the
given graph and then shrink this super graph till the number of nodes are less
than a threshold. This threshold will determine how much time the algorithm
will take to find the most significant subgraph. Higher the threshold, higher will
be the size of reduced graph, more will be the time taken to find most statistically
significant subgraph. In order to report the community we had to convert these
results (of reduced graph) back to the original graph. In case of facebook dataset
the original graph has 221 nodes, the super graph has 70 nodes and the reduced
graph has 20 nodes. On this 20 nodes graph we found the subgraph of 4 nodes,
which was most significant among all the subgraphs. When we map these 4 nodes
to original graph we get a community of 68 nodes. When the same algorithm is
applied on very large dataset (DBLP : 317080 nodes, Amazon : 334863), number
of nodes in the resulting communities were very high (around 30, 000), which is
very high compare to the size of ground truth communities. So one thing we
conclude here is that this algorithm, in its current form can not give useful
results for very large datasets.

To see the performance of the part of the algorithm that involves creation of
the super-graph, we created a synthetic graph dataset: we created ten dense
graphs and connected them with small number of edges. After running the algorithm for creating super-graph, we found that about 17 out of the 19 supernodes
that comprised of more than one node had the property of having all constituent
nodes from the same community. But, the number of constituent nodes in the
super nodes, as found on further observation was very small. This suggested
that more work can be done on improving the super graph creation algorithm
too. Ideally, the super graph would have been expected to have ten super-nodes,
each containing the vertices of one of the dense sub-graphs which could then
correspond to the ten communities.

Conclusions

We tried to eliminate the reduction part of the algorithm from the process by
cross-checking the contents of the supernodes against communities. In this, if
all the content nodes of a supernode were a part of the same community, that
would imply that there is no essential information loss from the original graph
to the super graph. But apparently as we found out, the number of supernodes with this property of having all content nodes as members of the same
community was also not very high. In the facebook dataset for instance, approximately 9 out of 19 supernodes satisfied this property. This now led us to
conclude two things:- either the labeling methods we used need to be improved
or the algorithm for super graph creation needs some changes.

Future Works

In statistical significant subgraph mining, labelling techniques play a very important role. We tried degree and pagerank as labels. After observing the possible
issues with these labels, more work can be done for devising new methods for
labelling that can capture the essence of community.
The algorithm for finding significant subgraph requires the graph to be
shrinked to a small number of nodes, as the time taken for large graphs is very
high. If we can do some improvement in this algorithm then we can eliminate the
shrinking segment which led to creation of large-size communities thus making
the precision low and recall very high.
The supergraph creation algorithm can be further improved so that it captures the essence of communities in a better way, thus leading to formation of
a smaller number of supernodes as well as ensure that constituent nodes of a
supernode all belong to the same ground truth community.

References
1. Pothen, A., 1997, Graph Partitioning Algorithms with Applications to Scientific
Computing, Technical Report, Norfolk, VA, USA
2. Goldberg, A. V., and R. E. Tarjan, 1988, Journal of the ACM 35, 921.
3. Flake, G. W., S. Lawrence, C. Lee Giles, and F. M. Coetzee, 2002, IEEE Computer
35, 66.
4. Hlaoui, A., and S. Wang, 2004, in Neural Networks and Computational Intelligence,
pp. 158-163
5. Schenker, A., M. Last, H. Bunke, and A. Kandel, 2003, in IbPRIA03:
6. Shi, J., and J. Malik, 1997, in CVPR 97: Proceedings of the 1997 Conference on
Computer Vision and Pattern Recognition
7. Ng, A. Y., M. I. Jordan, and Y. Weiss, 2001, in Advances in Neural Information
Processing Systems
8. Girvan, M., and M. E. J. Newman, 2002, Proc. Natl. Acad. Sci. USA 99 (12), 7821
9. Pinney, J. W., and D. R. Westhead, 2006, in Interdisciplinary Statistics and Bioinformatics (Leeds University Press, Leeds, UK), pp. 87-90
10. Palla, G., I. Der enyi, I. Farkas, and T. Vicsek, 2005, Nature 435, 814.
11. J. Hu, X. Shen, Y. Shao, C. Bystroff, and M. J. Zaki. Mining protein contact
maps. In BIOKDD, 2002.
12. R. Sharan, S. Suthram, R. M. Kelley, T. Kuhn, S. McCuine, P. Uetz, T. Sittler,
R. M. Karp, and T. Ideker. Conserved patterns of protein interaction in multiple
species. In Proc Natl Acad Sci, 2005.
13. S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in HIV data.
In KDD, 2001
14. A. Arora, M. Sachan, A. Bhattacharya, Mining Statistically Signicant Connected
Subgraphs in Vertex Labeled Graphs
15. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, Trawling the web
for emerging cyber-communities http://www8.org/w8-papers/4a-search-mining/
trawling/trawling.html
16. M. Charikar, Greedy Approximation Algorithms for Finding Dense Components
in a Graph http://link.springer.com/chapter/10.1007/3-540-44436-X_10
17. Barnes, E. R., 1982, SIAM J. Alg. Discr. Meth.3,541

Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Finding Community Structure in Very Large Networks
No ratings yet
Finding Community Structure in Very Large Networks
6 pages
Finding Community Structure in Very Large Networks
No ratings yet
Finding Community Structure in Very Large Networks
6 pages
E-Communities -Part1
No ratings yet
E-Communities -Part1
80 pages
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
No ratings yet
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
10 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
Community Detection and Evaluation
No ratings yet
Community Detection and Evaluation
46 pages
Physica A: Bilal Saoud, Abdelouahab Moussaoui
No ratings yet
Physica A: Bilal Saoud, Abdelouahab Moussaoui
9 pages
Clauset Et Al - 2004 - Finding Community Structure in Very Large Networks
No ratings yet
Clauset Et Al - 2004 - Finding Community Structure in Very Large Networks
6 pages
04 Communities
No ratings yet
04 Communities
78 pages
SNA-Community Detection
No ratings yet
SNA-Community Detection
38 pages
Community Moore
No ratings yet
Community Moore
6 pages
FALLSEM2018-19 - CSE3021 - ETH - SJT824 - VL2018191006149 - Reference Material I - Module3 - CommunityNetworks1
No ratings yet
FALLSEM2018-19 - CSE3021 - ETH - SJT824 - VL2018191006149 - Reference Material I - Module3 - CommunityNetworks1
98 pages
radicchi2004
No ratings yet
radicchi2004
6 pages
Graph Pattern Mining, Search and OLAP
No ratings yet
Graph Pattern Mining, Search and OLAP
14 pages
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
No ratings yet
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
4 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Weighted Graph Clustering For Community Detection of Large Social Networks
No ratings yet
Weighted Graph Clustering For Community Detection of Large Social Networks
10 pages
Sna Unit III
No ratings yet
Sna Unit III
10 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Community Detection
No ratings yet
Community Detection
5 pages
Sna It Unit3
No ratings yet
Sna It Unit3
19 pages
A Fast Algorithm For Finding Community Structure Based On Community Closeness
No ratings yet
A Fast Algorithm For Finding Community Structure Based On Community Closeness
4 pages
Module3 Communitynetworks
No ratings yet
Module3 Communitynetworks
102 pages
Network Centrality Measures in a Graph
No ratings yet
Network Centrality Measures in a Graph
16 pages
Efficient Mining of Large Maximal Bicliques
No ratings yet
Efficient Mining of Large Maximal Bicliques
12 pages
Community Detection: Statistical Inference Models: Anupama Chowdhary Satya Prakash Sharma
No ratings yet
Community Detection: Statistical Inference Models: Anupama Chowdhary Satya Prakash Sharma
6 pages
Grami-2014-Elseidy
No ratings yet
Grami-2014-Elseidy
12 pages
Module 3
No ratings yet
Module 3
36 pages
unit 4
No ratings yet
unit 4
78 pages
Efficient_Densest_Subgraphs_Discovery_in_Large_Dynamic_Graphs_by_Greedy_Approximation
No ratings yet
Efficient_Densest_Subgraphs_Discovery_in_Large_Dynamic_Graphs_by_Greedy_Approximation
11 pages
Engineering Parallel Algorithms For Community Detection in Massive Networks
No ratings yet
Engineering Parallel Algorithms For Community Detection in Massive Networks
16 pages
Unit 6 Mining Social Network Graph (1)
No ratings yet
Unit 6 Mining Social Network Graph (1)
9 pages
Literatue Review
No ratings yet
Literatue Review
13 pages
Training_2024 (1)
No ratings yet
Training_2024 (1)
20 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Detecting Community Structures in Signed Social Networks (An Automated Approach)
No ratings yet
Detecting Community Structures in Signed Social Networks (An Automated Approach)
6 pages
Algorithms 15 00020
No ratings yet
Algorithms 15 00020
18 pages
Tutorial On Spectral Clustering
No ratings yet
Tutorial On Spectral Clustering
26 pages
Louvain-Like Methods For Community Detection in Multi-Layer Networks
No ratings yet
Louvain-Like Methods For Community Detection in Multi-Layer Networks
16 pages
Community Detection in Social Network Ver4
No ratings yet
Community Detection in Social Network Ver4
23 pages
A Comparative Study of Frequent Subgraph Mining Algorithms
No ratings yet
A Comparative Study of Frequent Subgraph Mining Algorithms
17 pages
s12530-018-9244-x
No ratings yet
s12530-018-9244-x
11 pages
3 Community Detection Methods and Mining
No ratings yet
3 Community Detection Methods and Mining
3 pages
COMPLEX_NETWORKS_2017_paper_190
No ratings yet
COMPLEX_NETWORKS_2017_paper_190
13 pages
Nover PDF
No ratings yet
Nover PDF
109 pages
A Comprehensive Survey On Community Detection Methods and Applications in Complex Information Networks
No ratings yet
A Comprehensive Survey On Community Detection Methods and Applications in Complex Information Networks
47 pages
Extraction and Classification of Dense Communities in The Web
No ratings yet
Extraction and Classification of Dense Communities in The Web
10 pages
Gionis
No ratings yet
Gionis
191 pages
CS1
No ratings yet
CS1
26 pages
SNS Unit Iii
No ratings yet
SNS Unit Iii
21 pages
Community Detection
No ratings yet
Community Detection
72 pages
PPT10-W10-Graph Analytics For Big Data
No ratings yet
PPT10-W10-Graph Analytics For Big Data
55 pages
Luxburg07 Tutorial 4488
No ratings yet
Luxburg07 Tutorial 4488
32 pages
16-EJS1206
No ratings yet
16-EJS1206
26 pages
1 s2.0 S0169023X1830079X Main
No ratings yet
1 s2.0 S0169023X1830079X Main
15 pages
I-Introduction To Network Theory: Basic Concepts
No ratings yet
I-Introduction To Network Theory: Basic Concepts
66 pages
Oriented Gradients Histogram: Unveiling the Visual Realm: Exploring Oriented Gradients Histogram in Computer Vision
From Everand
Oriented Gradients Histogram: Unveiling the Visual Realm: Exploring Oriented Gradients Histogram in Computer Vision
Fouad Sabry
No ratings yet
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
ssrn-3904792
No ratings yet
ssrn-3904792
8 pages
ITCKakatiya KNK Food
No ratings yet
ITCKakatiya KNK Food
8 pages
Generative Machine Learning For Multivariate Equity Returns: Ruslan Tepelyan Achintya Gopal
No ratings yet
Generative Machine Learning For Multivariate Equity Returns: Ruslan Tepelyan Achintya Gopal
13 pages
The Grue Paradox: References
No ratings yet
The Grue Paradox: References
1 page
Multiply Two Numbers: I I I I+1
No ratings yet
Multiply Two Numbers: I I I I+1
3 pages
Final
No ratings yet
Final
12 pages
850 Implementation Guide
No ratings yet
850 Implementation Guide
36 pages
UCSP Module 7
No ratings yet
UCSP Module 7
7 pages
Process 2 Enbal Inc
No ratings yet
Process 2 Enbal Inc
43 pages
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
No ratings yet
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
8 pages
Mechanical: Operations
No ratings yet
Mechanical: Operations
222 pages
FEMA FilterManual 2011
100% (1)
FEMA FilterManual 2011
362 pages
Method Statement
0% (1)
Method Statement
2 pages
Saussure's Contributions To Linguistic
0% (2)
Saussure's Contributions To Linguistic
10 pages
RVDTDtS المعدات
No ratings yet
RVDTDtS المعدات
5 pages
EAPP Quarter 2-WPS Office
No ratings yet
EAPP Quarter 2-WPS Office
10 pages
Institute Summer Technical Activities 2022-23
No ratings yet
Institute Summer Technical Activities 2022-23
44 pages
Quest Bank Geo 8th
No ratings yet
Quest Bank Geo 8th
19 pages
Training & Industrial Safety Division, Ses & National Centre For Non-Destructive Testing (NCNDT)
No ratings yet
Training & Industrial Safety Division, Ses & National Centre For Non-Destructive Testing (NCNDT)
2 pages
UNIX环境高级编程
No ratings yet
UNIX环境高级编程
547 pages
P O R T F O L I O: Manthan Manohar
No ratings yet
P O R T F O L I O: Manthan Manohar
29 pages
Personal Development: Quarter 1 - Module 1: Knowing and Understanding Oneself During Middle and Late Adolescence
100% (1)
Personal Development: Quarter 1 - Module 1: Knowing and Understanding Oneself During Middle and Late Adolescence
9 pages
Balancing Costs and Guest Satisfaction With Integrated Energy Management Systems in Hotels
No ratings yet
Balancing Costs and Guest Satisfaction With Integrated Energy Management Systems in Hotels
16 pages
Programming and Data Structure Short Notes-1
No ratings yet
Programming and Data Structure Short Notes-1
42 pages
Ford Pinto and Utilitarian Ethics
No ratings yet
Ford Pinto and Utilitarian Ethics
3 pages
Pharmacy Technician (Category B) Diploma
50% (2)
Pharmacy Technician (Category B) Diploma
2 pages
M3000
No ratings yet
M3000
4 pages
Admit Card: May 26, 2024 (SUNDAY)
No ratings yet
Admit Card: May 26, 2024 (SUNDAY)
2 pages
BTC Mining. Best Hack Script
No ratings yet
BTC Mining. Best Hack Script
4 pages
resiliencescaleconstruction
No ratings yet
resiliencescaleconstruction
17 pages
Evaluation of The Paper of Molao
No ratings yet
Evaluation of The Paper of Molao
1 page
Swelling Properties
No ratings yet
Swelling Properties
6 pages
CH10 Wood
No ratings yet
CH10 Wood
51 pages
Bulkbinds in Oracle PLSQL
No ratings yet
Bulkbinds in Oracle PLSQL
113 pages
Chem Mid 1
No ratings yet
Chem Mid 1
4 pages
Polymers 14 04777 v2
No ratings yet
Polymers 14 04777 v2
15 pages

Community Detection Using Statistically Significant Subgraph Mining

Uploaded by

Community Detection Using Statistically Significant Subgraph Mining

Uploaded by

Community Detection using Statistically

Significant Subgraph Mining

Abstract. In our project, we are trying to find the relation, if any,

Fig. 1. Interrelation Triangle

communities. For example, in protein-protein interaction networks, proteins that

To compare the different algorithms, we have used Jaccards Index and F

There already exist an O(n3 log n) time algorithm(Goldbergs Algorithm) to find

DBLP Collaboration Network:

Statistically significant subgraph using discrete labeling

Closeness of results to Groundtruth Communities

Jaccard coef- Discrete

Continuous Label- Continuous Label- 2Approx. Dens- Densest

Table 1. Closeness of results to Groundtruth Communities: Jaccard Index (NA: Not

Continuous Label- Continuous Label- 2Approx. Dens- Densest

Continuous Labeling Continuous Labeling

Table 3. Closeness of results to Densest Communities: Jaccard Index

Continuous Labeling Continuous Labeling

Table 4. Closeness of results to Densest Communities: F-Score

Closeness of Results to Densest Communities

You might also like