0% found this document useful (0 votes)
97 views

Complex Network Classification With Convolutional Neural Network

This document discusses classifying complex networks using a convolutional neural network (CNN). It proposes a new framework called the Complex Network Classifier (CNC) that first embeds networks into high-dimensional space using DeepWalk, then converts the embeddings into images that are input into a CNN for classification. The CNC approach is able to automatically extract features from networks and classify them with high accuracy and robustness. It is compared to baseline methods on benchmark datasets and shown to perform well on large-scale networks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Complex Network Classification With Convolutional Neural Network

This document discusses classifying complex networks using a convolutional neural network (CNN). It proposes a new framework called the Complex Network Classifier (CNC) that first embeds networks into high-dimensional space using DeepWalk, then converts the embeddings into images that are input into a CNN for classification. The CNC approach is able to automatically extract features from networks and classify them with high accuracy and robustness. It is compared to baseline methods on benchmark datasets and shown to perform well on large-scale networks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

TSINGHUA SCIENCE AND TECHNOLOGY

ISSNll1007-0214 01/10 pp447–457


DOI: 1 0 . 2 6 5 9 9 / T S T . 2 0 1 9 . 9 0 1 0 0 5 5
Volume 25, Number 4, August 2020

Complex Network Classification with Convolutional Neural Network

Ruyue Xin, Jiang Zhang , and Yitong Shao

Abstract: Classifying large-scale networks into several categories and distinguishing them according to their fine
structures is of great importance to several real-life applications. However, most studies on complex networks focus
on the properties of a single network and seldom on classification, clustering, and comparison between different
networks, in which the network is treated as a whole. Conventional methods can hardly be applied on networks
directly due to the non-Euclidean properties of data. In this paper, we propose a novel framework of Complex
Network Classifier (CNC) by integrating network embedding and convolutional neural network to tackle the problem
of network classification. By training the classifier on synthetic complex network data, we show CNC can not only
classify networks with high accuracy and robustness but can also extract the features of the networks automatically.
We also compare our CNC with baseline methods on benchmark datasets, which shows that our method performs
well on large-scale networks.

Key words: complex network; network classification; DeepWalk; Convolutional Neural Network (CNN)

1 Introduction relationships. The analysis of social network can be


used to identify key people in society or reveal the
A complex network is the highly simplified model
social circles of people. Therefore, studying complex
of a complex system, and it has been widely used
networks is crucial.
in many fields, such as sociology, economics, and
Most studies on complex networks focus on the
biology[1] . Given that complex networks can describe
properties of a single complex network[2] , such
the relationship between events, an increasing amount
as classification and clustering of nodes and link
of research is using complex networks to model
prediction, while paying little attention to comparisons,
problems. For example, we can use a network to model
classifications, and clustering between different
compounds in chemical research, in which nodes and
complex networks. However, complex networks
edges represent molecules and chemical bonds between
classification is necessary and important in the study
molecules. The compound network can be used to
of complex networks. For example, the social network
identify substances with the same pattern structure as
behind an online community impacts the development
the toxic compounds. Moreover, nowadays, more and
of the community because social ties between users can
more social data constitute large-scale social networks,
be treated as the backbone of the online community[3] .
in which nodes and links represent individuals and
Thereafter, online community can be diagnosed by
 Ruyue Xin and Jiang Zhang are with the School of Systems comparing and distinguishing their connected modes,
Science, Beijing Normal University, Beijing 100875, China. and the development of online communities can be
E-mail: [email protected]; [email protected]. predicted. As another example, we consider the product
 Yitong Shao is with the School of Mathematical Sciences, flows on the international trade network. We know
Beijing Normal University, Beijing 100875, China. E-mail: that correct classification of products not only helps
[email protected].
us understand the characteristics of products but also
 To whom correspondence should be addressed.
helps trade countries better count the trade volume
Manuscript received: 2019-01-18; revised: 2019-07-12;
accepted: 2019-09-09 of products. However, classifying and labeling each

@ The author(s) 2020. The articles published in this open access journal are distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
448 Tsinghua Science and Technology, August 2020, 25(4): 447–457

exchanged product in international trade is tedious and data. Earlier works, such as local linear embedding[7]
difficult. Conventional method classifies these products and IsoMAP[8] , first constructed graphs based on
according to the attributes of the product manually feature vectors. In the past decades, some shallow
(https://unstats.un.org/unsd/trade/sitcrev4.htm), which models such as DeepWalk[9] , node2Vec[10] , and
is subjective. However, if a trade network classifier LINE[11] were proposed, which can embed nodes into
is built, we can classify a new product exclusively high-dimensional space and empirically perform well.
according to its network structure, because previous However, these methods can only be applied on the
studies point out that different products have completely tasks (classification, community detection, and link
different structures of international trade networks. prediction) on nodes but not the whole networks.
In addition, complex network classification problems Some models, such as GNNs[12] , GGSNN[13] , and
can be applied in many practical areas, such as GCN[14] , use deep learning techniques to deal with
predicting a country’s economic development based network data and learn representations of networks.
on industrial networks or predicting a company’s Nevertheless, most methods also focus on the tasks
performance based on its interactive structure, which at the node level but not the graph level. Another
all can be converted to a complex network classification limitation of these techniques is the requirement for a
task. fixed network structure. In this paper, we proposed a
The complex network classification task refers to new method called Complex Network Classifier (CNC)
the use of an established network classification model to address the complex network classification problem
to continuously learn the training data, which is by combining network embedding and CNN. We first
a labeled network dataset, to find the connection embed a network into high-dimensional space through
between the label and the structure of the network the DeepWalk algorithm, which preserves the local
and finally use the learned model to classify the structures of the network and converts it into a two-
test data, which is a labeled unknown network dimensional image. Then, we input the image into a
dataset. Thus, the complex network classification CNN for classification. Our model framework has the
problem is supervised learning similar to traditional merits of small size, small computational complexity,
classification algorithms. However, because complex scalability to different network sizes, and automatic
networks mainly contain unstructured data, such as feature extraction.
network structure, traditional classification algorithms The rest of this paper is organized as follows. Section
cannot be directly applied. Therefore, new complex 2 introduces the related research. Section 3 presents
network classification algorithms need to be found. the model framework and experiments data. Section 4
At present, deep learning technology has achieved shows the experiments and results. Section 5 gives the
state-of-the-art results in the processing of Euclidean discussion and conclusion.
data. For example, Convolutional Neural Network
(CNN)[4] can process image data, and Recurrent Neural 2 Related Work
Network (RNN)[5] can be used in natural language
2.1 Complex network
processing. However, deep learning technology is still
under development for graph structure data, such Complex network focuses on the structure of
as social networks, international trade networks, and individuals’ interrelation in systems and is a way
protein structure data. to understand the nature and function of complex
As for complex network classification problem, some systems. Studies of complex networks started from
related research mainly studied graph structure data in regular networks, such as Euclidean grid or nearest-
the past. For example, kernel methods were proposed neighbor network in the two-dimensional plane[15] . In
to calculate the similarity between two graphs[6] . 1959, Gilbert[16] proposed random network theory.
However, the methods can hardly be applied to large- In 1998, Watts and Strogatz[17] and Barabási and
scale and complex networks due to the expensive Albert[18] proposed small-world and scale-free network
computational complexity of these graph classification models, respectively, which depict real-life networks
methods. better. Researchers have summarized the classic
Network representation learning, which is developed complex network model, which includes regular
recently, is an important way to study graph structure networks, random networks, small-world networks, and
Ruyue Xin et al.: Complex Network Classification with Convolutional Neural Network 449

scale-free networks, and proposed network properties, specific structure cannot be directly applied to a graph
such as average path length, aggregation coefficient, with a different structure. We know that a complex
and degree distribution. Recent studies mainly focus on network classification problem often includes many
network reconstruction and network synchronization, samples and each sample has one specific network
and few studies focus on the classification of complex structure, so we cannot directly use GCN to classify
networks. networks.
2.2 Network classification 2.4 Network representation learning
Classification of network data has important Representation learning has been an important topic
applications, such as protein-protein interaction, in machine learning for a long time, and many works
predicting the functionality of chemical compounds, aim at learning representations for samples. Recent
diagnosing communities, and classifying product advances in deep neural networks have indicated their
trading networks. In the network classification problem, powerful representation abilities and that they can
we are given a set of networks with labels, and the generate useful representations for many types of
goal is to predict the label of a new set of unlabeled data. Network representation learning is an important
networks. The kernel methods developed in previous way to preserve structure and extract features of a
research are based on the comparison of two networks network through network embedding, which maps
and similarity calculation. The most common graph nodes into a high-dimensional vector space based on
kernels are random walk kernels[19] , shortest-path graph structure. The vector representations of network
kernels[20] , graphlet kernels[21] , and Weisfeiler-Lehman nodes can be used for classification and clustering
graph kernels[22] . However, the main problem of graph tasks. Some shallow models were previously proposed
kernels is that they can hardly be used on large-scale for network representation learning. DeepWalk[9]
and complex networks because of the expensive combined random walk and skip-gram to learn
calculation complexity. network representations. LINE[11] designed two loss
functions attempting to capture the local and global
2.3 Deep learning on graph structure data
network structure. Node2Vec[10] improved DeepWalk
CNN is the most successful model in the field and proposed a two-order random walk to balance the
of image processing. It has achieved good results Depth First Search (DFS) and Breath First Search (BFS)
in image classification[4] , recognition[23] , semantic search. Reference [29] proposed an approach based on
segmentation[24] , and machine translation[25] , and the open-flow network model to reveal the underlying
can independently learn and extract features of flow structure and its hidden metric space of different
images. However, it can be applied only on regular random walk strategies on networks.
data, such as images with a fixed size. As for The most important contribution of network
graph structure data, researchers have recently been representation learning is that it can extract network
searching for effective and efficient deep learning features that provide a way to process network data.
methods with deep learning methods. For example, Thus, we consider using the features extracted by the
to apply the convolutional operation on graphs, Ref. embedding methods to solve the network classification
[26] proposed to perform the convolution operation problem. We recognize DeepWalk as a classic and
on the Fourier domain by computing the graph simple model that can represent the network structure
decomposition of the Laplacian matrix. Furthermore, and has high efficiency when dealing with large-scale
Ref. [27] introduced a parameterization of the spectral networks. Moreover, the random walk process in
filters. Reference [14] proposed an approximation of DeepWalk, which obtains the sequences of networks,
the spectral filter by Chebyshev expansion of the is adaptable to different networks. For example, we
graph Laplacian. Reference [28] simplified the previous can easily change the random walk mechanism for
method by restricting the filters to operate in a one- the international trade network, which is directed
step neighborhood around each node. However, among and weighted. Therefore, we combine the network
all the aforementioned spectral approaches, the learned representation learning and deep learning method to
filters based on the Laplacian eigenbasis are dependent develop our model, which can perform well in the
on the graph structure. Thus, a model trained on a complex network classification task.
450 Tsinghua Science and Technology, August 2020, 25(4): 447–457

3 Methods of Network Classification the main information of the network structure. For
example, the karate network is shown in Fig. 1a, and
3.1 Model
its two-dimensional representation after the DeepWalk
Our strategy to classify complex networks is to embedding and PCA reducing dimension is presented
convert networks into images and use the standard in Fig. 1b. Figure 1b shows that nodes of the same
CNN model to perform the network classification color are clustered, thereby indicating that the network
task. Given the development of network representation structure information is actually well preserved after
techniques, many algorithms can be used to embed embedding and reducing the dimension, thus being
the network into a high-dimensional Euclidean space. suitable for the network classification task. However,
We select DeepWalk algorithm[9] , which was proposed the set of nodes is a point cloud that is still irregular
by Bryan Perozzi et al. to obtain the network and cannot be processed by CNN. Thus, we rasterize
representation. The algorithm generates numeric node the two-dimensional representation into an image. We
sequences by performing large-scale random walks divide all the areas covered by the 2-dimensional scatter
on the network. Afterwards, the sequences are fed plot into a square area with 4848 grids, and then count
into the skip-gram and negative sampling algorithms the number of nodes in each grid as the pixel grayscale.
to obtain the Euclidean coordinate representation Afterwards, a standard gray scale image is obtained. We
of each node. To increase the number of training actually change the size of the grid based on the network
samples, we can perform data augmentation by size during our experiments, reducing the size for small
performing the DeepWalk algorithm on a single networks. This method can also be applied on directed
network several times to obtain more sets of node and weighted networks, such as international trade
representations. Obviously, high-dimensional space networks. By adjusting the probabilities according to
representation is hard to process. Thus, we use the the weight and direction of each edge for a random walk
Principal Components Analysis (PCA) algorithm to on a network, we can obtain an embedding image.
reduce the dimension of node representations into The final step is to feed the networks’ images
two-dimensional space. PCA can simplify information, into a CNN classifier to complete the classification
remove redundant information and noise, and retain task. Our CNN architecture includes two convolutional
the principle components of data[30] . For the high- layers (one convolutional operation and one max-
dimensional space representation of network, PCA pooling operation) and one fully connected layer and
can not only reduce the dimension but also retain one output layer. The whole architecture of our model

Fig. 1 Pipeline of the CNC algorithm.


Ruyue Xin et al.: Complex Network Classification with Convolutional Neural Network 451

is shown in Fig. 1. Figure 1a is the original input adopted as the empirical data to test our classifier. The
network. Figure 1b is the embedding of the network dataset is provided by the National Bureau of Economic
with DeepWalk algorithm. In DeepWalk algorithm, to Research (http://cid.econ.ucdavis.edu/nberus.html) and
obtain enough corpus, we set the number of walks to covers the trade volume between countries of more than
10 000 times and the sequence length to 10. We then 800 different kinds of products that are all encoded by
embed the network into a 20-dimensional space and SITC4 digits, a hierarchical classification system for
project it on two-dimensional space by using the PCA different products from 1962 to 2000. The international
algorithm. Figure 1c is the rasterized image from the trade network is a weighted directed network, in which
2D-embedding representation of the network. Figure 1d the weighted directed edges represent the volumes of
is the CNN architecture of the CNC algorithm, which trading flows between two countries. Thus, the random
includes one input image, two convolutional pooling walk in the DeepWalk algorithm should be based on the
layers, one fully connected layer, and one output layer. weights and directions of edges. We train the CNC to
The sizes of the convolutional filters and of the pooling distinguish food products and chemical products. Each
operation are 5  5 and 2  2, respectively. The first product class contains about 10 000 networks obtained
layer has three convolutional filters, the second layer by the products and the product combinations within the
has five convolutional filters, and the fully connected category.
layer has 50 units. In all complex network classification In addition, to test the efficiency of our model, we
experiments, we set the learning rate to 0.01 and the applied our framework to benchmark datasets from
size of a mini-batch is 100. The CNN architecture is bioinformatics to social networks (see Table 1 for
selected to minimize the computational complexity and summary statistics of these datasets).
retain the classification accuracy. The bioinformatics dataset NCI1[32] , made publicly
3.2 Experimental data available by the National Cancer Institute, is a subset
dataset of chemical compounds screened for their
A large amount of experimental data is needed to train ability to suppress or inhibit the growth of a panel of
and test the classifier. Thus, we use both synthetic human tumor cell lines.
networks generated by network models and empirical Social network datasets include scientific
networks to test our model. collaboration dataset and Reddit datasets (Reddit
3.2.1 Synthetic data is a popular content-aggregation, website: http://reddit.
The synthetic networks are generated by well-known com). The scientific collaboration dataset COLLAB
Barabási-Albert model (BA) and Watts-Strogatz model is derived from three public collaboration datasets[33] ,
(WS). According to the evolutionary mechanism of the which represents the three different research fields.
BA model, which iteratively adds m D 4 nodes and The networks of different researchers were generated
edges at each time, the added nodes will preferentially from each field, and each network was labeled as the
link to the existing nodes with higher degrees until field of the researcher. The task is to determine to
n D 1000 nodes are generated, and the average degree which field the collaboration network of a researcher
hEi of the generated network is about 8, which is belongs. REDDIT-BINARY (RE B) is a dataset where
close to the degree of real networks[31] . We then use each network corresponds to an online discussion
the WS model (n D 1000, the number of neighbors of thread where nodes correspond to users, and an edge
each node k D 8, and the probability of reconnecting exists between two nodes if at least one of them
edges p D 0:1) to generate a large number of small- responded to another’s comment. Top submissions
world networks with the same average degrees as in Table 1 Properties of the empirical data.
the BA model. We then mix the generated 5600 BA Number of Number of Number
Enriched
networks and WS networks. Then, we separate the set Size average average of
size
nodes edges classes
of networks into a training set (with 8000 networks), a
NCI1 4110 12 330 29.80 32.30 2
validation set (with 2000 networks), and a test set (with
COLLAB 5000 15 000 74.49 2457.78 3
1200 networks). RE B 2000 6000 429.61 497.75 2
3.2.2 Empirical data RE 5K 4999 14997 508.50 594.87 5
Product-specific international trade networks are RE 12K 11 929 35 787 391.40 456.89 12
452 Tsinghua Science and Technology, August 2020, 25(4): 447–457

from four popular subreddits were chosen and


divided into question/answer-based subreddits and Loss
Error
discussion-based subreddits. The task is to identify
whether a given network belongs to a question/answer-

Error rate
Loss
based community or a discussion-based community.
REDDIT-MULTI-5K (RE 5K) is a dataset from the five
different subreddits, and we simply label each graph
with their correspondent subreddit. REDDIT-MULTI-
12K (RE 12K) is a larger variant of RE 5K, consisting
of 12 different subreddits. The task in the two datasets Iteration
is to predict to which subreddit a given discussion (a)
network belongs.

4 Experiments and Results Loss


Error

We conduct a large number of network classification

Error rate
experiments, and the results are presented in this Loss

section. On the synthetic networks, we not only show


the classification results but also present how the CNC
can extract the features of networks and the robustness
of the classifier on network sizes. On the empirical
networks, we show the results that our CNC apply on
the trade flow networks, which are directed weighted Epoch

networks. To compare our method and other existing (b)


methods on graph classification, we adopt the empirical Fig. 2 Loss and validation error rate of the classification
networks listed in Table 1. task on (a) BA vs. WS models and (b) food vs. chemical
products.
4.1 Classification on synthetic networks
representations by the filters of the CNN, which are
4.1.1 BA and WS network classification visualized in Fig. 3. Figure 3a shows 2D representations
The first task is to apply CNC to distinguish BA network and rasterized images of a BA network (upper) and
and WS network. Although we know that the BA a WS network (bottom), Fig. 3b is the visualization
network is a scale-free network and WS network is a of the three filters of the first convolutional layer, and
small-world network with a high clustering coefficient, Fig. 3c is the visualization of the five filters (size of
the machine does not know this. Thus, this series 5  5  3) of the second convolutional layer. However,
of experiments shows the possibility that the CNC reading meaningful information is difficult because the
network can extract the key features to distinguish the network structure cannot correspond to the images. To
two networks. We generate 5600 BA networks with understand what the filters do, we need to combine the
n D 1000, m D 4 and 5600 WS networks with the same network structure and the feature map. Therefore, we
size (n D 1000, k D 8) and p D 0:1, respectively. We try to map the highlighted areas in the feature maps of
combine these networks to form the dataset, which each filter on the node set of the network. That is, we
is further randomly separated into training set (with wonder which parts of the networks and what kind of
8000 networks), validation set (with 2000 networks), local structures are activated by the first convolutional
and test set (with 1200 networks). Figure 2 shows layer filters. We compare the activation modes for the
the decay of the loss on the training set and the two model networks as input, and the results are shown
error rate of the validation set. Finally, we obtain in Fig. 4. By observing and comparing these figures,
0.1% of the average error rate on the test set. we find that the convolutional filters of the first layer
Thus, we can say the model can distinguish the BA have learned to extract the features of the network in
network and the WS network accurately. To understand different parts. As shown in Fig. 4, Filter 0 extracts
what has been learned by our CNC model, we can the local clusters with a medium density of nodes and
visualize the feature maps extracted from the network connections, Filter 1 tries to extract the local clusters
Ruyue Xin et al.: Complex Network Classification with Convolutional Neural Network 453
Active nodes: Filter 0 and links with neighbors Active nodes: Filter 1 and links with neighbors

(a) (b)
Active nodes: Filter 2 and links with neighbors Active nodes: Filter 0 and links with neighbors

(c) (d)
(a)
Active nodes: Filter 1 and links with neighbors Active nodes: Filter 2 and links with neighbors
conv1_filter_0 conv1_filter_1 conv1_filter_2

(b)
conv2_filter0_0 conv2_filter0_1 conv2_filter0_2
(e) (f)

Fig. 4 Active nodes by the three kernels in the first layer,


(a)–(c) BA network and (d)–(f) WS network.
dense nodes and connections (Filter 0) locate at the
central area of the network representation for both BA
conv2_filter1_0 conv2_filter1_1 conv2_filter1_2
network and WS network. The local structures with
sparse nodes and connections locate at the peripheral
area, which is close to the edges of the image for
the WS network but is in the central area for the BA
conv2_filter2_0 conv2_filter2_1 conv2_filter2_2
network. This combination of the activation modes on
feature maps can help the higher-level filters and fully
connected layer distinguish the two kinds of networks.
4.1.2 Small-world network classification
One may think that distinguishing the BA and WS
conv2_filter3_0 conv2_filter3_1 conv2_filter3_2
networks is trivial because they are two different
models. Our second experiment will consider whether
the classifier can distinguish networks generated by
different parameters of the same model, which is harder
conv2_filter4_0 conv2_filter4_1 conv2_filter4_2 than the previous task.
To verify the discriminant ability of the model in this
task, we use the WS model to generate a large number
of experimental networks by changing the value of edge
reconnection probability p from 0 to 1 at a 0:1 step,
(c) and then we mix the networks with two discriminant p
Fig. 3 Feature maps visualization. values, e.g., p D 0:1 and p D 0:6. Then, we train the
with sparse connections, and Filter 2 tries to extract CNC for networks and test their discriminant ability on
the local clusters with dense nodes and connections. the test sets.
By comparing the BA and WS model networks, we We systematically conduct this experiment for
can observe that the locations and the patterns of the any combination of the networks with each two
highlighted areas are different. The local areas with probabilities, and the results are shown in Fig. 5. The
454 Tsinghua Science and Technology, August 2020, 25(4): 447–457
Test error
n
n
n

Error rate
n
Fig. 5 Classification results of each two small-world (a)
networks with different p values. Test error
m
m
networks generated by p values are less than 0:3 and p m
values that are greater than or equal to 0:4 are easier
to distinguish. Interestingly, the error rate changed
Error rate
suddenly at p D 0:4. The classifier cannot distinguish
the two networks with p > 0:4. This phenomenon may
be due to the phase transition from the small-world
network to random network because the WS networks
with p > 0:5 may be treated as random networks.
4.2 Robustness on network sizes
m
Our model has good classification performances on (b)
both synthetic and empirical data. Next, we want to test Fig. 6 Dependence of the error rates on (a) the number
the robustness of the classification on different sizes of nodes and (b) the number of edges in the robustness
(numbers of nodes and edges). All the experiments experiments.
performed in classification experiments contain the them on different networks with the different m. At first,
model networks with identical numbers of nodes and we observe that the error rates are almost independent
edges. Nonetheless, a good classifier should extract on small fluctuations of the number of nodes. However,
the features that are independent in size. Therefore, the error rates increase as size differences increase in
we examine the robustness of the classifier on various the test data. This finding indicates that our classifiers
network sizes that are different from those of the are robust on the size of the networks.
training sets. In these experiments, we first apply the Nevertheless, sudden changes occur in the variants
trained classifier for BA and WS networks with n D on the number of edges, which indicates that the
1000 nodes and average degree hEi D 8, on new number of edges has larger impacts on the network
networks with different numbers of nodes and edges. structure. We observe that a sudden drop in error rates
We generate 600 mixed networks with parameters m occurs with an increase of m for the test set when
Œ1; 2; 3; : : : ; 16 for the BA model and k Œ2; 4; 6; : : : ; 32 m D 8 for the training set. We observed the increased
for the WS model as the test set, such that their network embedding and determined that this sudden
average degrees are similar. We systematically compare change is due to the emergence of multi-centers on the
how the numbers of nodes and edges on the test representation space for the BA model. Therefore, the
sets influence the error rates, as shown in Fig. 6, in number of links can change the overall structure in the
Fig. 6a, on the test set, we set n (number of nodes) = scale-free network, thereby causing our classifier to fail.
Œ500; 600; 700; : : : ; 1500, and we also retrain networks Another interesting phenomenon is that the error rates
for n D 800 and n D 1200, and test them on networks can remain small when the number of edges increases
with different n. In Fig. 6b, on the test set, we set m when m in the training set is set to 8. Therefore, the
(average number of edges) D Œ1; 2; 3; : : : ; 16, and we classifier training on the dense networks is more robust
also retrain networks for m D 6 and m D 8 and test on the variance on edge densities.
Ruyue Xin et al.: Complex Network Classification with Convolutional Neural Network 455

4.3 Classification on trade flow networks 4.4 Comparison with other methods
We want to verify the effectiveness of the model on We compare our model CNC with baseline methods,
empirical networks. We conduct a classification task on such as graph kernel methods and deep learning
international trade networks with the dataset obtained methods. Kernel methods mainly compute the
from the National Bureau of Economic Research similarity between two graphs. We chose Graphlet
(http://cid.econ.ucdavis.edu/nberus.html). These data Kernel[21] (GK) and Weisfeiler-Lehman kernel[22]
cover the trade volume and direction information (WL), which are two state-of-the-art graph kernels.
between countries of more than 800 different kinds of As for deep learning methods, we chose deep graph
products that are all encoded by SITC4 digits from kernels[6] , which achieve significant improvements
1962 to 2000. We select food and chemical products in classification accuracy over state-of-the-art graph
as two labels for this experiment, and their SITC4 kernels in some datasets. We also chose PATCHY-
codes start with 0 and 5, respectively. For example, SAN (PSCN, k D 10)[34] , which is the best performing
0371 is for prepared or preserved fish and 5146 is for graph CNN model. We applied our model to benchmark
oxygen-function amino-compounds. Figure 7 shows datasets and compared the classification accuracy of our
the two-dimensional representations of the 10 products model against the baseline methods (see Table 2). From
for the two categories. After preprocessing, the total the table, we can see that our framework performs
number of the food trade networks is 10 705 (including well on Reddit datasets, which are all large-scale
products and product combinations with SITC4 digits networks that have hundreds of nodes and edges. This
starting with 0) and the number for the chemical trade result implies that our model can learn large-scale
networks is 10 016 (including products and product networks well because it can extract more meaningful
combinations with SITC4 digits starting with 5). Then, information for these networks through DeepWalk.
we divide them into training set, validation set, and test However, our model cannot perform well on the NCI1
set according to the ratio of 9:1:1. During the training, and COLLAB datasets mainly because those networks
we adjust the network parameters to 15 convolutional are too small to produce density information on
filters in the first layer, 30 convolutional filters in the rasterized images. We also tried using t-SNE methods
second layer, and 300 units in the fully connected layer. to reduce the dimension of node representations. But
Figure 2b shows that the classification error rate can be the effect is worse than that of our original PCA
cut down to 5%. method because the nodes become more clustered

Whey Fish Flour,meal,flakes Molasses Cocoa powder

Derivatives of hydrocarbons Amino-compounds Carbon Synthetic organic tanning substances Glycosides

Fig. 7 Network representations of 10 selected products in food (upper) and chemicals (bottom).

Table 2 Comparison of classification accuracy (˙ standard deviation).


(%)
Dataset
Method
NCI1 COLLAB RE B RE 5K RE 12K
GK 62.28˙0.29 72.84 ˙ 0.28 77.34˙0.18 41.01˙0.17 31.82˙0.08
WL 80.22˙0.51 77.82˙1.45 78.52˙2.01 50.77˙2.02 34.57˙1.32
Deep GK 62.48˙0.25 73.09˙0.25 78.04˙0.39 41.27˙0.18 32.22˙0.10
PSCK, k D 10 70.00˙1.98 72.60˙2.15 86.30˙1.58 49.10˙0.70 41.32˙0.42
CNC tSNE 63.18˙3.35 63.46˙1.59 80.17˙2.66 46.15˙1.55 36.53˙0.97
CNC 63.11˙0.56 67.79˙2.34 86.72˙1.55 51.35˙3.02 41.44˙1.64
456 Tsinghua Science and Technology, August 2020, 25(4): 447–457

after the t-SNE algorithm is applied, thereby leaving no. 1, p. 47, 2002.
less local information when we map them into image [3] R. A. Hanneman and M. Riddle, Introduction to Social
Network Methods. Riverside, CA, USA: University of
representation. Thus, CNN tSNE has difficulty to learn
California, 2005.
the effective features. [4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet
classification with deep convolutional neural networks,
5 Conclusion and Discussion in Proc. of Advances in Neural Information Processing
In this paper, we propose a model that mainly Systems, Lake Tahoe, NV, USA, 2012, pp. 1097–1105.
[5] A. Graves, A. Mohamed, and G. Hinton, Speech
incorporates DeepWalk and CNN to solve the
recognition with deep recurrent neural networks, in Proc.
network classification problem. With DeepWalk, we of 2013 IEEE International Conference on Acoustics,
obtain an image for each network, and then we Speech and Signal Processing (ICASSP), Vancouver,
use CNN to complete the classification task. Our Canada, 2013, pp. 6645–6649.
method is independent on the number of network [6] P. Yanardag and S. V. N. Vishwanathan, Deep graph
kernels, in Proceedings of the 21th ACM SIGKDD
samples, which is a major limitation for the spectral
International Conference on Knowledge Discovery and
methods on graph classification. We validate our Data Mining, Sydney, Australia, 2015, pp. 1365–1374.
model through experiments with synthetic data and [7] S. T. Roweis and L. K. Saul, Nonlinear dimensionality
empirical data, which show that our model performs reduction by locally linear embedding, Science, vol. 290,
well in classification tasks. To further understand no. 5500, pp. 2323–2326, 2000.
[8] J. B. Tenenbaum, V. De Silva, and J. C. Langford, A
the network features extracted by our model, we global geometric framework for nonlinear dimensionality
visualize the filters in CNN and observe that CNN can reduction, Science, vol. 290, no. 5500, pp. 2319–2323,
capture the differences between WS and BA networks. 2000.
Furthermore, we test the robustness of our model [9] B. Perozzi, R. Al-Rfou, and S. Skiena, Deepwalk:
by setting different sizes for training and testing. We Online learning of social representations, in Proceedings
of the 20th ACM SIGKDD International Conference on
also compare our model with baseline methods, and
Knowledge Discovery and Data Mining, New York, NY,
the result shows that our model performs well on USA, 2014, pp. 701–710.
large-scale networks. The biggest advantage of our [10] A. Grover and J. Leskovec, node2vec: Scalable feature
model is that it can deal with networks with different learning for networks, in Proceedings of the 22nd
structures and sizes. In addition, our model has a small ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, San Francisco, CA, USA,
architecture and low computational complexity. Several
2016, pp. 855–864.
potential improvements and extensions to our model [11] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q
could be addressed in future work. For example, we Mei, Line: Large-scale information network embedding, in
can develop more methods to deal with the network Proceedings of the 24th International Conference on World
features in high-dimensional space. We also think that Wide Web, Florence, Italy, 2015, pp. 1067–1077.
[12] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and
our model can be applied to more classification and
G. Monfardini, The graph neural network model, IEEE
forecasting tasks in various fields. Finally, we believe Transactions on Neural Networks, vol. 20, no. 1, pp. 61–
that extending our model to more graph structure data 80, 2008.
would allow us to address a larger variety of problems. [13] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, Gated
graph sequence neural networks, arXiv preprint arXiv:
Acknowledgment 1511.05493, 2015.
[14] M. Defferrard, X. Bresson, and P. Vandergheynst,
The work was supported by the National Natural Science
Convolutional neural networks on graphs with fast
Foundation of China (No. 61673070) and Beijing Normal localized spectral filtering, in Proc. of Advances in Neural
University Interdisciplinary Project. The authors would Information Processing Systems, Barcelona, Spain, 2016,
like to thank the referees for their valuable comments and pp. 3844–3852.
helpful suggestions. [15] S. H. Strogatz, Exploring complex networks, Nature, vol.
410, no. 6825, p. 268, 2001.
References [16] E. N. Gilbert, Random graphs, The Annals of Mathematical
Statistics, vol. 30, no. 4, pp. 1141–1144, 1959.
[1] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of [17] D. J. Watts and S. H. Strogatz, Collective dynamics of
networks, Advances in Physics, vol. 51, no. 4, pp. 1079– small-worldnetworks, Nature, vol. 393, no. 6684, p. 440,
1187, 2002. 1998.
[2] R. Albert and A. L. Barabási, Statistical mechanics of [18] A. L. Barabási and R. Albert, Emergence of scaling in
complex networks, Reviews of Modern Physics, vol. 74, random networks, Science, vol. 286, no. 5439, pp. 509–
Ruyue Xin et al.: Complex Network Classification with Convolutional Neural Network 457

512, 1999. [27] M. Henaff, J. Bruna, and Y. LeCun, Deep convolutional


[19] H. Kashima, K. Tsuda, and A. Inokuchi, Marginalized networks on graph-structured data, arXiv preprint arXiv:
kernels between labeled graphs, in Proceedings of the 20th 1506.05163, 2015.
International Conference on Machine Learning (ICML- [28] T. N. Kipf and M. Welling, Semi-supervised classification
03), Atlanta, GA, USA, 2003, pp. 321–328. with graph convolutional networks, arXiv preprint, arXiv:
[20] K. M. Borgwardt and H. P. Kriegel, Shortest-path kernels 1609.02907, 2016.
on graphs, in Proceedings of the Fifth IEEE International [29] W. Gu, L. Gong, X. Lou, and J. Zhang, The hidden
Conference on Data Mining (ICDM’05), Washington, DC, flow structure and metric space of network embedding
USA, 2005, p. 8. algorithms based on random walks, Scientific Reports, vol.
[21] N. Shervashidze, S. V. N. Vishwanathan, T. H. Petri, 7, no. 1, p. 13114, 2017.
K. Mehlhorn, and K. M. Borgwardt, Efficient graphlet [30] L. I. Smith, A tutorial on principal components analysis,
kernels for large graph comparison, in Proc. of Artificial http://www.cs.otago.ac.nz/cosc453/student tutorials/prin-
Intelligence and Statistics, Clearwater Beach, FL, USA, cipal components.pdf, 2002.
2009, pp. 488–495. [31] X. F. Wang and G. Chen, Complex networks: Small-
[22] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. world, scale-free and beyond, IEEE Circuits and Systems
Mehlhorn, and K. M. Borgwardt, Weisfeiler-Lehman graph Magazine, vol. 3, no. 1, pp. 6–20, 2003.
kernels, Journal of Machine Learning Research, vol. 12, [32] N. Wale, I. A. Watson, and G. Karypis, Comparison of
no. Sep, pp. 2539–2561, 2011. descriptor spaces for chemical compound retrieval and
[23] K. Simonyan and A. Zisserman, Very deep convolutional classification, Knowledge and Information Systems, vol.
networks for large-scale image recognition, arXiv preprint 14, no. 3, pp. 347–375, 2008.
arXiv: 1409.1556, 2014. [33] J. Leskovec, J. Kleinberg, and C. Faloutsos, Graphs
[24] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional over time: Densification laws, shrinking diameters and
networks for semantic segmentation, in Proceedings of possible explanations, in Proceedings of the Eleventh
the IEEE Conference on Computer Vision and Pattern ACM SIGKDD International Conference on Knowledge
Recognition, Boston, MA, USA, 2015, pp. 3431–3440. Discovery in Data Mining, Chicago, IL, USA, 2005, pp.
[25] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, A 177–187.
convolutional neural network for modelling sentences, [34] M. Niepert, M. Ahmed, and K. Kutzkov, Learning
arXiv preprint arXiv: 1404.2188, 2014. convolutional neural networks for graphs, in Proceedings
[26] J. Bruna, W. Zaremba, A. Szlam, Y. LeCun, Spectral of International Conference on Machine Learning, New
networks and locally connected networks on graphs, arXiv York, NY, USA, 2016, pp. 2014–2023.
preprint arXiv: 1312.6203, 2013.

Ruyue Xin received the master degree Jiang Zhang received the PhD degree
from Beijing Normal University in 2019. from Beijing Jiaotong University, Beijing,
She is currently perusing the PhD degree
China in 2006. From 2006 to 2008, he
in the University of van Amsterdam. Her worked in China Academy of Sciences and
research interests include graph theory and finished his post doctoral research. From
graph representation learning. 2008 to 2012, he was an assistant professor
in Beijing Normal University. From 2012
to 2018, he was an associate professor in
Beijing Normal University. And now, he is a professor of the
Yitong Shao received the bachelor School of Systems Science in Beijing Normal University.
degree from Beijing Normal University Prof. Zhang has many publications in journals and
in 2019. He is a master student in Beijing conferences, such as Nature Communication, PLoS ONE,
Normal University now. His research Physica A, etc. His research interests include network
interests include Markov chains, especially reconstruction, discrete optimization problem on graph, and
the branching processes. enterprise financial distress analysis.

You might also like