0% found this document useful (0 votes)
73 views

Deep Anomaly Detection On Attributed Networks

Uploaded by

sahki h
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Deep Anomaly Detection On Attributed Networks

Uploaded by

sahki h
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Deep Anomaly Detection on Attributed Networks

Kaize Ding∗ Jundong Li∗ Rohit Bhanushali∗ Huan Liu∗

Abstract ciated with additional attribute information (e.g., elec-


Attributed networks are ubiquitous and form a critical com- tricity capacity); in gene regulatory networks, genes in-
ponent of modern information infrastructure, where addi- teract with each other to control specific cell functions in
tional node attributes complement the raw network struc-
ture in knowledge discovery. Recently, detecting anoma- addition to the rich gene sequence expressions. Studies
lous nodes on attributed networks has attracted an increas- from social science have shown that data often exhibits
ing amount of research attention, with broad applications correlation among the attributes of connected individ-
in various high-impact domains, such as cybersecurity, fi-
nance, and healthcare. Most of the existing attempts, how- uals [20, 29], and such insights are helpful in distilling
ever, tackle the problem with shallow learning mechanisms actionable knowledge from such networks.
by ego-network or community analysis, or through subspace Detecting anomalies from data (e.g., attribute-value
selection. Undoubtedly, these models cannot fully address
the computational challenges on attributed networks. For data, networked data) is a vital research problem of
example, they often suffer from the network sparsity and pressing societal concerns, with significant implications
data nonlinearity issues, and fail to capture the complex in many security-related applications, ranging from so-
interactions between different information modalities, thus
negatively impact the performance of anomaly detection. To cial spam detection, financial fraud detection to net-
tackle the aforementioned problems, in this paper, we study work intrusion detection [1]. Due to the strong model-
the anomaly detection problem on attributed networks by ing power of attributed networks in unifying informa-
developing a novel deep model. In particular, our proposed
deep model: (1) explicitly models the topological structure tion of different modalities, there is a surge of research
and nodal attributes seamlessly for node embedding learn- interests in detecting anomalous nodes whose patterns
ing with the prevalent graph convolutional network (GCN); deviate significantly from other majority nodes on at-
and (2) is customized to address the anomaly detection prob-
lem by virtue of deep autoencoder that leverages the learned tributed networks. Generally, the abnormality of nodes
embeddings to reconstruct the original data. The synergy on attributed networks is not only determined by their
between GCN and autoencoder enables us to spot anoma- mutual interactions with others (w.r.t.topological struc-
lies by measuring the reconstruction errors of nodes from
both the structure and the attribute perspectives. Exten- ture), but also is measured by their content dissonance
sive experiments on real-world attributed network datasets (w.r.t. nodal attributes).
demonstrate the efficacy of our proposed algorithm. Due to the prohibitive cost for accessing the ground
Keywords: Anomaly Detection; Attributed Networks;
Graph Convolutional Network; Deep Autoencoder truth anomalies, existing efforts are mostly unsuper-
vised. Among them, one family of methods study the
1 Introduction problem at the mesoscope with ego-network [24] or com-
munity analysis [10] and then identify anomalies by
Attributed networks provide a potent tool to handle the
measuring the abnormality of ego-networks or compar-
data heterogeneity that we are often confronted with
ing the current node with other nodes within the same
in vast amounts of information systems. Apart from
community. Another family of methods heavily rely on
traditional plain networks in which only node-to-node
subspace selection and attempt to find anomalies in a
interactions are observed, attributed networks also en-
node feature subspace [28, 27, 21, 25]. Recently, resid-
code a rich set of features for each node [2, 13, 18].
ual analysis has emerged as another way to find anoma-
They are increasingly used to model a wide range of
lous nodes [17, 23], where anomalies are defined as the
complex systems, such as social media networks, crit-
nodes that cannot be approximated from others. De-
ical infrastructure networks, and gene regulatory net-
spite their empirical success, the following challenges
works [2, 26]. For example, in social networks, users
remain for anomaly detection on attributed networks:
not only are connected with each other by performing
(1) Network sparsity - the network structure could be
various social activities but also are affiliated with rich
very sparse on real-world attributed networks; thus ego-
profile information; in critical infrastructure networks,
network or community analysis is difficult to perform as
different power stations form grids, and are also asso-
they highly depend on the observed node interactions.
(2) Data nonlinearity - the node interactions and nodal
∗ Computer Science and Engineering, Arizona State Uni- attributes are highly non-linear in nature while exist-
versity, Tempe, AZ, USA. {kding9, jundongl, rbhanush,
ing subspace selection based anomaly detectors mainly
huan.liu}@asu.edu

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
model the attributed networks with linear mechanisms. 2 Problem Definition
(3) Complex modality interactions - attributed networks Following the commonly used notations, in this paper,
are notoriously difficult to tackle due to the bewildering we use calligraphic fonts to denote sets (e.g., V), bold
combination of two information sources, which necessi- lowercase letters (e.g., x) to denote vectors and bold
tates a unified feature space to capture their complex uppercase letters for matrices (e.g., X). The ith row of
interactions for anomaly detection. a matrix X is denoted by xi and the (i, j)th element of
To address the challenges above, we propose to matrix X is denoted by Xi,j . Besides, we represent the
model the attributed networks with graph convolutional identity matrix as I, and the transpose of a matrix X is
network (GCN) [16]. GCN, which takes the topologi- represented as XT . The `2 -norm of a vector is denoted
cal structure and nodal attributes as input, is able to by ||·||2 . The Frobenius norm of a matrix is represented
learn discriminative node embeddings by stacking multi- by || · ||F . Accordingly, we define the attributed network
ple layers of linear units and non-linear activation func- as follows:
tions. Even though GCN emerges to be a principled tool
to model attributed networks and achieves the state-of- Definition 1. Attributed Networks: An attributed
the-art performance in the semi-supervised node classi- network G = (V, E, X) consists of: (1) the set of nodes
fication task, it remains unclear how its power can be V = {v1 , v2 , ..., vn }, where |V| = n; (2) the set of edges
shifted to the anomaly detection problem. To bridge E, where |E| = m; and (3) the node attributes X ∈
the gap, we propose a novel graph convolutional au- Rn×d , where the ith row vector xi ∈ Rd (i = 1, ..., n) is
toencoder framework called Dominant (Deep Anomaly the attribute1 information for the ith node.
Detection on Attributed Networks) to support anomaly
detection on attributed networks. Specifically, Domi- The topological structure of attributed network G can
nant first compresses the input attributed network to be represented by an adjacency matrix A, where Ai,j =
succinct low-dimensional embedding representations us- 1 if there is a link between node vi and node vj . Oth-
ing graph convolutional network as an encoder function; erwise, Ai,j = 0. We follow the setting of [17] to obtain
then we aim to reconstruct both the topological struc- the adjacency matrix A = max(A, AT ) for directed
ture and nodal attributes with corresponding decoder networks. To make the results more interpretable, we
functions. The reconstruction errors of nodes following formulate the task of anomaly detection on attributed
the encoder and decoder phases are then leveraged for networks as a ranking problem:
spotting anomalous nodes on attributed networks. The
Problem 1. Anomaly Ranking on Attributed
main contributions of this paper are as follow:
Networks: Given an attributed network G, with the ad-
• We systematically analyze the limitations of exist- jacency matrix A and attribute information matrix X
ing shallow anomaly detection methods and show of n node instances, the task is to rank all the nodes ac-
the significance of developing a novel deep architec- cording to the degree of abnormality, such that the nodes
tured anomaly detector on attributed networks. that differ singularly from the majority reference nodes
• We develop a principled graph convolutional au- should be ranked on high positions.
toencoder Dominant which seamlessly models the Next, we will introduce our proposed deep framework
attributed network and conducts anomaly detec- which models network topological structure and nodal
tion in a joint framework. In particular, the pro- attributes coherently for detecting anomalies on at-
posed model can spot anomalies by analyzing the tributed networks in details.
reconstruction errors of nodes from both the struc-
ture and the attribute perspectives. 3 The Proposed Model
• We evaluate our proposed model on various at- In this section, we present the proposed framework of
tributed networks from different domains. Empir- Dominant in details. The architecture of the deep
ical experimental results demonstrate the superior model is illustrated in Figure 1. As can be observed, the
performance of our proposed framework. fundamental building block of Dominant is the deep
The remaining of the paper is organized as follows. autoencoder [11] and it consists of three essential com-
We formally introduce the problem definition in Section ponents: (i) attributed network encoder - which mod-
2. In Section 3, we present the details of the proposed els network structure and nodal attributes seamlessly
deep anomaly detection model. Experimental evalua- in a joint framework for node embedding representa-
tions on multiple real-world datasets are shown in Sec- tion learning with GCN; (ii) structure reconstruction
tion 4. Section 5 reviews the related work and Section
6 concludes the whole paper. 1 In this paper, we use attribute and feature interchangeably.

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
Structure Reconstruction Decoder
Hidden Layer 1 Hidden Layer 2 Hidden Layer 3
!" !# !$ !% !& !'
!$!" !% !# !& !"
1 1 1 !#
𝝈(𝐙 ∗ 𝐙 𝐓 )
2 2 2
!$
!$!" !% !# !&
'( 4
3
4
3
4
3
!%
(
𝐀
5 6 5 6 5 6 !&
') 𝐙 !' 𝑣*
1 1 1
'* 2 2 2 !"
!$!" !% !# !& ReLU ReLU ReLU 𝑣+
!$!" !% !# !& 4
3
4
3
4
3
!#
!$ Score(𝒗𝒊)
5 6 5 6 5 6

!% 1
.
'+
2
.
!& 3

… … …
4

.
!' 5 6
!"
1
2
ReLU !# 𝑣,
!$!" !% !# !&
Embedding
1 1 1 3

!$
4
2 2
(
2

𝐗
5 6

Vectors !%
3 3 3
!$!" !% !# !& 4 4 4

', '- !&
5 6 5 6 5 6 1
Anomaly Ranking List
!'
2
3
4

5 6

Attributed Network Encoder Attribute Reconstruction Decoder

Figure 1: The overall framework of our proposed Dominant for deep anomaly detection on attributed networks.

decoder - which aims to reconstruct the original net- linear information from high-dimensional input by ap-
work topology with the learned node embeddings; and plying multiple layers of linear units and nonlinear ac-
(iii) attribute reconstruction decoder - which attempts tivation functions in the encoder and decoder phases,
to reconstruct the observed nodal attributes with the which is advantageous compared to conventional shal-
obtained node embeddings. Afterwards, the reconstruc- low learning models. Subsequently, in this study, we
tion errors of nodes are then leveraged to flag anomalies propose to solve the problem of anomaly detection on
on attributed networks. attributed networks in a deep autoencoder architecture.

3.1 Preliminary - Deep Autoencoder As sug- 3.2 Attributed Network Encoder As a rich net-
gested by [32, 37, 17], the disparity between the orig- work representation, attributed networks encode not
inal data and the estimated data (i.e., reconstruction only the network structure but also abundant nodal at-
errors) is a strong indicator to show the abnormality of tributes. However, conventional deep autoencoders can
instances in a dataset. Specifically, the data instances only handle i.i.d. attribute-value data [37, 35], which
with large reconstruction errors are more likely to be cannot be directly applied to our scenario. How to
considered as anomalies, since their patterns deviate design an effective encoder to capture the underlying
significantly from the majority and cannot be accu- properties of attributed networks remains a daunting
rately reconstructed from the observed data. Among task as we need to address the three challenges (i.e.,
various reconstruction based anomaly detection meth- network sparsity, data nonlinearity, and complex modal-
ods, deep autoencoder achieves state-of-the-art perfor- ity interactions) simultaneously. To this end, we pro-
mance. Deep autoencoder is a type of deep neural net- pose a new type of attributed network encoder inspired
work that is used to learn latent representations of data by the graph convolutional network (GCN) model [16].
in an unsupervised manner by stacking multiple layers Specifically, GCN considers the high-order node proxim-
of encoding and decoding functions together. It has ity when learning the embedding representations, thus
achieved promising learning performance in various do- it mitigates the network sparsity issue beyond the ob-
mains, such as computer vision, natural language pro- served links among nodes. Meanwhile, through multi-
cessing, and speech recognition [11]. ple layers of nonlinear transformations, it captures the
Given an input dataset X, the encoder Enc(·) nonlinearity of data and the complex interactions of two
is first applied to map the data into a latent low- information modalities on attributed networks.
dimensional feature space, and then the decoder Dec(·) Mathematically, GCN extends the operation of
tries to recover the original data based on the latent convolution to networked data in the spectral domain
representations. The learning process can be described and learns a layer-wise new latent representation by a
as minimizing a cost function described below: spectral convolution function:

(3.1) min E[dist(X, Dec(Enc(X))], (3.2) H(l+1) = f (H(l) , A|W(l) ),

where dist(·, ·) is a predefined distance metric. In prac- where H(l) is the input for the convolution layer l,
tice, we often choose the `2 -norm distance to measure and H(l+1) is the output after the convolution layer.
the reconstruction errors. It also should be noted that We take the attribute matrix X ∈ Rn×d as the input
deep autoencoder is able to capture the highly non- of first layer, which is equivalent to H(0) . W(l) is a

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
layer-specific trainable weight matrix we need to learn RS (i, :) indicates that the ith node on the attributed
in the neural network. Each layer of the graph con- network has a higher probability of being an anomaly
volutional network can be expressed with the function from the network structure aspect. Specifically, the de-
f (H(l) , A|W(l) ) as follows: coder takes the latent representations as input and pre-
dicts whether there is a link between each pair of two
(3.3) e − 12 A
f (H(l) , A|W(l) ) = σ(D eDe − 12 H(l) W(l) ), nodes:
(3.7) b i,j = 1|zi , zj ) = sigmoid(zi , zT ).
p(A
where A e = A + I and D e is the diagonal matrix j
of Ae with the diagonal element as D e i,i = P A e i,j .
j Accordingly, we train a link prediction layer based on
1 1
− −
Thus we can directly compute D e 2A eDe 2 as a pre- the output of attributed network encoder Z, which can
processing step. Note that σ(·) is a non-linear activation be presented as follows:
function, such as Relu(x) = max(0, x). It is worth b = sigmoid(ZZT ).
(3.8) A
noting that the filter or feature map parameters Wl
are shared for all nodes on the attributed network.
3.4 Attribute Reconstruction Decoder Simi-
Given the attribute matrix X as input, the k th -hop
larly, to compute the reconstruction errors of nodal
neighborhood of each node can be effectively captured
attributes, we propose an attribute reconstruction de-
by successively stacking a number of k convolutional
coder that approximates the nodal attributes informa-
layers. Therefore, the embeddings Z not only encode
tion from the encoded latent representations Z. Specifi-
the attribute information of each node but also involve
cally, the attribute reconstruction decoder leverages an-
the k th -order node proximity information. In this
other graph convolutional layer to predict the original
work, we propose to use three convolutional layers for
nodal attributes as follows:
constructing the attributed network encoder, but it
should be noted that more layers can also be stacked (3.9) b = fRelu (Z, A|W(3) ).
X
for building a deeper network. The attributed network
encoder can be formulated as: With the computed reconstruction errors RA = X − X, b
we can spot anomalies on the attributed networks from
(3.4) H(1) = fRelu (X, A|W(0) ) the attribute perspective.
(3.5) H(2) = fRelu (H(1) , A|W(1) )
3.5 Anomaly Detection Until now, we have intro-
(3.6) Z = H(3) = fRelu (H(2) , A|W(2) ). duced how to reconstruct the topological network struc-
ture, and nodal attributes using structure reconstruc-
Here, W(0) ∈ Rn×h1 is an input-to-hidden layer tion decoder and attribute reconstruction decoder, re-
with h1 feature maps. Similarly, W(1) ∈ Rh1 ×h2 and spectively. To jointly learn the reconstruction errors,
W(3) ∈ Rh2 ×h3 are two hidden-to-hidden weight ma- the objective function of our proposed deep graph con-
trices. After applying three layers of convolution, the volutional autoencoder can be formulated as:
input attributed network can be transferred to the h3 -
L = (1 − α)RS + αRA
dimensional latent representations Z, which can capture (3.10)
the high non-linearity in the topological network struc- b 2 , +α||X − X||
= (1 − α)||A − A|| b 2,
F F
ture and nodal attributes.
where α is an important controlling parameter which
balances the impacts of structure reconstruction and
3.3 Structure Reconstruction Decoder In this
attribute reconstruction.
subsection, we will discuss how to reconstruct the origi-
By minimizing the above objective function, our
nal network structure with the learned latent represen-
proposed deep graph convolutional autoencoder can
tations Z, which is from the aforementioned encoder
module. Let A b denote the estimated adjacency matrix, iteratively approximate the input attributed network
then the structure reconstruction error RS = A− A b can based on the encoded latent representations until the
objective function converges. The final reconstruction
be exploited to determine structural anomalies on the
errors are then employed to assess the abnormality of
network. Specifically, for a certain node, if its structure
nodes. Note that the weight matrices of the deep graph
information can be approximated through the structure
convolutional autoencoder are trained using gradient
reconstruction decoder, thus it is of low probability to
descent on the objective function. After a certain
be anomalous. On the opposite side, if the connectivity
number of iterations, we can compute the anomaly score
patterns cannot be well reconstructed, it implies that its
of each node vi according to:
structure information does not conform to the patterns
of majority normal nodes. Therefore, a larger norm of (3.11) score(vi ) = (1 − α)||a − a bi ||2 + α||xi − x
bi ||2 .

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
BlogCatalog Flickr ACM paper is regarded as a node on the network, and
the links are the citation relations among different
# nodes 5,196 7,575 16,484 papers. The attributes of each paper are generated
# edges 171,743 239,738 71,980 from the paper abstract.
# attributes 8,189 12,047 8,337
# anomalies 300 450 600 As there is no ground truth of anomalies in the
above datasets, thus we need to inject anomalies into
Table 1: Details of the three datasets the attributed networks for our empirical evaluation. In
particular, we refer to two anomaly injection methods
that has been used in previous research [8, 31] to gen-
Specifically, instances with larger scores are more erate a combined set of anomalies for each dataset by
likely to be considered as anomalies; thus we can perturbing topological structure and nodal attributes,
compute the ranking of anomalies according to the respectively. On one hand, to perturb the topologi-
corresponding anomaly scores. cal structure of an attributed network, we adopt the
method introduced by [8] to generate some small cliques.
3.6 Complexity Analysis Graph convolutional The intuition behind this method is that in many real-
network is a computationally efficient model whose world scenarios, small clique is a typical anomalous sub-
complexity is linear to the number of edges on the structure in which a small set of nodes are much more
network. For a particular layer, the convolution closely linked to each other than average [30]. Therefore,
operation is De − 21 A
eDe − 21 XW, and its complexity is after we specify the clique size as m, we randomly select
O(mdh) [16] as AX e can be efficiently implemented m nodes from the network and then make those nodes
using sparse-dense matrix multiplications, where m fully connected, and then all the m nodes in the clique
is the number of non-zero elements in matrix A and are regarded as anomalies. We iteratively repeat this
d is the number feature dimensions on the attributed process until a number of n cliques are generated and
network, and h is the number of feature maps of the the total number of structral anomalies is m × n. In our
weight matrix. In addition to the convolutional layers, experiments, we fix the clique size m to 15 and set n to
there is another link prediction layer in our model to 10, 15 and 20 for BlogCatalog, Flickr and ACM, respec-
reconstruct the original topological structure; thus the tively. In addition to the injection of structural anoma-
overall complexity is O(mdH + n2 ), where H is the lies, we adopt another attribute perturbation schema
summation of all feature maps across different layers. introduced by [31] to generate anomalies from attribute
perspective. To guarantee an equal number of anoma-
4 Experiments lies from structural perspective and attribute perspec-
In this section, we perform empirical evaluations on real- tive will be injected into the attributed network, we first
world attributed networks to verify the effectiveness of randomly select another m × n nodes as the attribute
the proposed Dominant framework. perturbation candidates. For each selected node i, we
randomly pick another k nodes from the data and se-
4.1 Datasets In order to have a comprehensive eval- lect the node j whose attributes deviate the most from
uation, we adopt three real-world attributed network node i among the k nodes by maximizing the Euclidean
datasets that have been widely used in previous re- distance ||xi − xj ||2 . Afterwards, we then change the
search [19, 14, 8] in our experiments: attributes xi of node i to xj . In our experiments, we
set the value of k to 50. The details of these three at-
• BlogCatalog: BlogCatalog is a blog sharing web- tributed network datasets are shown in Table 1.
site. The bloggers in blogcatalog can follow each
other forming a social network. Users are associ- 4.2 Experimental Settings In this section, we in-
ated with a list of tags to describe themselves and troduce the detailed experimental settings, including
their blogs, which are regarded as node attributes. the compared baseline methods and evaluation metrics.
• Flickr: Flickr is an image hosting and sharing Compared Methods. We compare the proposed
website. Similar to BlogCatalog, users can follow Dominant framework with the following popular
each other and form a social network. Node anomaly detection methods:
attributes of users are defined by their specified tags • LOF [4] detects anomalies at the contextual level
that reflect their interests. and only considers nodal attributes.
• ACM: ACM is another attributed network from • SCAN [34] is a structure based detection method
academic field. It is a citation network where each which detects anomalies at the structural level.

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
1.0 1.0 1.0

0.8 0.8 0.8

True Positive Rate

True Positive Rate


True Positive Rate

0.6 0.6 0.6

0.4 LOF = 0.4915 0.4 LOF = 0.4881 0.4 LOF = 0.4738


SCAN = 0.2727 SCAN = 0.2686 SCAN = 0.3599
AMEN = 0.6648 AMEN = 0.6047 AMEN = 0.5337
0.2 Radar = 0.7104 0.2 Radar = 0.7286 0.2 Radar = 0.6936
Anomalous = 0.7281 Anomalous = 0.7159 Anomalous = 0.7185
Dominant = 0.7813 Dominant = 0.7490 Dominant = 0.7494
0.0 0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate False Positive Rate False Positive Rate
(a) BlogCatalog (b) Flickr (c) ACM

Figure 2: ROC curves and AUC scores of all methods on different datasets.

• AMEN [24] uses both attribute and network struc- • Recall@K: This metric measures the proportion
ture information to detect anomalous neighbor- of true anomalies that a specific detection method
hoods. Specifically, it analyzes the abnormality of discovered in the total number of ground truth
each node from the ego-network point of view. anomalies.

• Radar [17] is the state-of-the-art unsupervised Parameter Settings In the experiments on different
anomaly detection framework for attributed net- datasets, we propose to optimize the loss function with
works. It detects anomalies whose behaviors are Adam [15] algorithm and train the proposed model for
singularly different from the majority by charac- 300 epochs for the performance evaluation. We set
terizing the residuals of attribute information and the learning rate to 0.005. In addition, the attributed
its coherence with network information. network encoder is built with three convolutional lay-
ers (64-neuron, 32-neuron and 16-neuron, respectively).
• ANOMALOUS [23] performs joint anomaly de- For the other baselines, we retain to the settings de-
tection and attribute selection to detect anomalies scribed in the corresponding papers.
on attributed networks based on the CUR decom-
position and residual analysis. 4.3 Experimental Results In the experiments, we
evaluate the performance of our proposed model Dom-
Evaluation Metrics In the experiments, two evalu- inant by comparing it with the aforementioned base-
ation metrics are used to measure the performance of lines. We first present the experimental results in terms
different anomaly detection algorithms: of ROC-AUC on the three datasets in Figure 2. Then we
present the results w.r.t. Precision@K and Recall@K
• ROC-AUC: As a widely used evaluation metric for other methods on all the attributed networks in Ta-
in previous anomaly detection methods [17, 23], ble 2. Note that we do not include the results of SCAN
the ROC curve is a plot of true positive rate (an and AMEN in Table 2 as they are clustering based meth-
anomaly is recognized as an anomaly) against false ods that cannot provide a precise ranking list for all the
positive rate (a normal node is recognized as an nodes. From the evaluation results, we make the follow-
anomaly) according to the ground truth and the ing observations:
detection results. AUC value is the area under
the ROC curve, representing the probability that • The proposed deep model Dominant outperforms
a randomly chosen abnormal node is ranked higher other baseline methods on all the three attributed
than a normal node. If the AUC value approaches networks. It verifies the effectiveness of performing
1, the method is of high quality. anomaly detection on attributed networks by deep
architecture.
• Precision@K: As each anomaly detection method
outputs a ranking list according to the anomalous • LOF and SCAN cannot achieve satisfying results
scores of different nodes, we use Precision@K to in our experiments as they merely consider the
measure the proportion of true anomalies that a nodal attributes or topological structure. Even
specific detection method discovered in its top K though AMEN is designed for anomaly detection
ranked nodes. on attributed networks, it centers around finding

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
Precision@K
BlogCatalog Flickr ACM
K 50 100 200 300 50 100 200 300 50 100 200 300
LOF 0.300 0.220 0.180 0.183 0.420 0.380 0.270 0.237 0.060 0.060 0.045 0.037
Radar 0.660 0.670 0.550 0.416 0.740 0.700 0.635 0.503 0.560 0.580 0.520 0.430
Anomalous 0.640 0.650 0.515 0.417 0.790 0.710 0.650 0.510 0.600 0.570 0.510 0.410
Dominant 0.760 0.710 0.590 0.470 0.770 0.730 0.685 0.593 0.620 0.590 0.540 0.497
Recall@K
BlogCatalog Flickr ACM
K 50 100 200 300 50 100 200 300 50 100 200 300
LOF 0.050 0.073 0.120 0.183 0.047 0.084 0.120 0.158 0.005 0.010 0.015 0.018
Radar 0.110 0.223 0.367 0.416 0.082 0.156 0.282 0.336 0.047 0.097 0.173 0.215
Anomalous 0.107 0.217 0.343 0.417 0.087 0.158 0.289 0.340 0.050 0.095 0.170 0.205
Dominant 0.127 0.237 0.393 0.470 0.084 0.162 0.304 0.396 0.052 0.098 0.180 0.248

Table 2: Performance of different anomaly detection methods w.r.t. precision@K and recall@K.

anomalous neighborhoods rather than nodes, which In this section, we briefly review related work in two
also result in relatively poor performance. aspects: (1) anomaly detection on attributed networks;
and (2) deep learning on network data.
• The residual analysis based models (Radar and
Anomalous) are superior to the conventional 5.1 Anomaly Detection on Attributed Net-
anomaly detection methods (LOF, SCAN and works As attributed networks are increasingly used
AMEN). However, these models are still limited by to model a wide range of complex systems, the stud-
their shallow mechanisms to handle the network ies of anomaly detection on attributed networks have
sparsity, data nonlinearity, and complex modality attracted a lot of attention. Generally, the existing
interactions issues. methodologies can be divided into three categories: the
• Dominant shows a stronger ability to rank anoma- first category of anomaly detection methods aims to
lies on higher positions according to the results of spot anomalies with community or ego-network anal-
precision@K and recall@K. It can achieve better ysis. For instance, CODA attempts to simultaneously
detection performance when the objective is to find find communities as well as spot community anomalies
more true anomalies within the ranking list of lim- within a unified probabilistic model [10]. AMEN [24]
ited length. considers the ego-network information for each node
and discovers anomalous neighborhoods on attributed
4.4 Parameter Analysis Next, we investigate the networks. Besides that, another family of methods is
impact of the controlling parameter α in our proposed focused on spotting abnormal nodes in a node feature
Dominant framework and report the performance vari- subspace [28, 27, 21]. For example, GOutRank [21] con-
ance results in Figure 3. Here we present the AUC val- ducts anomaly ranking on attributed networks based
ues on the three attributed network datasets. The con- on subspace cluster analysis. ConSub [28] takes sub-
trolling parameter α balances the impact of attribute space selection algorithm as a pre-processing step be-
reconstruction errors and structure reconstruction er- fore anomaly detection. FocusCO [25] focuses on com-
rors on model training and anomaly scores computa- munity anomalies on a predefined subspace from user
tion. In two extreme cases, Dominant will only con- preferences. In addition to the methods mentioned
sider the structure reconstruction errors when α is set above, residual analysis has emerged as another com-
to 1 while merely consider the attribute reconstruction mon way to measure the abnormality of nodes on at-
errors when α is set 0. The results indicate that it is tributed networks. In particular, Radar [17] character-
necessary to find a balance between the structure recon- izes the residuals of attribute information and its coher-
struction errors and attribute reconstruction errors for ence with network information for anomaly detection.
achieving a better performance. The reasonable choice ANOMALOUS [23] further incorporates CUR decom-
of α is around 0.4 to 0.7 for BlogCatalog and Flickr position into the residual analysis to alleviate the ad-
datasets, and 0.5 to 0.8 for ACM dataset. verse impacts of noisy features for anomaly detection.
Despite their fruitful progress, these models are limited
5 Related Work by their shallow mechanisms and are incapable of han-

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
combines network representation learning and anomaly
detection in a joint framework, it is proposed to solve
0.78 the problem of anomaly detection on dynamic networks,
which cannot be directly applied to our attributed net-
0.77 work scenario.
0.76
6 Conclusion
AUC

0.75 In this paper, we make the first investigation on the re-


search problem of anomaly detection on attributed net-
0.74 works by developing a carefully designed deep learning
BlogCatalog
Flickr model. Specifically, we address the limitations of exist-
0.73 ACM ing methods and model the attributed networks with
0.0 0.2 0.4 0.6 0.8 1.0 graph convolutional network (GCN). As GCN handles
the high-order node interactions with multiple layers
of nonlinear transformations, it alleviates the network
Figure 3: Impact of different α w.r.t. AUC values. sparsity issue and can capture the nonlinearity of data
as well as the complex interactions between two sources
dling the critical issues of attributed networks, such as of information on attributed networks. To further en-
network sparsity, data nonlinearity and complex modal- able the detection of anomalous nodes, we introduce a
ity interactions among different information sources. deep autoencoder framework to reconstruct the original
attributed network with the learned node embeddings
5.2 Deep Learning on Networked Data With the from GCN. The reconstruction errors of nodes are then
growing research interests on deep learning, tremen- employed to flag anomalies. The experimental results
dous efforts have been devoted to developing deep neu- demonstrate the superiority of the proposed deep model
ral networks on networked data for various learning over the state-of-the-art methods. Future work can be
tasks [6, 33, 22, 5, 36, 3]. As one of the first attempts, focused on two aspects: first we will investigate if the
HNE [6] develops a heterogeneous deep model to em- proposed deep model is vulnerable to data poisoning at-
bed heterogeneous network data into a unified latent tacks as intelligent attackers can inject malicious sam-
feature space. Afterward, a surge of deep autoencoder ples to avoid the anomalies being detected; second, we
based models [33, 5, 9] have been proposed for net- will study how to develop robust anomaly detectors in
work representation learning, and render state-of-the- the presence of adversarial attacks.
art performance by their strong capability in captur-
ing highly non-linear properties of data. Among them, 7 Acknowledgement
SDNE [33] exploits the first and second order node prox- This material is based upon work supported by, or in
imity by extending the traditional autoencoder frame- part by, the Nation Science Foundation (NSF) grant
work. TriDNR [22] captures the inherent correlations 1614576, and the Office of Naval Research (ONR) grant
between structure, node content and label information N00014-16-1-2257.
via a tri-party autoencoder architecture. Meanwhile, References
recent research advances on graph convolutional net-
work (GCN) [16, 7, 12] demonstrate superior learning [1] Leman Akoglu, Hanghang Tong, and Danai Koutra.
performance by considering neighbors of nodes that are Graph based anomaly detection and description: A
multiple hops away. In particular, GCN [16] takes the survey. DMKD, 29(3):626–688, 2015.
structure and attribute information as input, and ex- [2] Leman Akoglu, Hanghang Tong, Brendan Meeder, and
tends the operation of convolution on network data in Christos Faloutsos. Pics: Parameter-free identification
the spectral domain for embedding representation learn- of cohesive subgroups in large attributed graphs. In
ing. GraphSAGE[12] enables inductive representation SDM, pages 439–450, 2012.
[3] Peter W Battaglia, Jessica B Hamrick, Victor Bapst,
learning on graph structured data by learning a func-
Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz
tion that generates embeddings by sampling and aggre- Malinowski, Andrea Tacchetti, David Raposo, Adam
gating features from a nodes local neighborhood. Nev- Santoro, Ryan Faulkner, et al. Relational inductive
ertheless, all these methods focus on learning embedded biases, deep learning, and graph networks. arXiv
representations of nodes, it is still not clear how to per- preprint arXiv:1806.01261, 2018.
form anomaly detection on top of the deep neural net- [4] Markus M Breunig, Hans-Peter Kriegel, Raymond T
works. Even though the recently proposed NetWalk [35] Ng, and Jörg Sander. Lof: identifying density-based

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited
local outliers. In ACM Sigmod Record, volume 29, and Yang Wang. Tri-party deep network representa-
pages 93–104, 2000. tion. Network, 11(9):12, 2016.
[5] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Deep neural [23] Zhen Peng, Minnan Luo, Jundong Li, Huan Liu, and
networks for learning graph representations. In AAAI, Qinghua Zheng. Anomalous: A joint modeling ap-
pages 1145–1152, 2016. proach for anomaly detection on attributed networks.
[6] Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, In IJCAI, pages 3513–3519, 2018.
Charu C Aggarwal, and Thomas S Huang. Hetero- [24] Bryan Perozzi and Leman Akoglu. Scalable anomaly
geneous network embedding via deep architectures. In ranking of attributed neighborhoods. In SDM, pages
KDD, pages 119–128, 2015. 207–215, 2016.
[7] Michaël Defferrard, Xavier Bresson, and Pierre Van- [25] Bryan Perozzi, Leman Akoglu, Patricia Igle-
dergheynst. Convolutional neural networks on graphs sias Sánchez, and Emmanuel Müller. Focused
with fast localized spectral filtering. In NIPS, pages clustering and outlier detection in large attributed
3844–3852, 2016. graphs. In KDD, pages 1346–1355, 2014.
[8] Kaize Ding, Jundong Li, and Huan Liu. Interactive [26] Joseph J Pfeiffer III, Sebastian Moreno, Timothy
anomaly detection on attributed networks. In WSDM, La Fond, Jennifer Neville, and Brian Gallagher. At-
2019. tributed graph models: Modeling network structure
[9] Hongchang Gao and Heng Huang. Deep attributed with correlated attributes. In WWW, pages 831–842,
network embedding. In IJCAI, pages 3364–3370, 2018. 2014.
[10] Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou [27] Patricia Iglesias Sánchez, Emmanuel Müller, Oretta
Sun, and Jiawei Han. On community outliers and their Irmler, and Klemens Böhm. Local context selection for
efficient detection in information networks. In KDD, outlier ranking in graphs with multiple numeric node
pages 813–822, 2010. attributes. In SSDBM, page 16, 2014.
[11] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and [28] Patricia Iglesias Sánchez, Emmanuel Muller, Fabian
Yoshua Bengio. Deep learning, volume 1. 2016. Laforet, Fabian Keller, and Klemens Bohm. Statistical
[12] Will Hamilton, Zhitao Ying, and Jure Leskovec. Induc- selection of congruent subspaces for mining attributed
tive representation learning on large graphs. In NIPS, graphs. In ICDM, pages 647–656, 2013.
pages 1024–1034, 2017. [29] Cosma Rohilla Shalizi and Andrew C Thomas. Ho-
[13] Xiao Huang, Jundong Li, and Xia Hu. Label informed mophily and contagion are generically confounded
attributed network embedding. In WSDM, pages 731– in observational social network studies. Sociological
739. ACM, 2017. Methods & Research, 40(2):211–239, 2011.
[14] Xiao Huang, Qingquan Song, Jundong Li, and Xia [30] David B Skillicorn. Detecting anomalies in graphs. In
Hu. Exploring expert cognition for attributed network ISI, pages 209–216, 2007.
embedding. In WSDM, 2018. [31] Xiuyao Song, Mingxi Wu, Christopher Jermaine, and
[15] Diederik P Kingma and Jimmy Ba. Adam: A Sanjay Ranka. Conditional anomaly detection. TKDE,
method for stochastic optimization. arXiv preprint 19(5):631–645, 2007.
arXiv:1412.6980, 2014. [32] Hanghang Tong and Ching-Yung Lin. Non-negative
[16] Thomas N Kipf and Max Welling. Semi-supervised residual matrix factorization with application to graph
classification with graph convolutional networks. In anomaly detection. In SDM, pages 143–153, 2011.
ICLR, 2016. [33] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural
[17] Jundong Li, Harsh Dani, Xia Hu, and Huan Liu. deep network embedding. In KDD, pages 1225–1234,
Radar: Residual analysis for anomaly detection in 2016.
attributed networks. In IJCAI, pages 2152–2158, 2017. [34] Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and
[18] Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Thomas AJ Schweiger. Scan: A structural clustering
Yi Chang, and Huan Liu. Attributed network embed- algorithm for networks. In KDD, pages 824–833, 2007.
ding for learning in a dynamic environment. In CIKM, [35] Wenchao Yu, Wei Cheng, Charu C Aggarwal, Kai
pages 387–396, 2017. Zhang, Haifeng Chen, and Wei Wang. Netwalk: A
[19] Jundong Li, Xia Hu, Jiliang Tang, and Huan Liu. Un- flexible deep embedding approach for anomaly detec-
supervised streaming feature selection in social media. tion in dynamic networks. In KDD, pages 2672–2681,
In CIKM, pages 1041–1050, 2015. 2018.
[20] Miller McPherson, Lynn Smith-Lovin, and James M [36] Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou,
Cook. Birds of a feather: Homophily in social net- Pinggang Yu, Jianwei Zhang, Martin Ester, and Can
works. Annual Review of Sociology, 27(1):415–444, Wang. Anrl: Attributed network representation learn-
2001. ing via deep neural networks. In IJCAI, pages 3155–
[21] Emmanuel Muller, Patricia Iglesias Sánchez, Yvonne 3161, 2018.
Mulle, and Klemens Bohm. Ranking outlier nodes in [37] Chong Zhou and Randy C Paffenroth. Anomaly
subspaces of attributed graphs. In ICDE Workshop, detection with robust deep autoencoders. In KDD,
pages 216–222, 2013. pages 665–674, 2017.
[22] Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang,

Copyright c 2019 by SIAM


Unauthorized reproduction of this article is prohibited

You might also like