Deep Anomaly Detection On Attributed Networks
Deep Anomaly Detection On Attributed Networks
!% 1
.
'+
2
.
!& 3
… … …
4
.
!' 5 6
!"
1
2
ReLU !# 𝑣,
!$!" !% !# !&
Embedding
1 1 1 3
!$
4
2 2
(
2
𝐗
5 6
Vectors !%
3 3 3
!$!" !% !# !& 4 4 4
…
', '- !&
5 6 5 6 5 6 1
Anomaly Ranking List
!'
2
3
4
5 6
Figure 1: The overall framework of our proposed Dominant for deep anomaly detection on attributed networks.
decoder - which aims to reconstruct the original net- linear information from high-dimensional input by ap-
work topology with the learned node embeddings; and plying multiple layers of linear units and nonlinear ac-
(iii) attribute reconstruction decoder - which attempts tivation functions in the encoder and decoder phases,
to reconstruct the observed nodal attributes with the which is advantageous compared to conventional shal-
obtained node embeddings. Afterwards, the reconstruc- low learning models. Subsequently, in this study, we
tion errors of nodes are then leveraged to flag anomalies propose to solve the problem of anomaly detection on
on attributed networks. attributed networks in a deep autoencoder architecture.
3.1 Preliminary - Deep Autoencoder As sug- 3.2 Attributed Network Encoder As a rich net-
gested by [32, 37, 17], the disparity between the orig- work representation, attributed networks encode not
inal data and the estimated data (i.e., reconstruction only the network structure but also abundant nodal at-
errors) is a strong indicator to show the abnormality of tributes. However, conventional deep autoencoders can
instances in a dataset. Specifically, the data instances only handle i.i.d. attribute-value data [37, 35], which
with large reconstruction errors are more likely to be cannot be directly applied to our scenario. How to
considered as anomalies, since their patterns deviate design an effective encoder to capture the underlying
significantly from the majority and cannot be accu- properties of attributed networks remains a daunting
rately reconstructed from the observed data. Among task as we need to address the three challenges (i.e.,
various reconstruction based anomaly detection meth- network sparsity, data nonlinearity, and complex modal-
ods, deep autoencoder achieves state-of-the-art perfor- ity interactions) simultaneously. To this end, we pro-
mance. Deep autoencoder is a type of deep neural net- pose a new type of attributed network encoder inspired
work that is used to learn latent representations of data by the graph convolutional network (GCN) model [16].
in an unsupervised manner by stacking multiple layers Specifically, GCN considers the high-order node proxim-
of encoding and decoding functions together. It has ity when learning the embedding representations, thus
achieved promising learning performance in various do- it mitigates the network sparsity issue beyond the ob-
mains, such as computer vision, natural language pro- served links among nodes. Meanwhile, through multi-
cessing, and speech recognition [11]. ple layers of nonlinear transformations, it captures the
Given an input dataset X, the encoder Enc(·) nonlinearity of data and the complex interactions of two
is first applied to map the data into a latent low- information modalities on attributed networks.
dimensional feature space, and then the decoder Dec(·) Mathematically, GCN extends the operation of
tries to recover the original data based on the latent convolution to networked data in the spectral domain
representations. The learning process can be described and learns a layer-wise new latent representation by a
as minimizing a cost function described below: spectral convolution function:
where dist(·, ·) is a predefined distance metric. In prac- where H(l) is the input for the convolution layer l,
tice, we often choose the `2 -norm distance to measure and H(l+1) is the output after the convolution layer.
the reconstruction errors. It also should be noted that We take the attribute matrix X ∈ Rn×d as the input
deep autoencoder is able to capture the highly non- of first layer, which is equivalent to H(0) . W(l) is a
Figure 2: ROC curves and AUC scores of all methods on different datasets.
• AMEN [24] uses both attribute and network struc- • Recall@K: This metric measures the proportion
ture information to detect anomalous neighbor- of true anomalies that a specific detection method
hoods. Specifically, it analyzes the abnormality of discovered in the total number of ground truth
each node from the ego-network point of view. anomalies.
• Radar [17] is the state-of-the-art unsupervised Parameter Settings In the experiments on different
anomaly detection framework for attributed net- datasets, we propose to optimize the loss function with
works. It detects anomalies whose behaviors are Adam [15] algorithm and train the proposed model for
singularly different from the majority by charac- 300 epochs for the performance evaluation. We set
terizing the residuals of attribute information and the learning rate to 0.005. In addition, the attributed
its coherence with network information. network encoder is built with three convolutional lay-
ers (64-neuron, 32-neuron and 16-neuron, respectively).
• ANOMALOUS [23] performs joint anomaly de- For the other baselines, we retain to the settings de-
tection and attribute selection to detect anomalies scribed in the corresponding papers.
on attributed networks based on the CUR decom-
position and residual analysis. 4.3 Experimental Results In the experiments, we
evaluate the performance of our proposed model Dom-
Evaluation Metrics In the experiments, two evalu- inant by comparing it with the aforementioned base-
ation metrics are used to measure the performance of lines. We first present the experimental results in terms
different anomaly detection algorithms: of ROC-AUC on the three datasets in Figure 2. Then we
present the results w.r.t. Precision@K and Recall@K
• ROC-AUC: As a widely used evaluation metric for other methods on all the attributed networks in Ta-
in previous anomaly detection methods [17, 23], ble 2. Note that we do not include the results of SCAN
the ROC curve is a plot of true positive rate (an and AMEN in Table 2 as they are clustering based meth-
anomaly is recognized as an anomaly) against false ods that cannot provide a precise ranking list for all the
positive rate (a normal node is recognized as an nodes. From the evaluation results, we make the follow-
anomaly) according to the ground truth and the ing observations:
detection results. AUC value is the area under
the ROC curve, representing the probability that • The proposed deep model Dominant outperforms
a randomly chosen abnormal node is ranked higher other baseline methods on all the three attributed
than a normal node. If the AUC value approaches networks. It verifies the effectiveness of performing
1, the method is of high quality. anomaly detection on attributed networks by deep
architecture.
• Precision@K: As each anomaly detection method
outputs a ranking list according to the anomalous • LOF and SCAN cannot achieve satisfying results
scores of different nodes, we use Precision@K to in our experiments as they merely consider the
measure the proportion of true anomalies that a nodal attributes or topological structure. Even
specific detection method discovered in its top K though AMEN is designed for anomaly detection
ranked nodes. on attributed networks, it centers around finding
Table 2: Performance of different anomaly detection methods w.r.t. precision@K and recall@K.
anomalous neighborhoods rather than nodes, which In this section, we briefly review related work in two
also result in relatively poor performance. aspects: (1) anomaly detection on attributed networks;
and (2) deep learning on network data.
• The residual analysis based models (Radar and
Anomalous) are superior to the conventional 5.1 Anomaly Detection on Attributed Net-
anomaly detection methods (LOF, SCAN and works As attributed networks are increasingly used
AMEN). However, these models are still limited by to model a wide range of complex systems, the stud-
their shallow mechanisms to handle the network ies of anomaly detection on attributed networks have
sparsity, data nonlinearity, and complex modality attracted a lot of attention. Generally, the existing
interactions issues. methodologies can be divided into three categories: the
• Dominant shows a stronger ability to rank anoma- first category of anomaly detection methods aims to
lies on higher positions according to the results of spot anomalies with community or ego-network anal-
precision@K and recall@K. It can achieve better ysis. For instance, CODA attempts to simultaneously
detection performance when the objective is to find find communities as well as spot community anomalies
more true anomalies within the ranking list of lim- within a unified probabilistic model [10]. AMEN [24]
ited length. considers the ego-network information for each node
and discovers anomalous neighborhoods on attributed
4.4 Parameter Analysis Next, we investigate the networks. Besides that, another family of methods is
impact of the controlling parameter α in our proposed focused on spotting abnormal nodes in a node feature
Dominant framework and report the performance vari- subspace [28, 27, 21]. For example, GOutRank [21] con-
ance results in Figure 3. Here we present the AUC val- ducts anomaly ranking on attributed networks based
ues on the three attributed network datasets. The con- on subspace cluster analysis. ConSub [28] takes sub-
trolling parameter α balances the impact of attribute space selection algorithm as a pre-processing step be-
reconstruction errors and structure reconstruction er- fore anomaly detection. FocusCO [25] focuses on com-
rors on model training and anomaly scores computa- munity anomalies on a predefined subspace from user
tion. In two extreme cases, Dominant will only con- preferences. In addition to the methods mentioned
sider the structure reconstruction errors when α is set above, residual analysis has emerged as another com-
to 1 while merely consider the attribute reconstruction mon way to measure the abnormality of nodes on at-
errors when α is set 0. The results indicate that it is tributed networks. In particular, Radar [17] character-
necessary to find a balance between the structure recon- izes the residuals of attribute information and its coher-
struction errors and attribute reconstruction errors for ence with network information for anomaly detection.
achieving a better performance. The reasonable choice ANOMALOUS [23] further incorporates CUR decom-
of α is around 0.4 to 0.7 for BlogCatalog and Flickr position into the residual analysis to alleviate the ad-
datasets, and 0.5 to 0.8 for ACM dataset. verse impacts of noisy features for anomaly detection.
Despite their fruitful progress, these models are limited
5 Related Work by their shallow mechanisms and are incapable of han-