23 - NeurIPS - Interpreting Unsupervised Anomaly Detection in Security Via Rule Extraction
23 - NeurIPS - Interpreting Unsupervised Anomaly Detection in Security Via Rule Extraction
Ruoyu Li§† , Qing Li∗† , Yu Zhang§ , Dan Zhao† , Yong Jiang♮† , Yong Yang‡
§
Tsinghua University, China; † Peng Cheng Laboratory, China
♮
Tsinghua Shenzhen International Graduate School, China
‡
Tencent Security Platform Department, China
{liry19,yu-zhang23}@mails.tsinghua.edu.cn; {liq,zhaod01}@pcl.ac.cn
[email protected]; [email protected]
Abstract
Many security applications require unsupervised anomaly detection, as malicious
data are extremely rare and often only unlabeled normal data are available for
training (i.e., zero-positive). However, security operators are concerned about the
high stakes of trusting black-box models due to their lack of interpretability. In this
paper, we propose a post-hoc method to globally explain a black-box unsupervised
anomaly detection model via rule extraction. First, we propose the concept of dis-
tribution decomposition rules that decompose the complex distribution of normal
data into multiple compositional distributions. To find such rules, we design an
unsupervised Interior Clustering Tree that incorporates the model prediction into
the splitting criteria. Then, we propose the Compositional Boundary Exploration
(CBE) algorithm to obtain the boundary inference rules that estimate the decision
boundary of the original model on each compositional distribution. By merging
these two types of rules into a rule set, we can present the inferential process
of the unsupervised black-box model in a human-understandable way, and build
a surrogate rule-based model for online deployment at the same time. We con-
duct comprehensive experiments on the explanation of four distinct unsupervised
anomaly detection models on various real-world datasets. The evaluation shows
that our method outperforms existing methods in terms of diverse metrics including
fidelity, correctness and robustness.
1 Introduction
In recent years, machine learning (ML) and deep learning (DL) have revolutionized many security
applications such as network intrusion detection [1–3] and malware identification [4, 5] that outper-
form traditional methods in terms of accuracy and generalization. Among these works, unsupervised
anomaly detection becomes more promising, which detects malicious activities by the deviation
from normality. Compared to supervised methods, this type of method is more desirable in security
domains as 1) it hardly requires labeled attack/malicious data during the training (i.e., zero-positive
learning), which are typically much more sparse and difficult to obtain in contrast with benign data;
2) it does not fit any known threats, enabling better detection on unforeseen anomalies.
Due to the black-box nature of these models, ML/DL models are usually not directly interpretable
and understandable. Many local explanation methods [6–10] have attempted to interpret the models
by presenting feature importance of the decision for a single point. However, globally explaining
black-box models, especially using rule extraction to characterize the whole decision boundaries, is
particularly desirable in security systems since it can provide the following benefits:
∗
Corresponding author: Qing Li.
2 Related Work
To combat persistent emergence of new attacks in cyberspace, recent security applications [1–3, 19–
21] make heavy use of unsupervised models to detect unknown anomalies, such as one-class classifiers
[22–24], Isolation Forests [25, 26], autoencoders and variational autoencoders [27]. Despite many
unsupervised model-based approaches have achieved good detection rates, security operators are still
concerned about the semantic gap between black-box model prediction and human understanding,
considering the risks of the great cost incurred by bad decisions [10]. To resolve such concerns,
explainable AI (XAI) has been applied to anomaly detection [28–30]. For example, Kauffmann et al.
propose a decomposition method to explain anomalies of one-class SVMs [28]. Philipp et al. present
2
an explainable deep one-class classification method called Fully Convolutional Data Description
[30]. However, these methods are specific to a limited range of models and not versatile enough to
accommodate the vastly heterogeneous models of unsupervised anomaly detection.
Some prior work also incorporates popular model-agnostic explanation methods, such as LIME [6],
SHAP [7] and their variations [8], and applies them to explain unsupervised models [31–33] and
security applications [9, 34, 35, 10]. These methods typically use sparse linear models to interpret
predictions by estimating feature importance. Guo et al. propose a method named LEMNA that uses
a fused lasso to explain malware classification [9]. Sipple uses Integrated Gradients [36] to attribute
anomalies of IoT device failure [34]. Nonetheless, these methods can only interpret one data point at
a time (i.e., local explanation) but not reveal the complete decision-making process of a model.
To fully understand how black-box models work and safely deploy them, the most appropriate method
is model-agnostic global post-hoc explanation. It aims to match the predictions of any well-trained
models with an inherently interpretable explainer, such as decision trees [15, 16], symbolic rules [14],
sparse linear models [17] and decision lists [37]. In [15], the authors construct global explanations of
complex black-box models in the form of a decision tree approximating the original model. Jacobs
et al. propose a framework that takes an existing ML model and training dataset and generates tree
models to interpret security-related decisions [16]. However, most of these methods are only suitable
for interpreting supervised models that have labeled data of all classes, which are often unavailable.
Though work like [38] can extract rules from unsupervised anomaly detection models, it still assumes
that enough outliers exist in the training dataset judged by the black-box model so as to determine its
decision boundary. This assumption may not hold in practice if a model has great generalization and
can achieve a low false positive rate on normal data (e.g., [1]).
Some recent studies aggregate several local explanation models into near-global explanation [39–41].
However, this type of method is inherently computationally challenging when data volumes are
large and has to make trade-offs between fidelity and coverage. While techniques like knowledge
distillation can also realize model transformation to reduce complexity and promote interpretability
[42, 43], the fundamental purpose of these efforts is to compress models while ensuring accuracy
rather than explaining the original models with high fidelity.
3 Overview
3.1 Problem Definition
Let X ⊆ Rd be the variable space of d-dimensional features; x and xi denote a data sample and the
i-th dimension of the data sample. We give the following definitions for the rest of the paper:
Definition 1 (Unsupervised Anomaly Detection). Given unlabeled negative data (i.e., normal
data) X sampled from a stationary distribution D for training, an unsupervised model estimates
the probability density function f (x) ≈ PX ∼D (x), and detects an anomaly via a low probability
f (x) < φ, where φ > 0 is a threshold determined by the model itself or by humans.
It is noted that the threshold φ is a non-zero value, meaning that the model inevitably generates false
positives, which is a common setting in most of the works [1–3] even though the false positive rate
can be very low. Besides, the normal data may occasionally be contaminated or handled with errors.
We consider the anomaly detection tolerant of noisy data, but their proportion in training dataset is
small and we do not have any ground truth labels of the training data.
Definition 2 (Global Explanation by Rule Extraction). Given a trained model f with its anomaly
threshold φ and the training set X, we obtain an in-distribution rule set C = {C1 , C2 , ...} that
explains how the model f profiles the distribution of normal data. A rule C = ... ∧ (xi ⊙ υi ) ∧ ... ∧
(xj ⊙ υj ) is a conjunction of several axis-aligned constraints on a subset of the feature space, where
υi is the bound for the i-th dimension and ⊙ ∈ {≤, >}.
Let x ∈ C indicate that a data sample satisfies a rule. From C, we can build a surrogate model hC (x),
whose inference is to regard a data sample that cannot match any of the extracted rules as anomalous:
hC (x) = ¬(x ∈ C1 ) ∧ ¬(x ∈ C2 ) ∧ ..., Ci ∈ C. (1)
Our Goal. We expect the extracted rules to have a high fidelity to the original model, that is, a similar
coverage of normal data (i.e., true negative rate), and a similar detection rate of anomalies (i.e., true
3
(a) The unlabeled data (b) Compositional distribu- (c) Process of the CBE al- (d) The final rule set
tions gorithm
Figure 1: A high-level illustration of our method. The small circles are unlabeled normal data. The
dashed curves are the decision boundary of the black-box model. The vertical/horizontal lines in (b)
and (c) are the distribution decomposition rules.
To minimize the first item in Equation (2), suppose the training data X can well represent the
distribution D, a straightforward approach is to find the bound of X as rules, such as using a
hypercube to enclose the data samples which can easily achieve the minimization of the partial loss
Lx∈X (C, f, φ) = 0. However, as D is not a prior distribution and we do not have labeled abnormal
samples, the second item LX ≁D (C, f, φ) is neither deterministic nor estimable unless we create
sufficient random samples and query f , which is challenging given the high-dimensional space of X .
As prior studies [21, 34] suggest, normal data are typically multimodal, i.e., the overall distribution is
formed by multiple compositional distributions. For example, a server supports multiple services
such as web, email and database. The representations of these services can be disparate and located in
different regions in feature space with little transition between the regions, making it infeasible to find
a uniform rule set to accurately estimate the original model. An example is illustrated in Figure 1a.
Based on this intuition, we propose a divide-and-conquer approach. First, we propose an Interior
Clustering Tree model (Section 4) to find the distribution decomposition rules, which cut the feature
space into subspaces so that each subspace contains data belonging to the same compositional
distribution, as shown in Figure 1b. Then, we design a Compositional Boundary Exploration
algorithm (Section 5) to explore the decision boundary on each compositional distribution, as
depicted in Figure 1c. Particularly, the algorithm starts from the minimal hypercube that encloses
all data of the distribution, and finds the boundary by recursively extending the boundary following
the optimal direction guided by a gradient approximation. Upon obtaining the decision boundary
of a distribution, the corresponding boundary inference rule can be extracted. Last, the rule set that
globally approximates the original model can be obtained by merging the distribution decomposition
rule and the boundary inference rule of each compositional distribution, as illustrated in Figure 1d.
We formally define the distribution decomposition rule and the boundary inference rule as follows.
Definition 3 (Distribution Decomposition Rule). Denoted by CkI that decomposes the overall
PK
distribution of normal data D into K compositional distributions, i.e., PX ∼D (x) = k=1 ϕk ·
PX ∼Dk (x|x ∈ CkI ) where ϕk denotes the weight of each compositional distribution, so that a data
sample x ∼ Dk has significantly small probability of belonging to other distributions.
Definition 4 (Boundary Inference Rule). Denoted by CkE that estimates the decision boundary of
the original model for each distribution Dk , i.e., arg min LX ∼Dk (CkE , f, φ) + LX ≁Dk (CkE , f, φ).
CkE
With the definition of these two types of rules, we translate the objective in Equation (2) to the
following objective as our intuition indicates. We give a proof of this proposition in the appendix.
4
Proposition 1. The original objective can be estimated by finding the union of the conjunction of
distribution decomposition rules and boundary inference rules for each compositional distribution:
K 原始⽬标可以通过找到每个组合分布的分布分解规则和边界推断规则的交集来估计:
[
arg min LX ∼Dk (Ck , f, φ) + LX ≁Dk (Ck , f, φ), where Ck = CkI ∧ CkE . (3)
Ck
k=1
An IC-Tree continues to split nodes until it satisfies one of the following conditions: i) the number of
data samples at the node |N | = 1; ii) for any two of the data samples at the node ∀x(i) , x(j) ∈ N , 如果两个样本在模型输出上⾮常
接近,那么就不再需要进⼀步分
|f (x(i) ) − f (x(j) )| < ϵ; iii) it reaches a maximum depth τ , which is a hyperparameter. 裂节点,因为它们被认为是相似
的。
Distribution Decomposition Rule Extraction. A trained IC-Tree that has K leaf nodes (K ≤ 2τ )
represents K distributions separated from the overall distribution D. Suppose the k-th leaf node has
a depth of τ ′ . A distribution decomposition rule that describes the k-th compositional distribution
can be extracted by the conjunction of the splitting constraints from the root to the leaf node:
CkI = (xi ⊙1 bi |s1 = (i, bi )) ∧ ... ∧ (xj ⊙τ ′ bj |sτ ′ = (j, bj )), (6)
where ⊙ is “≤” if the decision path goes left or “>” if the decision path goes right.
5
approximate the optimal direction to explore the decision boundary, which makes the algorithm more
efficient and accurate to estimate the decision boundary.
Starting from Hypercube (line 1). Let X k denote the training data falling into the k-th leaf node of
an IC-Tree that represents a compositional distribution. Recall the definition of boundary inference
rules that target min LX ∼Dk (CkE , f, φ) + LX ≁Dk (CkE , f, φ). We use the minimal hypercube Hk
as a starting point of boundary inference rules to bound every dimension of the data samples in
X k judged by the original model as normal, which obviously achieves Lx∈X k (Hk , f, φ) = 0. The
minimal hypercube is enclosed by 2 × d axis-aligned hyperplanes, which can be characterized by the
following rule:
Hk = (υ1− ≤ x1 ≤ υ1+ ) ∧ ... ∧ (υd− ≤ xd ≤ υd+ ), (7)
where υi− = min(xi |f (x) > φ, x ∈ X k ) and υi+ = max(xi |f (x) > φ, x ∈ X k ).
Explorer Sampling (line 4∼6). The CBE algorithm explores the decision boundary of the original
model by estimating the bound of one feature dimension at a time. For i-th dimension, we uniformly
sample Ne data points on each hyperplane of the hypercube, i.e., e(1) , ..., e(Ne ) ∈ Hk ∧ (xi =
υi ), υi ∈ {υi− , υi+ }, which are called the initial explorers for this hyperplane. For an initial explorer e,
we further sample Ns auxiliary explorers near it from a truncated multivariant Gaussian distribution
denoted by N (e, Σ, i). Particularly, the center of sampling is the explorer e and the radius of
sampling is constrained by the covariance matrix Σ = diag(ρ|υ1+ − υ1− |, ..., ρ|υd+ − υd− |), where ρ is
a hyperparameter, and the sampling on i-th dimension is half-truncated to only keep the distribution
outside the hypercube as we desire to extend the boundary. With Ne × Ns auxiliary explorers in total,
we query the original model and use Beam Search to select Ne samples with the minimal probability
of being normal as the candidate explorers for the next iteration.
Gradient Approximation (line 7∼9). Though we have obtained Ne candidate explorers in the
previous step, using them directly for the next iteration does not guarantee the optimal direction of
movement towards the decision boundary. To find the optimal direction, we utilize the Fast Gradient
Sign Method [45] that employs gradient ascent to find the direction of feature perturbation. However,
we do not know the loss function of the original model in black-box scenarios. To deal with it, given
a selected auxiliary explorer ê that is sampled around an initial explorer e on the i-th dimension
hyperplane, we approximate the i-th dimension of the model gradient (i.e., the partial derivative) by
the slope of a linear model across the two data points, and use the midpoint with its i-th dimension
minus the approximation as the new explorer for the next iteration:
ei + êi ∂f (x) f (e) − f (ê)
ei,next = − η · sign(∇i ), where ∇i = ≈ , (8)
2 ∂xi ei − êi
sign(·) is the sign function, and η is a hyperparameter to control the stride of one iteration. The
iteration stops when i) an auxiliary explorer êext that satisfies f (êext ) < φ is found, or ii) it reaches
the maximum number of iterations.
Rule Acquisition (line 12). If the iteration stops due to the first condition, we produce a boundary
constraint for each dimension using the coordinate of êext that extends the boundary of the hypercube,
i.e., ci = (xi ⊙ êext,i ), where ⊙ is “≤” if êext,i is greater than υi+ , or “>” if êext,i is less than υi− .
If the iteration stops due to the second condition, it means the algorithm encounters difficulties
in moving towards the decision boundary by perturbing this feature dimension. We calculate the
difference between the model prediction of the last auxiliary explorer and that of the initial explorers
on the hyperplane. If the difference is smaller than a threshold δ, we decide that this feature dimension
is a contour line, i.e., it has no significant correlation with the model prediction. In this case, we do
not produce any constraints for this dimension. If the difference is greater than the threshold, we
produce constraints in the same way as those produced under the first condition. The final boundary
inference rule is the disjunction of the hypercube and the constraints on each dimension.
6 Evaluation
6.1 Experimental Setup
Black-box Models and Datasets. We use four different types of unsupervised anomaly detection
models widely used in security applications as the original black-box models, including autoencoder
(AE, used by [1]), variational autoencoder (VAE, used by [46]), one-class SVM (OCSVM, used by
6
Algorithm 1: Compositional Boundary Exploration
Input: Data falling into the k-th leaf node X k , anomaly detector f and its threshold φ
Output: Boundary inference rule Ck on this leaf node such that Ck encapsulates normality
1 Hk ← MinimalHypercube(Xk );
2 for i-th dimension in X k do
3 e(1) , ..., e(Ne ) ← IntialExplorer(Hk ) on i-th dimension;
4 while True do
5 ê(1) , ..., ê(Ns ) ← AuxiliaryExplorer(e) for each initial explorer e;
6 Beam Search for Ne candidate explorers from Ne × Ns auxiliary explorers that have the
minimal probability of being normal judged by f and φ;
7 e ← GradientApprox(ê) for each candidate explorer selected from auxiliary explorers;
8 if ending condition satisfied then
9 ci ← (xi ⊙ êi ) and break;
10 end while
11 end for
12 return CkE = Hk ∨ (c1 ∧ c2 ∧ ... ∧ cd );
[47]) and Isolation Forest (iForest, used by [48]). We employ three benchmark datasets for network
intrusion detection in the experiment, including CIC-IDS2017, CSE-CIC-IDS2018 [49] and TON-IoT
[50]. The representation of these datasets is tabular data, where each row is a network flow record
and each column is a statistical attribute, such as the mean of packet sizes and the inter-arrival time.
The datasets are randomly split by the ratio of 6:2:2 for training, validation and testing. We use
only normal data to train the anomaly detection models and calibrate their hyperparameters. The
description of the datasets and the AUC score of the models on the datasets are shown in Table 1.
Baselines. We employ five prior explanation methods as baselines: 1) We use [38] that extracts rules
from unsupervised anomaly detection (UAD); 2) For other global methods, we use the estimated
greedy decision tree (EGDT) proposed by [15], and Trustee [16] that specifically explains security
applications; 3) We also consider one method LIME [6] that can use a Submodular Pick algorithm to
aggregate local explanations into global explanations, and a knowledge distillation (KD) method [43]
that globally converts a black-box model to a self-explained decision tree. These methods, like ours,
can only access normal data to extract explanations. More details about baselines are in the appendix.
Metrics. We refer to the metrics in [18] to evaluate the rule extraction. Due to limited space, we
demonstrate the following four metrics in this section and present other results in the appendix:
1) Fidelity (FD), i.e., the ratio of input samples on which the predictions of original models and
surrogate models agree over the total samples, which indicates the extent to which humans can trust
the explanations; 2) Robustness (RB), i.e, the persistence of the surrogate model to withstand small
perturbations of the input that do not change the prediction of the original model; 3) True positive
rate (TPR) and true negative rate (TNR), suggesting whether the detection capability meets the need
of online defense and whether the extracted rules generate noticeable false alarms that cause “alert
fatigue” [51] in highly unbalanced scenarios of most security applications, respectively.
We extract rules from the four unsupervised anomaly detection models using the five baseline methods
and our method, and test the performance of the extracted rules. The results on the three datasets
are in Table 2. We find that our method achieves the highest fidelity on all the detection models
and datasets, and half of the scores are even over 0.99. It shows our method can precisely match
the predictions of the black box models, which ensures the correctness of its global interpretation.
Table 1: Summary of datasets for network intrusion detection and AUC of trained models.
No. Dataset #Classes #Features #Normal #Attack AE VAE OCSVM iForest
1 CIC-IDS2017 6 attacks + 1 normal 80 687,565 288,404 0.9921 0.9901 0.9967 0.9879
2 CSE-CIC-IDS2018 14 attacks + 1 normal 80 693,004 202,556 0.9906 0.9767 0.9901 0.9734
3 TON-IoT 9 attacks + 1 normal 30 309,086 893,006 0.9998 0.9998 0.9993 0.9877
7
Table 2: Performance of rule extraction on different datasets.
CIC-IDS2017 dataset
Table 3: Fidelity of extracted rules under varying percentages of noisy training data.
Percentage Random Noise Mislabeled Noise
AE VAE OCSVM iForest AE VAE OCSVM iForest
0% 0.9829 0.9814 0.9729 0.9876 0.9997 0.9977 0.9975 0.9984
1% 0.9829 0.9824 0.9148 0.9940 0.9991 0.9992 0.9952 0.9953
3% 0.9876 0.9873 0.8960 0.9920 0.9991 0.9992 0.9952 0.9953
5% 0.9855 0.9675 0.9511 0.7732 0.9914 0.9996 0.9992 0.9966
10% 0.9739 0.9881 0.9600 0.5148 0.9987 0.9983 0.9996 0.9978
Moreover, our method achieves the highest TPR on all the detection models and datasets; specifically,
the TPR is equal to 1.00 for all the detection models on TON-IoT dataset. This result suggests that our
rules can accurately detect various anomalous data, making it possible to realize online deployment
and defense using these rules. Our method also reaches a high level of robustness (minimum 0.9890,
maximum 1.00) and true negative rate (minimum 0.9715, maximum 1.00). Therefore, it is concluded
that our method can obtain rules of high quality from different black-box unsupervised anomaly
detection models using only unlabeled one-class data.
Considering that obtaining a “clean” training set requires huge manual effort in reality [52], we also
assess the efficacy of our method under varying percentages of “noisy” data. We evaluate the fidelity
of extracted rules using two approaches for the injection of noisy data: 1) random noise; 2) mislabeled
data from other classes, i.e., attack data. The results are shown in Table 3. We find that the impact of
the noisy data proportion is not significant: 36 of 40 fidelity scores in the table preserve over 0.95, and
the variation of fidelity scores is not obvious with the increase of noisy data for most of the models.
This shows that our rule extraction method can retain similar performance to the black-box model
that it extracts from. Nonetheless, the results of iForest also reveal that a sufficiently large proportion
of noisy data may cause a certain negative impact on the rule extraction for certain models.
To demonstrate that the rules obtained by our method are in line with human understanding, we use
the OCSVM as an example of black-box models to exhibit several explanations. We extract rules
from the well-train model and use the rules to predict three typical types of attack data, including
Distributed Denial-of-Service (DDoS) attacks, scanning attacks, SQL injection, and backdoor attacks.
8
Table 4: Examples of explanation on four types of attacks.
Attack Rules of Normality Attack Value Feature Meaning Human Understanding
ps_mean > 101.68 57.33 Mean of IP packet sizes DDoS attacks use packets of small sizes to achieve
DDoS iat_mean > 0.063 0.00063 Mean of packet inter-arrival time asymmetric resource consumption on the victim side,
dur > 12.61 0.00126 Duration of a connection and send packets at a high rate to flood the victim.
count > 120 1 IP packet count per connection Scanning attacks send a constant probe packet to a port,
Scanning
ps_var > 2355.20 0.0 Variance of IP packet sizes and the victim will not reply if the port is closed.
SQL ps_bwd_mean ≤ 415.58 435.80 Mean of backward IP packet sizes Unauthorized access to additional data from websites,
Injection dur > 1.64 0.37 Duration of a connection usually establish short connections for one attack.
ps_max > 275.28 48.0 Maximum of IP packet sizes It persists in compromised hosts and sends stealthy
Backdoor
ps_min > 49.41 40.0 Minimum of IP packet sizes keep-alive packets with no payload (thus very small).
Table 4 shows some features of the rules extracted from normal data that cannot be matched by the
attack data, and exhibits how humans can interpret the model decisions2 . For example, the data of
DDoS attacks cannot match the rules of three feature dimensions, including mean of packet sizes,
mean of packet inter-arrival time, and duration of a connection. It can be observed that the feature
values of attacks are markedly lower than the bound of the rules. Such results are easy to interpret.
Because the purpose of DDoS attacks is to overwhelm the resources of a victim, an attacker will
realize asymmetric resource consumption between the victim and himself (i.e., using small packets),
send packets at an extremely high rate (i.e., low inter-arrival time), and establish as many useless
connections as possible (i.e., short duration of connections). These explanations are in line with
how humans recognize the attack data. Hence, we can draw a conclusion that our method is able to
provide precise insights into black-box anomaly detection models in a human-understandable way.
To evaluate the contribution of each component in our method, including the IC-Tree and the CBE
algorithm, we conduct an ablation experiment by 1) replacing the IC-Tree with a clustering algorithm
K-Means, 2) using only the CBE algorithm, and 3) replacing the CBE algorithm with directly using
hypercubes as rules. In Table 5, we find that our method (IC-Tree + CBE) outperforms others in
terms of fidelity on both datasets. Though using the K-Means can reach similar results, it cannot
be expressed by axis-aligned rules with high interpretability and deployability as the IC-Tree can
achieve. In summary, both components are helpful for the quality of rule extraction.
We also evaluate the computational cost of our method with respect to training and prediction. Since
CIC-IDS2017 dataset has 80 features in total, we train the model using the first 20, 40, 60, and 80
features of 4000 samples to investigate the influence of feature sizes. The results are shown in Table 6,
which demonstrate the average training and prediction time of our method. It can be seen that the
training time is around 1 minute, which is acceptable and practical for large-scale training. Besides,
the training time increases basically linearly with the increase of feature sizes. This is because our
method adopts a feature-by-feature strategy to explore the decision boundary of the model. For
prediction time, our method is highly efficient, which only costs microsecond-level overhead for one
inference. It shows that as a rule-based approach, our method can achieve real-time execution for
online use. Note that the runtime is measured purely based on Python. In practice, the prediction
time of our method can be even less with more efficient code implementation.
2
Note that the “human understanding” was derived from the knowledge of the authors, and hence may be
subjective and not reflect the wide population of security experts. We give more clarification of obtaining the
content of Table 4 in the appendix, as well as potential reasons for the disagreement between humans and models.
9
Table 6: Average training and prediction time per sample for different feature sizes.
Feature Size Training Time (ms) Prediction Time (ms)
20 5.40 ± 5.50 × 10−4 5.48 × 10−3 ± 2.51 × 10−9
40 15.5 ± 6.80 × 10−2 5.52 × 10−3 ± 2.34 × 10−9
60 14.7 ± 8.75 × 10−5 6.99 × 10−3 ± 3.56 × 10−8
80 30.7 ± 3.08 × 10−1 6.91 × 10−3 ± 9.00 × 10−8
(a) Maximum tree depth (b) #Explorers (c) Sampling coefficient (d) Iteration stride
Figure 2: Sensitivity experiments of hyperparameters.
We also theoretically analyze the time complexity of our algorithms. For training, the complexity of
the IC-Tree is identical to a CART: O(d · n log n), where d is the feature size and n is the sample
number; the complexity of the CBE algorithm is O(K · d · Ne · Ns ), where K is the number of leaf
nodes of the IC-Tree, and Ne and Ns are the number of initial explorers and auxiliary explorers.
Therefore, the training time is theoretically linear to the feature size, which is in line with the empirical
results. For execution, the time complexity is O(|C| · d), where |C| is the number of extracted rules.
6.6 Hyperparameter
We perform a sensitivity analysis of several hyperparameters on their influence on the rule extraction.
We present four major hyperparameters in Figure 2, including the maximum depth τ of an IC-Tree,
Ne number of explorers, the coefficient ρ of sampling, and the factor η that controls the stride of an
iteration. Due to limited space, the analysis of other hyperparameters is placed in the appendix.
Maximum tree depth. A deeper IC-Tree has more leaf nodes, and can accordingly decompose more
distributions that ease the difficulty of rule extraction. Meanwhile, excessively fine-grained splitting
might cause overfitting. We find that τ = 15 achieves the best performance.
Number of Explorers. It is essentially the number of selected nodes per iteration in Beam Search,
which considers multiple local optima to improve greedy algorithms. But selecting too many nodes
may also include more redundancy. Figure 2b shows that a value between 6 and 8 is recommended.
Coefficient of sampling. Figure 2c shows that a higher value of the hyperparameter achieves
better results. A large coefficient decides a large radius of sampling from a multivariant Gaussian
distribution, which helps the CBE algorithm quickly find the decision boundary of the original model.
Factor of iteration stride. In Figure 2d, we find that a larger factor η can obtain rules of higher
quality. As it decides the stride of finding the explorers for the next iteration, a higher value of the
hyperparameter might help the convergence of the iteration process.
10
Acknowledgments and Disclosure of Funding
This work is supported by the National Key Research and Development Program of China under
grant No. 2022YFB3105000, the Major Key Project of PCL under grant No. PCL2023AS5-1, the
Shenzhen Key Lab of Software Defined Networking under grant No. ZDSYS20140509172959989,
and the research fund of Tsinghua University - Tencent Joint Laboratory for Internet Innovation
Technology.
References
[1] Y. Mirsky, T. Doitshman, Y. Elovici, and A. Shabtai, “Kitsune: An ensemble of autoencoders
for online network intrusion detection,” in Annual Network and Distributed System Security
Symposium (NDSS), 2018.
[2] R. Tang, Z. Yang, Z. Li, W. Meng, H. Wang, Q. Li, Y. Sun, D. Pei, T. Wei, Y. Xu, and Y. Liu,
“Zerowall: Detecting zero-day web attacks through encoder-decoder recurrent neural networks,”
in IEEE Conference on Computer Communications (INFOCOM), 2020.
[3] C. Fu, Q. Li, M. Shen, and K. Xu, “Realtime robust malicious traffic detection via frequency
domain analysis,” in ACM SIGSAC Conference on Computer and Communications Security
(CCS), 2021.
[4] R. Perdisci, W. Lee, and N. Feamster, “Behavioral clustering of http-based malware and
signature generation using malicious network traces,” in USENIX Symposium on Networked
Systems Design and Implementation (NSDI), 2010.
[5] E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural
networks,” in USENIX Security Symposium, 2015.
[6] M. T. Ribeiro, S. Singh, and C. Guestrin, “"why should i trust you?": Explaining the predictions
of any classifier,” in ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD), 2016.
[7] S. M. Lundberg and S. Lee, “A unified approach to interpreting model predictions,” in Annual
Conference on Neural Information Processing Systems (NeurIPS), 2017.
[8] M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-precision model-agnostic explana-
tions,” in AAAI Conference on Artificial Intelligence (AAAI), 2018.
[9] W. Guo, D. Mu, J. Xu, P. Su, G. Wang, and X. Xing, “Lemna: Explaining deep learning based
security applications,” in ACM SIGSAC Conference on Computer and Communications Security
(CCS), 2018.
[10] D. Han, Z. Wang, W. Chen, Y. Zhong, S. Wang, H. Zhang, J. Yang, X. Shi, and X. Yin, “Deepaid:
Interpreting and improving deep learning-based anomaly detection in security applications,” in
ACM SIGSAC Conference on Computer and Communications Security (CCS), 2021.
[11] netfilter project, “iptables,” https://www.netfilter.org/projects/iptables/index.html, 2023.
[12] snort, “Snort ids,” https://www.snort.org/, 2023.
[13] G. Xie, Q. Li, Y. Dong, G. Duan, Y. Jiang, and J. Duan, “Mousika: Enable general in-network
intelligence in programmable switches by knowledge distillation,” in IEEE Conference on
Computer Communications (INFOCOM), 2022.
[14] M. W. Craven and J. W. Shavlik, “Using sampling and queries to extract rules from trained
neural networks,” in International Conference on Machine Learning (ICML), 1994.
[15] O. Bastani, C. Kim, and H. Bastani, “Interpreting blackbox models via model extraction,” CoRR,
vol. abs/1705.08504, 2017.
[16] A. S. Jacobs, R. Beltiukov, W. Willinger, R. A. Ferreira, A. Gupta, and L. Z. Granville, “Ai/ml
for network security: The emperor has no clothes,” in ACM SIGSAC Conference on Computer
and Communications Security (CCS), 2022.
11
[17] J. H. Friedman and B. E. Popescu, “Predictive learning via rule ensembles,” The Annals of
Applied Statistics, vol. 2, no. 3, 2008.
[19] M. Du, F. Li, G. Zheng, and V. Srikumar, “Deeplog: Anomaly detection and diagnosis from
system logs through deep learning,” in ACM SIGSAC Conference on Computer and Communi-
cations Security (CCS), 2017.
[20] R. Li, Q. Li, J. Zhou, and Y. Jiang, “Adriot: An edge-assisted anomaly detection framework
against iot-based network attacks,” IEEE Internet of Things Journal, vol. 9, no. 13, pp. 10 576–
10 587, 2022.
[21] R. Li, Q. Li, Y. Huang, W. Zhang, P. Zhu, and Y. Jiang, “Iotensemble: Detection of botnet
attacks on internet of things,” in European Symposium on Research in Computer Security
(ESORICS), 2022.
[24] S. Itani, F. Lecron, and P. Fortemps, “A one-class classification decision tree based on kernel
density estimation,” Appl. Soft Comput., vol. 91, p. 106250, 2020.
[25] F. T. Liu, K. M. Ting, and Z. Zhou, “Isolation forest,” in IEEE International Conference on
Data Mining (ICDM), 2008.
[26] H. Xu, G. Pang, Y. Wang, and Y. Wang, “Deep isolation forest for anomaly detection,” CoRR,
vol. abs/2206.06602, 2022.
[28] J. Kauffmann, K.-R. Müller, and G. Montavon, “Towards explaining anomalies: A deep taylor
decomposition of one-class models,” Pattern Recognition, vol. 101, p. 107198, 2020.
[29] D. Kazhdan, B. Dimanov, M. Jamnik, and P. Liò, “MEME: generating RNN model explanations
via model extraction,” CoRR, vol. abs/2012.06954, 2020.
[32] J. Crabbé and M. van der Schaar, “Label-free explainability for unsupervised models,” in
International Conference on Machine Learning (ICML), 2022.
[33] O. Eberle, J. Büttner, F. Kräutli, K. Müller, M. Valleriani, and G. Montavon, “Building and
interpreting deep similarity models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp.
1149–1161, 2022.
[34] J. Sipple, “Interpretable, multidimensional, multimodal anomaly detection with negative sam-
pling for detection of device failure,” in International Conference on Machine Learning (ICML),
2020.
12
[35] L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “CADE:
detecting and explaining concept drift samples for security applications,” in USENIX Security
Symposium, 2021.
[36] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Interna-
tional Conference on Machine Learning (ICML), 2017.
[37] B. Letham, C. Rudin, T. H. McCormick, and D. Madigan, “Interpretable classifiers using rules
and bayesian analysis: Building a better stroke prediction model,” CoRR, vol. abs/1511.01644,
2015.
[38] A. Barbado, O. Corcho, and R. Benjamins, “Rule extraction in unsupervised anomaly detection
for model explainability: Application to oneclass svm,” Expert Systems with Applications, vol.
189, p. 116100, 2022.
[39] I. van der Linden, H. Haned, and E. Kanoulas, “Global aggregations of local explanations for
black box models,” CoRR, vol. abs/1907.03039, 2019.
[40] M. Setzu, R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “Glocalx - from
local to global explanations of black box AI models,” Artif. Intell., vol. 294, p. 103457, 2021.
[41] Q. Li, R. Cummings, and Y. Mintz, “Optimal local explainer aggregation for interpretable
prediction,” in AAAI Conference on Artificial Intelligence (AAAI), 2022.
[42] N. Frosst and G. E. Hinton, “Distilling a neural network into a soft decision tree,” CoRR, vol.
abs/1711.09784, 2017.
[43] Y. Li, J. Bai, J. Li, X. Yang, Y. Jiang, and S. Xia, “Rectified decision trees: Exploring the
landscape of interpretable and effective machine learning,” CoRR, vol. abs/2008.09413, 2020.
[44] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees.
Wadsworth, 1984.
[45] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,”
in International Conference on Learning Representations (ICLR), 2015.
[46] X. Xu, J. Li, Y. Yang, and F. Shen, “Toward effective intrusion detection using log-cosh
conditional variational autoencoder,” IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6187–
6196, 2021.
[47] A. Binbusayyis and T. Vaiyapuri, “Unsupervised deep learning approach for network intrusion
detection combining convolutional autoencoder and one-class SVM,” Appl. Intell., vol. 51,
no. 10, pp. 7094–7108, 2021.
[48] Y. Dong, Q. Li, K. Wu, R. Li, D. Zhao, G. Tyson, J. Peng, Y. Jiang, S. Xia, and M. Xu,
“Horuseye: Realtime iot malicious traffic detection framework with programmable switches,” in
USENIX Security Symposium, 2023.
[49] L. Liu, G. Engelen, T. M. Lynar, D. Essam, and W. Joosen, “Error prevalence in NIDS datasets: A
case study on CIC-IDS-2017 and CSE-CIC-IDS-2018,” in IEEE Conference on Communications
and Network Security (CNS), 2022.
[50] T. M. Booij, I. Chiscop, E. Meeuwissen, N. Moustafa, and F. T. H. den Hartog, “Ton_iot: The
role of heterogeneity and the need for standardization of features and attack types in iot network
intrusion data sets,” IEEE Internet of Things Journal, vol. 9, no. 1, pp. 485–496, 2022.
[51] B. A. AlAhmadi, L. Axon, and I. Martinovic, “99% false positives: A qualitative study of SOC
analysts’ perspectives on security alarms,” in USENIX Security Symposium, 2022.
[52] G. Apruzzese, P. Laskov, and A. Tastemirova, “Sok: The impact of unlabelled data in cyberthreat
detection,” in IEEE European Symposium on Security and Privacy (EuroS&P), 2022.
13
A Appendix
A.1 Proof of Proposition 1
Our Goal. We expect the extracted rules to have a high fidelity to the original model, that is, a similar
coverage of normal data (i.e., true negative rate), and a similar detection rate of anomalies (i.e., true
positive rate). To this end, we formulate our objective as follows:
arg min LX ∼D (C, f, φ) + LX ≁D (C, f, φ). (9)
C
Proposition 1. The original objective can be estimated by finding the union of the conjunction of
distribution decomposition rules and boundary inference rules for each compositional distribution:
K
[
arg min LX ∼Dk (Ck , f, φ) + LX ≁Dk (Ck , f, φ), where Ck = CkI ∧ CkE . (10)
Ck
k=1
Proof. The sum of the minimum losses on each of the compositional distributions is calculated by an
iteratively cumulative process. Let Lj be the sum of the minimum losses on each of the compositional
distributions at the j-th iteration, i.e.,
j
X
Lj = min(LX ∼Dk (Ck , f, φ) + LX ≁Dk (Ck , f, φ)).
k=1
Sj
Let X ∼ k=1 Dk represent a variable belonging to any of the compositional distributions D1 , ..., Dj .
We prove the Loop Invariant of Lj during the iteration, which always satisfies:
j
[ j
[
Lj = min(LX ∼ Sj
Dk ( Ck , f, φ) + LX ≁ Sj
Dk ( Ck , f, φ)) + ψ. (12)
k=1 k=1
k=1 k=1
1) For the first iteration, the equation L1 = min LX ∼D1 (C1 , f, φ) + LX ≁D1 (C1 , f, φ) + ψ obviously
holds where ψ = 0.
2) Suppose the Equation (12) holds at the j-th iteration. For the (j + 1)-th iteration, we have the
following derivations:
j+1
X
Lj+1 = min(LX ∼Dk (Ck , f, φ) + LX ≁Dk (Ck , f, φ))
k=1
j
[ j
[
= min(LX ∼Sj Dk ( Ck , f, φ) + LX ≁Sj Dk ( Ck , f, φ)) + ψ
k=1 k=1
k=1 k=1
+ min(LX ∼Dj+1 (Cj+1 , f, φ) + LX ≁Dj+1 (Cj+1 , f, φ))
j+1
[ j+1
[
= min(LX ∼ Sj+1
Dk ( Ck , f, φ) + LX ≁Sj+1 Dk ( Ck , f, φ)) +ψ
k=1 k=1
k=1 k=1
j+1
[ j+1
[
+ LX ∼ Sj
Dk ∩Dj+1 ( Ck , f, φ) + LX ≁ Sj
Dk ∩Dj+1 ( Ck , f, φ),
k=1 k=1
k=1 k=1
14
Sj+1
where k=1 Dk ∩ Dj+1 represents the overlap area between the conjunction of the compositional
Sj+1
distributions k=1 Dk and the (j + 1)-th compositional distribution Dj+1 . Recall the definition of
the compositional distributions that a data sample belonging to one compositional distribution has a
significantly small probability of belonging to other compositional distributions, meaning that the
overlap area between the compositional distributions is significantly small. Therefore, the loss with
respect to the overlap area is also significantly small, given the data samples belonging to the area are
significantly rare. Let
j+1
[ j+1
[
ψ = ψ + LX ∼ Sj
Dk ∩Dj+1 ( Ck , f, φ) + LX ≁Sj Dk ∩Dj+1 ( Ck , f, φ),
k=1 k=1
k=1 k=1
and we can get the final result of Lj+1 :
j+1
[ j+1
[
Lj+1 = min(LX ∼Sj+1 Dk ( Ck , f, φ) + LX ≁Sj+1 Dk ( Ck , f, φ)) + ψ,
k=1 k=1
k=1 k=1
which proves the loop invariant in Equation (12). At the last iteration when j = K, as the overall
SK
distribution is equal to the conjunction of the compositional distributions, i.e., D = k=1 Dk , we
prove Equation (11) holds and Lemma 1.1 is correct.
Our implementation is primarily based on PyTorch (version 1.12.1) for the deep learning models,
such as AE and VAE. Additionally, for data preprocessing, feature engineering, and model evaluation,
we employ the versatile machine learning library scikit-learn (version 1.1.3). Python (version 3.9.15)
serves as the programming language for our implementation, providing a rich ecosystem of libraries
and tools for data manipulation and experiment orchestration.
Our experiments were conducted on a server equipped with the Intel(R) Xeon(R) Gold 5218 CPU @
2.30GHz (128GB RAM) and the GeForce RTX 2080 Super (8GB VRAM). Note that GPU is only
used for the training of some DL-based anomaly detectors (e.g., AE, VAE), and our rule extraction
method only requires the use of CPU.
In this section, we delve into the details of the baseline methods used for evaluation in our experi-
ments, focusing on their implementation in the context of globally explaining unsupervised anomaly
detection. The five baseline methods include Rule extraction from UAD, Estimated Greedy Decision
Tree (EGDT), Trustee, Local Interpretable Model-agnostic Explanations (LIME), and a Knowledge
Distillation (KD) method.
15
model. Based on these training data, EGDT constructs the decision tree T ∗ of size k similar to CART,
in a greedy manner and is pruned to improve interoperability. The algorithm takes into account the
distribution of points that are routed to each leaf node in the decision tree to ensure that the label
assigned to each leaf node is accurate.
A.3.3 Trustee
The model-agnostic DT extraction method Trustee, specifically focuses on interpreting security
applications. The core idea is to construct an interpretable decision tree by minimizing the difference
between the black-box model and the surrogate model. The algorithm creates several high-fidelity
decision trees by executing an outer loop S times and an inner loop N times. In an unsupervised
setting, we treat the predictions of the black-box model as pseudo-labels and employ the same process
as for the supervised case.
16
parameters unless specified otherwise. Hyperparameters were set based on the initial grid search and
then manually fine-tuned.
• Precision (PR) measures the proportion of correctly predicted positive instances out of the
total instances anticipated as positive. It indicates the model’s ability to avoid false positives.
A lower rate of false positives is indicated by precision values that are higher.
• Correctness (CR) evaluates the precision of the explanations provided by the IC-Tree
method with respect to the underlying model. This can be quantified using the Jaccard
similarity index, defined as Cr = Nr , where r is the number of correct outputs by the model.
• Recall (RC) also known as sensitivity or true positive rate, quantifies the proportion of
correctly predicted positive instances out of all actual positive instances. It indicates the
model’s ability to identify all positive instances without missing any. Higher recall values
indicate a lower rate of false negatives. It can be quantitatively assessed by the ratio
RC = T PT+FP
N.
• F1 score (F1) is the harmonic mean of precision and recall combined into a single statistic,
which provides a balanced measure of both precision and recall, capturing the overall
performance of the model.
Besides the aforementioned metrics, in fact, we believe that the most appropriate way to assess
“human’s trust in anomaly detection models” should be employing a trial in which human security
experts are invited to inspect the rules of interpretation, measuring the consistency of understanding.
Yet this is not an easy task to conduct: to eliminate the subjective influence of each expert, enough
people must be invited to participate in the trial, which can be somewhat difficult to carry out. Despite
the intrinsic limitations of lacking such a metric, it is indeed a common practice to employ only those
commonly used metrics (e.g., fidelity, F1 score) even in research works accepted to top venues, which
may reveal an obvious gap between “research” and “practice”.
The specifics of the performance metrics of our proposed method in comparison to other baseline
methods, including UAD, EGDT, Trustee, LIME, and KD, are demonstrated in Table 7 and Table
8. In addition, we analyze the effects of several hyperparameters on our methodology, as shown in
Table 9. The performance metrics are categorized based on different anomaly detection algorithms:
Autoencoder (AE), Variational Autoencoder (VAE), One-Class SVM (OCSVM), and Isolation Forest
(iForest) are shown in Table 8. Precision (PR), Correctness (CR), Recall (RC), and F1 score (F1) are
evaluated in this paper. Notably, our proposed method achieves the highest performance in almost
all categories, with a few significant instances where it ties with the best performer. This indicates
the comprehensive superiority of our approach, as it consistently delivers exceptional performance
across a wide range of models and metrics.
UAD 0.5896 0.0659 0.1027 0.175 0.6104 0.0874 0.1126 0.1901 0.9912 0.067 0.999 0.9951 0.9993 0.067 0.9292 0.963
EGDT 0.5729 0.4729 0.4729 0.8924 0.5613 0.873 0.6864 0.5462 0.9904 0.9244 0.8553 0.9179 0.9942 0.9402 0.9938 0.994
Trustee 0.8651 0.4255 0.6482 0.4255 0.812 0.831 0.7655 0.828 0.856 0.5509 0.6664 0.7494 0.954 0.4842 0.8676 0.8952
LIME 0.9998 0.7614 0.7569 0.8616 0.7091 0.7331 0.8641 0.9003 0.9892 0.741 0.9263 0.9567 0.8641 0.8305 0.8305 0.8751
KD 0.5141 0.5141 0.6871 0.5835 0.5612 0.1618 0.4516 0.5623 0.9157 0.3564 0.3748 0.5319 0.7863 0.067 0.8465 0.8461
Ours 0.9994 0.9488 0.9457 0.9718 0.9633 0.9633 0.9633 0.9645 0.9351 0.933 1 0.9653 0.933 0.9933 0.9354 0.9841
This extensive evaluation underlines the robustness and effectiveness of our method compared to
established baselines. The superior performance, across varying metrics and under different detection
models, signifies its potential as a versatile solution for rule extraction in the TON-IoT dataset. This
17
Table 8: Performance of rule extraction on TON-IoT dataset.
AE VAE OCSVM iForest
Method
PR CR RC F1 PR CR RC F1 PR CR RC F1 PR CR RC F1
UAD 0.0058 0.085 0.1257 0.182 0.6153 0.216 0.3546 0.6845 0.9847 0.0501 0.0526 0.0998 0.9949 0.0501 0.1776 0.3014
EGDT 0.9749 0.9749 0.9854 0.9749 0.7659 0.7659 0.7856 0.8621 0.9979 0.8136 0.651 0.788 0.9964 0.6408 0.7977 0.886
Trustee 0.8475 0.4776 0.5465 0.4776 0.3809 0.3809 0.8416 0.3809 0.8482 0.7932 0.4147 0.557 0.9548 0.2338 0.6576 0.7525
LIME 0.7651 0.7664 0.6845 0.7561 0.9035 0.9053 0.8486 0.9053 0.7896 0.0749 0.7815 0.7512 0.7987 0.8354 0.8861 0.8354
KD 0.5765 0.0824 0.6548 0.3554 0.5648 0.0594 0.5486 0.5461 0.8966 0.0505 0.1004 0.1805 0.5154 0.0501 0.8456 0.8465
Ours 0.9992 0.9992 0.9345 0.9801 0.9992 0.9992 0.9861 0.9814 0.9345 0.9499 1 0.9743 0.9933 0.9933 0.9456 0.9968
demonstrates the practical utility of our proposed method, especially in a diverse and dynamic domain
such as IoT, which could significantly benefit from such adaptable and high-performing solutions.
As for the baseline methods, while some of them can achieve acceptable fidelity for certain models,
they fail to maintain such results on other models, indicating that they cannot achieve qualified
model-agnostic global explanations for unsupervised models. Further, most of their recall scores
cannot meet the requirement of using their rules for online defense. It is mainly because these
methods either require labeled data to determine the boundary between normal and abnormal (e.g.,
EGDT and LIME), or need sufficient outliers in the training data (e.g., UAD and KD), which can be
unavailable in many security applications. In contrast, our method eliminates these requirements by
the IC-Tree and CBE algorithm that explores the decision boundary in an unsupervised manner, and
meanwhile realizes a high detection rate of anomalies.
Lastly, it is worth noting that while our method exhibits superior performance, we do not imply the
obsoleteness of other methods. Every method has its strengths and use cases, thus the selection of the
method should always be context-dependent. Future work could involve fine-tuning our method to
improve its performance further or be applied to other domains.
5
D. Arp, E. Quiring, et al., “Dos and Don’ts of Machine Learning in Computer Security”, in 31st USENIX
Security Symposium, USENIX Security 2022.
18
training data, such as the DoS attack being launched from one separate host address while all other
normal traffic is from other host addresses.
Fidelity 0.9996 0.9996 0.9996 0.9995 0.9996 0.9996 0.9996 0.9995 0.9996 0.9998 0.9998 0.9998 0.9996 0.9996 0.9996 0.9995
Robustness 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Correctness 0.9992 0.9992 0.9992 0.9992 0.9993 0.9992 0.9992 0.9992 0.9992 0.9995 0.9996 0.9996 0.9992 0.9992 0.9992 0.9992
Accuracy 0.9992 0.9992 0.9992 0.9992 0.9993 0.9992 0.9992 0.9992 0.9992 0.9995 0.9996 0.9996 0.9992 0.9992 0.9992 0.9992
TPR 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
TNR 0.9844 0.9845 0.9849 0.9842 0.9853 0.9846 0.9845 0.9841 0.9844 0.9909 0.9925 0.9920 0.9850 0.9847 0.9848 0.9841
A.5 Limitations
Though our method achieves global explanation with high fidelity, it still has several limitations. First,
our method works well with tabular data but might be inapplicable to raw image data. It is because
our method treats every dimension of the feature space as a semantic feature, such as the average
packet size of a network connection, and extracts rules from each dimension of the feature space. In
19
contrast, raw image data are tensors of pixels. Their high-level semantics cannot directly derive from
each of the pixels but usually need a deep model with spatial awareness (e.g., CNN, ViT) to extract
feature maps, which are inconsistent with our method. Hence, this issue may limit the transferability
of the proposed method to other domains. Nonetheless, due to the typical trust in expert knowledge
over deep models in security domains, most security applications still rely on sophisticated feature
engineering and use data representations with explicit semantics, suggesting that our method remains
general in the field of security.
Second, recall that the rules extracted by our method are axis-aligned, which can be interpreted
as a certain feature over/under a threshold and are human-understandable. Though this format of
rules significantly promotes interpretability, it may limit its degree of fitting to the decision boundary
of the original model, which can be of various shapes in the high-dimensional feature space for
different models. Though our IC-Tree has mitigated this issue by splitting the distribution of normal
data into multiple compositional distributions, which are more compact and more likely to be fitted
using axis-aligned rules, there is little guarantee that our method cannot encounter underfitting if the
decision boundary of the original model is extremely irregular. It should be clarified that this is a
common limitation for all the global explanation methods that employ axis-aligned rules or decision
trees as the surrogate expression. To this end, we are also exploring other surrogate models and
algorithms that can further balance the interpretability and fitting ability.
Lastly, as we mentioned above, we believe that the experiments on “human understanding and
trust” can be significantly strengthened by introducing security practitioners to participate in a use
test, finding if the interpretation provided by the proposed method is consistent with their expert
knowledge of judging an anomaly. Currently, we are collaborating with the Tencent Security Platform
Department, aiming to rectify this limitation by accounting for the opinions of real practitioners.
20