0% found this document useful (0 votes)
54 views13 pages

eBPF

Uploaded by

zshuking24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views13 pages

eBPF

Uploaded by

zshuking24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Real-Time Intrusion Detection and Prevention


with Neural Network in Kernel using eBPF
Junyu Zhang∗ , Pengfei Chen† , Zilong He∗ , Hongyang Chen∗ , and Xiaoyun Li∗
∗† School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
∗ {zhangjy297, hezlong, chenhy95, lixy223}@mail2.sysu.edu.cn, † [email protected]

Abstract—With the development of public cloud, real-time intrusion detection. However, storing all nodes of DT brings
intrusion detection is becoming necessary. Current methods exponential memory overhead to the kernel. Linear Support
neither address the overhead of real-time network data capturing, Vector Machine [13] is inadequate for fitting non-linearly
nor effectively balance security level with performance. These
issues can be addressed by offloading intrusion detection and pre- separable intrusion detection problems. Neural networks (NN)
vention to the extended Berkeley Packet Filter (eBPF). However, based on int8 quantization [14] not only exhibit high inference
current eBPF-based methods suffer from shortcomings in model complexity but also introduce significant errors. [12]–[14]
performance or inference overhead. Moreover, they overlook the do not consider many important issues in eBPF, such as
issues of eBPF in real-time scenarios, such as maximum eBPF race conditions and the maximum eBPF instruction limitation
instruction limitations. In this paper, we redesign the Neural
Network inference mechanism to address the limitations of eBPF. [15] (§II-B2). Additionally, these works lack an real-time
Then, we propose a thread-safe parameter hot-updating mech- evaluation of the performance, effectiveness, and reliability.
anism without explicit spin lock. Evaluations indicate that our
method achieves model performance comparable to the current Insights. In this paper, we implement a real-time intrusion
best eBPF-based method while reducing memory overhead (5KB) detection and prevention prototype with eBPF (§IV-A). We
and inference time (3000-5000ns per flow). Our method achieve
F1-scores of 0.933 and 0.992 on the offline and online datasets, redesign the NN inference mechanism to address the limi-
respectively. tation of the integer-only arithmetic (§IV-B) and maximum
Index Terms—Real-Time Intrusion Detection, eBPF, Deep instruction number (§IV-C) in eBPF, and reduce the memory
Learning, Neural Network Quantization overhead while maintaining the performance. We propose a
thread-safe parameter hot-updating mechanism without termi-
I. I NTRODUCTION nating the intrusion detection system and explicit eBPF spin
Due to the cost-effective advantages of cloud computing, an lock [16] (§IV-D).
increasing number of companies are migrating their data and Recently, deep learning-based intrusion detection systems
business to the cloud [1]. Tenants access the cloud products have shown outstanding performance on existing and unseen
via the internet, thus exposing the cloud services to the threat intrusions [9], while the mainstream of these deep learning-
of network intrusions [2]. Consequently, it is necessary to based systems relies on NN [17]–[19]. Compared with [14],
detect and prevent network intrusions in real time before any our NN inference method not only reduces the inference
potential impact on system performance and functionality [3]. complexity but also improves the performance. We decompose
On the one hand, current real-time intrusion detection meth- the NN inference process into several sequential stages, and
ods [4]–[7] focus on the performance of detection models but then utilize chain eBPF Tail Calls [20] to implement them,
overlook the overhead associated with network data capturing which overcomes the maximum instruction limitation in a
(§II-A). On the other hand, there are tools automatically single eBPF program. We implement an integer-only NN
generating rules for active defense tools such as iptables [8] inference algorithm in eBPF, reducing the required memory
from offline detection models [9], but the performance of these overhead while ensuring high performance.
tools rapidly degrades as the number of rules increases [10]. Concept drift can lead to critical failures in deployed deep
Conversely, inadequate rules may allow more intrusions to learning-based detection models, and thus it is essential to
evade prevention [9]. Therefore, current approaches have not update the parameters of the models when concept drift occurs
effectively integrated intrusion detection and prevention in the [21]. If the NN inference program is terminated for parameter
real-time scenarios. updates, the system is exposed to unprotected risks during
The extended Berkeley Packet Filter (eBPF) enables the the update. Conversely, if the program is not terminated
dynamic and sandboxed programs execution totally in the when updating parameters (Parameters Hot-Updating), it can
Linux kernel, without any changes to the kernel source code easily lead to read-write race condition issues. We implement
[11]. Linux kernel hide diverse hardware architectures, which a thread-safe hot-updating algorithm that does not require
makes eBPF a suitable programmable network data plane. At explicit eBPF locks.
present, there is insufficient research on how to use eBPF for In this paper, we make the following contributions.
real-time intrusion detection and prevention (§V-E). Bachl et • Study: we identify overlooked issues associated with
al. [12] firstly use eBPF to implement decision tree (DT) for integrating real-time intrusion detection and prevention

2158-3927/24/$31.00 ©2024 IEEE 416


DOI 10.1109/DSN58291.2024.00048
in existing methods. As shown in Figure 1, with the bandwidth increasing, the
• Framework: we redesign NN inference in kernel using CPU overhead of libpcap grows from 1.53% to 5.00%, while
eBPF and propose a thread-safe parameter hot-updating the CPU overhead of tcpdump grows from 29.49% to 100%.
mechanism. Since tcpdump performs additional packet parsing, the cpu
• Evaluation: our method reduces memory overhead while overhead is significantly greater than that of libpcap program.
maintains the detection performance and time overhead Furthermore, when the bandwidth exceeds 1.2 Gbps, tcpdump
comparable to the existing methods. begins to drop packets and the corresponding context switches
Our code is available in the public repository1 . decreases significantly to 0. Context switches of libpcap
increases linearly with bandwidth, and can eventually reach
II. BACKGROUND & M OTIVATION thousands of times per second.
The root cause of the overhead is that traditional packet cap-
A. Overhead of Traditional Packet Capturing turing is based on the user-space programs. Packets captured
Typically, tcpdump [22] is employed for packet capturing, in the kernel require frequent context switches and memory
followed by the utilization of tools like CICFlowMeter [23] for copies before they can reach the user-space.
feature extraction [24]. However, without the programmability, B. eBPF for real-time intrusion detection and prevention
feature extraction will not start until all network packets are
captured by tcpdump. The time overhead of feature extraction 1) Introduction of the eBPF technique: eBPF enables
is significantly affected by the total size of the captured programmability in the Linux kernel [11]. User-defined eBPF
packets. Table I shows that, as network traffic increases, program is attached to the kernel hook point like system calls,
feature extraction time changes notably, surpassing the packet function entry/exit and network events. When events on the
capturing time. Consequently, the serial execution mode of corresponding hook point are triggered, the pre-defined eBPF
tcpdump and CICFlowMeter proves unsuitable for real-time program is run. For eBPF programs completely in the kernel,
scenarios. context switches per second can be almost non-existent.
There are two most common network hooks for packet
TABLE I: Impact of size on feature extraction time filtering: XDP and TC [28]. eXpress Data Path (XDP) serves
as the initial hook of the kernel network stack [29]. XDP
Traffic Size (MB) 100 200 300 400 500 hook enables eBPF program to process packets only in RX
Packet Capturing Time (s) 4.3 8.4 12.7 17.4 21.1 direction and decide whether the packets can be received.
Feature Extraction Time (s) 21.3 41.9 78.3 93.6 145.4
Traffic Control (TC) [30] is another hook after the execution
of XDP. Different from XDP hook, TC hook can filter packets
libpcap [25] is the foundation of tcpdump and provides in both RX and TX directions. However, TC hook is slightly
the programmability during the real-time data capturing [4]. worse than XDP hook because TC hook requires additional
After libpcap captures each packet, it will execute a callback memory allocation or entering software socket queues before
function containing the intrusion detection algorithm written it is triggered [31]. In order to minimize the overhead caused
in advance, which avoids the problem that packet capture and by packet filtering, we choose XDP as the eBPF hook point.
detection can only be executed serially. 2) Challenges of employing eBPF for real-time intrusion
However, libpcap introduces significant context switch and detection: The first challenge comes from ensuring feature
CPU overhead. We use the interfaces [26] provided by libpcap extraction time. XDP conducts feature extraction for each
to implement the callback function. In order to measure the packet upon receipt. If the time taken for feature extraction
overhead of real-time packet capturing, the callback function is less than the average time interval between two adjacent
returns immediately without performing any operation when it packets, then feature extraction is nearly imperceptible to the
is called. We evaluate libpcap and tcpdump on a 8 core virtual network flow to which the packet belongs. However, if the
machine, and use pidstat [27] to measure the overhead. time equals or exceeds the average time interval, for protocols
like TCP with acknowledgment mechanisms, the transmission
time of the network flow increases. In the case of protocols like
UDP without acknowledgment mechanisms, feature extraction
errors may occur.
The second challenge arises from the selection of features.
XDP offers programmability to implement feature extraction
algorithms. Due to the constraints of the 512 bytes eBPF stack
size and 1M instruction number [20], the number of features
(a) CPU Overhead (b) Context Switch that can be extracted at the XDP layer is limited. Moreover,
XDP is capable of inspecting packets only in the RX direction,
Fig. 1: Overhead of libpcap and tcpdump meaning it can filter packets received by itself but is unable
to observe packets sent by itself (TX direction). Therefore,
1 https://github.com/IntelligentDDS/NN-eBPF minimizing the number of features extracted in XDP and

417
determining the most important and effective features only in
the RX direction present another challenge.
The third challenge is race condition. Network interfaces
maintain multiple RX queues, with each RX queue assigned to
a specific CPU core. Upon receiving a packet, each RX queue
executes the XDP program on its allocated CPU core. Hence,
shared data structures within the XDP program give rise to
race conditions [31]. Although eBPF provides a spin lock
mechanism to address race conditions, but using spin locks
indiscriminately can introduce unpredictable latency overhead
to the kernel and eBPF do not allow any function calling
before the lock released. Effectively avoiding or resolving race
conditions requires specific design techniques.

III. T HREAT M ODEL


The system implemented in this paper is designed for real-
time intrusion detection scenarios. Therefore, it is necessary to
perform benign and intrusion behavior in real-time in a local
environment to validate the effectiveness, reliability, availabil-
ity, and overhead of our system in the real-time scenarios.
Thus, we need to define what constitutes benign behavior and
what constitutes intrusion behavior.
Benign Behavior [32]: Fig. 2: Overview of our system
• Using ssh to successfully log in to the system.
• Safe execution of common Linux commands: for exam-
1) Training and Quantization in User Space: To reduce
ple, using ping to test reachability, using ps to list all
the overhead of training NN in kernel space, we relocate the
processes, and using docker to manage containers.
NN training process to user space, while keeping the inference
• Normal HTTP requests and TCP traffic.
in the kernel space. Initially, NN is trained using historical data
• Uploading and downloading files using FTP.
on PyTorch, TensorFlow or MXNet. Subsequently, its parame-
Intrusion Behavior [33]: ters are quantized from floating-point numbers to integers for
• Brute-force attacks on SSH and FTP with repeated pass- inference in kernel. The basic idea of quantization involves
word attempts. multiplying a coefficient in order to shift the decimal point
• Port Scan: for example, multiple execution of nmap to to the right. Then, quantized parameters are loaded into eBPF
scan all network ports. program in kernel through module Parameter Hot-Updating.
• Denial of Service (DoS) and Distributed Denial of Ser- 2) Parameter Hot-Updating: Module Parameter Hot-
vice (DDoS) attacks. Updating addresses the issue of race conditions when updating
• Brute-force and Cross-Site Scripting (XSS) attacks on parameters from user space to kernel space. The NN quantized
HTTP applications . for inference solely using integers has been loaded into the
kernel and it utilizes the hot-updated parameters to conduct
IV. S YSTEM D ESIGN & I MPLEMENTATION inference.
3) Host Filter: In the real-time scenario, packets are ini-
We initially provide an overview of how intrusion detection tially processed by the eBPF program attached to the XDP.
and prevention are accomplished through NN in the kernel The first module of our eBPF program is Host Filter. Host
space. Subsequently, we elucidate the quantization of NN Filter checks the source IP of packets. If a packet originates
parameters and inference, and resolve challenges specific to from a previously identified host performing intrusions, the it
eBPF implementation. Then, we present the inference based is discarded. Otherwise, it has no detection record yet, and is
on the chained eBPF Tail Calls. Finally, we discuss strategies passed to module Packet Feature Extraction. Each record is
to mitigate race conditions during the hot update of parameters assigned the same validity period, and it is removed once it
from user space to kernel space. expires.
4) Packet Feature Extraction and Flow Feature Updat-
A. Overview ing: Module Packet Feature Extraction extracts features from
Figure 2 illustrates the overall architecture of the system. the packet, such as TCP Header Length, and updates the
The following analysis delves into the functions of each corresponding flow features stored in eBPF map, such as
module. Total Header Length. The selection of flow features is not

418
based on expert knowledge or arbitrary choices. Rather, it is signed division, which is not supported by eBPF. Moreover, if
derived from the intrusion detection task. We calculate the the data follow a normal distribution, the standardized data
importance of each feature from historical data, choose the top follow the standard normal distribution N (0, 1), with data
6 most important features (Fwd packet length Max, Fwd IAT concentrated around 0 according to the 3σ rule. Since eBPF
Max, Fwd packet length Min, Destination Port, Fwd Header only supports integer division, direct standardization in eBPF
Length, Total Fwd packets, details are shown in Table IV) leads to significant precision loss.
and subsequently implement corresponding feature extraction To address the issue of precision loss, we incorporate the
algorithms in eBPF. enlargement method into the standardization formula, resulting
5) Inference in Kernel Space using eBPF: We implement in the following expression:
an integer-only NN inference algorithm in eBPF. The input to
the Inference module is the normalized flow features stored in ⎧
eBPF map flow feature, and the output is a binary classification ⎪
⎨0 σj = 0
(1) s·(μ −x )
result: whether it is an intrusive or normal flow. To reduce xj = −round( σj j j ) xj < μj , σj = 0 (1)

⎩ s·(x −μ )
the overhead of kernel-space inference, we perform a binary round( σj j j ) xj ≥ μj , σj = 0
classification without distinguishing between different intru-
s·(xj −μj ) x −μ
sion types. Additionally, the inference is only conducted after σj and s · jσj j are two different computation
the completion of a flow. The indication of flow completion is methods. The latter involves division followed by multipli-
marked by setting the FIN and RST flags in the TCP header cation, and since eBPF performs integer division, significant
to 1. precision loss occurs in this case. The former, on the other
6) Intrusion Host Record and Correction: Whenever NN hand, involves multiplication followed by division to preserve
in kernel detects an intrusion, the source IP of the correspond- precision. Moreover, the above formula first performs unsigned
ing flow is used in Host Filter, resulting in subsequent packets division and then converts the result into the corresponding
from that source to be discarded. To improve the recall of the signed number, thereby circumventing the limitations posed
model, we have the ability to update host filter by adding or by the lack of support for signed number division in eBPF.
removing IP addresses. Since eBPF performs integer operations exclusively, the above
formula does not necessitate the use of the round operation.
B. Parameters and NN Inference Quantization
2) Inference: Parameters of each linear layer need to be
In this paper, the term NN refers to Multilayer Perceptron quantized before inference, and the formula is as follows:
(MLP), and ReLU are implemented in all activation layers.
(k) (k)
The quantization of NN parameters is performed using a W E  round(s · W (k) ) = [round(s · wij )] (2)
simple technique called enlargement method. The core idea
is to multiply the floating-point number by an integer s For the k-th (k > 1) linear layer, the input tensor satisfies:
(enlargement factor) and subsequently round it to the nearest
integer stored in int32. To provide a comprehensive description x(k) = relu(y (k−1) ) (3)
of the method, some notations are defined in Table II.
The output tensor of the k-th layer satisfies the following
TABLE II: Notations of enlargement method formula:

Notation Meaning 1 (k)


y (k) = · W E · x(k)
(k)
x(k) = [xj ] ∈ Rn
(k−1)
Input of k-th linear layer
s
(k−1)
(k) (k) n
y (k) = [yi ] ∈ Rn Output of k-th linear layer 1 (k) (k)
(k) (k) (k−1) = ·[ wE,ij · xj ]
W (k) = [wij ] ∈ Rn ×n Weight matrix of k-th linear layer s (4)
(k) j=1
n (k ≥ 1) Size of k-th linear layer
(k−1)
n(0) Number of input features n
s=2b (k) (k)
x = [xj ] ∈ Rn
(0)
Initial input of NN = ars( wE,ij · xj , b)
(0)
μ = [μj ] ∈ Rn Mean of x j=1
(0)
σ = [σj ] ∈ Rn Standard deviation of x
relu(x) Function ReLU
In the above formula, as we have multiplied each element
(k)
round(x) = x Rounding down x in W E by s, and the elements in y (k) are obtained by
(k)
ars(x, b) Arithmetic right-shift x by b bits multiplying corresponding elements of W E and x(k) and
then summing them up. Therefore, to eliminate the s from the
1) Preprocessing (Standardization): If normalization is result, we need to multiply by 1s . To reduce the time overhead
performed during the training process, it implies that normal- associated with multiplication and division instructions, we
ization is also required before real-time inference in the kernel utilize shift instructions to perform these operations. Specifi-
space. cally, let s = 2b , multiplication by s is achieved by left shifting
We employ standard normalization (standardization) for pre- by b bits, while division by s is achieved by right shifting by
processing. However, the normalization process may involve b bits.

419
XDP concurrently filters multiple network flows, and per- D. Parameter Hot-Updating
forming parallel inference on these flows may introduce issues As shown in Figure 4(a), we utilize an eBPF map named
related to race conditions. The primary concern lies in how to nn parameters to store NN used for inference in XDP.
store the hidden layer outputs y (k) in a parallelized manner. nn parameters consists of two elements: one is named Run-
If y (k) is stored using global variables, there inevitably is a ning, representing the active NN parameters for inference, and
race condition between read and write operations. To address the other is named Idle, reserved for subsequent updating. An
the race condition problem, we employ local variables for eBPF map named nn index is employed to store the index of
storing y (k) of each flow. However, local variables may pose the Running in nn parameters, and thus, nn index comprises
limitations as the maximum stack space is restricted. only one element.
3) Classification: We consider intrusion detection as a
binary classification task, where the label for the Intrusion
class is 1, and the label for the Benign class is 0. For the final
linear layer output y (K) , the decision criterion is as follows:

(K) (K)
1 y1 > y0
prediction = (K) (K) (5)
0 y1 ≤ y0
C. Inference based on the chained eBPF Tail Calls
(a) Inference (b) NN Updating (c) Index Updating
The constraint of 1 million eBPF instructions poses a
significant challenge to the implementation of NN in eBPF. In Fig. 4: Parameter hot-updating
practice, we observe that even for a NN with small dimensions
like [6, 32, 32, 2], the compiled number of instructions still The pinned map nn parameters can be found in the corre-
exceeds 1 million, leading to rejection by the eBPF verifier sponding file located at /sys/fs/bpf. To access nn parameters in
during loading. kernel space, the user-space eBPF code can employ the helper
To address the instruction number limitation, we adopt the function bpf obj get to load the file /sys/fs/bpf/nn parameters.
eBPF Tail Call mechanism for inference. As illustrated in In Figure 4(b), the hot-updating process begins by loading the
Figure 3, the inference process is split into alternating eBPF new NN parameters, named Ready, from user space into Idle
programs for the Linear Layer and ReLU. Upon completion of in kernel space. The index of Idle within nn parameters can
one program, it uses bpf tail call to invoke the next adjacent be determined using the following formula:
program.
The advantage of using tail call is that each program has an idle index = (running index + 1) mod 2 (6)
instruction limit of 1 million, treating Linear Layer and ReLU
as independent programs. For an individual Linear Layer and In the above formula, the value of running index is stored
ReLU, the number of instructions does not exceed 1 million, within the nn index pinned map, which can be loaded and
satisfying the conditions of the eBPF verifier. accessed in user space.
In the final step, as depicted in Figure 4(c), updating the
value in nn index with idle index completes the parameter
hot-updating process.
Although the helper function bpf map update elem can be
used to atomically update the NN parameters stored in the
eBPF map, accessing the NN during inference is not atomic,
which can result in potential race condition issues between
reads and writes. Hence, we choose the approach illustrated
in Figure 4 to address the race condition without relying on
spin locks.
V. E VALUATION
Fig. 3: Inference using three-layer MLP in eBPF In this section, our goal is to evaluate the performance of
NN in eBPF and address the following research questions
The stack size limit for the entire tail call chain is set to (RQs).
256 bytes per subprogram, and the maximum call depth is 33 • RQ I: Compared to existing methods, what is the effec-
[20]. The outputs of the Linear Layer and ReLU layer are tiveness of the proposed NN in accomplishing intrusion
stored as int32, with each element occupying 4 bytes. Hence, detection tasks (§V-B)?
the size limit for each layer is 64 ( 256
4 = 64) elements. Since
• RQ II: When errors occur in real-time detection, to what
Linear Layer and ReLU layers appear in pairs except for the extent can the proposed NN maintain the reliability and
last layer, the maximum depth 17 ( 33−1 2 + 1 = 17). availability of the system (§V-C)?

420
• RQ III: How much overhead is caused to the system (Heartbleed), insufficient quantity of instances (Infiltration,
by extracting features from the flow in real time and Web SQL Injection), and similarities to the already reproduced
performing NN inference (§V-D)? intrusions (DoS Hulk, DDoS, DoS Slowloris).
• RQ IV: What are the overlooked issues in the existing We conduct the evaluation using two Linux virtual ma-
methods (§V-E)? chines. Host A deploys the real-time intrusion detection system
proposed in this paper, while host B is responsible for sending
A. Dataset & Environment Setup benign and intrusion network traffic to host A according to the
CIC-IDS-2017 [33] is a commonly used benchmark for generation methods. Each host is configured with 8 cores, 16
evaluating intrusion detection models, and we remove all the GB memory, 2,000 MHz CPU frequency, and 6.1.43 kernel
invalid data from the CIC-Flow-2017 dataset. To evaluate the version.
real-time performance using eBPF, we reproduce intrusion
B. Effectiveness
datasets (eBPF-Reproduction Dataset) for both benign and
intrusion using the feature extraction algorithm in eBPF XDP. We use classical metrics for intrusion detection to
Below are the generation method of each intrusion: evaluate the model performance, namely Accuracy =
T P +T N TP TP
• Benign: use ssh to log in shell and execute various T P +F P +T N +F N , Precision = T P +F P , Recall = T P +F N ,
2·P recision·Recall
common commands, including ping, ps, docker, and and F1-score = P recision+Recall . True Positive (TP) rep-
curl, among others. Additionally, utilize httperf [34] resents the instances correctly identified as intrusion, False
for simulating typical HTTP traffic and iperf [35] for Positive (FP) represents the instances incorrectly identified
simulating regular TCP traffic. Furthermore, take into as intrusion, True Negative (TN) represents the instances
account activities such as logging in, uploading, and correctly identified as benign, and False Negative (FN) repre-
downloading within an FTP application. sents the instances incorrectly identified as benign. We regard
• PortScan: utilize the Linux nmap tool to conduct port
intrusion detection as a binary classification task, that is, the
scanning. type of intrusion is not distinguished.
• Dos GoldenEye and Slowhttptest: implement two dis-
We compare the effectiveness among Decision Tree (DT)
tinct Denial of Service (DoS) attack methods using Gold- [12], Support Vector Machine (SVM) [13], Neural Network
enEye [36] and Slowhttptest [37]. using int8 quantization (NN-int8) [14], and our method (NN-
• FTP and SSH Patator: use patator [38] to perform
int32). Table V demonstrates the evaluation results on the CIC-
dictionary-based brute force attack on SSH and FTP IDS-2017 and eBPF-Reproduction dataset.
passwords. 1) Effectiveness on CIC-IDS-2017: Because CIC-IDS-
zh 2017 is a widely utilized intrusion dataset with a diverse range
• Web Brute Force and XSS: automate Brute Force and
of categories, we firstly evaluate the effectiveness of each
Cross-Site Scripting (XSS) attacks on Damn Vulnerable method on it. we set the max depth of decision tree to 10,
Web App (DVWA) [39] using selenium [40]. the same configuration as [12], and the neural network is a
three-layer perceptron, with the sizes of each layer being 32,
TABLE III: CIC-IDS-2017 and reproduction dataset 32, and 2 respectively. The batch size is set to 512, the learning
rate is set to 0.001, and Nvidia Tesla V100 is used to iteratively
Intrusion Type CIC-IDS-2017
eBPF-Reproduction train the NN for 32 times. The enlargement factor s is set to
Train Test 216 .
FTP Download 1287 990 Due to the linear kernel method used in the implementa-
FTP Upload 1568 3136
Benign Http Traffic 2271320 1024 2048
tion of SVM [13], its non-linear fitting capability is limited.
TCP Traffic 923 1846 Consequently, SVM tends to classify all flows as intrusions,
SSH 2160 2160 leading to high recall but low precision results. We now focus
FTP-Patator 7935 1972 1913 on comparing DT, NN-int8, and NN-int32.
SSH-Patator 5897 972 1920 We initially train the model (WQ) using all features (ALL),
Dos GoldenEye 10293 2807 6772
Intrusion PortScan 158804 896 896
and subsequently quantize it (Q). Both NN-int8 and NN-int32
Dos Slowhttptest 5499 1100 2222 employ the same model (WQ) but with different quantization
Web Brute Force 967 800 1600 methods. DT can achieve good performance even without pre-
Web XSS 1507 800 1600
processing the input data using standardization (WQ+ALL).
Total 2462222 16257 27103 Moreover, many features are actually integers, for instance,
Total Fwd Packets (Table IV). Additionally, apart from the
Table III illustrates the datasets used in this study, namely threshold values at each node, the model parameters of DT
CIC-IDS-2017 and eBPF-Reproduction dataset. To minimize are represented using integers [12]. Therefore, the performance
the influence of the local environment, the training and testing remains consistent before and after quantization (Q+ALL).
datasets of eBPF-Reproduction are collected on different dates. The performance of NN before quantization is comparable
Some intrusion types from the CIC-IDS-2017 dataset are not to that of DT (WQ+ALL). However, the differences in quan-
reproduced due to three reasons: outdated intrusion types tization methods lead to variations in performance (Q+ALL).

421
NN-int8 leads to a decrease in recall, while the decline in other the importance of each feature based on the Gini gain of
metrics is smaller. In other words, NN-int8 tends to classify DT. Then, we compute the cumulative feature performance
flows as benign. This is because, compared to the unquantized to investigate how many of the most important features are
model, NN-int8 loses too much information during the quan- required to achieve optimal performance. We incorporate
tization process, as 8-bit integers are insufficient to represent features based on their importance in descending order, train
parameters and variables of each layer. Furthermore, NN-int8 NN, and evaluate the performance metrics. Cumulative results
requires both quantize and dequantize operations at each layer displayed in Figure 5 indicate that training the model with
[14], both of which involve approximation, leading to further only features {6, 23, 7, 0, 34, 2} already achieves performance
loss of precision. NN-int32 (Our Method) and NN-int8 use comparable to using all features. Consequently, the number of
the same unquantized model, but the performance of NN- feature can be reduced from 24 to 6.
int32 remains consistent with that before quantization. This
is because parameters and inputs of each layers multiplied by
the enlargement factor s do not overflow the representation
range of 32-bit integers, while maintaining precision within
the maximum range.
To further assess if employing solely RX features is suffi-
cient for intrusion detection, features associated with TX and
the overall flow are removed from the CIC-IDS-2017 dataset,
resulting in 24 RX-specific features detailed in Table IV. Fig. 5: Cumulative feature performance

TABLE IV: RX-specific features. fwd and forward indicate We select the Top 6 RX-specific features. Before quan-
“in the forward direction”, which refers to the RX. tization, compared to models using all RX-specific features
Number Feature Description (WQ+RX), both NN and DT experience a decrease in per-
0 Destination Port Destination Port formance (WQ+Top K), primarily reflected in precision and
2
4
Total Fwd Packets
Total Length of Fwd Packets
Total packets
Total size of packet
recall. However, the maximum decrease is only 0.0047 (NN-
6 Fwd Packet Length Max Maximum size of packet int32 recall). After quantization, the performance of NN-int32
7 Fwd Packet Length Min Minimum size of packet
8 Fwd Packet Length Std Standard deviation size of packet and DT remains nearly unchanged (Q+Top K), while the per-
9
20
Fwd Packet Length Mean
Fwd IAT Total
Mean size of packet
Total time between two packets
formance of NN-int8 decreases by almost 50%, indicating that
21 Fwd IAT Mean Mean time between two packets the quantization error of NN-int8 is significantly influenced by
22 Fwd IAT Std Standard deviation time between two packets
23 Fwd IAT Max Maximum time between two packets the number of features.
24 fwd IAT Min Minimum time between two packets Since CIC-IDS-2017 is imbalanced, we also use Matthew
30 Fwd PSH Flags Number of PSH flag
32 Fwd URG Flags Number of URG flag Correlation Coefficient (MCC) [41] to evaluate the perfor-
34 Fwd Header Length Total bytes used for headers
36 Fwd Packets/s Number of packets per second mance of models with quantization and top 6 features (Q+Top
53 Avg Fwd Segment Size Average size observed K). MCC for DT and our method are 0.958 and 0.917,
56 Fwd Avg Bytes/Bulk Average number of bytes bulk rate
57 Fwd Avg Packets/Bulk Average number of packets bulk rate respectively, indicating optimal agreement between predicted
58 Fwd Avg Bulk Rate Average number of bulk rate
62 Subflow Fwd Packets The average number of packets in a sub flow and actual values. MCC for NN-int8 is 0.344, suggesting
63 Subflow Fwd Bytes The average number of bytes in a sub flow poor classification performance. MCC for SVM is 0.015,
68 act data pkt fwd Count of packets with at least 1 byte
69 min seg size forward Minimum segment size observed much close to the random guess classifier. Results of PR-
AUC are almost consistent with MCC. DT and our method
Before quantization, compared to the model using all fea- perform the best, achieving 0.996 and 0.975, respectively,
tures (WQ+ALL), both DT and NN experience only a slight while SVM and NN-int8 are close, with values of 0.408 and
decline in performance when utilizing only RX-specific fea- 0.366, respectively.
tures (WQ+RX). This is attributed to the reduction in features 2) Effectiveness on eBPF-Reproduction: In §V-B1, we
related to TX and the overal flow, impacting the detection evaluate the effectiveness utilizing Top K RX-specific fea-
capability of model. However, due to the strong correlation tures (WQ/Q+Top K). Firstly, we implement the extraction
between RX and TX-specific features, the decrease in perfor- algorithm for the Top 6 features in eBPF XDP. Subsequently,
mance is minimal. After quantization, both DT and NN-int32 following the methodology outlined in §V-A, we generate
(Q+RX) maintain consistent performance with the unquantized training and testing data for benign and intrusion, resulting in
model (WQ+RX). However, the quantization error of NN- the dataset presented in Table III. Afterwards, we implement
int8 amplifies the performance decline caused by the reduced NN inference algorithm with eBPF. NN parameters obtained
features, resulting in a significant decrease in precision. from the training dataset are loaded after quantization. Results
Not all features in Table IV contribute to the intrusion on the testing dataset are illustrated in Table V.
detection model. In other words, it is possible to achieve The performance of SVM (WQ/Q+Top K) is the worst
similar performance as using all features from Table IV by among the models due to the non-linearity of the data. It is
choosing the most important subset (Top K) of features. To consistently biased towards classifying flows as benign, which
provide interpretability to the feature importance, we estimate leads to a high recall but a low precision.

422
TABLE V: Effectiveness evaluation. WQ denotes models implemented in the PyTorch and scikit-learn frameworks without
any quantization, while Q denotes the models quantized from models of WQ and then implemented in eBPF. ALL, RX, and
Top K respectively denote the evaluation results using all features, only the RX-specific features, and only the top K most
important features from the RX-specific set.
Offline Evaluation Real-Time Evaluation
Model
Accuarcy Precision Recall F1-score Accuarcy Precision Recall F1-score
WQ + ALL 0.997 0.995 0.989 0.992
WQ + RX 0.993 0.989 0.975 0.982
WQ + Top K 0.987 0.962 0.972 0.967 0.999 0.999 0.999 0.999
DT [12]
Q+ ALL 0.988 0.995 0.946 0.970
Q+ RX 0.993 0.988 0.975 0.981
Q+ Top K 0.987 0.962 0.972 0.967 0.999 0.999 1.000 0.999
WQ + ALL 0.227 0.201 0.984 0.334
WQ + RX 0.197 0.197 0.999 0.329
WQ + Top K 0.198 0.197 1.000 0.329 0.824 0.788 0.982 0.874
SVM [13]
Q+ ALL 0.339 0.144 0.477 0.221
Q+ RX 0.282 0.158 0.611 0.251
Q+ Top K 0.368 0.191 0.686 0.299 0.624 0.624 1.000 0.769
WQ + ALL 0.994 0.978 0.993 0.985
WQ + RX 0.988 0.952 0.988 0.970
WQ + Top K 0.974 0.927 0.941 0.934 0.991 0.988 0.998 0.993
NN-int8 [14]
Q+ ALL 0.902 0.956 0.527 0.679
Q+ RX 0.864 0.720 0.502 0.592
Q+ Top K 0.838 0.622 0.444 0.518 0.671 0.663 0.961 0.785
WQ + ALL 0.994 0.978 0.993 0.985
WQ + RX 0.988 0.952 0.988 0.970
WQ + Top K 0.974 0.927 0.941 0.934 0.991 0.988 0.998 0.993
NN-int32 (Our Method)
Q+ ALL 0.994 0.977 0.992 0.985
Q+ RX 0.988 0.952 0.988 0.970
Q+ Top K 0.974 0.926 0.941 0.933 0.994 0.986 0.999 0.992

TABLE VI: Impact of depth and size of hidden layers


NN and DT achieve excellent results before quantization
(WQ+Top K), with performance of NN slightly trailing behind Structure Precision Recall F1-score
DT, but the difference is no more than 0.011. After quantiza- [6,16,16,16,2] 0.990 0.926 0.957
tion, both NN-int32 and DT maintain comparable performance [6,32,32,32,2] 0.974 0.926 0.941
Size
(Q+Top K), indicating that the quantization process largely [6,64,64,64,2] 0.995 0.937 0.965
[6,128,128,128,2] 0.983 0.969 0.976
preserves accuracy. However, NN-int8 exhibits a significant
reduction in all metrics except recall after quantization, indi- [6,32,32,2] 0.970 0.896 0.932
[6,32,32,32,2] 0.974 0.926 0.941
cating a substantial decrease in the ability to detect intrusions. Depth
[6,32,32,32,32,2] 0.979 0.928 0.953
Moreover, when compared to results on the CIC-IDS-2017 [6,32,32,32,32,32,2] 0.993 0.898 0.943
dataset, the performance (CIC-IDS-2017, Q+Top K) shows a
higher decline in recall, suggesting that NN-int8 is sensitive
to the choice of the dataset. The enlargement factor s affects the precision of model
We also calculate the MCC on the eBPF reproduction quantization. A larger factor leads to higher quantization
dataset. We find that both DT and our method still perform the precision, but it also comes with increased storage overhead.
best, with MCC reaching 0.998 and 0.981 respectively. How- For instance, with s = 216 , it may be necessary to use int32
ever, MMC for NN-int8 degrade to -0.099 and 0, respectively. for storage, while s = 28 might allow the use of int16, halving
SVM predicts all test samples as intrusions, resulting in an the storage cost. Figure 6(a) indicates that the model achieves
MCC of 0. For PR-AUC, DT and our method achieve 0.999 optimal performance only when s = 216 .
and 0.966 respectively, while SVM and NN-int8 only reach The reason for using a larger s is that the model parameters
0.735 and 0.707 respectively. K are very small. Therefore, a larger s is needed to preserve
3) Hyper-parameter Settings: To investigate the impact of higher precision during rounding. This heuristic provides a
the NN-int32 (Q+TOP K) structure on performance, experi- basis for searching a suitable value of s, where the product of
ments are conducted with different depths and sizes of hidden K and s should be greater than or equal to 1 to avoid becoming
layers on CIC-IDS-2017. Table VI suggests that modifications 0 during rounding. However, s cannot be too large to avoid
to the structure have minimal impact on performance. There- the overflow. Therefore, the most suitable s∗ should minimize
fore, to strike an optimal balance between model performance s under the condition that the probability of the product of K
and complexity, a three-layer neural network with a hidden and s being less than 1 is less than a threshold α. This can
layer size of 32 is employed. be expressed in the formula below:

423
Next, we analyze the reliability. We select the categories
with accuracy less than 1.000 from Table VII, namely FTP
Download, Http Traffic, SSH, SSH-Patator, and Web XSS.
We then visualize the detection results for each flow in each
category in chronological order and calculate the Mean Time
To Failure (MTTF) for each category, which represents the
number of flows between two consecutive incorrect detection
results.
FTP Download, SSH-Patator, and Web XSS demonstrate
robust detection accuracy with minimal errors, as shown in
(a) Performance under different s (b) ECDF of NN parameters Figure 7. These errors are sparsely scattered. Conversely, Http
Traffic and SSH exhibit relatively lower detection accuracy,
Fig. 6: Impact of different s resulting in a denser distribution of errors. However, when
considering MTTF, Http Traffic and SSH achieve impressive
MTTF values of 1050 and 1021 respectively, implying a high
1 level of reliability despite the lower accuracy. While FTP
s∗ = arg min P r(|K| ≤ ) ≤ α. (7) Download, SSH-Patator, and Web XSS have lower MTTF
s s
values, their infrequent occurrence of errors contributes to their
By fitting the empirical cumulative distribution function overall strong reliability.
(ECDF) of the absolute values of model parameters |K|
through data, the above formula can be expressed as: D. Overhead
This section analyzes the overhead of our method in terms
s∗ = arg min ECDF (s−1 ) ≤ α. (8) of time, space and CPU.
s
1) Time: to assess the real-time feature extraction and
The ECDF is shown in Figure 6(b). If the condition is set NN inference overhead, we utilize iperf [35] to generate
to α = 0.005, then s needs to be at least greater than 216 . varying numbers of concurrent flows, ranging from 1 to 128.
The error of NN-int32 also depends on the quantization of the Each concurrency level is repeated for 8 iterations. Each flow
input data. Given α = 0.005 and s = 216 introduced above, transmit 10MB TCP traffic at the maximum sending rate, and
we expect that for each feature xi and its ECDF denoted as all the flows are sent concurrently in one iteration.
ECDFi , the following formula is satisfied: As shown in Table VIII, with the increase in the number
of concurrent flows, the average transmission time (WO) also
ECDFi (s−1 ) ≤ α. (9) increases. This is because the system has only 8 cores, and the
network transmission rate is fixed. Therefore, with a constant
When computing the ECDF for each feature, we observe
amount of data to be transmitted, as the number of concurrent
that, the ECDF of each features all adhere to the formula
flows increases, the transmission time also increases. However,
mentioned above. Therefore, we can achieve results with NN-
even with the addition of feature extraction and NN inference
int32 that closely unquantized NN.
(W), the average transmission time for W and WO does
C. Availability & Reliability not show a significant difference, with instances where one
is smaller or larger than the other. This suggests that the
The accuracy of inference is not guaranteed to be 100%. implemented feature extraction and NN inference do not
In real-time scenarios, dropping packets to terminate intrusion introduce a significant delay to the network.
flow based on the incorrect inference results may have unex- Due to the latency caused by the network stack and trans-
pected consequences for the system. Therefore, this section mission link between two packets, as long as the feature
mainly analyzes the reliability and availability of our method. extraction time is less than the delay time, it has little impact
The dataset used for the analysis is the testing dataset as shown on the network connection when performing feature extraction
in Table III. upon receiving a packet. In an ideal scenario where there is
First, we analyze the availability. In this context, availability no significant delay in the transmission link, the latency of
refers to accuracy. Table VII demonstrates that out of the 5 the network stack can be approximated using ping 127.0.0.1.
benign categories, 2 have a perfect accuracy of 1.000, while The measured average latency of the network stack on the
among the 7 intrusion categories, 5 also achieve a perfect accu- experimental machine is 46000 ns, significantly higher than the
racy of 1.000. The remaining categories all have an accuracy values of FEPP (average feature extraction time per packet)
of at least 0.950, with the intrusion categories consistently and IPF (average NN inference time per flow). Therefore, this
outperforming the benign categories. These findings indicate explains the close results observed between W and WO.
that our method exhibits high availability and is particularly IPF exhibits fluctuations in the range of 3000-5000 ns,
adept at identifying intrusion instances as compared to benign while FEPP shows two stable phases, particularly around
ones. 8 concurrent flows. This behavior is attributed to the fact

424
TABLE VII: Effectiveness of real time detection for different benign and intrusion behaviours
Benign Intrusion
Web
FTP FTP Http TCP FTP- SSH- Dos Port Dos Web
SSH Brute
Download Upload Traffic Traffic Patator Patator Goldeneye Scan Slowhttptest XSS
Force
Accuracy 0.973 1.000 0.950 1.000 0.964 1.000 0.997 1.000 1.000 1.000 1.000 0.983

Fig. 7: Reliability in real time detection

that inference is performed only at the completion of each Maps to store parameters. Accessing parameters through eBPF
flow, causing IPF to be influenced by other concurrently helper functions at each layer introduces additional memory
processed flows when using eBPF tail call. On the other hand, access overhead, resulting in an increased IPF. Overall, our
feature extraction occurs immediately upon receiving a packet, method reduces inference time overhead while maintaining
resulting in a more stable pattern. Given the limitation of 8 performance.
cores in the system, when the number of concurrent flows
exceeds 8, all cores become fully occupied, leading to an
increase in the execution time of FEPP.
TABLE VIII: Time overhead of feature extracting and NN
inference. WO denotes average transmission time without
feature extracting and NN inference. W denotes average
transmission time with feature extracting and NN inference.
FEPP denotes average feature extraction time per packet. IPF Fig. 8: IPF of the existing methods.
denotes average NN inference time per flow. The units of
WO and W are seconds, and the units of FEPP and IPF 2) Space: Storage overhead comprises of two components.
are nanoseconds. The initial component comes from storing the features of
# Flows WO (s) W (s) FEPP (ns) IPF (ns)
presently active flows. Each flow has six features stored as
int64, resulting in a total overhead requirement of 48n bytes,
1 0.38 0.39 96.50 5189.88
2 0.80 0.78 90.69 4570.06 where n represents the current number of active flows. The
4 1.65 1.60 90.91 3691.25 second component involves storing the NN parameters. With
8 3.33 3.31 134.81 3697.88 the sizes of the three layers in the NN as 6×32, 32×32, 32×2,
16 6.49 6.52 127.90 2945.18
32 13.27 13.19 142.96 3333.93 and using int32 for storage, along with the hidden layer output
64 26.07 26.19 145.11 3872.96 uniformly represented by an int32 array of 32 elements, a total
128 50.30 50.59 148.28 4306.88 of 5248 bytes is needed.
3) CPU: eBPF program is triggered with the XDP hook,
To compare the inference time overhead, we also mea- so its impact on the CPU is mixed in the kernel process and
sure the IPF of the existing methods, as shown in Figure does not exist as a separate process. In order to isolate the
8. Linear SVM during inference is equivalent to a single- CPU overhead of the eBPF program, we use perf [42] to
layer NN, hence the minimum IPF. Our method and NN- instrument CPU performance. We use iperf to send at the
int8 have similar inference times, but we simplify the NN maximum rate for 60 seconds, with the number of concurrent
inference algorithm implementation. The decision paths of DT processes ranging from 1-128. During the first 0-30 seconds of
are uncertain and vary with the number of flows. Although the network transmission, we randomly start perf to samples
DT is structurally simpler, their implementation is constrained at 99Hz for 30 seconds, and then record the CPU overhead
by eBPF memory access, necessitating the use of eBPF caused by the eBPF program. Since NN inference starts after

425
each flow ends, the recorded results is the overhead of real- TABLE IX: Impact of DT Depth [12]
time feature extraction for each packet.
Depth Accuracy Precision Recall F1-score
3 0.862 0.599 0.910 0.722
4 0.886 0.990 0.425 0.594
5 0.914 0.829 0.706 0.763
6 0.944 0.799 0.958 0.871
7 0.967 0.898 0.941 0.919
8 0.971 0.966 0.886 0.924
9 0.983 0.964 0.951 0.957
10 0.987 0.962 0.972 0.967

Fig. 9: CPU Overhead of Feature Extraction


the trained DT to detect the removed intrusions. We find that
As shown in Figure 9, the CPU overhead remains between
although the average F1-score of the DT on the testing dataset
0.54% and 1.64%. If the number of concurrent flows does not
reaches 0.969, the accuracy on each removed intrusion types
exceed 8 (total CPU cores), the CPU overhead increases as the
is low, as shown in Figure 11. We also find that NN generally
number of flows grows because the total bandwidth becomes
performs better than DT on the removed intrusion types but
larger. After exceeding the limit, each flow not only has to
still suffer from the problem of concept drift.
compete for bandwidth, but also for CPU cores, so the total
bandwidth is declining and the CPU overhead decreases even
the number of flows grows.
E. Overlooked Issues in Existing Methods
The worst-case spatial complexity of DT used in [12] is
O(2n ), where n represents the depth [43]. Although tree
pruning can reduce nodes in the training process, the number Fig. 11: Accuracy on the removed intrusions
of nodes in the trained DT varies even for the same dataset. We
train 1024 DT and the training data comprised 80% randomly When using non-linear SVM, we find that even with the
sampled data from each category of CIC-IDS-2017. As shown same training dataset and Top 6 features, the training time
in Figure 10, the distribution of the number of nodes is of SVM does not stop even after several hours and is much
mainly spread between 300 and 600. In order to hot-update longer compared to DT and NN. DT and NN takes 5s and
71s respectively to achieve the effectiveness shown in Table
V. Even we set the maximum number of iterations to 1000
during training, SVM training is still slow. As shown in
Table X, results on the testing dataset indicate that non-
linear kernels do not improve the performance of the model.
Computations involving the exponential function ex in the
RBF and Sigmoid kernels also pose significant challenges for
its integer implementations in eBPF.

TABLE X: Performance of SVM with non-linear kernels


Fig. 10: Distribution of the number of DT nodes.
SVM Training
Accuracy Precision Recall F1-score
parameters, eBPF program has to consider the worst case, Kernel Time (s)
which introduces exponential memory overheads to the kernel. Linear 0.262 0.210 0.999 0.348 1289
Polynomial 0.198 0.197 0.999 0.329 1414
In addition, the depth of DT also determine its performance. RBF 0.222 0.202 0.999 0.336 2147
As shown in Figure IX, when using the Top 6 features, if Sigmoid 0.421 0.196 0.625 0.298 2706
the depth is less than 9, the performance of DT is inferior to
our method (NN-int32 Q+Top K). DT implemented in [12] Compared to NN-int8 [14], our method (NN-int32) is not
requires at least 4 int64 arrays. If the depth is set to 10, then simply increase the length of quantized integers from 8 bits to
the worst-case memory would reach 4 × 8 × 210 = 32 KB. 32 bits. Hara et al. [14] use int8 to store the model parameters,
However, our method achieves performance close to that of while each layer still use int32 to store the input and output.
DT while reducing memory overhead to 5 KB (§ V-D). Thus, for each layer, the input is first quantized into int8,
Although DT achieves great anomaly detection performance then undergoes linear and relu layer, and finally, dequantized
with a simple structure, it still suffer from the problem to obtain the output in int32, which serves as the input for
of concept drift. To illustrate this, we remove one type of the next layer. The quantization and dequantization steps in
intrusion from the training and testing datasets from CIC-IDS- each layer add considerable computation overhead and error to
2017 at a time, then train a DT with a depth of 10 and use the real-time inference. Furthermore, the two steps introduce

426
extra memory overhead beyond the model parameters [44]. excessive computational overhead, we choose to perform de-
However, our method essentially stores floating-point numbers tection only after the completion of a flow.
in the form of fixed-point numbers within int32, eliminating
the need for additional quantization and dequantization steps VIII. C ONCLUSION
during inference. We design our NN inference algorithm and We address the overlooked issues in eBPF such as the max-
implementation mechanism to accommodate the limitations imum instructions number, integer-only arithmetic operations,
of eBPF overlooked in [14]. We significantly improve per- and race conditions. Subsequently, we implement an efficient
formance while reducing the computational complexity. real-time intrusion detection and defense prototype within the
kernel. First, we redesign an NN inference algorithm based on
VI. R ELATED W ORK int32 quantization and integer-only arithmetic. To overcome
Real-time intrusion detection and prevention can protect the maximum eBPF instruction limitation, we decompose the
system from potential impact on its functionality [3]. However, algorithm and employ chained eBPF Tail Calls for real-time
current methods [4]–[7] focus on algorithmic improvements inference in XDP. Furthermore, we implement a thread-safe
while neglecting the CPU, memory, and context switch over- mechanism for hot-updating model parameters. The evaluation
head during real-time packet capturing. Unlike these methods, results show that our methods reduce memory and inference
we utilize eBPF to offload the intrusion detection model into time overhead while maintains performance comparable to the
XDP hook, enabling real-time packet capturing and subsequent existing state-of-the-art method using eBPF.
analysis within the kernel. We significantly reduces the kernel- ACKNOWLEDGMENT
user context switch overhead and make it compatible with a
wider range of NIC architectures in a cost-effective way. We greatly appreciate the insightful feedback from the
There are also similar methods that directly use eBPF anonymous reviewers. The research is sponsored by the Na-
for intrusion detection. [12] first implements DT in eBPF, tional Key Research and Development Program of China
but the structure of DT becomes unfixed after each training, (2019YFB1804002), the National Natural Science Foundation
presenting challenges in storing and updating it within eBPF. of China (No.62272495) and the Guangdong Basic and Ap-
Linear Support Vector Machine employed by [13] may not plied Basic Research Foundation (No.2023B1515020054), and
be suitable for intrusion detection, which is not a straight- sponsored by Huawei. The corresponding author is Pengfei
forward linearly separable problem. NN has fixed structure Chen.
and strong fitting capabilities. Therefore, NN is widely used
R EFERENCES
in intrusion detection task now [9]. [14] implements NN in
eBPF based on int8 quantization. However, the quantization [1] Shih-Wei Li, John S Koh, and Jason Nieh. Protecting cloud virtual
machines from hypervisor and host operating system exploits. In 28th
method introduces significant errors and unnecessarily compli- USENIX Security Symposium (USENIX Security 19), pages 1357–1374,
cates the implementation process. Moreover, [12]–[14] neglect 2019.
the critical analyses such as feature importance, overlooked [2] Amazon. Cloud security software. https://aws.amazon.com/marketplace/
solutions/security.
problems like race conditions and limitations in implementing [3] Ravi Sekar and Prem UppuluriR Sekar. Synthesizing fast intrusion
complex algorithms, and the reproduction real-time intrusions prevention/detection systems from high-level specifications. In 8th
for evaluation. We redesign a new NN inference mechanism USENIX Security Symposium (USENIX Security 99), 1999.
[4] Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai.
based on int32 and implement it in eBPF through chained Kitsune: an ensemble of autoencoders for online network intrusion
eBPF tail calls. We then propose a thread-safe parameters detection. arXiv preprint arXiv:1802.09089, 2018.
hot-updating mechanism. Through comprehensive evaluations, [5] Steve TK Jan, Qingying Hao, Tianrui Hu, Jiameng Pu, Sonal Oswal,
we demonstrate that our method achieves performance and Gang Wang, and Bimal Viswanath. Throwing darts in the dark? detecting
bots with limited data using neural data augmentation. In 2020 IEEE
inference overhead comparable to the existing methods while symposium on security and privacy (SP), pages 1190–1206. IEEE, 2020.
reducing memory overhead. [6] HyungBin Seo and MyungKeun Yoon. Generative intrusion detection
and prevention on data stream. In 32nd USENIX Security Symposium
(USENIX Security 23), pages 4319–4335, 2023.
VII. L IMITATIONS & D ISCUSSIONS [7] Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evange-
los P Markatos, and Sotiris Ioannidis. Gnort: High performance network
Since the detection occurs only at the end of a flow, our intrusion detection using graphics processors. In Recent Advances
current implementation is not suitable for intrusions during in Intrusion Detection: 11th International Symposium, RAID 2008,
persistent connections. Persistent connection means that it Cambridge, MA, USA, September 15-17, 2008. Proceedings 11, pages
116–134. Springer, 2008.
remains open for a long duration without being disconnected. [8] iptables. https://linux.die.net/man/8/iptables.
The detection of intrusions in persistent connections requires [9] Feng Wei, Hongda Li, Ziming Zhao, and Hongxin Hu. Xnids: Explaining
detecting every packet or flow features within a specific time deep learning-based network intrusion detection systems for active
intrusion responses. In 32nd USENIX Security Symposium (USENIX
window. This limitation can be addressed by replacing the Security 23), Anaheim, CA, USA, 2023.
training dataset and conducting detection immediately after [10] Matteo Bertrone, Sebastiano Miano, Fulvio Risso, and Massimo Tumolo.
each packet. Since current mainstream intrusion detection Accelerating linux security with ebpf iptables. In Proceedings of the
ACM SIGCOMM 2018 Conference on Posters and Demos, pages 108–
datasets are primarily based on features of the entire flow [33], 110, 2018.
and conducting detection for each received packet introduces [11] eBPF. Available at: https://ebpf.io/.

427
[12] Maximilian Bachl, Joachim Fabini, and Tanja Zseby. A flow-based ids [40] selenium. Available at: https://selenium-python.readthedocs.io/.
using machine learning in ebpf. arXiv preprint arXiv:2102.09980, 2021. [41] Brian W Matthews. Comparison of the predicted and observed secondary
[13] NEMALIKANTI ANAND, MA SAIFULLA, and Pavan Kumar Aakula. structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-
High-performance intrusion detection systemusing ebpf with machine Protein Structure, 405(2):442–451, 1975.
learning algorithms. 2023. [42] perf. Available at: https://perf.wiki.kernel.org/index.php/Tutorial.
[14] Takanori Hara and Masahiro Sasabe. On practicality of kernel packet [43] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford
processing empowered by lightweight neural network and decision tree. Stein. Introduction to algorithms. MIT press, 2022.
In 2023 14th International Conference on Network of the Future (NoF), [44] Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius
pages 89–97. IEEE, 2023. Micikevicius. Integer quantization for deep learning inference: Principles
[15] Linux. Bpf design q&a. Available at: https://www.kernel.org/doc/html/ and empirical evaluation, 2020.
v5.2/bpf/bpf design QA.html.
[16] Jonathan Corbet. Concurrency management in bpf. https://lwn.net/
Articles/779120/.
[17] Ahmad Javaid, Quamar Niyaz, Weiqing Sun, and Mansoor Alam. A
deep learning approach for network intrusion detection system. In
Proceedings of the 9th EAI International Conference on Bio-inspired
Information and Communications Technologies (formerly BIONETICS),
pages 21–26, 2016.
[18] Jihyun Kim, Jaehyun Kim, Huong Le Thi Thu, and Howon Kim. Long
short term memory recurrent neural network classifier for intrusion
detection. In 2016 international conference on platform technology and
service (PlatCon), pages 1–5. IEEE, 2016.
[19] Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den
Hengel. Deep learning for anomaly detection: A review. ACM computing
surveys (CSUR), 54(2):1–38, 2021.
[20] Cillium. Bpf and xdp reference guide. Available at: https://docs.cilium.
io/en/latest/bpf/.
[21] Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ah-
madzadeh, Xinyu Xing, and Gang Wang. Cade: Detecting and explaining
concept drift samples for security applications. In 30th USENIX Security
Symposium (USENIX Security 21), pages 2327–2344, 2021.
[22] Tcpdump. https://www.tcpdump.org/manpages/tcpdump.1.html.
[23] Mohammad Saiful Islam Mamun Arash Habibi Lashkari, Gerard Draper-
Gil and Ali A. Ghorbani. cicflowmeter. Available at: https://www.unb.
ca/cic/research/applications.html.
[24] Dongzi Jin, Yiqin Lu, Jiancheng Qin, Zhe Cheng, and Zhongshu
Mao. Swiftids: Real-time intrusion detection system based on lightgbm
and parallel intrusion detection mechanism. Computers & Security,
97:101984, 2020.
[25] Libpcap. https://www.tcpdump.org/manpages/pcap-filter.7.html.
[26] Programming with pcap. Available at: https://www.tcpdump.org/pcap.
html.
[27] pidstat. Available at: https://man7.org/linux/man-pages/man1/pidstat.1.
html.
[28] Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu.
Electrode: Accelerating distributed protocols with ebpf. In 20th USENIX
Symposium on Networked Systems Design and Implementation (NSDI
23), pages 1391–1407, 2023.
[29] Toke Høiland-Jørgensen, Jesper Dangaard Brouer, Daniel Borkmann,
John Fastabend, Tom Herbert, David Ahern, and David Miller. The
express data path: Fast programmable packet processing in the operating
system kernel. In Proceedings of the 14th international conference on
emerging networking experiments and technologies, pages 54–66, 2018.
[30] Tc-bpf. Available at: https://man7.org/linux/man-pages/man8/tc-bpf.8.
html.
[31] Sebastiano Miano, Matteo Bertrone, Fulvio Risso, Massimo Tumolo,
and Mauricio Vásquez Bernal. Creating complex network services with
ebpf: Experience and lessons learned. In 2018 IEEE 19th International
Conference on High Performance Switching and Routing (HPSR), pages
1–8. IEEE, 2018.
[32] John H Ring IV, Colin M Van Oort, Samson Durst, Vanessa White,
Joseph P Near, and Christian Skalka. Methods for host-based intrusion
detection with deep learning. Digital Threats: Research and Practice
(DTRAP), 2(4):1–29, 2021.
[33] Iman Sharafaldin, Arash Habibi Lashkari, and Ali A Ghorbani. To-
ward generating a new intrusion detection dataset and intrusion traffic
characterization. ICISSp, 1:108–116, 2018.
[34] httperf. Available at: https://github.com/httperf/httperf.
[35] iperf. Available at: https://iperf.fr.
[36] GoldenEye. Available at: https://github.com/jseidl/GoldenEye.
[37] Slowhttptest. Available at: https://github.com/shekyan/slowhttptest.
[38] patator. Available at: https://github.com/lanjelot/patator.
[39] DVWA. Damn vulnerable web app. Available at: https://github.com/
digininja/DVWA.

428

You might also like