19 Analysing Performance Issues of Open-Source Intrusion Detection Systems in High-Speed Networks
19 Analysing Performance Issues of Open-Source Intrusion Detection Systems in High-Speed Networks
a r t i c l e i n f o a b s t r a c t
Article history: Driven by the growing data transfer needs, industry and research institutions are deploying 100 Gb/s
Available online 9 January 2020 networks. As such high-speed networks become prevalent, these also introduce significant technical chal-
lenges. In particular, an Intrusion Detection System (IDS) cannot process network activities at such a high
rate when monitoring large and diverse traffic volumes, thus resulting in packet drops. Unfortunately, the
high packet drop rate has a significant impact on detection accuracy. In this work, we investigate two
popular open-source IDSs: Snort and Suricata along with their comparative performance benchmarks to
better understand drop rates and detection accuracy in 100 Gb/s networks. More specifically, we study
vital factors (including system resource usage, packet processing speed, packet drop rate, and detection
accuracy) that limit the applicability of IDSs to high-speed networks. Furthermore, we provide a com-
prehensive analysis to show the performance impact on IDSs by using different configurations, traffic
volumes and different flows. Finally, we identify challenges of using open-source IDSs in high-speed net-
works and provide suggestions to help network administrators to address identified issues and give some
recommendations for developing new IDSs that can be used for high-speed networks.
© 2020 Elsevier Ltd. All rights reserved.
1. Introduction ment for handling 1 Gb/s traffic. Alhomouda et al. [2] measured
the performance while using Suricata on FreeBSD for monitoring
Intrusion Detection Systems (IDSs) have played a significant role unauthorised activities with a revvast volume of background traf-
in detecting malicious activities in a network and the hosts con- fic. Hu et al. [3] explored that IDS performance can be affected by
nected to it. IDSs such as Snort, Bro, and Suricata, are used for different flows with different durations.
identifying potential attacks on today’s networks; however, there Both Snort and Suricata use a regular expression to match
are performance limitations of IDSs with currently available high- attacker’s patterns in network traffic. However, with the large
speed networks. There have been several studies [1–4] that focus traffic volume, matching packet’s data using regular expressions
on two main aspects of IDS performance: the first one is to find consumes a significant amount of system resources and becomes
and reduce factors that affect IDS performance; the other one is to performance bottleneck during the packet detection procedure.
improve the overall IDS performance. Antonatos et al. [5] discovered that in Snort, string pattern
Some studies [1–3] find that IDS performance can be influ- matching consumes 40–70% of the total processing time. For
enced by various factors such as IDS configuration, the number this reason, some existing studies [6–8] suggest improving the
of network flows,1 and flow durations. For instance, Salah et al. regular expression matching architecture in order to improve
[1] and Alhomouda et al. [2] discovered that different Operating IDS performance. Yang et al. [7] proposed a novel Deterministic
Systems (OSs) and platforms could impact IDS performance. Salah Finite Automata (DFA) accelerated architecture that improves the
et al. [1] found that Snort performs better in the Linux environ- throughput of DFA while managing memory efficiently. Their solu-
tion leverages three Field Programmable Gate Arrays (FPGA)-based
algorithms: Simple State Merge Tree (SSMT), Distribute Data in
∗
Round-Robin(DDRR), and Multi-path Speculation, which make the
Corresponding author.
serial DFA matching can be parallelised and pipelined. They tested
E-mail addresses: [email protected] (Q. Hu),
[email protected] (S.-Y. Yu), [email protected] (M.R. Asghar). this architecture in different production environments. Their re-
1
A network flow (a.k.a. flow hereafter) is a group of packets having the same (i) sults show that this new design improves the processing speed by
source and destination IP addresses, (ii) port numbers, and (iii) the protocol. 108 times.
https://doi.org/10.1016/j.jisa.2019.102426
2214-2126/© 2020 Elsevier Ltd. All rights reserved.
2 Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426
Hu et al. [3] highlighted the challenges of using the default IDS factors affect IDS performance and packet drop rates in the high-
packet capturing mechanism and packet detection mechanism in speed network, such as different traffic volumes, different flow
a high-speed network. For instance, they observed less than 10% types and system resource allocation. We summarise our research
packet drop rate when they used a default IDS configuration for a contributions as follows.
1 Gb/s single flow test network. However, the default configuration
gives an abysmal performance (80% packet drop rate, 99% CPU • We assess the performance and accuracy of two open-source
usage) when processing multiple flows on a 2 Gb/s network. After IDSs under different network throughputs. Our results show
modifying the packet capturing mechanism and the packet detec- that it is not possible to handle 100 Gb/s traffic using both
tion mechanism, their result shows a significant improvement. The IDSs with the existing packet capturing mechanisms, including
packet drop rate reduced to 1% and the CPU usage to 11.5%. Camp- Libpcap and AF_PACKET. We noticed the CPU bottleneck with a
bell and Lee [4] introduced a hardware-based solution to reduce default configuration of the IDSs, causing to drop 99.9% packets
the packet detection volume for each IDS instance. Their solution when an incoming throughput reaches to 40 Gb/s. We found
is a hybrid approach that uses a set of Bro police scripts on a that AF_Packet improved the limitation to 60 Gb/s, but both
load-balancing device to interact with the Bro instances using pre- IDSs started to drop packets above 60 Gb/s. We also discovered
defined Application Programming Interfaces (APIs). Their solution that we can capture 100 Gb/s with eXpress Data Path (XDP)
reduced the packet detection volume for each IDS instance, while in Suricata but with a single detection rule; however, it is not
maintaining IDS accuracy at the high-level of effectiveness. possible to detect any malicious activity with just a single rule.
By reviewing existing studies [2–4,7], we discovered that CPU • We observed that not only a larger volume of traffic affects the
usage, memory usage, and packet drop rates of an IDS could be performance of IDSs, but also the complexities of the multi-
affected by different environments, i.e., packet detection mecha- ple flows impact both performance and accuracy. For instance,
nisms, packet capturing mechanisms, the number of flows, and Snort 3.0 adopts the multithreaded architecture, which im-
hardware specifications. Many studies [3,4,7,9] have been con- proves CPU usage and reduces packet drop rates compared to
ducted to improve IDS performance on high-speed networks. the previous experimental results [3]. However, when dealing
However, existing studies focus on investigating IDS performance with a large volume of multiple flows, Snort 3.0 and Suricata
under 20 Gb/s networks or using driver-dependent module such 4.1 will suffer from performance degradation. As a result, the
as PF_Ring [10]. We found that the performance of open-source CPU usage reaches 99% and the packet drop rate becomes as
IDS under 100 Gb/s throughput seems not to have received much high as 99.9% when handling 33,0 0 0 multiple flows per second.
attention in the literature. As expected, the accuracy of malicious flow detection decreases
The objective of this study is to understand the feasibility of as the packet drop increases.
popular open-source IDSs, including Snort and Suricata, in a 100 • Both Snort 3.0 and Suricata 4.1 have improved resource utili-
GB/s network without relying on a new packet capturing mecha- sation compared to their previous versions (including Snort 2.8
nism or updating the existing hardware. We would like to high- and Suricata 3.1.4) two years ago [3]. For example, Snort 3.0
light the challenges of these IDSs in modern high-speed networks adopted the multithreaded architecture, which optimises CPU
and propose optimisations to improve their performance. We list usage and reduces packet drop rates compared to the previous
our research questions to fulfil the objective. results from Snort 2.8 [3]. Nevertheless, when dealing with
There are emerging research questions: do we need a power- a larger volume of multiple flows, Snort 3.0 and Suricata 4.1
ful server to run the open-source IDS tool for handling a high- suffered from competing for the system resources with other
speed throughput? How much memory and how many CPU cores applications. The major challenge we found in our experiment
required to support a high-performance IDS? What is the main is to balance resource allocation among IDS instances, Ipref3 in-
challenge for running IDS instances and other applications in par- stances, and Soft Interrupt Request(IRQ) instances. Even though
allel? Answering these questions provides performance baseline of we specified different cores for handling SoftIRQ, IDSs, and
the IDSs with different packet capturing mechanisms in a high- Ipref3 processes, we found CPU usage from handling SoftIRQ
speed network. We also suggest optimisations for IDSs to maximise caused to drop packets in Snort and Suricata. When the CPU
their performance in high-speed networks. For example, running cannot handle SoftIRQ from the NIC, both IDSs began to drop
IDSs with 60 Gb/s traffic requires a CPU with at least 12 cores packets.
to distribute the load from the IDSs and Network Interface Card
The rest of this article is organised as follows. We provide a
(NIC). If an IDS system does not have enough resources, it may
brief overview of two popular IDSs in Section 2. Section 3 describes
lead the system overload and cause an IDS to miss malicious ac-
our methodology, testing environment, and use case scenarios.
tivities [3]. Also, we need to understand the challenges of using
In Section 4, we discuss experimental results with different IDS
both IDSs in high-speed networks, such as whether two common
configurations. Section 5 shows the significant impacts of using ex-
packet capturing mechanisms are able to handle 100 GB/s traffic
isting IDS configurations in high-speed networks. We also provide
without any missing any packets. Schaelicke et al. [11] and Ptacek
some solutions to improve IDS performance. Section 6 concludes
et al. [12] reported that even a limited packet loss is critical to the
this article, provides some recommendations for practitioners, and
accuracy of IDSs. Further, we would like to investigate the current
highlights research directions for future work.
mechanisms used by the existing IDSs, whether the current packet
capturing mechanisms and packet detection mechanisms can still
maintain high efficiency and low packet drop rates under more 2. Open-source intrusion detection tools
complex network flows. Besides, both Snort and Suricata have re-
leased new versions with some performance improvements, so we In this section, we discuss two open-source IDS tools: Snort and
want to assess if these new versions of IDSs could be used di- Suricata. Both tools are widely deployed by many organisations
rectly under high-speed network traffic without any configuration [13] to protect their networks. We begin with the design goals
changes. and then describe the architectures of both Snort and Suricata. It
Research Contributions In this article, we evaluated the feasibil- will help us understand why both tools perform differently even
ity of using IDS with 100 Gb/s traffic, highlighted the main chal- though both implement a multithreaded architecture. The other ar-
lenge of running IDSs in the high-speed networks, as well as pro- chitectural components of an IDS including the packet capturing
posed possible solutions. More importantly, we found that many mechanisms and the packet detection mechanisms will be covered
Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426 3
2.1. Snort
then uses getifaddrs() to get their IP addresses and related in- • An [output] function merges duplicated output states into a
formation. All such network devices are saved in the pcapif list. new output state.
• Berkeley Packet Filter (BPF) [14]. This provides a filter function
for the sniffer so that it can forward only specific packets. The 2.5.2. Regular expression signatures
BPF is applied after the driver receives the packets from the A regular expression mechanism is another signature-matching
network interface. algorithm; it uses character classes, unions, optional elements, and
• Packet Processing Loop. Snort calls the pcapdispatch() function closures to enhance a signature-based IDS flexibility. Moreover, it
from the libpcap library to read packets from the NIC. Snort improves search efficiency by adding effective schemes to perform
then uses a PcapProcessPacket() function to process each cap- pattern matching. A normal regular expression can be represented
tured packet based on different protocol types. The packet de- by a finite state automaton. In [21], Hopcroft et al. introduced two
coder passes decoded packets to the preprocessor module for finite state automata: a Deterministic Finite Automaton (DFA) and
further investigation. a Non-deterministic Finite Automaton (NFA). The DFA takes input
symbols and then the transition function outputs a single next
AF_PACKET AF_PACKET [15] is the Linux native network socket. state. Instead of returning a single next state, the NFA solution re-
Similar to libpcap, AF_PACKET enables network administrators to turns a set of states. Existing studies [7,17,19] show that NFAs are
configure a memory buffer for captured packets. This means that compact but slow; whereas, DFAs are fast but may require more
the memory allocated for the buffer is shared with the capture memory while processing. In the last decade, most studies focused
process, so instead of the kernel sending packets to the capture on making DFAs more efficient, such as [17], where Gong et al. re-
process, the process can just read the packets from their original duced the construction time, memory and matching time by using
memory address. This method saves time and CPU resources. a multi-dimensional finite automaton in the original DFA model.
PF_RING PF_RING [16] is another high-performance Linux kernel
module that optimises load balancing through the ring cluster de- 3. Our methodology
sign. In the packet capturing process, the application copies pack-
ets from the NIC to the PF_RING circular buffer. Then, the IDSs read In this section, we describe our proposed methodology and test
the packets from this circular buffer. PF_RING can distribute incom- environment. For our experiments, we use three different methods
ing packets to multiple rings and it allows multiple applications to to generate high throughput traffic. In the first experiment, we
process packets simultaneously. verify the performance of IDSs under a controlled environment
and use Iperf3 to generate multiple TCP flows with a packet size
2.5. Packet detection mechanisms of 1500 bytes. We measured the performance of each IDS with
increasing throughput from 10 Gb/s to 100 Gb/s to test their capa-
Traditionally, an IDS inspects packets deeply by scanning every bility of handling packets in a 100 Gb/s network. In the second ex-
byte of the packet; however, several improvements have been pro- periment, we measure the detection accuracy under high through-
posed in the last two decades [7,17–19]. By reviewing studies in put traffic along with some malicious traffic. To achieve our goal,
the past [7,17–20], we found two packet detection mechanisms that we introduce Pytbull, which is a flexible IDS and Intrusion Preven-
have been used most widely in the current IDS tools. tion System (IPS) testing framework that covers a broader scope
of attacks. We run Iperf3 and Pytbull at the same time to verify
2.5.1. Aho-Corasick algorithm whether each IDS can detect the attacks under a certain amount of
Aho et al. [20] proposed a simple and efficient algorithm in background traffic. In the third experiment, we assess the accuracy
1975 (used in existing IDS tools including Snort and Suricata) that of each IDS with real-world traffic. To this end, we use TRex to
uses a default pattern searching algorithm. In this approach, they generate L4 to L7 traffic based on their real-world traffic templates.
use a pattern matching machine to represent a predefined lan- We extend the first and second experiments by testing each IDS
guage as a set of strings; network administrators can test whether and its packet capturing mechanism with real-world background
an input string matches any set of the given strings. The pattern traffic and some malicious traffic. To conduct our experiments, we
state machine processes an input text string and is composed of set up a 100 Gb/s testbed with three machines. Our testbed uses
three functions: a [goto] function, a [failure] function, and an [out- two Dell PowerEdge R740XD servers: one as a sender and the other
put] function: as a receiver. Also, we use a Dell Z9100 as a switch connecting the
two. The hardware detail is specified in Table 1.
• A [goto] function constructs a goto graph; the goto graph starts
with a root node that represents a state, 1. Each input keyword 3.1. First experiment: Performance checking
is entered into a subsequent node. A search starts from state 1
and a path through the graph spells out a keyword. If no fail- This experiment aims to evaluate the performance of Snort and
ure is detected during the search, the matched keyword will be Suricata while processing a TCP flow throughput from 10 Gb/s
passed to an output function. to 100 Gb/s. As a result, the experiment was set up in a con-
• A [failure] function is triggered when the [goto] function re- trollable environment, and there is no background traffic between
ports failure. For example, if a current input character is not two machines. The experiment consisted of a logical network dia-
found in the current node or the sub-nodes on the same path, gram as shown in Fig. 2. Both IDSs installed on a receiver server.
the pattern matching machine will call the [failure] function to We used Iperf3 to generate the TCP flow with a packet size of
search alternative paths for processing the character. 1500 bytes which is the most common Maximum Transmission
Table 1
The role, the model, and hardware specifications of each tested device.
Sender Dell PowerEdge R740XD 2 x Intel XEON Gold 6126 2.6 GHz, 12 cores per CPU 196 GB Mellanox ConnectX-5 100GE
Receiver Dell PowerEdge R740XD 2 x Intel XEON Gold 6126 2.6 GHz, 12 cores per CPU 196 GB Mellanox ConnectX-5 100GE
Switch Dell Z9100 MPC8541 2 GB Firebolt-3 ASIC
Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426 5
Fig. 2. Two test machines were configured for our 100 Gb/s test environment. A
sender used Iperf3 to deliver large quantities of data. Two IDS tools were installed Fig. 4. A sender used Trex to generate different flows with different time duration.
on the receiver side to measure IDS performance while monitoring 100 Gb/s TCP The throughput is configured between 1 Gb/s to 20 Gb/s. Pytbull is installed on the
flows with a longtime duration. sender for checking IDS accuracy when we simulate different throughput. An IDS is
installed at the receiver side, it monitors all incoming traffic.
Fig. 5. Comparing the performance of the different versions of IDSs: we set up a network environment with a 10 Gb/s TCP flow. The name of each title in the table is
defined based on the following naming convention: IDS name, version number, test environment, and the packet capturing mechanism. For instance, if we test Snort 3.0 in
a 10 Gb/s network to assess IDS performance using Libpcap then we call this test Snort3.0_10 Gb/s Libpcap.
performance check against the default configuration. Furthermore, to lose track of the potential attacks [11]. We also want to com-
we adjust the packet capturing and packet detection mechanisms pare the absolute performance of each IDS with different packet
of the IDSs based on the packet drop rates. Schaelicke et al. capturing mechanisms in terms of their capability of processing
[11] discovered a linear relationship between packet loss and preci- throughput without packet loss. When we tested Suricata with
sion loss in IDSs. The similar result has been discovered by Ptacek three 10 Gb/s throughput, we discovered that Libpcap had 1%
et al. [12], where they showed that attacks can bypass detection drop rate and we removed the Suricata with Libpcap from the ex-
by overloading the IDS, causing high drop rates and increasing the periments with more number of flows. Fortunately, Suricata with
chances that a successful intrusion remains undetected. Their re- AF_PACKET did not lose any packet until the throughput reaches
sults indicate that any packet loss can directly degrade the effec- 60 Gb/s.
tiveness of the IDS. In this work, we discard an IDS with a spe- Based on Fig. 6, we find that the Libpcap was not satisfactory
cific packet capturing mechanism for the further experiment once when the throughput is over 40 Gb/s. Therefore, we decided to use
it starts losing packets. AF_PACKET as the packet capturing mechanism for Snort and Suri-
The performance comparison results of different versions of cata to monitor the traffic over 40 Gb/s. After applying AF_PACKET,
IDSs are shown in Fig. 5. The results show that the newer versions Suricata’s average CPU increased to 60% while monitoring 50 Gb/s
of Snort have improved their performance in terms of the CPU and traffic, and the packet drop rate decreased to 0%. We use the same
memory usage and the packet drop rate in terms of resource con- packet capturing mechanism in Snort 3.0, the CPU utilisation of
sumption. We notice that Snort3.0’s CPU usage was 11% lower than Snort is higher than Suricata 4.0 while processing traffic under
that of Snort 2.8, its memory usage dropped from 2% to 0.1% while 50 Gb/s throughput. Snort’s CPU consumption was 94% along with
processing the same 10 Gb/s TCP flow. Compare to Snort, Suricata’s 0% packet drop rate. We tried up to 60 Gb/s because we observed
changes are not significant. Suricata 4.0 used 16 cores from the some packets being dropped. However, we discovered that Snort
receiver server, and each core consumed 10% CPU for processing and Suricata do not accurately reflect the packet drop rate from
10 Gb/s traffic. The collected performance data shows that Suri- the network layer aspect. For example, when we tested IDS per-
cata4.0s memory usage is less than that of Suricata 2.10 as illus- formance using the 60 Gb/s throughput, both Snort 3.0, and Suri-
trated in Fig. 5. Fortunately, the packet drop rate of Suricata 4.0 cata 4.0 showed that the packet drop rate was 0% with AF_PACKET.
had decreased from 5.9% to 0%. After we deeply analysed how many packets were sent from the
For testing the performance of newer versions of IDSs in high- sender, we found that the receiver side lost about 0.01% traffic.
speed networks, we used with Snort 3.0 and Suricata 4.0 with All in all, our results statistically displayed the process with TCP
default configurations and generated one TCP flow with a single flows under different throughput and resource overheads. It clearly
20 Gb/s flow and then two flows with 10 Gb/s throughput for each. shows that when the throughput starts to increase, an IDS con-
From our results, we observe that a single 20 Gb/s flow over- sumes more resources to maintain the low packet drop rate.
loaded the receiver with too many interrupts. As a result, IDSs IDS Accuracy Checking In the second experiment, we explored
dropped a lot of incoming traffic. To reduce the packet drop how accurately Snort 3.0 and Suricata 4.0 can classify the legiti-
caused by the interrupt, we used two 10 Gb/s flows and used the mate and malicious traffic under 60 Gb/s throughput. We used Pyt-
Receiver-Side Scaling (RSS) to spread hardware interrupt. We fur- bull, an open-source IDS test framework to test specific attack sce-
ther increased the throughput by increasing the number of 10 Gb/s narios. Each attack scenario assesses the default rule set in Snort
flows. 3.0 and Suricata 4.0 and targets the relevant alert. We ran both
We started the throughput from two 10 Gb/s flows and mea- IDSs with the default rule set and configurations. The attack detec-
sured CPU, memory, and packet drop rate when using Libpcap and tion rates of both IDSs are shown in Fig. 7. The difference between
AF_PACKET. We removed an IDS with a specific packet capturing both IDSs is minimal. Suricata detected all the malicious traffic
mechanism from further experiments when it starts dropping using the default rule set; whereas, Snort missed a few anomaly
packets because a small percentage of packet loss can cause IDSs packets. This difference suggests that Snort needs to add those
Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426 7
Fig. 6. Performance comparison between the new version and old version of IDSs having 20 to 60 Gb/s TCP flows, 10 Gb/s per flow using different packet capturing
mechanisms: The left side of the y-axis shows CPU usage, and the right side indicates the packet drop rate. Overall, Snort 3.0 and Suricata 4.1 show a high-performance
result when AF_PACKET was used as a packet capturing mechanism. Memory utilisation throughout the experiments stays at 10%.
CPU per core. The same behaviour was found in Snort 3.0; it con-
sumed 100% of 16 cores to process the same size of traffic. When
Trex stopped sending traffic, we stopped the Snort and Suricata
instances manually, and we found that the packet drop rate was
close to 100% when Libpcap was used. To understand whether IDS
performance can be improved by modifying the packet capturing
mechanism and packet detection mechanism, we launched differ-
ent experiments with different configurations. The best result from
these experiments is with AF_PACKET in Suricata. Our results show
that the packet drop rate of Suricata is down to 68% after using
AF_PACKET. As for Snort, no matter whichever combinations we
choose, Snort’s performance showed no difference.
Fig. 7. Evaluating the accuracy of both IDSs under 60 Gb/s throughput using Pytbull 5. Discussion
to generate the same attacks for both Snort and Suricata.
First, we explain the limitations of existing IDSs, then we dis-
cuss insights from the experiment. We found a large volume of
missed rules to its default rule set. Moreover, from the perfor-
multiple small flows can impact the CPU and memory usage as
mance aspect, the second experiment produced the same perfor-
well as lead to the higher packet drop rate. The packet detection
mance results as the first experiment for processing 60 Gb/s traffic.
mechanisms require more memory to process traffic with the pre-
This result shows that the accuracy of IDSs is not affected when
defined rule set in the high-speed network. The existing packet
there is less than 15% of packet drops. Actually, IDSs can maintain
capturing mechanisms have a packet loss issue. To sum up, IDS
a good balance between accuracy and performance with a simple
performance can be affected by the number of multiple flows as
network environment.
well as the high throughput. As a result, system resources are com-
IDS Performance and Accuracy Checking The third experiment
pletely exhausted, and there is no way to handle new requests. To
focused on testing performance and accuracy while processing
address these problems, we provide some possible solutions. First,
multiple flows with different throughput. We used Trex, an open-
we suggest a load balancing mechanism, where multiple flows
source traffic generator, which can be used to generate Layer 4
from a high-speed network can be distributed to a collection of
(transport layer) to Layer 7 (application layer) traffic based on pre-
IDS instances, each one processing 13,0 0 0 flows (about 2 Gb/s net-
processing and replay the Libpcap file that contains real traffic. We
work traffic) per second. Second, we suggest using efficient regular
evaluated the performance of Snort 3.0 and Suricata 4.0 under dif-
expression algorithms for reducing the cost of matching the packet
ferent throughputs (20 Gb/s to 60 Gb/s). To this end, we used Trex
payload with the predefined rule set. Moreover, we highlight the
to generate flows using different packet sizes, different protocols,
importance of enabling the Data Plane Development Kit (DPDK) as
but with the same flow duration. From our existing study [3], we
a new capturing mechanism; we provide some data to prove that
showed there is a performance bottleneck when processing a large
DPDK can significantly increase traffic throughput while incurring
number of multiple flows using default configurations in Snort. So,
a lower performance cost in Section 5.1.
we would like to see if this performance issue has been addressed
In our study, we did not consider Zeek3 due to the following
by using the multithreaded architecture in Snort. We observed
reasons. First, Zeek only supports Libpcap and PF_RING; however,
that both Snort 3.0 and Suricata 4.0 show the CPU and memory
due to the limitation of our experimental environment, we do not
overheads along with high packet drop rates while processing
have a PF_RING module installed on our test servers. Second, from
more than 33,0 0 0 flows per second, where the flow duration was
the previous studies [3,24], we learned that the multithreaded
40 milliseconds.
Suricata is better than a single-threaded Snort while processing a
We started with the default configuration (the packet capturing
mechanism is Libpcap, and the packet detection is Aho-Corasick
(AC)) and observed that Suricata used 16 cores and consumed 99% 3
https://www.zeek.org.
8 Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426
5.1. Recommendations
Fig. 8. Comparing Suricata’s CPU usage with SoftIRQ’s CPU usage when enabling
one detection rule from the Suricata configure file. Suricata can monitor 100 Gb/s
Our objective in this study is to find out how to improve perfor-
traffic with 0% drop rate, also the CPU of Suricata was stable at 60% while process-
ing a single rule file. Each coloured line represents the CPU usage of a core. mance of current open-source IDSs using recently developed tech-
niques including data processing approaches and packet capturing
mechanisms. In this section, we make three recommendations that
might help IDS developers as well as the system administrators in
deploying Suricata and Snort in high-speed networks.
Data distribution We discovered that 5G multiple flow traffic
is too much for a single IDS instance to handle. To this end,
we can divide the 5G traffic into smaller pieces, each of which
can be handled by a single IDS instance. In the case of using a
Software-Defined Networking (SDN) and an IDS cluster, network
Fig. 9. Comparing Suricata’s CPU usage with SoftIRQ’s CPU usage when enabling administrators can easily filter out the traffic based on the network
all detection rules in the Suricata configure file. When the Suricata’s CPU reached protocols, source, and destination addresses, port numbers, and
80%, the network throughput dropped to 89 Gb/s along with 62% traffic dropped by then pass on the traffic, which can effectively be processed by a
Suricata. The CPU usage was unstable between Suricata and SoftIRQ, thus requir-
ing Suricata to use more CPU resources to process all the rules. Each coloured line
single IDS instance, such as 2 Gb/s traffic for each instance. The IDS
represents the CPU usage of a core. cluster contains dozens or hundreds of IDS instances, each instance
analysing a fraction of the overall traffic volume. To achieve this,
OpenFlow [26] can extract traffic based on a predefined network
larger volume of traffic. However, with Snort3.0, a multithreading protocol, while at the same time, network administrators can scale
framework has been enabled. Therefore, we want to compare the IDS instances based on the traffic volume. Another approach to
Snort and Suricata’s performance again and check if Snort’s per- reducing the volume of data could be checking particular flows, for
formance improved with multiple threads. Third, after we finished example, only assessing TCP, UDP, or HTTP flows.
the experiment, we found that Suricata released the latest version Regular expression matching algorithm For a signature-based IDS,
4.1.4. In this version, Suricata includes extended BSD Packet Filter a regular expression matching algorithm is widely used for iden-
(eBPF) and XDP support. With this new feature, Suricata can tifying application protocols and detecting network attacks. How-
directly execute in kernel context, before the kernel itself touches ever, a major bottleneck in the existing algorithm is that most IDSs
the packet data, which enables the packet capture processing at inspect each byte from incoming packets. This causes high CPU us-
the earliest possible point after a packet is received from the age and memory consumption. Based on our results, besides Bec-
hardware. Leblond [25] finds a decrease in the packet drop rate chi et al.’s [18] discovery, the current processors are not powerful
after enabling eBPF and XDP in Suricata 4.1.4. Their experiment to match regular expression at 10 Gb/s or more. In order to capture
results motivated us to use the new Suricata 4.1.4, and then repeat network traffic on a 100 Gb/s link, several works [6–8,19] proposed
our previous experiments. As we observed before, Suricata began a hardware acceleration solution. For instance, Matoušek et al.
to drop packets when there is no enough time for CPU cores [19] used the multi-striding technique and pipelined finite state
to process SoftIRQ from the NIC. By enabling eBPF and XDP in machines in hardware to allow the existing IDS architecture to
Suricata 4.1.4, we can reach 79.4 Gb/s throughput, but at the same handle hundreds of Gb/s. In their solution, the pipelined automata
time, we observe that Suricata dropped 0.81% of total packets. directly mapped to the Field Programmable Gate Arrays (FPGAs).
Inspired by [25], we increased the number of Suricata threads, Their results show that increasing the number of automata in
binding them to specific cores to avoid overloading cores handling the pipeline can improve the packet capture speed to 100 Gb/s.
SoftIRQ from the NIC. As shown in Fig. 8 and Fig. 9, we found Besides, they used a single input packet buffer to reduce mem-
the performance of Suricata is CPU intensive. Fig. 8 shows Suri- ory consumption. Another study [7] addressed the performance
cata processed 100 Gb/s traffic when we only enabled a signature bottleneck by optimising the throughput of Deterministic Finite
rule. In contrast, after used all the signatures in the configure file, Automata (DFA). Yang et al. [7] proposed an ultrahigh-throughput
the throughput dropped to 89 Gb/s and discarded 62% packets. DFA accelerated architecture that brings all advantages from three
The reason for the high CPU usage and packet drop rate is that FPGA-based algorithms: Simple State Merge Tree (SSMT), Distribute
each Suricata instance is processing each packet against 30 0,0 0 0 Data in Round-Robin (DDRR), and multi-path speculation. In order
different signatures. The existing optimisations cannot allow IDSs to reduce memory consumption by the DFA transition table, they
to handle 100 Gb/s networks. The CPU gets overloaded and there used a classical compression algorithm to compress the table. The
are not enough CPU cycles to process new incoming packets. We results showed that in most cases the memory usage of each rule
also observed a significant packet drop when both IDS’s processes set is less than 15% of the total resources as well as improving
and SoftIRQ are handled by the same CPU core. On the receiver Bro’s processing speed to handle 100 Gb/s throughput. These stud-
machine, we allocated half of the CPU cores (i.e., 12 cores) for the ies [7,19] indicate that by changing regular expression matching
IDS processes and another 12 cores for SoftIRQ. As shown in Fig. 9, engines and combining some accelerated hardware, existing IDSs
when all detection rules were enabled, it causes high CPU utilisa- can efficiently handle 100 Gb/s throughput while maintaining the
tion for each core to load and analyse each rule. While there is system resource efficiency.
no enough time for the CPU cores to process SoftIRQ, the packets Packet capturing mechanism DPDK creates a set of data plane
from the NIC cannot be handled and dropped before delivered to libraries and network interface controller drivers for providing effi-
the process. As a result, we observe 62% of packet drop rate with cient ways to handle packets in the user space. DPDK allows user-
Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426 9
land applications to access packets directly from the NIC, avoiding in better monitoring of each IDS instance and reduce overheads.
existing network protocol stacks in the OS. For packet processing Third, we will configure the network interface to use the PF_RING
applications that do not need to rely on the existing network stack, library, and then to repeat the same test scenarios. We then
DPDK minimises processing resources required to access packets. can compare the performance among Snort, Zeek, and Suricata
As Wu et al. [27] reported in their study, it is possible to ac- and check any performance improvement by using PF_RING. Last
celerate the packet processing in 100 Gb/s with almost no packet but not least, we also try to find out why the IDS report does
loss. The study shows that with 1500 bytes UDP packets can be not truly reflect the network packet drop status.
processed at 8Mpps (i.e., 90 Gb/s) without packet loss, with a max-
imum of 24 GB memory and six 3 Gb/s CPU cores. Although this Declaration of Competing Interest
may throttle down to 70% drop rate with packets smaller than
64 bytes due to an increase in the number of packets and thus None.
the processing overhead, it may still provide significant benefits
to any packet processing system, including IDSs. Furthermore, XDP References
[28] can improve the packet processing within the GNU/Linux ker-
nel up to 24 Mpps per each 3.6 GHz CPU core. [1] Salah K, Kahtani A. Performance evaluation comparison of snort NIDS under
In an IDS, it will require much more processing power to anal- linux and windows server. J Netw Comput Appl 2010;33(1):6–15.
[2] Alhomoud A, Munir R, Disso JP, Awan I, Al-Dhelaan A. Performance evaluation
yse packets with traditional signature-based detecting mechanism. study of intrusion detection systems. Procedia Comput Sci 2011;5:173–80.
There will be an overhead when using XDP which has to copy [3] Hu Q, Asghar MR, Brownlee N. Evaluating network intrusion detection systems
packets to the user space and do context switch as the IDS runs for high-speed networks. In: Telecommunication Networks and Applications
Conference (ITNAC), 2017 27th International. IEEE; 2017. p. 1–6.
in user space most of the time while XDP uses eBPF in kernel [4] Campbell S, Lee J. Intrusion detection at 100g. In: State of the Practice Reports.
space. However, optimising the packet capturing mechanism will ACM; 2011. p. 14.
still provide an initial step to significant performance improvement [5] Antonatos S, Anagnostakis KG, Markatos EP. Generating realistic workloads
for network intrusion detection systems. ACM SIGSOFT Softw Eng Notes
and multithreading the packet processing will further improve IDS
2004;29(1):207–15.
performance. [6] Sidhu R, Prasanna VK. Fast regular expression matching using FPGAs. In: Field-
-Programmable Custom Computing Machines, 2001. FCCM’01. The 9th Annual
IEEE Symposium on. IEEE; 2001. p. 227–38.
6. Conclusions and future work [7] Yang J, Jiang L, Bai X, Peng H, Dai Q. A high-performance Round-Robin regular
expression matching architecture based on FPGA. In: 2018 IEEE symposium on
In this work, we evaluated the feasibility of using IDSs in computers and communications (ISCC). IEEE; 2018. p. 1–7.
[8] Clark CR, Schimmel DE. Efficient reconfigurable logic circuits for matching
high-speed networks by analysing the performance of two popular
complex network intrusion detection patterns. In: International conference on
IDSs using up to 100 Gb/s links. The experiment results from field programmable logic and applications. Springer; 2003. p. 956–9.
Section 4 show that the multithreaded architecture can signifi- [9] Purzynski M., Manev P. Suricata extreme performance tuning. 2016. https:
cantly improve IDS performance as well as reduce the packet drop //suricon.net/wp-content/uploads/2016/11/SuriCon2016_MichalPurzynski_
PeterManev.pdf.
rate. Further, both IDSs show better performance when processing [10] Johnson J. Reproducible performance testing of Suricata on a budget
traffic under 60 Gb/s. We noticed some packets have been dropped with Trex. 2018. https://suricon.net/wp-content/uploads/2019/01/SuriCon2018_
when we configured the throughput to 60 Gb/s. In terms of accu- Johnson.pdf.
[11] Schaelicke L, Freeland JC. Characterizing sources and remedies for packet loss
racy, IDSs show a high accuracy even if some packets are dropped. in network intrusion detection systems. In: IEEE International. 2005 Proceed-
Also, we found that IDSs and the receiver cannot run in parallel ings of the IEEE workload characterization symposium, 2005.. IEEE; 2005.
on the same server, because it will cause the system’s SoftIRQ to p. 188–96.
[12] Ptacek TH, Newsham TN. Insertion, evasion, and denial of service: Eluding net-
get overloaded. Once SoftIRQ is exhausted, the receiver side starts work intrusion detection. Tech. Rep.. SECURE NETWORKS INC CALGARY AL-
to drop the packets. As a result, we cannot generate traffic up to BERTA; 1998.
100 Gb/s. Furthermore, the performance becomes worse if we start [13] Samoshkin A. Certified snort integrator program. 2018. https://www.snort.org/
integrators.
to increase the number of flows per second. Our findings show [14] McCanne S, Jacobson V. The BSD packet filter: a new architecture for user-level
that Snort and Suricata are not able to handle network throughput packet capture. USENIX winter, 46; 1993.
higher than 5 Gb/s, which reflects 30,0 0 0 multiple flows per [15] Bezborodov S., et al. Intrusion detection systems and intrusion prevention sys-
tem with snort provided by security onion2016;.
second. All packets have been dropped when the resource is over-
[16] MetaFlows. Open source IDS multiprocessing with PF_RING. 2016. https://
loaded. We highlighted some solutions to optimise the resource www.metaflows.com/features/pf_ring/.
overhead, reduce the packet drop rate, and improve the detection [17] Gong Y, Liu Q, Shao X, Pan C, Jiao H. A novel regular expression matching
accuracy. For example, we suggest to add a load balancing mecha- algorithm based on multi-dimensional finite automata. In: High performance
switching and routing (HPSR), 2014 IEEE 15th international conference on.
nism in the existing IDS infrastructure, where multiple flows from IEEE; 2014. p. 90–7.
a high-speed network can be distributed to a collection of IDS [18] Becchi M, Wiseman C, Crowley P. Evaluating regular expression matching en-
instances, each one monitors 2 Gb/s network traffic that indicates gines on network and general purpose processors. In: Proceedings of the 5th
ACM/IEEE symposium on architectures for networking and communications
13,0 0 0 multiple flows per second. Moreover, we highlight the im- systems. ACM; 2009. p. 30–9.
portance of enabling DPDK as a new capturing mechanism; DPDK [19] Matoušek D, Kořenek J, Puš V. High-speed regular expression matching with
creates a set of data plane libraries and network interface con- pipelined automata. In: Field-programmable technology (FPT), 2016 Interna-
tional Conference on. IEEE; 2016. p. 93–100.
troller drivers for providing efficient ways to handle packets in the [20] Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search.
user space. Wu et al. [27] reported in their study that it is possible Commun ACM 1975;18(6):333–40.
to accelerate the packet processing in 100 Gb/s with almost no [21] Hopcroft JE, Motwani R, Ullman JD. Introduction to automata theory, lan-
guages, and computation. Acm Sigact News 2001;32(1):60–5.
packet loss.
[22] White JS, Fitzsimmons T, Matthews JN. Quantitative analysis of intrusion de-
With this study, we explore some new topics that can be tection systems: Snort and Suricata. In: Cyber sensing 2013, 8757. International
investigated in the future. First, we would like to study the effect Society for Optics and Photonics; 2013. p. 875704.
[23] Stammler JH. Suricata performance white paper. 2011. https://
of flow duration on IDS performance. For example, we configure
redmine.openinfosecfoundation.org/attachments/download/763/suricata%
a different number of long-lived flows or a different number of 20performance%20writup%20final.pdf
short-lived flows in our experiment and observe the changes in [24] Day D, Burns B. A performance analysis of Snort and Suricata network intru-
IDSs from the performance perspective. Second, we can investigate sion detection and prevention engines. In: Fifth international conference on
digital society, Gosier, Guadeloupe; 2011. p. 187–92.
SDN techniques such as the use of SDN switches for distributing [25] Leblond E. Why eBPF and XDP in Suricata matters. 2018. https://suricon.net/
the traffic based on the predefined network protocol that will help wp-content/uploads/2019/01/SuriCon2018_Leblond.pdf.
10 Q. Hu, S.-Y. Yu and M.R. Asghar / Journal of Information Security and Applications 51 (2020) 102426
[26] McKeown N, Anderson T, Balakrishnan H, Parulkar G, Peterson L, Rexford J, [28] Høiland-Jørgensen T, Brouer JD, Borkmann D, Fastabend J, Herbert T, Ahern D,
et al. Openflow: enabling innovation in campus networks. ACM SIGCOMM et al. The eXpress data path: fast programmable packet processing in the op-
Comput Commun Rev 2008;38(2):69–74. erating system kernel. In: Proceedings of the 14th international conference on
[27] Wu X, Li P, Ran Y, Luo Y. Network measurement for 100 gbe network links emerging networking experiments and technologies. New York, NY, USA: ACM;
using multicore processors. Future Gener Comput Syst 2018;79:180–9. 2018. p. 54–66. doi:10.1145/3281411.3281443.