0% found this document useful (0 votes)
2 views

To Evade Deep Packet Inspection in NIDS Using Frequent Element Pattern Matching

This paper presents a framework for evading Deep Packet Inspection in Network Intrusion Detection Systems (NIDS) using frequent element pattern matching and the Apriori algorithm. It discusses the challenges posed by evasive techniques that manipulate network protocols, and demonstrates a proof of concept using two publicly available datasets to model and detect malicious behavior. The study highlights the importance of identifying anomalous behavior to enhance computer security and the effectiveness of the proposed methods in detecting evasions.

Uploaded by

Toti Colorín
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

To Evade Deep Packet Inspection in NIDS Using Frequent Element Pattern Matching

This paper presents a framework for evading Deep Packet Inspection in Network Intrusion Detection Systems (NIDS) using frequent element pattern matching and the Apriori algorithm. It discusses the challenges posed by evasive techniques that manipulate network protocols, and demonstrates a proof of concept using two publicly available datasets to model and detect malicious behavior. The study highlights the importance of identifying anomalous behavior to enhance computer security and the effectiveness of the proposed methods in detecting evasions.

Uploaded by

Toti Colorín
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ISSN: 2277-3754

ISO 9001:2008 Certified


International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012

To Evade Deep Packet Inspection in NIDS Using


Frequent Element Pattern Matching
Pallavi Dhade, T.J.Parvat
 malicious behavior in the commercial NIDS. Computer
Abstract— A set of rules are used Signature based Network security is defined as the protection of computing systems
Intrusion Detection Systems (NIDS) to detect hostile traffic in against threats to confidentiality, integrity, and availability
network segments or packets , which are so important in [2]. The goal of Confidentiality (or secrecy) is that
detecting malicious and anomalous behavior like known
information is disclosed only according to policy, integrity
attacks that hackers look for new techniques to go unseen. Most
of the techniques involves, in the manipulations of anonymities
means that information is not destroyed or corrupted and that
of network protocol. Now, the detection techniques are developed the system performs correctly, availability means that system
against most of these elusive and equivocal techniques by means services are available when they are needed. Computing
of identifying and recognizing. The presence of new elusive system refers to computers, computer networks, and the
forms may possibly effect NIDS to be ineffective. This paper information they handle. Security threats come from
presents an innovative functional framework to perform different sources such as natural forces, accidents, failure of
modeling and detection over NIDS using frequent element services (such as power) and people known as intruders. The
pattern matching. Main, NIDS demonstrated precisely through categories of intruders are the external intruders who are
Apriori algorithm. At this point, the paper consists of looking for
unauthorized users of the system they attack, and internal
avoidances on models are simpler and easier than directly trying
intruders, who have permission to access the system with
to understand the behavior of NIDS. We present a proof of
concept showing how to perform deep packet inspection in NIDS
some restrictions. The traditional prevention techniques such
using two publicly available datasets. This framework can be as user authentication, data encryption, avoiding
used for analyzing, Modeling and detecting the commercial programming errors and firewalls are used as the first line of
NIDS after elusion. defense for computer security. To manage the huge amount
of personal, public and critical data we always choose the
Index Terms—Apriori Algorithm, Deep Packet Inspection, Information Technology systems, which become a critical
Network Intrusion Detection Systems, Frequent Elements component in organizations Identifying such a anomalous
Matching, High Speed Network
behavior and hostile actions, for protecting those systems,
which is one of the most important goal in security. Intrusion
I. INTRODUCTION
Detection Systems (IDS) are Software or hardware tools that
Network Intrusion Detection Systems (NIDS) just analyze automatically examine, check and observe events that take
network traffic captured on the network segment. NIDS may place in a computer or a network, looking for indication of
look for either anomalous activity (anomaly they are intrusion [1].
installed. NIDS may seek for either anomalous activity
(anomaly based NIDS) or known hostile patterns (signature II. STATE OF ART
based NIDS) on the network. Firewalls they do not normally
There are some uncertainties and doubts in network protocol,
block packets, but aware about the intrusion alarm. This
which allow different systems to implement them in a
situation focuses on elusions over the signatures of these different way. An evasion is successful when DPI is not
systems. There are some problems in network protocols that giving attention to packets which are going to be processed
create state where endpoint systems process the packets on the endpoints system and vice versa. For example, if the
generates a different demonstration of data in the NIDS and ICMP packet contains some inaccurate or malicious field,
in the end system. If the representation of the data in the that protocol does not have an idea what to do with those
NIDS and in the end systems are different then the evasion is packets. ICMP protocols either ignore or accept or reject
successful. Sometimes it is not possible to detect the attacks. those packets. As shown in Figure 1, an evasion could
Thus, a possible appearance of new elusive techniques would successful if the NIDS implementation of the ICMP protocol
be critical for systems that are supposed to be secure. This is differs from the endpoint system implementation [1].
the inspiration and motivation of the work, in which a new In this example, i.e fig.1 the DPI preprocessor accepts the
approach to watch for elusions over NIDS, giving a proof of packet containing a malicious field, while the endpoint does
concept showing how to perform deep packet inspection in not, so the final structure after the preprocessing phase will
NIDS using two publicly available datasets. This framework be different. Many techniques have been designed to prevent
can be used for analyzing and Modeling and detecting the evasions. Most of them are based on network traffic

59
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
modification, to remove the ambiguities and establish a attack) and O is the output given by the NIDS (normal or
common understanding of the protocols for DPI and intrusion). The overall dataset is then divided into smaller
endpoints. Our goal is to first model the NIDS then perform sets, one being the training subset and the remainder the
the evasion. AdaBoost is the algorithm is used here for testing subsets.
constructing a”strong” classifier as linear combination, B. Model the NIDS
T
As we know, in our framework Adaboost and Apriori
ht(x) = ∑αt ht(x)
algorithms are used to model the behavior of NIDS. First,
t=1
values for some parameters are established. This process can
of “simple” “weak” classifiers ht(x).
be made manually or automatically. This technique consists
of performing the Adaboost modeling phase several times, by
AI BJ AIBJ
using different combination of parameters. Each training
phase is performed with one fold, using the remainder to test
Malicious packet DPI Result the evolved model. The principal advantage of using this
technique is that we explore several combinations of
AI BJ AI parameter values so we can assure that we are using an
optimum values for them, as the training phase is performed
End system Result with all the different subsets (folds) of the entire dataset, so it
does not depends on an initial selection, but in the complete
Fig. 1: Elusion example
dataset. Once the parameters are fixed, we obtain the NIDS
models by training them with the entire training subset
III. WORKFLOW
(which has to be considerably bigger than the remainder,
Here in this frame work, our goal is satisfied in the second used for testing). Then, we perform the test of the obtained
half of the Figure 2 which shows a graphical description of models using the testing set. Results must be stored to be
overall framework. The main objective is to look for new processed afterwards. Because the Adaboost search is
evasion techniques on a given NIDS. After elusion, main heuristic, it is appropriate to perform the training phase
goal to detect those changes. We use weka tool to obtain a several times, using different random seeds, taking the
model that classifies as similar as possible to the NIDS. Due results for the best individual (the one that has produced the
to the use of a simple syntax, the Adaboost algorithm has a best test results) and the average of the individuals. Using
simpler semantics. Looking for evasive techniques over the different random seeds covers a bigger searching space. A
model is easier than over the NIDS. If evasions succeed over manual optimization of the model is then performed. The
the model, and given that this model may have a quite similar tree model obtained has normally redundant branches or
behavior than then original NIDS, it is likely that the nodes, so performing a pruning phase could be interesting to
improve the efficiency of the model.
evasions will also succeed over the NIDS. But now the work
of our system starts that to detect those elusion. Our C. Analysis and Design of Evasive Techniques
framework is composed of a set of tasks described in the Once the model is obtained, it is analyzed in order to
following sections. discover some points of the internal structure of the NIDS,
thus conceiving an idea of its behavior. Mainly, the Model
A. Generate the Small Dataset
indicates which are the fields that the NIDS takes into
The Adaboost modeling process at issue requires a labeled account to classify traces. This information is used to perform
dataset. This dataset must represent as well as possible real a brute force modification of those fields. The idea is to
traffic. Due to the necessity of generating different traffic automate the process by changing the value of the fields that
profiles, a controlled environment is required. Generated are present in the model, generating new modified traces.
traffic should include normal (simple web requests, remote Before changing the value, it should be assured that traces
connections, web navigation, etc) and intrusive (malicious) with the new value remain being attacks and still coherent
traffic. Traffic is processed by means of data mining with the protocols. For that purpose, a set of rules must be
techniques to extract the most significant features. It also established and fulfilled, indicating which variables can be
needs to be labeled in order to identify it as normal or hostile. changed and which values can be set to them. New valid
Obtained traffic should be exposed to the NIDS, which values are given for those fields in hostile traces which were
analyzes the dataset looking for intrusive actions. Output previously detected by the NIDS (true positives), establishing
given by the NIDS is appended to its corresponding a new dataset composed by old and new (modified) traces.
processed frame [2]. Thus, the obtained dataset is composed Before changing the value, it should be assured that traces
of registers with the form: with the new value remain being attacks and still coherent
S1, S2, S3... SN, L, O with the protocols. For that purpose, a set of rules must be
Where each Si is the field i of the trace (for example, the established and fulfilled, indicating which variables can be
source port, the flag bits, the amount of data exchanged, etc.), changed and which values can be set to them. New valid
L is the label which indicates the nature of data (normal or values are given for those fields in hostile traces which were

60
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
previously detected by the NIDS (true positives), establishing
a new dataset composed by old and new (modified) traces.
Then, the NIDS is applied to those new modified traces New
false negatives would indicate that the evasions performed
have been successful The process is repeated for each field
that appears in the model, and also multiple simultaneous
changes (to more than one field at the same time) can be
done.
Raw Traffic NIDS

Data Mining Labele traffic

Detection False Alarm Rate


Parameter Selection Adaboost Algorithm Rate
KDD 86% 0.1%
NSL 99% 65%
Elusion Search Apriori algorithm
Evaluation/Detection

Succeed
Yes

No

Apply elusion

Fig. 2: Functional Framework


original C4.5 based NIDS. We look for evasions by
D. Specific Proof Concept
modifying the value of one or more fields of the traces and
The two main objectives of the proof of concept exposing them to the original NIDS. We must choose
presented are first, to find evasions over the NIDS fields and values in such a way that the traces remain
analyzing the corresponding model. For that purpose, we coherent with protocols, being still attacks (for example, if
have created a basic NIDS based on the C4.5 algorithm we change the bit of some TCP flag in a port scanning
[2,4]. This algorithm is a supervised learning classifier attack, we are not evading the NIDS and attacking the
whose output is a tree. A simplification of the framework endpoint, but transforming the malicious trace into a
has been made to fulfill our goals. Instead of creating a normal one). For that purpose, need is to analyze the
specific dataset, in this work we have opted to use the only nature of the attacks we are working with. It is also needed
two publicly available datasets that are labeled (as normal that traces to be modified are true positives. An evasion is
or intrusive). Results showed that with an accuracy of considered successful if, after the modification of the
96%, the behavior of a self-built NIDS can be modeled by trace, the NIDS does not detect it as an intrusion. And then
reducing its complexity. In this work, we improve the the main aim of our system is to perform detection over
study by using a different dataset, the KDD-99 derived elusion or change.
from raw traffic captured during MIT/LL 1998 evaluation.
The use of an extra dataset corroborates that the accuracy IV. EXPERIMENTAL WORK
of using Apriori algorithm to model a NIDS is not limited Figure 3 shows a scheme of the modeling phase. At first
to one scenario and attack, but also to another that uses the datasets are prepared. The KDD provides both normal
several kind of attacks. It is obvious that these datasets are and Port scanning traffic, captured in various days at
both quite old, taking into account the fast growth of the different hours. We use five different raw traffic files,
complexity in the Information Technologies. However, processing them in order to take just TCP traffic. Thus, we
they have been widely used in the literature and they establish five datasets containing labeled traces from both
provide a huge set of labeled traces. Thus, it is useful for malicious and normal nature. These traces are composed
providing insight into the problem at issue and to analyze of the fields (Ri ) of the TCP header. In the case of the
if the idea is sound. After obtaining models for each KDD dataset, we have taken 10% of the original traces,
dataset, we are challenged to find real evasions over the preprocessed them in order to make the output binary (i.e.

61
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
normal or intrusion) and normalizing the non-numerical As it was previously stated, one of the goals of this proof
fields. We use the weka tool [8] to obtain the C4.5 based of concept is to corroborate that this is a good paradigm to
NIDS (step 1 in Figure 3). For that purpose, we randomly be used when modeling the NIDS. In order to compare
choose a subset of each dataset to perform the training with some other techniques, we have obtained models
phase, testing over the remainder. This testing phase using two different techniques. Concretely, we have used
provides, for each trace, the output given by the NIDS, i.e. the Naïve Bayes approach, which is a specific Bayesian
if it has properly classified the trace or not. This classifier which assumes strong independence among
information is appended to each trace, obtaining the final fields [7] and whose output is not a tree, but a probabilistic
dataset (step 2 in the Figure 3). We perform another model. The second method used is the C4.5 algorithm,
division of the dataset, in this case to obtain two new which is the one used to create the NIDS under study, but
different subsets, one to be used in the training phase and limiting its maximal tree depth to 4. It is obvious that the
another one to test the individuals (step 3 in the Figure 3). C4.5 algorithm will reach better results if its maximal
Table 1 shows the performance of the NIDS created for depth would not be limited to 4, because it is the algorithm
both the NSL dataset and the KDD. As can be observed, in used to obtain the original NIDS. However, this limitation
the case of the NSL, NIDS are tending to classify the traces of the maximal depth is needed to assure that the
as intrusive, so its detection rate and its false alarm rate are complexity of the models to be compared is similar. In
both very high. However, the NIDS built with the KDD order to evade these models, we are interested in changing
has lower rates, which indicates that it is more likely to traces corresponding to true positives. We analyze the
classify the traces as normal. So, given that the NIDS models manually looking for any field that, when
which are going to be modeled are very different in nature, changed, will make the NIDS to fail in the detection. It is
the first goal of our proof of concept, which was to prove possible that there is no possible change that causes the
the feasibility of using Apriori algorithm to model NIDS evasion of the NIDS. In this case, we should repeat
goes a step further. modeling process (by changing some field or the fitness
The models are created by first evolving them using a function) in order to obtain another model over which we
training subset (step 4 in Figure 3), using the remaining would look for new evasive method. Now here the evasion
subsets to test whether the obtained models have a good is done and the existing NIDS fail to detect the malicious
performance with different traffic from the one used to behavior. So again by identifying the fields and the
evolve them (step 5 in Figure 3). Critical component in parameter where the changes has already been done, we
C4.5 is that it performs a heuristic search. Accordingly can perform the better detection with this system. As by
seven different seeds have been used over each training generating rules by using Apriori algorithm where by
subset, thus obtaining seven different evolved individuals. providing support and confidence we identify the
Then, a Testing process is performed with each individual. parameters which are responsible for evasion. Then the
In the following section the best and average result for values of those parameters changed and again the
each model is shown. Each individual represents one detection has been performed which detects the attack
different NIDS model, and because they must be as simple which are ignored by the original NIDS after evasion. In
as possible, a maximum depth of 4 is established. this way we improve the detection rate and accuracy.
The models are created by first evolving them using a
training subset (step 4 in Figure 3), using the remaining V.CONCLUSION
subsets to test whether the obtained models have a good Currently, NIDS are prepared to detect a huge variety of
performance with different traffic from the one used to attacks. Some of them, like Snort, take into account the
evolve them (step 5 in Figure 3). Critical component in possibility of being evaded with the techniques. However,
C4.5 is that it performs a heuristic search. Accordingly they are not prepared to new evasive forms that can
seven different seeds have been used over each training appear. In this paper we present a new framework to look
subset, thus obtaining seven different evolved individuals. for evasions over a given NIDS. The core of the framework
Then, a Testing process is performed with each individual. is to model the NIDS using Adaboost Algorithm obtain an
In the following section the best and average result for easier to understand individual which works as similar as
each model is shown. Each individual represents one
possible to the NIDS. This model allows the
different NIDS model, and because they must be as simple
understanding of how the NIDS classifies network data.
as possible, a maximum depth of 4 is established.
Table I. Performance of Self Built Ids Using C4.5 Once this model is obtained, we can look for some way of
evading the NIDS detection by changing some of the fields
Detection Rate False Alarm Rate of the packets. The final aim of using our framework is not
KDD 86% 0.1% to break the detection of the NIDS, but to analyze NIDS
NSL 99% 65% robustness with high detection rate accuracy.

62
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
ACKNOWLEDGMENT [7] Po-Ching Lin; Ying-Dar Lin; Tsern-Huei Lee; Yuan-Cheng
Lai; "Using String Matching for Deep Packet
It is a pleasure for me to present this paper where Inspection," Computer , vol.41,no.4,pp.23-28,April2008.
guidance plays an invaluable key and provides concrete
platform for completion of the paper. I would also like to [8] Kun Huang; Dafang Zhang, "A Byte-Filtered String
Matching Algorithm for Fast Deep Packet Inspection,”. The
express my sincere thanks to my internal guide Prof. Mr. 9th International Conference for Computer science 2008.
T. J. Parvat. Department of Computer Engineering, for his
unfaltering encouragement and constant scrutiny without
which I wouldn’t have looked deeper into my work and AUTHOR’S PROFILE
realized both our shortcomings and our feats. This work Pallavi Dhade ,Assistant Professor, M.E.(Computer Engineering pursuing),,
would not have been possible without him. Department of Computer Engineering, Sinhgad Institute of Technology,
Lonavala, Pune, Maharashtra state, India, research area Network security
Prof. T.J.Parvat,,Professor, M.E, P.hd(ng pursuing),, Department of
1 Computer Engineering, Sinhgad Institute of Technology, Lonavala, Pune,
C4.5 Maharashtra state, India ,research area network security
training C4.5 Based
LBNL subset DPI in
NIDS
C4.5 training
(Decision
LBNL subsets
2 tree)

3 Labeled
Datasets

Training labeled Testing


datasets labeled
datasets
Statistics (in the
4 form of graph)
Apply 5
Apriori
algorith
m
Fig 3. Detailed Designed For Experimental Work

REFERENCES
[1] Xu Kefu, Guo Li, Tan Jianlong, Liu Ping,” Traffic aware
frequent element matching algorithm for Deep Packet
Inspetion”,International Conference on Network Security,
wireless communication & Trusted Computing, 2010 .
[2] Sergio Pastrana Agustin Orfila Arturo Ribagorda,”A
functional framework to evade NIDS”, Hawaii International
conference on System Sciences, 2011.
[3] J. R. Koza, ''Genetic Programming: On the Programming of
Computers”, International conference on security sciences,
2010.
[4] S. Pastrana, A. Orfila, and A. Ribagorda, “Modeling NIDS
evasion with Genetic Programming”, on the
Proceedings of The 2010 International Conference on
Security and Management, SAM 2010.
[5] L. Juan, C. Kreibich, C.-H. Lin, and V. Paxson, "A Tool for
Offline and Live Testing of Evasion Resilience in
Network Intrusion Detection Systems,",5th international
conference on Detection of Intrusions and Malware, and
Vulnerability,2008.
[6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.Reutemann,
H. Witten, ''The WEKA Data Mining Software: An Update'',
An extensive empirical study of feature selection
metrics for text classification, 2009.

63

You might also like