To Evade Deep Packet Inspection in NIDS Using Frequent Element Pattern Matching
To Evade Deep Packet Inspection in NIDS Using Frequent Element Pattern Matching
59
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
modification, to remove the ambiguities and establish a attack) and O is the output given by the NIDS (normal or
common understanding of the protocols for DPI and intrusion). The overall dataset is then divided into smaller
endpoints. Our goal is to first model the NIDS then perform sets, one being the training subset and the remainder the
the evasion. AdaBoost is the algorithm is used here for testing subsets.
constructing a”strong” classifier as linear combination, B. Model the NIDS
T
As we know, in our framework Adaboost and Apriori
ht(x) = ∑αt ht(x)
algorithms are used to model the behavior of NIDS. First,
t=1
values for some parameters are established. This process can
of “simple” “weak” classifiers ht(x).
be made manually or automatically. This technique consists
of performing the Adaboost modeling phase several times, by
AI BJ AIBJ
using different combination of parameters. Each training
phase is performed with one fold, using the remainder to test
Malicious packet DPI Result the evolved model. The principal advantage of using this
technique is that we explore several combinations of
AI BJ AI parameter values so we can assure that we are using an
optimum values for them, as the training phase is performed
End system Result with all the different subsets (folds) of the entire dataset, so it
does not depends on an initial selection, but in the complete
Fig. 1: Elusion example
dataset. Once the parameters are fixed, we obtain the NIDS
models by training them with the entire training subset
III. WORKFLOW
(which has to be considerably bigger than the remainder,
Here in this frame work, our goal is satisfied in the second used for testing). Then, we perform the test of the obtained
half of the Figure 2 which shows a graphical description of models using the testing set. Results must be stored to be
overall framework. The main objective is to look for new processed afterwards. Because the Adaboost search is
evasion techniques on a given NIDS. After elusion, main heuristic, it is appropriate to perform the training phase
goal to detect those changes. We use weka tool to obtain a several times, using different random seeds, taking the
model that classifies as similar as possible to the NIDS. Due results for the best individual (the one that has produced the
to the use of a simple syntax, the Adaboost algorithm has a best test results) and the average of the individuals. Using
simpler semantics. Looking for evasive techniques over the different random seeds covers a bigger searching space. A
model is easier than over the NIDS. If evasions succeed over manual optimization of the model is then performed. The
the model, and given that this model may have a quite similar tree model obtained has normally redundant branches or
behavior than then original NIDS, it is likely that the nodes, so performing a pruning phase could be interesting to
improve the efficiency of the model.
evasions will also succeed over the NIDS. But now the work
of our system starts that to detect those elusion. Our C. Analysis and Design of Evasive Techniques
framework is composed of a set of tasks described in the Once the model is obtained, it is analyzed in order to
following sections. discover some points of the internal structure of the NIDS,
thus conceiving an idea of its behavior. Mainly, the Model
A. Generate the Small Dataset
indicates which are the fields that the NIDS takes into
The Adaboost modeling process at issue requires a labeled account to classify traces. This information is used to perform
dataset. This dataset must represent as well as possible real a brute force modification of those fields. The idea is to
traffic. Due to the necessity of generating different traffic automate the process by changing the value of the fields that
profiles, a controlled environment is required. Generated are present in the model, generating new modified traces.
traffic should include normal (simple web requests, remote Before changing the value, it should be assured that traces
connections, web navigation, etc) and intrusive (malicious) with the new value remain being attacks and still coherent
traffic. Traffic is processed by means of data mining with the protocols. For that purpose, a set of rules must be
techniques to extract the most significant features. It also established and fulfilled, indicating which variables can be
needs to be labeled in order to identify it as normal or hostile. changed and which values can be set to them. New valid
Obtained traffic should be exposed to the NIDS, which values are given for those fields in hostile traces which were
analyzes the dataset looking for intrusive actions. Output previously detected by the NIDS (true positives), establishing
given by the NIDS is appended to its corresponding a new dataset composed by old and new (modified) traces.
processed frame [2]. Thus, the obtained dataset is composed Before changing the value, it should be assured that traces
of registers with the form: with the new value remain being attacks and still coherent
S1, S2, S3... SN, L, O with the protocols. For that purpose, a set of rules must be
Where each Si is the field i of the trace (for example, the established and fulfilled, indicating which variables can be
source port, the flag bits, the amount of data exchanged, etc.), changed and which values can be set to them. New valid
L is the label which indicates the nature of data (normal or values are given for those fields in hostile traces which were
60
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
previously detected by the NIDS (true positives), establishing
a new dataset composed by old and new (modified) traces.
Then, the NIDS is applied to those new modified traces New
false negatives would indicate that the evasions performed
have been successful The process is repeated for each field
that appears in the model, and also multiple simultaneous
changes (to more than one field at the same time) can be
done.
Raw Traffic NIDS
Succeed
Yes
No
Apply elusion
61
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
normal or intrusion) and normalizing the non-numerical As it was previously stated, one of the goals of this proof
fields. We use the weka tool [8] to obtain the C4.5 based of concept is to corroborate that this is a good paradigm to
NIDS (step 1 in Figure 3). For that purpose, we randomly be used when modeling the NIDS. In order to compare
choose a subset of each dataset to perform the training with some other techniques, we have obtained models
phase, testing over the remainder. This testing phase using two different techniques. Concretely, we have used
provides, for each trace, the output given by the NIDS, i.e. the Naïve Bayes approach, which is a specific Bayesian
if it has properly classified the trace or not. This classifier which assumes strong independence among
information is appended to each trace, obtaining the final fields [7] and whose output is not a tree, but a probabilistic
dataset (step 2 in the Figure 3). We perform another model. The second method used is the C4.5 algorithm,
division of the dataset, in this case to obtain two new which is the one used to create the NIDS under study, but
different subsets, one to be used in the training phase and limiting its maximal tree depth to 4. It is obvious that the
another one to test the individuals (step 3 in the Figure 3). C4.5 algorithm will reach better results if its maximal
Table 1 shows the performance of the NIDS created for depth would not be limited to 4, because it is the algorithm
both the NSL dataset and the KDD. As can be observed, in used to obtain the original NIDS. However, this limitation
the case of the NSL, NIDS are tending to classify the traces of the maximal depth is needed to assure that the
as intrusive, so its detection rate and its false alarm rate are complexity of the models to be compared is similar. In
both very high. However, the NIDS built with the KDD order to evade these models, we are interested in changing
has lower rates, which indicates that it is more likely to traces corresponding to true positives. We analyze the
classify the traces as normal. So, given that the NIDS models manually looking for any field that, when
which are going to be modeled are very different in nature, changed, will make the NIDS to fail in the detection. It is
the first goal of our proof of concept, which was to prove possible that there is no possible change that causes the
the feasibility of using Apriori algorithm to model NIDS evasion of the NIDS. In this case, we should repeat
goes a step further. modeling process (by changing some field or the fitness
The models are created by first evolving them using a function) in order to obtain another model over which we
training subset (step 4 in Figure 3), using the remaining would look for new evasive method. Now here the evasion
subsets to test whether the obtained models have a good is done and the existing NIDS fail to detect the malicious
performance with different traffic from the one used to behavior. So again by identifying the fields and the
evolve them (step 5 in Figure 3). Critical component in parameter where the changes has already been done, we
C4.5 is that it performs a heuristic search. Accordingly can perform the better detection with this system. As by
seven different seeds have been used over each training generating rules by using Apriori algorithm where by
subset, thus obtaining seven different evolved individuals. providing support and confidence we identify the
Then, a Testing process is performed with each individual. parameters which are responsible for evasion. Then the
In the following section the best and average result for values of those parameters changed and again the
each model is shown. Each individual represents one detection has been performed which detects the attack
different NIDS model, and because they must be as simple which are ignored by the original NIDS after evasion. In
as possible, a maximum depth of 4 is established. this way we improve the detection rate and accuracy.
The models are created by first evolving them using a
training subset (step 4 in Figure 3), using the remaining V.CONCLUSION
subsets to test whether the obtained models have a good Currently, NIDS are prepared to detect a huge variety of
performance with different traffic from the one used to attacks. Some of them, like Snort, take into account the
evolve them (step 5 in Figure 3). Critical component in possibility of being evaded with the techniques. However,
C4.5 is that it performs a heuristic search. Accordingly they are not prepared to new evasive forms that can
seven different seeds have been used over each training appear. In this paper we present a new framework to look
subset, thus obtaining seven different evolved individuals. for evasions over a given NIDS. The core of the framework
Then, a Testing process is performed with each individual. is to model the NIDS using Adaboost Algorithm obtain an
In the following section the best and average result for easier to understand individual which works as similar as
each model is shown. Each individual represents one
possible to the NIDS. This model allows the
different NIDS model, and because they must be as simple
understanding of how the NIDS classifies network data.
as possible, a maximum depth of 4 is established.
Table I. Performance of Self Built Ids Using C4.5 Once this model is obtained, we can look for some way of
evading the NIDS detection by changing some of the fields
Detection Rate False Alarm Rate of the packets. The final aim of using our framework is not
KDD 86% 0.1% to break the detection of the NIDS, but to analyze NIDS
NSL 99% 65% robustness with high detection rate accuracy.
62
ISSN: 2277-3754
ISO 9001:2008 Certified
International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 1, July 2012
ACKNOWLEDGMENT [7] Po-Ching Lin; Ying-Dar Lin; Tsern-Huei Lee; Yuan-Cheng
Lai; "Using String Matching for Deep Packet
It is a pleasure for me to present this paper where Inspection," Computer , vol.41,no.4,pp.23-28,April2008.
guidance plays an invaluable key and provides concrete
platform for completion of the paper. I would also like to [8] Kun Huang; Dafang Zhang, "A Byte-Filtered String
Matching Algorithm for Fast Deep Packet Inspection,”. The
express my sincere thanks to my internal guide Prof. Mr. 9th International Conference for Computer science 2008.
T. J. Parvat. Department of Computer Engineering, for his
unfaltering encouragement and constant scrutiny without
which I wouldn’t have looked deeper into my work and AUTHOR’S PROFILE
realized both our shortcomings and our feats. This work Pallavi Dhade ,Assistant Professor, M.E.(Computer Engineering pursuing),,
would not have been possible without him. Department of Computer Engineering, Sinhgad Institute of Technology,
Lonavala, Pune, Maharashtra state, India, research area Network security
Prof. T.J.Parvat,,Professor, M.E, P.hd(ng pursuing),, Department of
1 Computer Engineering, Sinhgad Institute of Technology, Lonavala, Pune,
C4.5 Maharashtra state, India ,research area network security
training C4.5 Based
LBNL subset DPI in
NIDS
C4.5 training
(Decision
LBNL subsets
2 tree)
3 Labeled
Datasets
REFERENCES
[1] Xu Kefu, Guo Li, Tan Jianlong, Liu Ping,” Traffic aware
frequent element matching algorithm for Deep Packet
Inspetion”,International Conference on Network Security,
wireless communication & Trusted Computing, 2010 .
[2] Sergio Pastrana Agustin Orfila Arturo Ribagorda,”A
functional framework to evade NIDS”, Hawaii International
conference on System Sciences, 2011.
[3] J. R. Koza, ''Genetic Programming: On the Programming of
Computers”, International conference on security sciences,
2010.
[4] S. Pastrana, A. Orfila, and A. Ribagorda, “Modeling NIDS
evasion with Genetic Programming”, on the
Proceedings of The 2010 International Conference on
Security and Management, SAM 2010.
[5] L. Juan, C. Kreibich, C.-H. Lin, and V. Paxson, "A Tool for
Offline and Live Testing of Evasion Resilience in
Network Intrusion Detection Systems,",5th international
conference on Detection of Intrusions and Malware, and
Vulnerability,2008.
[6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.Reutemann,
H. Witten, ''The WEKA Data Mining Software: An Update'',
An extensive empirical study of feature selection
metrics for text classification, 2009.
63