0% found this document useful (0 votes)
39 views

RNN Based Intrusion Detection System

This document summarizes a research paper presented at an international conference on communication and signal processing in July 2020 in India. The paper proposes using a long short-term memory (LSTM) recurrent neural network to detect anomalies in network traffic and classify traffic as normal, distributed denial of service (DDoS) attacks, or other attacks. The model was trained and tested on the CICIDS2017 dataset containing various attacks. The results showed the LSTM model achieved a high degree of accuracy in detecting these attacks.

Uploaded by

Ramadevi Unknown
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

RNN Based Intrusion Detection System

This document summarizes a research paper presented at an international conference on communication and signal processing in July 2020 in India. The paper proposes using a long short-term memory (LSTM) recurrent neural network to detect anomalies in network traffic and classify traffic as normal, distributed denial of service (DDoS) attacks, or other attacks. The model was trained and tested on the CICIDS2017 dataset containing various attacks. The results showed the LSTM model achieved a high degree of accuracy in detecting these attacks.

Uploaded by

Ramadevi Unknown
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Conference on Communication and Signal Processing, July 28 - 30, 2020, India

Recurrent Neural Network Based Intrusion


Detection System
Sanchit Nayyar, Sneha Arora and Maninder Singh


Abstract—An increase in connectivity through the internet and As the firewalls alone have not proven to be sufficient for
increased freedom of data access has led to numerous attempts to preventing security attacks, a second layer called Intrusion
attack network servers. These attacks have become sophisticated Detection System (IDS) has emerged as an inevitable part of
over time and hence difficult to detect using the existing Intrusion
Detection Systems. Existing research in the field of IDS has not
the security system. Intrusion Detection Systems (IDS) are the
been able to categorize DDoS attacks with a fair degree of significant defense mechanisms in this ever increasing world
accuracy till date. It has been extremely difficult to strike the of security breaches and intrusive activities. Research on IDS
balance between accuracy and prediction time. This paper aims at systems has flourished over the years in order to get the better
using a LSTM based Machine Learning model to detect IDS system. Since absolute prevention is not possible with the
anomalies in the network traffic and redirecting all malicious existing intrusion prevention systems, therefore the problem
requests to a honeypot based black hole server. This algorithm
was trained and tested on the CICIDS2017 data set consisting of was segregated into detection and prevention systems. IDS
numerous attacks such as Patator based attacks, Web based Brute systems identify the intrusions with the help of certain
Force, DoS sloworis, DoS slowhttptest, DoS Hulk, DoS Classification engines. Thereafter Intrusion prevention systems
GoldenEye, DDoS LOIT and some other categorical attacks. The (IPS) are applied.
results exhibit a high degree of accuracy in detection of these In this paper we focus on IDS system which is classified
attacks, and the model is suitable for use in existing distributed
into four layers namely, data collection, feature identification,
servers.
model training and execution of the classification model
Index Terms—Distributed Denial of Service, Intrusion shown in Fig. 1. The dataset used is CICIDS2017 [3]. It is
Detection System, Intrusion Prevention System, Long Short Term completely labeled and around 80 features of network traffic
Memory are extracted for benign and intrusive flows. Table I contains
the detailed dataset with its attacks and its number of rows in
each.
I. INTRODUCTION The methodology used is supervised machine learning. We

W ITH the great amount of information floating all around


the internet, network security and information
preservation have become a huge challenge these days [1].
classify the traffic into three categories as normal flow, DDoS
flooding attack and any other flow (known and unknown). The
classification engine used is the anomaly based detection
Attackers are continuously trying to develop new ways to technique, dynamic and easy to detect new attacks, flexible,
surpass the security systems. When the resources of a website TABLE I
are flooded with a dramatic increase of packets, of which a ATTACKS IN THE DATASET
particular service provider is incapable of handling, the Attack Number of Rows
legitimate users are deprived of using the resources in such
cases. This is called Distributed Denial of Service (DDoS) Botnet ARES 72386
Brute Force FTP Patator 89479
flooding attack. There are two methods to generate it namely, Brute Force SSH Patator 73669
Direct and Indirect [2]. In the DDoS attack, the malicious DDoS LOIT 180116
packets, also called bots that infect the program have similar DoS GoldenEye 22792
DoS Hulk 294417
kind of characteristics as that of the actual program such as DoS slowhttptest 32450
destination address and port, packet size and rate. DoS sloworis 31259
Hartbleed 444 32814
Infiltration Cool disk MAC 11680
Infiltration Dropbox Download 127676
Port Scan 26140
Web brute force 46192
Web SQL injection 3065
Web XSS 15156
Sanchit Nayyar, Sneha Arora and Maninder Singh are with Computer Regular Traffic 2046876
Science and Engineering Department, Thapar Institute of Engineering and
Technology, Patiala, India (e-mail: [email protected]
[email protected] [email protected])

978-1-7281-4988-2/20/$31.00 ©2020 IEEE 0136

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on September 05,2020 at 17:05:11 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Components of Intrusion Detection System
scalable and robust. Since machine learning requires huge
training time for the big dataset. Hence, deep learning based
LSTM neural network model was used to train the dataset.
Deep learning [4] is effective and notable that allows
exploration of LSTM. An LSTM is a specialized form of a
Recurrent Neural Network (RNN) where a neuron is replaced
with a memory cell. The memory cell is able to learn and hold
information to take into account long term dependencies, thus
allowing overcoming the problem of vanishing and exploding
gradients with RNNs. The decrease in training time and
increase in accuracy of the proposed system shown in Fig. 2
are made due to the simultaneous parallel solution
technologies.
The rest of the paper is organized as follows. In the next Fig. 2. The Proposed Network
section, existing work on IDS is summarized. Section III There are numerous existing work in the field of Intrusion
presents Data Preprocessing Algorithms. Section IV explains Detection Systems, but most of those can be categorized into
the detailed network model. Results and Discussions are three different types of systems [8] as listed below.
represented in Section V. Finally, conclusion is drawn in
x Network Based IDS (NIDS)
Section VI.
x Host Based IDS (HIDS)
x Hybrid IDS
II. EXISTING WORK
[9] Is one of the most prevalent IDS in the world of cyber
There have been numerous attempts to develop intrusion security due to its detailed analysis of the request patterns and
detection systems in the past, the most prominent of which are efficient detection. This paper has acted as the benchmark of
based on Hidden Markov Model (HMM) and Long Short- IDS for years, and it lists 16 different characteristics of a good
Term Memory (LSTM) architectures of Machine Learning IDS. The model suggested in any paper is evaluated against
domain. Use of Multi Layer HMM (LHMM) for intrusion these parameters.
detection system has been studied in detail [5], but attempts to [8] Calls [10] a robust and fast Intrusion Detection System,
demonstrate the practical application of the concept have not while the best case accuracy of the proposed model is around
been very successful. 90%, where the exact number depends on the buffer size. Only
Different algorithms exhibit different accuracy rates [6] in case of wormhole attacks, this model has an excellent hit
ranging from 84.88% to 99.55%. The higher accuracy has rate of 100% which makes it one of the best models in
been achieved through a KNN algorithm which has a existence.
significantly high training and prediction time, rendering the [11] Monitors groups of nodes and uses routing tables for
network useless for practical purposes. The highest accuracy anomaly detection, a technique quite similar to the one used in
among usable algorithms has been obtained as 97.05% using this paper. The accuracy of this model was 95% at 100 nodes
the Multi Layer Perceptron (MLP) algorithm. This algorithm in case of Isolation Table Intrusion Detection System (ITIDS)
provided the best balance between performance and prediction module, whereas the accuracy was at approximately 80% for
time. All predictions required 1.1 sec in this algorithm. Routing Table Intrusion Detection (RTID) system. The
This paper reduces the prediction time to less than 10 ms, physical limitation of hosting 100 nodes at the server end
while maintaining a considerable accuracy rate (more than that causes this model to be less effective in real time IDS systems.
obtained by 5 out of 7 algorithms tested in [6]). [12] Combines already existing models and approaches to
LSTM has been used to obtain accuracy of 87% [7] even achieve complete IDS which detect numerous attacks. The
when the port number is known. This result has been generated proposed model was not tried on any real world or simulated
on the CIC IDS dataset [3] which has been created under traffic, thus it has not been possible to predict whether the
simulated environments. Also, the algorithms were model will be effective in live environments or not.
implemented directly on the Packet Capture (PCAP) data files [13] Detects sinkhole based attacks with an extremely high
and not the properties, which are comparatively very heavy on accuracy rate of nearly 100% even when malicious nodes are
the host hardware. Algorithms like the one presented in this 50% which means when normal and abnormal traffic is equal,
paper, which run on feature sets in CSV format are much more the model does not fail. The only limitation is the fact that it
hardware friendly. detects only a special type of attacks.

0137

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on September 05,2020 at 17:05:11 UTC from IEEE Xplore. Restrictions apply.
[14] Was an attempt in anomaly detection while keeping the efficient algorithm that bridges minimal time lags in excess of
communication overheads to the minimum. In an attempt to do 1000 discrete-time steps.
so, the detection rate of attacks (true positive rate) of the it V (wi[ht  1, xt ]  bi ) (1)
model came down to 30% from 97% hence proving that ft V (wf [ht  1, xt ]  bf ) (2)
overhead costs cannot be minimized under the current system
ot V (wo[ht  1, xt ]  bo) (3)
architecture. SVM has lower energy consumption, and at the
same time has a lower success rate. This paper explained the it is input gate
need to strike a balance between the two properties and hence ft is forget gate
generate a successful model. Its findings were taken into ot is output gate
account while developing this paper. σ is sigmoid function
wx is weight for respective gate(x) neuron
III. DATA PREPROCESSING h is output of LSTM block
x is input
The dataset [3] provides approximately 51 GB of network bx is bias for respective blocks
traffic data collected over a week of packet sniffing of abstract
behavior of 25 users from 3 July 2017 to 7 July 2017. This B. Fully Connected Layers
data consisted of 82 properties and nearly 1 million rows, that Fully connected layers are the basic building blocks of any
have been extracted into CSV files for training using CIC Flow Artificial Neural Network [16]. These are layers of the
Meter [15]. network graph where every pair of nodes in adjacent layers is
First step was to merge these 8 CSV files, which totals up to connected through weighted edges. In case of numerous fully
1.2GB of data, into a single CSV File connected layers, a delay in updating of weights is introduced
x Create new file in write mode which causes the network to either over fit or be undertrained.
x For each attack in new file To tackle this problem, initial layers of LSTM model were
x Copy data to new file used.
x Create new categorical entries for newly created
columns In case of fully connected layers, weights are updated after
Doing so caused an addition of a few columns in the dataset. every batch size (set to 30 in this network) and causing the
x Normal: for normal request. accuracy to improve over time.
x LOIT: for DDoS LOIT request. wi  1 wi  (D * f ' ( x)) (4)
x Others: for other attacks included in the dataset [3]. w is weight
Second step in this process was conversion of non-integer f(x) is loss function
columns such as IP (Internet Protocol) address into integer α is learning rate
values for the network to process. Then, it was also necessary
to select the relevant features out of the given set of features. The learning rate has been set by optimization function, and
x IP Address discrete values are not required, only is usually a value between 0 and 1 which controls the rate of
past trends of this IP address are required. Hence, update of weights. Reducing the value of this learning rate
IP address (both source and destination) were helps prevent over fitting in a model with high number of
replaced by number of times this IP address layers (14 in this case).
appeared in the past 5 seconds of logs. C. Loss Functions
x Absolute time values were replaced by number of
Loss function, along with the learning rate, defines the rate
seconds past since last entry.
of update of weights in a neural network. Binary Cross entropy
x Port values were converted to categorical columns.
is a loss function used for categorical binary output data
This processing caused the size of dataset to reduce to
values.
400MB, 1 Million rows of 95 properties. This final file was
1
hence used for training and testing purposes. hp (q)  6yi log( p( yi ))  (1  yi ) log(1  p( yi )) (5)
N
IV. NETWORK MODEL p(x) is probability
yi is output value at row i
A. Long Short Term Memory hp(q) is Binary Cross Entropy loss
Recurrent back propagation [16] takes a long time in N is number of rows
learning to store information over extended time intervals. The D. Optimization Functions
error signals which flow backwards in time either explode or The optimization function used for the construction of neural
vanish. Long Short-Term Memory [17] (LSTM), a special network in this paper is nadam [18]. Nadam optimization
form of Recurrent Neural Network is a gradient-based and properties used for training are:

0138

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on September 05,2020 at 17:05:11 UTC from IEEE Xplore. Restrictions apply.
x Learning Rate = 0.002
x β1 = 0.9
x β2 = 0.999
x ϵ = 10-7
x Schedule Decay = 0.004
K 1  E 1 gt
Tt  1 Tt  ( E 1̂mt  ) (6)
vt  H 1  E 1t
η is step size
mt is Nesterov Momentum Mean Estimate
vt is Nesterov Momentum Uncensored Variance Estimate Fig. 3. Accuracy Trends
ε is fuzz factor
gt is gradient
Binary cross entropy loss function was used to calculate the
E. Activation Functions loss values mentioned in Table II.
Activation is the process of normalization of the dataset Though the Loss value is significantly higher in case of the
values in a neural network so that the difference in range of unseen data in [20], there is no considerable change in case of
different columns does not cause the weights to be biased. accuracy value.
Different activation functions [19] are used on different layers, Fig. 3 shows the trends of accuracy of the network over
and our model uses epochs in the training phase, which, very clearly, is not
x tanh activation for LSTM layers stagnant and pulsates in an unpredictable fashion. Due to this,
x ReLU activation for Fully Connected layers the number of training epochs was decided based on testing
x Sigmoid(S) activation for output layer accuracy and not the training accuracy.
B. Analysis of Performance
e2x  1 (7)
tanh( x) The model designed performs with nearly the same accuracy
e2x  1 on both seen and unseen data, hence verifying that this model
Re LU ( x) max( 0.0, x) (8) is not a case of overfitting. With that established, this model
x
e (9) was tested on the CAIDA DDoS 2007 dataset [20], where it
S ( x)
1 ex performed to an accuracy of 96.7%, proving the efficiency of
the model. These results showcase a state of the art Intrusion
Detection System. Training was done in a time of 180 minutes
F. Final Model
on a computer with the following specifications.
The final Neural Network model used for the proposed IDS x 2.4 GHz Intel i5 7300HQ
has 1 LSTM based input layer with 77 neurons and tanh x 6MB Cache
activation function. After that, it has 12 hidden layers, 5 of
x 4 cores
which are LSTM based and 7 others are fully connected layers.
x 8GB DDR3 RAM
The output is a 2 neuron Fully Connected layer with S(x)
x NVidia GeForce GTX 1050M 4GB dedicated
activation. This 14 layer Recurrent Neural Network model
Graphics Processing Unit (GPU)
provides an accuracy of 96.25% and a prediction time of
x Manjaro 18.04 Operating System
merely 7 μs, which makes it practical for real time applications
Testing time on same machine was 30 microseconds per
in an IDS application.
row, which is significantly low. Operating under this
prediction time on an end user machine as described above,
V. RESULTS AND DISCUSSIONS
LOIT attacks can be identified with negligible time lag in
A. Performance Measures request serve time.
The model generated was evaluated against the training and The only visible drawback in this model is requirement of
testing data and the DDoS attack 2007 Dataset by CAIDA high precision floating point arithmetic operations, that
[20]. preliminary CPUs are incapable of providing. Other than that,
TABLE II the model is extremely efficient and accurate.
PERFORMANCE OF MODEL This model surpasses over 70% of existing models in the
community in terms of accuracy alone, while maintaining a
Dataset Accuracy Loss significantly low prediction time.
[3]Training Sample 96.2518% 0.15927
[3]Testing Sample 96.2461% 0.15943 C. Future Scope
[20]Complete Dataset 96.7030% 1.567 The proposed model in this paper uses a classification
algorithm to identify DDoS LOIT attacks from normal traffic

0139

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on September 05,2020 at 17:05:11 UTC from IEEE Xplore. Restrictions apply.
and other attacks. DDoS LOIT is used to flood target systems REFERENCES
with TCP, HTTP, UDP and GET requests. Minor LOIT [1] G. Karatas, O. Demir, and O. Koray Sahingoz, “Deep Learning in
attacks can be detected and prevented by firewalls. However, Intrusion Detection Systems,” in 2018 International Congress on Big
Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT),
for major ones, high accuracy and efficient algorithms are
2018.
needed. Since, security of internet is the prime concern, our [2] X. Jing, Z. Yan, X. Jiang, and W. Pedrycz, “Network traffic fusion and
efforts must be more on tackling these attacks before the actual analysis against DDoS flooding attacks with a novel reversible sketch,”
damage has happened. The algorithm can be expanded for Information Fusion, vol. 51, pp. 100–113, Nov. 2019.
[3] Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani,
classifying other attacks as well. This model can be expanded “Toward Generating a New Intrusion Detection Dataset and Intrusion
in the future to flag other different attacks categorically and Traffic Characterization”, 4th International Conference on Information
take required preventive measures, thus converting it to an Systems Security and Privacy (ICISSP), Portugal, January 2018.
[4] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
Intrusion Prevention System (IPS). no. 7553, pp. 436–444, May 2015.
According to this paper, our model gives fairly high [5] W. Zegeye, R. Dean, and F. Moazzami, “Multi-Layer Hidden Markov
accuracies for training and testing dataset. This model needs to Model Based Intrusion Detection System,” Machine Learning and
Knowledge Extraction, vol. 1, no. 1, pp. 265–286, Dec. 2018.
be tested on a live environment to gain better insights of its [6] M. Alrowaily, F. Alenezi, and Z. Lu, “Effectiveness of Machine
effectiveness in the same. Such a testing is yet to be performed Learning Based Intrusion Detection Systems,” in Security, Privacy, and
on this algorithm. Anonymity in Computation, Communication, and Storage, Springer
International Publishing, 2019, pp. 277–288.
[7] Benjamin J. Radford, Bartley D. Richardson and Shawn E. Davis.
VI. CONCLUSION Sequence Aggregation Rules for Anomaly Detection in Computer
In this paper, we present an efficient method to detect DDoS Network Traffic, 2018; arXiv:1805.03735.
[8] T. F. Lunt, “A survey of intrusion detection techniques,” Computers &
attacks using anomaly detection by LSTM based neural Security, vol. 12, no. 4, pp. 405–418, Jun. 1993.
network with an accuracy of 96%. Since, there have been a lot [9] T. F. Lunt, “A survey of intrusion detection techniques,” Computers &
of researches on the detection of DDoS attacks, they mostly Security, vol. 12, no. 4, pp. 405–418, Jun. 1993.T.S. Sobh, “Wired and
wireless intrusion detection system: Classifica-tions, good
focused on signature based classification systems or the traffic characteristics and state-of-the-art”,Elsevier J. ComputerStandards and
generated during attack period. Our method uses a deep Interfaces, volume 28, number 6, pages 670-694, 2006.
learning method that reduces the time required for the model [10] A.P. da Silva, M. Martins, B. Rocha, A. Loureiro, L. Ruiz andH.C.
Wong, “Decentralized Intrusion Detection in Wireless
to predict. With the increased use of the internet and new
SensorNetworks,”in Proc. 1st ACM International Workshop on Quality
software being developed each day, the security threats have ofService and Security in Wireless and Mobile Networks (Q2SWinet
loomed on the horizon. Lots of attacks are been carried out by ’05),ACM Press, October 2005, pp. 16-23.
the authorized company users, which obviously become hard [11] R.C. Chen, C.F. Hsieh, Y.F. Huang, “A New Method for
IntrusionDetection on Hierarchical Wireless Sensor Networks”,in Proc.
to detect and control. All the attackers who exchange their ACMICUIMC-09, 2009.
ways and codes to crack the hardest security systems together [12] A.A. Strikos, “A full approach for intrusion detection in wireless
are getting powerful. The researchers must come together and sensornetworks”,School of Information and Communication
Technology,2007.
counter them and bring their immense powerful networks to [13] E. Ngai, J. Liu and M. Lyu, “On the Intruder Detection for Sinkhole
the ground. The method used in this paper will be helpful as Attack in Wireless Sensor Networks,”ICC’06,Istanbul,Turkey,June2006.
well. The CICIDS2017 dataset provides large number of [14] S. Rajasegarar, C. Leckie, M. Palaniswami and J.C. Bezdek,
“QuarterSphere Based Distributed Anomaly Detection in Wireless
attacks and the features provided have contributed well in Sensor Net-works”,IEEE ICC ’07, Glasgow, U.K., June 2007.
carrying out the detection mechanism. However, the dataset [15] CICFlowMeter (2017). Canadian institute for cybersecurity(cic).
was divided in three phases i.e. to detect on only DDoS [16] Shiruru, Kuldeep. (2016). AN INTRODUCTION TO ARTIFICIAL
attacks. The model, when tested on the CAIDA DDoS 2007 NEURAL NETWORK. International Journal of Advance Research and
Innovative Ideas in Education. 1. 27-30.
dataset, resulted in an accuracy of 96%. We hope that our [17] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural
contribution on using deep learning for IDS system will be Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
useful in research community, furthermore. Also, it will be [18] Dozat, Timothy. "Incorporating nesterov momentum into adam."
(2016).
helpful to use different dataset and the same model with [19] D. F. Specht, “Probabilistic neural networks,” Neural Networks, vol. 3,
alternate combinations of LSTM and fully connected layers to no. 1, pp. 109–118, Jan. 1990.
generate better results. [20] The CAIDA UCSD "DDoS Attack 2007" Dataset

0140

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on September 05,2020 at 17:05:11 UTC from IEEE Xplore. Restrictions apply.

You might also like