0% found this document useful (0 votes)
3 views

Cyber Intrusion Prediction and Taxonomy System Using Deep Learning and Distributed Big Data Processing

sffsfsfsfs

Uploaded by

soutien104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Cyber Intrusion Prediction and Taxonomy System Using Deep Learning and Distributed Big Data Processing

sffsfsfsfs

Uploaded by

soutien104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Cyber Intrusion Prediction and Taxonomy System

Using Deep Learning And Distributed Big Data


Processing
Hamzah Al Najada, Imad Mahgoub, Imran Mohammed
Computer & Electrical Engineering & Computer Science Department
Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431 USA
Email: {halnajada2014, mahgoubi, imohamme}@fau.edu

Abstract—The issue of cybersecurity is becoming more and deal with technology or cyberspace. The fact that all our
more serious every day at all levels and in all domains. Cyber- information are stored all over cyberspace, whether it is held
attacks threaten the national security of every country and by federal, non-federal, or financial entities. Intrusion detection
nation. Furthermore, cyber-attacks can significantly harm the
economy. With the rapid and continuous growth of the cyber- systems (IDSs) are an essential component of any cyber or
universe, more software is being created, more data is being physical computer system. IDS can help prevent the loss of
generated, and cybersecurity breaches and defense strategies information through security breaches, in any organization,
are getting more complex. For such a problem, considering by detecting them and engaging alarms. This is done by
the size and complexity of the cyber-universe, big data mining monitoring and analyzing the incoming traffic data. IDSs,
techniques and advanced machine learning solutions will be
most suitable to use for predicting brand-new attacks. This is in general, are categorized into two main classes: signature-
because traditional machine learning methods would not help based intrusion detection systems (SIDS) and anomaly-based
combat today’s cybersecurity issues. Anomaly-based intrusion intrusion detection systems (AIDS). The SIDS are capable of
detection systems are receiving tremendous attention nowadays. giving accurate results in a timely manner if the attempting
This is because of the vast improvement and development in attack/intrusion has its signature/pattern stored in the SIDS
big data solutions. This paper utilizes highly imbalanced real-
life benchmark network traffic datasets of multiple types of library. The limitation of this method is the inability to detect
attacks. After resolving the class imbalance issue in our datasets new emerging attacks. Traditional IDS and techniques such as
by applying oversampling approach, our study becomes twofold. signature-based detection systems are always good for detect-
First, we are building prediction models for each type of attacks ing old-known attacks, but new attacks will not be detected
separately and optimizing the model with the highest accuracy. by these systems. It is not surprising that thousands of attacks
Then, we build a prediction model for all attacks together
using deep learning with the smallest number of features and are occurring on a daily basis, such as the zero-day attack,
we optimize the model to achieve the highest accuracy. Our that cannot be detected using SIDS [1]. Hackers are working
developed model can accurately predict the threat and the type day and night, either individually or in teams, to generate
of attack. new attacks that cannot be detected, to destroy or steal other
Index Terms—Cybersecurity, Big Data, Stream Mining, Real- people’s information. Hidden misbehaving pattern discovery
Time, Intrusion Detection, Deep Learning, High-Performance
Computing Clusters (HPCC) is the core of AIDS. The more accurate the discovery is, the
less are the security breaches. Designing and developing an
AIDS requires data that very well represents intrusions or
I. I NTRODUCTION
attacks, which can then be used to train a machine learning
Todays world is more interconnected than ever before. model that learns the behavior of those attacks, and predict
Yet, for all its advantages, increased connectivity brings an future attacks. Traditional machine learning and data mining
increased risk of theft, fraud, and abuse. As Americans techniques are the core of designing AIDS. However, with
become more reliant on modern technology, they become tremendous growth in the amount of data flowing through
more vulnerable to cyber-attacks such as corporate security networks (can be in gigabytes per minute), these traditional
breaches, spear phishing, and social media fraud. Comple- methods and tools cannot handle this amount of data. Big
mentary cybersecurity and law enforcement capabilities are data techniques and algorithms are the solutions for designing
critical to safeguarding and securing cyberspace. The issue of an effective and efficient IDS for current cybersecurity based
cybersecurity is becoming more and more serious every day on anomaly detection.
at all levels and in all domains. Cyber-attacks threaten the Deep learning (DL) has achieved a real breakthrough due
national security of every country and nation. In this work, to the evolution in computing power. At first, deep learning
we are tackling a very important and serious issue which is was extensively used for image and voice recognition, since
cybersecurity. Cybersecurity has become an urgent need for it is very capable of finding hidden patterns and difficult
everybody all around the world, even for those who don’t correlations in the data. Deep learning is the best technique to

978-1-5386-9276-9/18/$31.00 2018
c IEEE 631
efficiently detect zero-day attacks. With the rapid development Buczak and Guven [7] surveyed the Machine Learning
of computation techniques, a powerful framework has been (ML) and Data Mining (DM) methods used for cybersecurity
provided by Artificial Neural Networks (ANNs) with deep intrusion detection. They specifically looked at papers that
architectures for supervised learning. Generally speaking, a described the use of different ML and DM techniques in
deep learning algorithm consists of a hierarchical architecture the cyber domain, both for misuse and anomaly detection.
with many layers each of which constitutes a non-linear They concluded that the methods that have been established
information processing unit. In this paper, we only discuss for cyber applications are not the most effective. Given the
deep architectures in Neural Networks (NNs). Deep neural richness and complexity of the methods, it is impossible to
networks (DNNs), which employ deep architectures in NNs, make a recommendation due to several criteria like accuracy,
can represent functions with higher complexity if the number complexity, time for classifying an unknown instance with
of layers and units in a single layer are increased. Given simple classical prediction model. Also, it is difficult and time
enough labeled training data and suitable models, deep learn- consuming to obtain representative data. Depending on the
ing approaches can help us understand complex problems. In particular IDS, some ML techniques might be more important
this paper, we focus on four main deep learning architectures. than others.
Other methods, like sparse coding, are also briefly discussed. Rege et, al. [8] used four different neural network models
Additionally, some recent advances in the field of deep learn- to make temporal predictions of how adversaries progress
ing are described [2]. through cyber-attacks: nonlinear autoregressive (NAR) neural
The remainder of this paper is organized as follows. Sec- network, NAR neural network with exogenous input (NARX),
tion II presents the most related work. Section III presents a NAR neural network for multi-steps-ahead prediction, and
brief description about intrusion detection systems. Section IV autoregressive integrated moving average (ARIMA). The mod-
talks about the concept and theory of deep learning. Section V els were built on data that was collected from two RTBTE
describes the experiments done in this research and the ob- (Red Team/Blue Team exercises) research sites. The authors
tained results. Section VI contains the analysis of the obtained attempted to build a framework for dynamic prediction of
results. Finally, conclusions and future work are presented in adversarial movement across the cyber-intrusion chain. They
Section VII. concluded that their analysis is inadequate since it did not
account for many permutations and combinations of attack
II. R ELATED W ORK scenarios as well as different adversary types and motivations,
Improving the cybersecurity by developing efficient IDS is objectives, and organizational dynamics. Hence, more data
being discussed and researched since the emergence of the is needed to make a reliable mechanism for intrusion chain
Internet. Nowadays, many federal and private organizations analysis.
and companies are working collaboratively and individually Hindy et, al. [9] presented a taxonomy of network threats
on developing better IDS to enhance the cybersecurity against for intrusion detection systems. The taxonomy is divided into
all kinds of attacks. three control stages - Reconnaissance, Scanning, and Attacks
Modi et, al. [3] have surveyed the intrusion detection in the in order to describe more complex attack processes. The au-
cloud, where most of our data today resides. By introducing thors attempted to create a taxonomy with the ability to inform
the most common attacks that threaten the cyberspace, the first researchers developing both intrusion detection systems and
attack they discussed is the insider attack, in which the threat training datasets in order to increase the detection accuracy and
is initiated by authorized users and this cannot be prevented by decrease the false positive rate. With the increasing number
a firewall and will not be detected by an SID. SIDS are only of connected systems and networks, the taxonomy aims at
applicable for known attacks and this is the main limitation facilitating the design of future defense mechanisms as well
for this detection technique. AIDS are highly recommended as robust systems.
by the authors and their related work to be applied at all AlEroud and Karabatis [10] introduced a context-domain
levels of the cloud (on distributed architecture) since it is the knowledge-driven framework that has been implemented and
best guard against unforeseen attacks. In the Internet of things applied in the discovery of cyber-attacks. The proposed frame-
(IOT), cybersecurity is still the main challenge other than the work is intended to address the limitation of knowledge-
connectivity issues. based IDSs such as the lack of contextual information and
Diro et, al. [1] have proposed a DL based distributed domain knowledge used to detect attacks. This framework
attack detection method for IOT using the fog ecosystem and consists of several attack prediction models that are utilized
showed the effectiveness of deep learning for IOT. Authors in conjunction with IDSs to detect cyber-attacks. After a
compared the performance of both distributed and parallel comprehensive research review of contextual information, they
IDS. The authors proposed a parallel training of local nodes found a common classification of the contextual aspects that
and detection of attacks in attacks in a distributed manner. should be considered in IDSs to make them aware of the
However, their proposal lacks the intrusion detection in real- current context. The authors, approach introduces domain
time for a big amount of data. Traditional machine learning knowledge extracted from taxonomies as a foundation for
techniques have given comparable results in other research context-based reasoning in cybersecurity.
using the same data such as [4]–[6]. Loukas et, al. [11] have shown experimentally that utilizing

632 IEEE Symposium Series on Computational Intelligence SSCI 2018


RNN-based deep learning enhanced by LSTM can consider- techniques are common approaches to reducing the impact of
ably increase intrusion detection accuracy for a robotic vehicle. imbalance on classifiers [17].
Compared against standard machine learning classifiers or Malicious attacks and intrusions are dynamic and intrusion
MLP-based deep learning, RNN-based deep learning cannot detection has to be performed in real-time on data streams.
take into account the temporal elements of a cyber-attack. An event may be normal on its own, but it is malicious if it
The authors have also shown that the key disadvantage of is considered as part of a sequence of events. Stream data
a deep learning based approach, is detection latency due analysis is used to help identify intrusions in this kind of
to increased processing demands, which can be addressed situations. It can be very useful in identifying sequences of
through cloud-based computational offloading. For this, they events that frequently occur together, discovering sequential
produced a practical implementation and have also presented patterns, and recognizing outliers or anomalies [18]. There
and experimentally validated a mathematical evaluation model are three main types of anomalies [19]: 1) point anomalies-
that can be used when offloading is practical from the detection data samples that are detected as anomalies with respect to
latency perspective. the rest of a dataset; 2) collective anomalies-collections of
data samples that are anomalous altogether; and 3) contex-
III. I NTRUSION D ETECTION S YSTEMS (IDS S ) tual (conditional) anomalies-being anomalous only in certain
Effective stream data processing in real-time is a challeng- contexts.
ing task since data streams are dynamic and constantly chang- In this work, we are focusing on predicting five attack types
ing. The second issue is the large volume of data streams, and specify the exact type of the attack to facilitate and expe-
therefore, real-time processing requires quick responses. Al- dite the mitigation process. The attacks we are experimenting
most everything around us that we use and deal with on on this work are briefly described below.
a daily basis is stream data, such as web clickstreams and 1) Denial of Service (DoS): in this attack, the target com-
network traffic [12]. Effective analysis and management of puter is overwhelmed by requests from the attacker so
stream data is a huge challenge since stream data is generally the victim’s computer will run out of resources (such as
not stored in any type of data repository. The continuous query processing time, memory, or the bandwidth) to prevent it
model is a typical query model in a stream data management from doing its legitimate jobs.
system where predefined queries evaluate incoming streams 2) Distributed Denial of Service Attack (DDoS): it holds the
constantly, collect aggregate data, respond to the changes same concept of the DoS attack but this time the attack
of data streams, and report their status. Stream data mining and the requests are coming to the victim’s machine
involves dynamic changes and efficient discovery of general from many machines on the network, which will make
patterns within the stream data. People are interested in identi- identifying them in order to block them difficult.
fying intrusions based on the anomaly of the message flow that 3) Brute Force Attack (BFA): this is the kind of attack
can be discovered by dynamically constructing stream models that tries all different possibilities and combination to
and clustering stream data, or comparing the current frequent determine the victim’s password or identity.
patterns with those at specific previous times. Most stream 4) SSH Brute Force Attack (BFASSH): it is one of the
data resides at a low level of abstraction, but analysts are most common attacks in the over the cyberspace. This
usually more interested in higher as well as multiple levels of happens when BFA is being applied to the secure shell
abstraction. Therefore, multidimensional and multi-level on- connections.
line analysis and mining should be conducted on stream data 5) Internal network Infiltration: in which a compromised
as well [13]. computer in the network playing the role of a trusted
The main problems associated with stream data mining are computer but in fact is compromised and being used by
concept evolution, concept drift, and infinite length. Concept others to leaks whatever information it can get from the
evolution is defined as the development of novel classes, while secure network to the attacker.
concept drift means data changes with time. Infinite length
means that stream data requires infinite length storage and IV. D EEP L EARNING (DL)
training time [14]. Concept drift in the learning model is Deep Learning has emerged as an efficient machine learning
introduced because of the velocity component of big stream technique to handle big data. It allows computational models
data; specifically, concept drift indicates that statistical prop- that are composed of multiple processing layers to learn
erties of the target variable predicted by a model change representations of data with multiple levels of abstraction.
with time in an unforeseen manner. This is a major problem If we want to talk about DL, first we have to know what
since the prediction will be less accurate with time [15]. a neural network (NN) is. An NN is a biologically-inspired
Real-time intrusion detection is a tedious task because of programming paradigm that enables a computer to learn from
the large volume of data involved. Data imbalance is also observational data [20]. It is a special form of an acyclic
a major hurdle [16]. If the imbalance level in the data is computational graph (ACG) that consists of one or more input
high, classifiers will be lower in accuracy and reliability. Data layers, one or more hidden layers, and an output layer [20].
imbalance is an inevitable problem in real-time data due to the Each layer consists of a number of cells (nodes) that are
large size and low frequency of certain transactions. Sampling connected with the cells in the previous and following layer.

IEEE Symposium Series on Computational Intelligence SSCI 2018 633


Those connections are weighted and each output is a function V. DATA , E XPERIMENTS , AND R ESULTS
of the weighted sum of the inputs, this function called
A. Intrusion Detection Data
the activation function. The simplest network architecture
with a sigmoid activation function and one hidden layer is The amount of information available to detection mecha-
demonstrated in the following equation [20]: nisms is of vital importance as this provides the means to
  detect anomalous behaviors. In other words, this information
 l,j 1 is essential for post-evaluation and the correct interpretation
l l 1
aj = σ wi ai + bj where σ = (1) of the results. Thus, it is deemed a major requirement for a
i
1 + exp(x)
dataset to include all network traffic and interactions. A labeled
Where alj is the output of the activation function of node dataset is of immense importance in the evaluation of various
j located in layer l, and wjl,i is the connection weight exiting detection mechanisms [24].
from node i on layer l − 1 to node j on layer l, and ail−1 is In network intrusion detection, AIDS particularly suffer
the output of node i on layer l − 1, and blj is the bias of node from comparison, accurate evaluation, and deployment which
j on layer l. originates from the scarcity of adequate datasets. Many such
A neural network is a supervised machine learning datasets are internal and cannot be shared due to privacy
technique where the output of the neural network is compared issues, others are heavily anonymized and do not reflect
to a target by means of a loss function. The type of current trends, or they lack certain statistical characteristics.
output layer node and the loss function are often chosen These deficiencies are primarily the reasons why a perfect
jointly for their computational properties with respect to dataset is yet to exist. Thus, researchers must resort to datasets
backpropagation [21]. A typical combination for categorical which they can obtain that are often suboptimal [25].
targets is a softmax layer with a cross-entropy loss function H: As network behaviors and patterns change and intrusions
evolve, it has very much become necessary to move away
exp(ai )  from static and one-time datasets toward more dynamically
yi = sof tmax(ai ) =  Hy (y) = − yi log(yi )
j exp(aj ) i
generated datasets which not only reflect the traffic compo-
(2) sitions and intrusions of that time, but are also modifiable,
Where y  is the target values and y is the network outputs, extensible, and reproducible.
computed in turn from the output activations ai of the next In this work, we have used the UNB ISCX IDS 2012
to last network layer. Gradients of the parameters wjl , i, bli dataset which consists of labeled network traces, including
with respect to the loss function are computed using back- full packet payloads in pcap format, which along with the
propagation and parameters are adjusted using variants of relevant profiles are publicly available for researchers. The
gradient descent algorithms [20]. Deep learning is a new area underlying notion of the ISCX IDS dataset is based on the
of machine learning. DL learns a high dimensional function concept of profiles which contain detailed descriptions of
via a sequence of semi-affine non-linear transformations. DL intrusions and abstract distribution models for applications,
is a powerful set of techniques for learning in neural networks. protocols, or lower level network entities. Real-life traces
The core architecture of DL is organized as a graph. The nodes are analyzed to create profiles for agents that generate real
of the graph are units, connected by links to propagate the traffic for HTTP, SMTP, SSH, IMAP, POP3, and FTP. In this
activation calculated at the origin to the destination units [20]. regard, a set of guidelines was established to outline valid
The same as in NN, each link has a weight that determines datasets, which set the basis for generating profiles. These
the relative strength and sign of the connection. Each unit of guidelines are vital for the effectiveness of the dataset in terms
nodes applies an activation function to all of the weighted sums of realism, evaluation capabilities, total capture, completeness,
of incoming activations. The activation function uses either, a and malicious activity.
sigmoid function or a tanh. A particular class of deep learning The profiles are then employed in an experiment to generate
models uses a DAG structure which is the feed-forward neural the desirable dataset in a testbed environment. Various multi-
network structure [22]. DL allows for efficient modeling of stage attacks scenarios were subsequently carried out to supply
linear and nonlinear functions. Hidden layers in DL supports the anomalous portion of the dataset [24]. The intent for this
the high dimensional input variables. dataset is to assist various researchers in acquiring datasets
DL is advancing in solving the problems that basic NN was of this kind of testing, evaluation, and comparison purposes,
not able to solve for many years. It is very good at discovering through sharing the generated datasets and profiles. The UNB
complex relationships and structures in high-dimensional data. ISCX 2012 intrusion detection evaluation dataset consists of
This has motivated us to use DL for modeling our cyber- 7 days of network activity, where each network activity is
intrusion prediction system, in order to have as accurate results classified as either normal or malicious.
as possible, compared to the previously used models [23]. To The typical approach for performing anomaly detection
design our model we used a two-layer network with the size of using the ISCX data set is to apply a customized machine
200 nodes in each, the epoch used is 10, the activation function learning algorithm to learn the general behavior of the data
used is Rectifier, and we applied a 10-fold cross validation to set in order to be able to differentiate between normal and
this model. malicious activities. The main issue with the ISCX data is the

634 IEEE Symposium Series on Computational Intelligence SSCI 2018


class imbalance problem. The normal cases in the network smaller the number, the higher the relevance of the feature. We
traffic are way more than the malicious cases. can notice from this preliminary experiment that each type
For this purpose, we are using one day of network traffic of attack has different relevant features, so we can’t easily
for each attack type. We are experimenting and classifying generalize a classification model for all types of attack by
five types of cyber-attacks, namely: HTTP Denial of Service only building a model for one type. Classifying the type of
Attack (DoSHTTP), Distributed Denial of Service Attack the attack at the beginning of the attack or even before it takes
(DDoS), Internal Network Infiltration (NetInf), Brute Force place will mitigate the damage and restore the lost info quickly.
Attack (BFA), and SSH Brute Force Attack (BFASSH). Each The accuracy results for those preliminary experiments do not
type of attack has 21 features which make the dataset very high represent the true accuracy since those datasets are highly
dimensional, but we are applying feature selection to the used imbalanced, thus the accuracy results are too good to be true
datasets to get more accurate results and efficient processing since learning is biased towards the majority class.
time. Table I represents the size of the datasets used in this In order to solve the class imbalance issue, we apply
work. oversampling technique to the aggregated dataset which has
(1,816,609) normal examples and (68,910) malicious examples
B. Cyber Data Class Imbalance with a class imbalance of 2.65%, which is considered very
Mining the cybersecurity data to develop cyber IDSs is high imbalanced. After applying the oversampling we build
faced by the hurdle of class imbalance (CI). This happens a random forest prediction model and another deep learning
naturally due to the fact that most of the network traffic packets model then compare their results. Figure 1 shows the most
are legitimate and non-malicious. In this research paper, our important features for the random forest and deep learning
positive examples (malicious) are of a small portion of our models after applying the oversampling. The performance of
dataset, but the majority is for the negative class (normal). the models is measured using two performance indexes which
Learning from imbalanced data will cause the overfitting are the mean squared error (MSE) and the root mean squared
problem since the learners will be biased towards the majority error (RMSE). The performance results of this model are
class and all of the majority class examples will be classified shown in Table IV. Those performance indexes are defined as:
correctly [16]. Solving this problem can be done in different
1  2
n
ways. The basic solutions are undersampling and oversam-
M SE = | fi − fˆi | (3)
pling. Undersampling is to decrease the number of the majority n i=1
class examples in the training dataset. Oversampling is to do
the opposite by increasing the number of the minority class 
1  2
examples, either randomly or systematically. Undersampling n
has proven its suitability and effectiveness in many previous RM SE = | fi − fˆi | (4)
research studies [16], [26], and [27]. n i=1
To tackle this problem, we are using three approaches and
compare their results in a classification setting to use the best where fi is the actual class, and fˆi is the predicted class.
method. First, we are using our previous approach which is a The deep learning model we built here for the aggregated
bagging based approach for building a balanced dataset [16]. dataset of the five types of attacks is our main focus in this
For the second approach, we are using oversampling, and research paper. We are building a deep learning model of four
in the last one, we are using the deep neural networks to hidden layers, where each layer has 200 nodes. Since our
create more attack examples by learning from this imbalanced proposed solution is designed for network data streams, we
dataset. The fact that streaming data is of a high volume, and are streaming the data iteratively 10 times to pass through the
we have to build prediction models for streamed data which input layer into the hidden layers then into the Softmax regres-
is huge in size and speed. Table I shows the class imbalance sion function since we are having a multi-class classification.
percentages for our ISCX attacks datasets that we are using Table III shows the confusion matrix generated for the multi-
in this research paper. class classification model developed using deep learning. The
row label is the actual classes and the column label is the
C. Experiments and Results predicted classes.
For our preliminary study, the first phase was building
distributed random forest classification models for each attack VI. A NALYSIS AND D ISCUSSION
dataset individually using 5-fold cross-validation. Then we From our experiments, we found that each type of cyber-
aggregated the data together to have one huge dataset hold attack has its own important features. Although some of
the five types of attacks. The second phase for our experiment the features are common across attack types, they still have
is resolving the problem of class imbalance then building different importance scores. After we aggregated the data in
distributed random forest and deep learning models for the one dataset, we ended up with a multi-class dataset that has
aggregated dataset. Table II shows the most important features five malicious classes (DoSHTTP, DDoS, Internal NetwInf,
for each dataset individually, then it shows the most important BFA, and BFASSH) and one normal class. Figure 2 shows
features with their relevancy for the aggregated dataset. The the results of our deep learning model that we ran on a

IEEE Symposium Series on Computational Intelligence SSCI 2018 635


TABLE I
C LASS IMBALANCE PERCENTAGES OF THE CYBER - ATTACKS DATASETS

Normal Malicious Class Imbalance (%)


DoSHTTP 167604 3776 2.20
DDoS 534238 37460 6.56
NetInf 255170 20358 7.39
BFA 653359 2097 0.32
BFASSH 206238 5219 2.47
The Aggregated Data 1816609 68910 3.65

TABLE II
T HE MOST RELEVANT FEATURES COMPUTED IN THE PRELIMINARY ANALYSIS

Dataset 1 2 3 4 5
DoSHTTP sourcceTCPFlags destinationTCPFlags appName stopDateTime direction
DDoS direction destination startDateTime source appName
NetInf appName sourcceTCPFlags destinationTCPFlags destinationPort source
BFA source appName direction destinationTCPFlags sourcePort
BFASSH source direcction appName destinationTCPFlags destinationPort
The Aggregated Data direction appName sourcceTCPFlags destinationTCPFlags sourcePort

TABLE III
T HE CONFUSION MATRIX OF THE BUILT DEEP LEARNING MODEL FOR CYBER - INTRUSION DETECTION ALONG WITH WITH THE ERROR METRICS

BFA BFSSH DDoS DoSHTTP NetwInf Normal Error


BFA 1666 1 0 0 0 0 0.0006
BFSSH 6 1653 0 0 0 0 0.0036
DDoS 1 0 224 9 0 1439 0.8661
DoSHTTP 2 0 0 1676 0 2 0.0024
NetwInf 0 0 0 1390 261 48 0.8464
Normal 0 0 0 51 2 1589 0.0323
Total 1675 1654 224 3126 263 3078 0.2945

Fig. 1. The most important features of the aggregated dataset after applying the oversampling with distributed random forest and deep learning algorithms

TABLE IV
E FFECTIVENESS R ESULTS C OMPARISON OF T HE D EVELOPED M ODELS

Before Oversampling After Oversampling


MSE RMSE MSE RMSE
DRF 0.0034% 0.0584% 0.21% 0.45%
Deep Learning 0.012% 0.11% 0.23% 0.48%

636 IEEE Symposium Series on Computational Intelligence SSCI 2018


High-Performance Computing Clusters (HPCC) at FAU which Our findings from the TCP flags show that they are of great
are distributed clusters. Our models use a rectifier activation importance to determine the attack and the type of the attack
function for the nodes outputs, four hidden layers of 200 nodes from both sides of the source and the destination. The SYN
each, and a Softmax regression function at the output layer, and ACK flags are the most important flags for the source
with an epoch of 10, which means that the experiment has been machine. For the destination, when the four flags are set (FIN,
using distributed streams of network data since our dataset is SYN, PUSH, and ACK) this increase the possibility of having
stored on distributed clusters. This model was designed for a cyber-attack.
distributed intrusion prediction to receive and process streams
of network data and send the answers of being normal or VII. C ONCLUSION AND F UTURE W ORK
malicious back to the distributed clusters.
Cyberspace and its underlying infrastructure are vulnerable
We can see that the source port is the most important feature to a wide range of risk stemming from both physical and cyber
needed to determine the cyber-attack, as well as the attack threats and hazards. Of growing concern is the cyber threat to
type. The size and number of packets being sent and received critical infrastructure, which is increasingly subject to sophis-
over the network are of a great importance to discover and ticated cyber-intrusions that pose new risks. In light of the risk
predict any potential attacks, especially in the DoS and DDoS and potential consequences of cyber events, strengthening the
attacks, where the goal is to overwhelm the destination with security and resilience of cyberspace has become an important
useless packets huge in size. As we have mentioned earlier that national mission. In this digital era, we can expect all kinds of
the appName feature which has the used apps and protocols cyber-attacks. Economy, commerce, intelligence information,
is very important in discovering the cyber-attack. Our deep health information, spaceships control, autonomous vehicles
learning model has confirmed this and found that most of the and many more could be destroyed or lost by attacks. Data an-
attacks are coming through SSH, POP, HTTP image transfer, alytics and machine learning solutions will certainly be able to
IMAP, IRC, and ICMP protocols. prevent those attacks. AIDS are receiving tremendous attention
The flow direction feature has four directions for the net- nowadays, because of the vast improvement and development
work flow: in big data solutions. In this research, we utilized real-life
• Local to Local (L2L) benchmark network traffic dataset of multiple types of attacks,
• Local to Remote (L2R) which are highly imbalanced. After we resolved the class
• Remote to Remote (R2R) imbalance issue by using oversampling approach, our study
• Remote to Local(R2L) became twofold. First, we built specific prediction models for
each type of attack separately and optimized the best models
In this paper, we found that the flow which is coming from with the highest accuracy. Then, we built prediction models for
local machines is riskier than those coming from a remote all attacks together using distributed random forest and deep
machine, and it targets both the remote and the local machines learning with the most relevant features. Then we optimized
extensively. the models to have the highest and reasonable accuracy results.
Transmission Control Protocol (TCP) is an essential proto- Our developed model can accurately predict the threat and the
col in the computer network which defines the establishment type of attack, regardless of the class imbalance distribution
of the network connections and conversations, therefore ex- of any cyber dataset. For our future work, we are going to
changing data can be possible via certain rules and standards. experiment more on new and more recent types of attacks.
TCP has certain flags which are bits of (0, 1) that can indicate
the connection state at a time. Those flags are [28]: ACKNOWLEDGMENT
• CWR- Congestion Window Reduced (CWR) is a flag set This work is part of the Smart Mobile Computing initiative
by the sending host to indicate that it received a TCP at Tecore Networks lab, at Florida Atlantic University.
segment with the ECE flag set.
• ECE- (ECN-Echo) a flag that can indicate if the TCP peer R EFERENCES
is ECN capable during 3-way handshake. [1] A. A. Diro and N. Chilamkurti, “Distributed attack detection scheme
• URG- a flag indicates that the URGent pointer field is using deep learning approach for internet of things,” Future Generation
significant Computer Systems, 2017.
• ACK- the flag used to indicate that the ACKnowledgment [2] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A
survey of deep neural network architectures and their applications,”
field is significant. Neurocomputing, vol. 234, pp. 11–26, 2017.
• PSH- used to trigger the push function [3] C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel, and M. Rajarajan, “A
• RST- used to reset the connection (Seen on rejected survey of intrusion detection techniques in cloud,” Journal of Network
and Computer Applications, vol. 36, no. 1, pp. 42–57, 2013.
connections) [4] N. A. Noureldien and I. M. Yousif, “Accuracy of machine learning
• SYN- used to synchronize sequence numbers (Seen on algorithms in detecting dos attacks types,” Science and Technology,
new connections) vol. 6, no. 4, pp. 89–92, 2016.
[5] A. M. M. Ghaleb and S. A. Talab, “Assembly classifier approach to
• FIN- to indicate that no more data from sender (Seen analyze intrusion detection dataset in networks by using data mining
after a connection is closed) techniques.”

IEEE Symposium Series on Computational Intelligence SSCI 2018 637


Fig. 2. The most important features for the aggregated dataset after applying the oversampling in deep learning

[6] B. A. Tama, A. S. Patil, and K.-H. Rhee, “An improved model of Processing Symposium, 2007. IPDPS 2007. IEEE International. IEEE,
anomaly detection using two-level classifier ensemble,” in Information 2007, pp. 1–8.
Security (AsiaJCIS), 2017 12th Asia Joint Conference on. IEEE, 2017, [19] B. Kicanaoglu, “Unsupervised anomaly detection in unstructured log-
pp. 1–4. data for root-cause-analysis,” 2015.
[7] A. L. Buczak and E. Guven, “A survey of data mining and machine [20] J. Schmidhuber, “Deep learning in neural networks: An overview,”
learning methods for cyber security intrusion detection,” IEEE Commu- Neural networks, vol. 61, pp. 85–117, 2015.
nications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016. [21] J. Evermann, J.-R. Rehse, and P. Fettke, “Predicting process behaviour
[8] A. Rege, Z. Obradovic, N. Asadi, E. Parker, R. Pandit, N. Masceri, using deep learning,” Decision Support Systems, 2017.
and B. Singer, “Predicting adversarial cyber-intrusion stages using [22] C. M. Bishop, Neural networks for pattern recognition. Oxford
autoregressive neural networks,” IEEE Intelligent Systems, no. 2, pp. university press, 1995.
29–39, 2018. [23] H. Al-Najada and I. Mahgoub, “Real-time incident clearance time
[9] H. Hindy, E. Hodo, E. Bayne, A. Seeam, R. Atkinson, and X. Bellekens, prediction using traffic data from internet of mobility sensors,” in
“A taxonomy of malicious traffic for intrusion detection systems,” arXiv Dependable, Autonomic and Secure Computing, 15th Intl Conf on
preprint arXiv:1806.03516, 2018. Pervasive Intelligence & Computing, 3rd Intl Conf on Big Data In-
[10] A. AlEroud and G. Karabatis, “Methods and techniques to identify telligence and Computing and Cyber Science and Technology Congress
security incidents using domain knowledge and contextual information,” (DASC/PiCom/DataCom/CyberSciTech), 2017 IEEE 15th Intl. IEEE,
in Integrated Network and Service Management (IM), 2017 IFIP/IEEE 2017, pp. 728–735.
Symposium on. IEEE, 2017, pp. 1040–1045. [24] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward
[11] G. Loukas, T. Vuong, R. Heartfield, G. Sakellari, Y. Yoon, and D. Gan, developing a systematic approach to generate benchmark datasets for
“Cloud-based cyber-physical intrusion detection for vehicles using deep intrusion detection,” computers & security, vol. 31, no. 3, pp. 357–374,
learning,” IEEE Access, vol. 6, pp. 3491–3508, 2018. 2012.
[12] L. Wang and R. Jones, “Big data analytics for network intrusion detec- [25] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating
tion: A survey,” International Journal of Networks and Communications, a new intrusion detection dataset and intrusion traffic characterization.”
vol. 7, no. 1, pp. 24–31, 2017. in ICISSP, 2018, pp. 108–116.
[13] J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques: [26] E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, “Smote-rsb*: a
concepts and techniques. Elsevier, 2011. hybrid preprocessing approach based on oversampling and undersam-
[14] D. Parikh and P. Tirkha, “Data mining & data stream miningopen pling for high imbalanced data-sets using smote and rough sets theory,”
source tools,” International Journal of Innovative Research in Science, Knowledge and information systems, vol. 33, no. 2, pp. 245–265, 2012.
Engineering and Technology, vol. 2, no. 10, pp. 5234–5239, 2013. [27] A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method
[15] Z. Najafian, V. Aghazarian, and A. Hedayati, “Signature-based method for learning from imbalanced data sets,” Computational intelligence,
and stream data mining technique performance evaluation for security vol. 20, no. 1, pp. 18–36, 2004.
and intrusion detection in advanced metering infrastructures (ami),” [28] V. Srinivasan, S. Suri, and G. Varghese, “Packet classification using tuple
International Journal of Computer and Electrical Engineering, vol. 7, space search,” in ACM SIGCOMM Computer Communication Review,
no. 2, p. 128, 2015. vol. 29, no. 4. ACM, 1999, pp. 135–146.
[16] H. Al Najada and X. Zhu, “isrd: Spam review detection with imbalanced
data distributions,” in Information Reuse and Integration (IRI), 2014
IEEE 15th International Conference on. IEEE, 2014, pp. 553–560.
[17] R. Balasubramanian and S. Joseph, “Intrusion detection on highly
imbalance big data using tree based real time intrusion detection system:
effects and solutions,” Int. J. Adv. Res. Comput. Commun. Eng, vol. 5,
no. 2, pp. 27–32, 2016.
[18] L. Zhang and G. B. White, “An approach to detect executable content for
anomaly based network intrusion detection,” in Parallel and Distributed

638 IEEE Symposium Series on Computational Intelligence SSCI 2018

You might also like