0% found this document useful (0 votes)
9 views

A CNN-based Attack Classification Versus An AE-based Unsupervised Anomaly Detection For Intrusion Detection Systems

The document discusses using a CNN and autoencoder models for intrusion detection. A CNN is used for traffic classification and an autoencoder is used for anomaly detection. The models achieve low false alarm and false negative rates and the autoencoder works well against many attack types. Both models show high performance.

Uploaded by

KISETU
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

A CNN-based Attack Classification Versus An AE-based Unsupervised Anomaly Detection For Intrusion Detection Systems

The document discusses using a CNN and autoencoder models for intrusion detection. A CNN is used for traffic classification and an autoencoder is used for anomaly detection. The models achieve low false alarm and false negative rates and the autoencoder works well against many attack types. Both models show high performance.

Uploaded by

KISETU
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Proc.

of the International Conference on Electrical, Computer and Energy Technologies (ICECET 2022)
20-22 July 2022, Prague-Czech Republic

A CNN-based Attack Classification versus an


2022 International Conference on Electrical, Computer and Energy Technologies (ICECET) | 978-1-6654-7087-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICECET55527.2022.9873072

AE-based Unsupervised Anomaly Detection for


Intrusion Detection Systems
Jean Claude Joseph Badji Cherif Diallo
Lab. d’Algèbre, Cryptographie, Codes et Applications Lab. d’Algèbre, Cryptographie, Codes et Applications
Lab. LACCA, Dept. Informatique, UFR SAT Lab. LACCA, Dept. Informatique, UFR SAT
Université Gaston Berger, BP 234, Saint-Louis, Sénégal Université Gaston Berger, BP 234, Saint-Louis, Sénégal
E-mail: [email protected] E-mail:[email protected]

Abstract—As the cyber threat landscape expands, attacks are IDS, is the high rate of false positives. Added to this, the IDS
becoming stealthier, faster and smarter. Traditional security tech- has a limited capacity for analysis in terms of throughput. To
niques therefore become ineffective against polymorphic threats address these drawbacks of IDS, research has been oriented
and zero-day attacks. Thus, research is increasingly oriented
towards AI. Machine Learning (ML) quickly showed its limits on machine learning techniques (ML) to improve the detection
due to the amount of data and the high dimensionality imposed by ratio of both known and zero-day attacks and to reduce the FP
the Big Data era, and the workload on manual feature extraction. and FN. But due to the amount and the high dimensionality
IDS based on ML has thus shown poor performance and Deep of data, some ML models like SVM [2] showed their limits.
learning could therefore be a possible solution. In this paper, Deep Learning (DL) remains the ideal candidate to:
we propose traffic classification by a one-dimensional CNN and
anomaly detection by a deep/stacked autoencoder (DAE). The • handle the high dimensionality and the amount of data;
evaluation of the proposed models show that the false alarm • learn and extract itself the features of network traffic and
rate (FAR) and the false negative rate (FNR) are very low.
their correlation;
Additionally, the DAE model works well against almost any
attack. Finally, both models show high performance. • and infer in consequence according to labels or threshold.
Index Terms—intrusion detection system; anomaly detection; Some recent studies have shown that DL could be a reliable
deep learning; convolutional neural network; auto-encoder;
solution for IDS. Most of them proposed various interesting
DL models but used datasets such as KDD and NLS-KDD
I. I NTRODUCTION that are outdated. The Communications Security Establishment
Nowadays organizations due to the Internet and technolo- (CSE) and the Canadian Institute for Cybersecurity (CIC) in a
gies are exposed to threats, intrusions, and attacks. Due to collaborative project provide the most up-to-date dataset. The
these threats, intrusions and attacks, cybersecurity becomes CSE-CIC-IDS2018 is a big data dataset with 15 traffic classes
then the main concern. To address this exposition, security is (one benign and 14 different attacks). There are only a few
built, with defined security policies, by deploying firewalls at studies with the CSE-CIC-IDS2018 and some of them have
the perimeter and intrusion detection systems (IDS) inside the shown poor detection performance because of imbalanced
network. Intrusion detection is a set of mechanisms and prac- classes on available datasets.
tices used towards detecting errors that may lead to security In this study, we propose an unsupervised anomaly detec-
failures, and diagnosing intrusions and attacks and an IDS is tion by a deep auto-encoder (DAE) and a multi-class traffic
its implementation. The IDS have been using the signature classification by a convolutional neural network (CNN). The
of attacks and anomaly-based techniques for detection [1], CSE-CIC-IDS2018 dataset is studied in more depth through
[2]. The signature-based IDS compares the traffic data to a an exploratory data analysis and the general pre-processing
signature database of known attacks for detection. More recent technique is used to clean data before training and testing the
IDS use anomaly detection based on different kinds of profiles models. To address the data imbalance problem, a random
to detect deviations from these defined profiles. oversampling technique is used for multi-class traffic classifi-
In practice, despite their utility, IDS suffer from many cation. The models are trained with much more data than in
problems. The first problem is the existing rate of false similar previous studies.
negatives (FN) where attacks are flagged as normal traffic, The outline of this paper is organized as follows. In Section
due to the inability of the IDS to detect unknown, zero-day II, we investigate the DL-based IDS specifically the CNN-
attacks. The second problem, mainly seen in Anomaly-based based and the DAE-based studies with CSE-CIC-IDS2018. In

978-1-6654-7087-2/22/$31.00 © 2022 IEEE


Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
section III, we give details about the proposed methods. In obtain interesting results with high Accuracy of 99% with
Section IV, the exploratory data analysis, pre-processing for both datasets. However, their evaluation process is not ho-
each method and the results of the experiments are detailed. mogeneous as they didn’t use the same metrics for binary
The results are discussed in Section V. Section VI concludes classification (F1-SCORE) and multi-class classification (only
this paper. Accuracy which is not enough). They don’t also give detailed
results of experiments with CSE-CIC-IDS2018.
II. R ELATED W ORK
In [8], authors proposed the same CNN model as previous
We found in literature numerous recent studies on DL-based authors but with a different pre-processing method as they
intrusion detection, zero-day detection, anomaly detection, discretize features before converting data into images. They
multi-class classification etc. To restrict borders for this study, compared their proposed with a 1D-CNN and showed that
we depict in priority some studies which used the most recent discretization reduces the training time of models. The authors
dataset (CSE-CIC-IDS2018). used the NSL-KDD & the CSE-CIC-IDS2018 datasets and
In [3], authors proposed a CNN-based model for intrusion their results are shown on the CSE-CIC-IDS2018 dataset
detection. They trained a 2-layer CNN with 79 data features percentage average of 98% (Accuracy, Precision, Recall) when
from the CSE-CIC-IDS2018 dataset subdivided into 10 sub- they discretize only continuous features and 97% when they
datasets (subdivision per day). They trained and validated discretize both continuous and discrete features. However,
their model with each sub-dataset in image format. Before their proposed framework showed poor performance on the
pre-processing, their model performs well in classifying most NSL-KDD dataset.
attack traffic. However, the model presents very poor perfor- In [9], authors proposed an intrusion detection model based
mance on web attacks, infiltration, DoS Goldeneye and DoS on auto-encoder and XGBoost. Their model structure got
Slowloris. The proposed model clearly struggles with class a pre-processing module with data cleansing, oversampling,
imbalance. They also present results after preprocessing which feature selection by Random Forest and feature grouping, and
showed improvements. an auto-encoder module trained with normal data and tested
Authors of [4] proposed a CNN model based on the with normal and abnormal data. The threshold and the recon-
inception model of the Google team. The 9-layer CNN is struction error obtained, are used by the XGBoost classifier for
evaluated with the KDDCUP99 dataset with data encoded and detection. Their proposed model has very good performance
scaled. They obtain 94.11%, 2.18% and 93.21% of Accuracy, including the Recall. However, the model struggles to detect
fall out and detection rate respectively. Their CNN needs to be infiltration and SQL injection attacks launched from inside the
evaluated on more up-to-date datasets. The model also needs network.
improvement as they said in terms of false alarms rate.
Authors in [10] proposed ZED-IDS and its improvement
In [5], authors trained a 3-layer 2D-CNN with the KDD99
2L-ZED-IDS [11] based on a stacked auto-encoder (SAE).
dataset. They used few data i.e., just 10% of the dataset and
ZED-IDS is a single layer SAE that performs well on the two
Accuracy as the only metric. The model achieved 99.23% but
most recent datasets but raises many false alarms (4.9% as the
after 800 epochs. The authors didn’t give details about FP and
highest rate for DoS attacks). To decrease the false alarms rate
FN.
(FAR), the authors proposed a 2 layers 2L-ZED-IDS. In their
Authors of [6] fed a 1 to 3 layers 1D-CNN with the
last model, they use the output of the first SAE in the first
dataset UNSW NB15 in their studies. They also combined
layer to train and evaluate two other SAE in the second layer.
a 1D-CNN and an LSTM network. They chose 43 features
The one is trained with benign and FN, the second is trained
from this unbalanced dataset to feed their models. They
by the attacks and the FP classified by the first layer. Their
obtained precision values of 76.76%, 77.99%, 80.33% and
results show that FAR dropped by more than 50%. Indeed, the
79.57% for the 1D-CNN (1 to 3 layers) and 1D CNN+LSTM
other performance metrics also increased but slightly.
respectively and exceptional Recall percentages between 98
and 99.61% for their different configurations. We note that
there are high rate of FP and a very low rate of FN. The
authors used a random oversampling technique to balance the III. M ETHODOLOGY
data and re-conduct the experiment with the same models. The
experiments with balanced data show improvements in terms This paper proposes a study of traffic classification perfor-
of Precision (about 87.53% for their best model) reducing the mances of a 1D CNN for the CSE-CIC-IDS2018 dataset. In
false alarms rate. But the false-negative rate increases as the parallel, we propose a study of anomaly detection for the same
Recall performance remain very good but decreases about 2%. dataset by a DAE. In this section, we describe approaches
This study illustrates well the problem of data imbalance for used for data pre-processing to train and evaluate the proposed
deep learning. models. We begin with an exploratory data analysis (EDA)
In [7], authors proposed a CNN-based network IDS (NIDS) that helps us with data pre-processing. We take the data to
against DoS attacks. For their study, they used KDD & CSE- an appropriate shape to train and evaluate respectively each
CIC-IDS2018 dataset reshaped into RGB color and grayscale model. After EDA and pre-processing description, more details
image format to fit 18 different configurations of CNN. They about the proposed models are given.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
A. EDA and pre-processing

An EDA is a start and an important step in similar studies.


An EDA on CSE-CIC-IDS2018 permits us to more under-
standing of the data, in order to know how to exploit them
well. This step is often neglected but plays an essential role in
driving a good data pre-processing. The EDA is going through
shape analysis, content analysis, traffic distribution analysis,
features analysis, and correlation.
It’s shown that in the AI field, pre-processing defines model
Fig. 1. 1D CNN architecture
performances, [3] illustrates well its impacts on the perfor-
mances of their CNN model. Before we feed models with data,
the pre-processing is going through finding NaN, duplicated
and infinite values to eliminate or to impute, encoding labels,
separating data per label and random oversampling for the
case of attack classification, splitting the dataset into train set,
validation set, and test set, eliminating useless features and
normalizing data.

B. Proposed Models

For attack classification, The proposed CNN, adapted from


[12] , have a three-level stack of Conv1D and MaxPooling1D Fig. 2. Deep Autoencoder Architecture
layers. Each level of the stack has:
• A Conv1D with 64 filters, 6 as kernel size and a shape of
(No . of features, 1) as input shape. The convolution layer a threshold between normal and abnormal data. The threshold
extracts the most complex features of patterns from data; is calculated with the equation (1).
• A Batch Normalization which helps the network to con- threshold = µ(L) + nσ(L) or threshold = max(L) (1)
verge faster to the global minimum;
• A MaxPooling1D with a pool size of 3 and 2 strides Where µ is the mean of training loss (L), the standard
which selects the max value of the region where the deviation of (L) and n = 1, 2 ou 3.
filter is applied. The output of conv1D is compressed to The DAE, adapted from [14], has an encoder with 5 hidden
a smaller size by the max-pooling layer. layers, a bottleneck, and a decoder with 5 hidden layers. We
Then comes the flatten layer which reshapes data from mul- insert two batch normalization layers between layers in the
tiple dimensions to 1 dimension, followed by fully connected encoder and two in the decoder to accelerate convergence and
layers with 64 units and the number of labels as the number of prevent overfitting. The input and output dimensions are equal
units for the output layer (figure 1). The classification is made to the number of features (figure 2).
by the fully connected layers. We use the SoftMax activation We use Adam as an optimizer in the two models, categorical
function in the output layer and RELU in all other layers. cross-entropy as the loss function for CNN attack classifica-
The batch normalization layer in each convolutional layer is tion, and MSE for DAE anomaly detection.
also included to prevent overfitting. We have the possibility to
IV. E XPERIMENTS
use Conv1D and MaxPooling1D to feed the model with 1D
shaped data; hence we choose to not add a supplementary step We do experiments with the two models (figure 3,4) using
of pre-processing which reshape data into image format and almost all the amount of data in the dataset. We train and test
increase time processing. the CNN model against each category of attack separately. In
For anomaly detection, we use a DAE in unsupervised learn- the case of anomaly detection, we train and validate the DAE
ing. In the encoding phase by reducing data dimensionality, with the normal traffic before testing it with abnormal ones.
DAE learns the most relevant features of the train set at the This section describes the different steps we follow to train and
bottleneck. In the decoding phase, with the representation from test the models and presents the results of the experiments.
that layer, the DAE tries to reconstruct data while minimizing
loss or reconstruction error (RE) at each epoch. The RE is A. EDA
calculated with Mean Squared Error (MSE) or Mean Absolute We drive an EDA to explore the dataset in terms of shape
Error (MAE). In resume, DAE compresses data and builds its and content. The main motivation for this analysis is to
own latent representation, then reconstructs data to its original respond to the following questions: (i) What is the shape of
representation with minimal error [13]. The DAE defines then data? (ii) What type of data do we find on the dataset? (iii)

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
How are they distributed? (iv) What are the features? (v) How
are they correlated?
The CSE-CIC uses 50 machines to attack a victim organiza-

INFERENCE
tion which has 5 departments and includes 420 machines and
30 servers. The dataset of +450GB is made of network traffic
and system log captures of each machine [15]. Alongside raw
data, CICFlowMeter-V3 is used then to extract 80 features TEST SET

from the captured traffic. We got then a set of 10 subsets,


captured in 10 different days, made of 16,232,943 samples and
84 features. Among these 84 features, only 80 are common PREPROCESSING
to all 10 subsets, this is why it is common to see proposed  DATA CLEANSING
 DATA SEPARATION TRAIN/VAL SET

pre-processing remove these last 4 features. In this dataset, we


find features of benign network traffic data and 14 different

TRAFFIC
most recent attacks traffic data. Information about features is DAE MODEL

provided in Table I.
The full description of each feature is listed in [15]. Table II Fig. 3. Anomaly detection experiment workflow
shows details about the distribution of different attacks on the
datasets. It is a very unbalanced dataset with about benign 13
484 708 samples i.e., about 83%. This data imbalance could • For the CNN model, we perform a random oversampling
lead to a generalization problem. to obtain equal distribution for all classes, as we know,
The feature names and information are clearly shown there that DL models are vulnerable to class imbalance. We
might be a correlation between features. It’s obvious that split data into train, validation, and test sets while extract-
packet minimal length depends on protocol, or packet header ing targets and converting them to categorical targets.
length depends on total packet length and protocol, etc. The After these steps, we create a pipeline to normalize data
correlation analysis is done to check the redundant features with the Normalizer function and scale them with the Min-
and eliminate them. A deeper analysis is done in [16] on the MaxScaler function. The utility of normalization is explained
dataset to determine which features influence the most training well in [17]. We finally reshape data to (data, 1) format after
and evaluation processes. data normalization for the CNN model. We train the models
with 78 features of 80. Label and timestamp are suppressed
B. Pre-processing
for good generalization.
In the EDA step, we also check the percentage of NaN and
duplicated values. The dataset contains less than 1% of NaN C. Results
and 3.48% of duplicated values. We began the pre-processing We performed the experiments using TensorFlow with
by suppressing NaN and duplicated values as they represent Keras, NumPy, Panda, Scikit Learn, and the Python3 language.
a very small portion, then we replace the infinite values with The environment is an HP EliteBook 840 G1 with a Core™ i5-
NaN before suppressing them. We encode labels using the 4310U processor (4x 2.6 GHz), Ram 12 GB + 60 GB fixedly
label encoder function and separate data per label. At this
stage, we have different pre-processing for each model.
TABLE II:
• For the DAE, we isolate the attack traffic for test purposes T HE NUMBER OF FLOWS PER ATTACK TYPE
and split the benign traffic into 50% of train/validation
CSE-CIC-IDS2018
sets and 50% of the test set. The targets aren’t used here Traffic Number of samples Rate
because we go for unsupervised anomaly detection. Benign 13484708 83.07%
DDOS attack-HOIC 686012
DDoS attacks-LOIC-HTTP 576191 7.79%
TABLE I: DDoS attacks-LOIC-UDP 1730
CSE-CIC-IDS2018 FEATURES INFORMATION DoS attacks-Hulk 461912
Dos attacks-SlowHTTPTest 139890
CSE-CIC-IDS2018 4.03%
Dos attacks-GoldenEye 41508
Features Features information Dos attacks-Slowloris 10990
0 – 3, 82 Network connections
Botnet 286191 1.76%
4 - 15 Networks packets
16 - 21 Network flows FTP-BruteForce 193360
2.35%
SSH-BruteForce 187589
22 – 44 Statistic Network flow
45 – 62 Packets content Infiltration 161934 0.99%
63 – 66 Subflow packets Brute Force -Web 611
0.006%
67 – 78 Traffic features Brute Force -XSS 230
79 Label SQL Injection 87
80 – 83 Flow ID, source and dest IP addresses Total 16 232 943

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
CLASSIFICATION CLASSIFICATION
DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #1 OVERSAMPLING
TRAIN/VAL/TEST SET

DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #2 OVERSAMPLING
TRAIN/VAL/TEST SET

SEPARATION
PER ATTACK
CSE‐CIC TYPE
IDS2018

CLASSIFICATION CLASSIFICATION
DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #5 OVERSAMPLING
TRAIN/VAL/TEST SET

Fig. 5. Learning Curves of DAE


DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #6 OVERSAMPLING
TRAIN/VAL/TEST SET

TABLE III:
1D CNN
C LASSIFICATION REPORT WHEN CNN TRAINED PER ATTACK TYPE

Fig. 4. Attack classification experiment workflow Classification report


Traffic
Precision Recall F1-score FAR FNR
Benign 100% 100% 100%
0.018% 0.122%
allocated from the hard disk as a paging file, and an SSD hard Bot 100% 100% 100%
disk of 160 GB. We performed experiments of 50 iterations, Benign 100% 99% 99%
with an Earlystopping mechanism with patience at 5 or 15 to Brute Force Web 86% 86% 86%
1.225% 0.081%
stop training in case of overfitting and with Batch Sizes of 32, Brute Force XSS 97% 85% 90%
SQL injection 88% 99% 93%
128, 256, and 512 to find the best hyperparameters because
the resources are insufficient to find them automatically with Benign 100% 100% 100%
DDoS HOIC 100% 100% 100%
a Keras Tuner [18]. 0.336% 0.005%
DDoS LOIC UDP 100% 100% 100%
The objective of this study is to find a way to minimize DDoS LOIC HTTP 100% 100% 100%
the FAR and increase the detection rate by reducing the FNR.
Benign 100% 100% 100%
In traffic classification experiments, we use the metrics like DoS GoldenEye 100% 100% 100%
Precision, Recall, and F1-score which give more details about DoS Hulk 100% 100% 100% 0.018% 0%
FP and FN than Accuracy. The Precision is the ratio of the DoS SlowHTTPTest 100% 100% 100%
number of correctly classified attacks to the total number of DoS Slowloris 100% 100% 100%
classified attacks. The Recall measures the model’s ability to Benign 100% 100% 100%
correctly identify the true positives. The F1-score metric is a FTP Brute Force 100% 100% 100% 0.022% 0.003%
harmonic mean of the Precision and Recall. SSH Brute Force 100% 100% 100%
Benign 54% 87% 67%
13.070% 72.433%
2 ∗ P recision ∗ Recall Infiltration 68% 28% 39%
F 1 − score = (2)
P recision + Recall

T rueP ositives anomalies of DDoS HOIC increase to 75.503% and 100%


P recision = (3)
T rueP ositives + F alseP ositives with other thresholds but the DDoS-LOIC-UDP attack remains
undetectable to DAE. The DoS Slowloris detection presents
T rueP ositives also a low percentage of detected anomalies (max 28.128%).
Recall = (4)
T rueP ositives + F alseN egatives The DAE also detects 30.572% anomalies in normal traffic
1) DAE model vs. attacks - an anomaly detection: We used in the testing process as shown in Table IV.
drive a regression experiment with the DAE as it determines 2) CNN model vs. attacks - a traffic classification: Rather
a threshold between normal data and abnormal data. Figure than training the CNN model with all the data, we found it
5 shows the learning curve of DAE. The prediction function more interesting to study the behavior of the CNN model by
calculates the reconstruction error and positions data recon- type of attack. The results show that the CNN model is very
structed compared to the threshold. If they are superior to the effective with a very good performance against botnets, DDoS
threshold, they are detected as an anomaly. For instance, we attacks, DoS attacks, FTP, and SSH Brute forces. Against these
compute a threshold with a variable parameter “n” according attacks, the results show very low FAR and FNR. On the other
to (1). We obtain DAE anomaly detection results for 4 different hand, this CNN has average to good performance (92.75%
thresholds as reported in Table IV. The DAE detects almost of Precision, 92.25% of Recall) against web attacks with a
all attacks with an acceptable to a very good percentage as FAR of 1,225% but an acceptable FNR. The model especially
anomalies except DDoS HOIC and DDoS-LOIC-UDP with re- struggles against infiltration attacks with a very low Recall of
spectively 0% and 1.098% of detected anomalies. The detected 28%. More details about the results are given in Table III.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
TABLE IV:
P ERCENTAGE OF DETECTED ANOMALIES PER THRESHOLD

Traffic Threshold (n=1) Threshold (n=2) Threshold (n=3) Threshold (max loss)
Benign test 32.572% 32.572% 32.572% 100%
Bot 99.998% 100% 100% 100%
Brute Force Web 99.836% 99.836% 100% 100%
Brute Force XSS 100% 100% 100% 100%
DDoS HOIC 0% 75.503% 75.503% 100%
DDoS LOIC UDP 1.098% 1.156% 1.156% 8.035%
DDoS LOIC HTTP 99.876% 99.878% 99.878% 99.894%
DoS GoldenEye 98.280% 98.664% 98.794% 100%
DoS Hulk 97.434% 97.447% 99.474% 100%
DoS SlowHTTPTest 68.970% 83.121% 97.276% 100%
DoS Slowloris 25.318% 26, 991% 28.128% 100%
FTP brute Force 58.265% 70.386% 85.193% 100%
Infiltration 97.110% 97.841% 98.241% 100%
SQL injection 94.252% 94.253% 100% 100%
SSH brute Force 79.974% 79.975% 99.818% 100%

V. D ISCUSSION
In the field of Cybersecurity, both Recall and Precision are
important because we need a Recall as high as possible, but we
also need high Precision as we don’t want a huge workload try-
ing to find ghost attacks. This is why cybersecurity researchers
look for a good F1 score in AI-based IDS. In addition to FAR,
the experiment of traffic classification described above shows
a few attack misclassifications as the model make mistakes
in predicting an attack as another. This kind of mistake is
not severe as this could lead to the right alarm anyway. An
attack is an attack. These experiments allowed us to see that
CNN networks are more sensitive to certain attacks than to
others (see Table III). Driven experiments in that way are
closer to reality than taking all data to train the model. In
a specific attack scenario, benign traffic is correlated to attack
traffic and it would be rare for an attacker to run many
different types of attacks at the same time on the same victim.
This is why we found it more interesting to train models by
attack type. Indeed, we found it more interesting to implement Fig. 6. Confusion Matrix for CNN vs. All Attacks
in IDS, models trained per attack type. CNN trained with
a specific attack type is more accurate. To illustrate that,
we decide to train the same CNN model with the whole detected anomalies of attack traffic effectively increases. Thus,
dataset. Results given in Table V show that models trained the threshold when n = 2 and n = 3 appears to be more optimal
with the whole dataset, present poorer performance in terms for detecting attacks. DDoS LOIC UDP and the DoS Slowloris
of F1-score and FAR. Attacks like infiltration blur the model detected anomalies percentages remain low with the different
which makes more mistakes in classifying benign traffic than thresholds. With the last threshold, DAE effectively detects all
previous models. This also affects DoS attacks classification anomalies except DDoS LOIC UDP but raises many alarms
as shown in figure 6. In this confusion matrix, the number as it detects all the benign traffic in the test set as anomalies.
of FP is the sum of the row of benign traffic in true classes DoS Slowloris use slower and partial requests to use up server
except for the diagonal value and the number of FN is the sum resources as it will never be able to release any of the open
of the column of benign traffic in predicted classes except for partial connections. This attack tries to mimic normal traffic
the diagonal value. and this is why the DAE struggles to detect it. DDoS LOIC
Anomaly detection is all about threshold as we can see it UDP attack is a flood of UDP packets by thousand coordinated
in the results in Table IV. The model needs to define the right users (e.g. botnet) on the same victim. Since traffic from a
threshold to prevent a high FAR and FNR. We run multiple single attacker in this attack is ‘normal’ that would explain
experiments by varying the n values from 1 to 3 and by setting why DAE cannot detect it.
the threshold to the max of the training loss. The percentage of Both models can handle unknown data as they present

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
TABLE V: R EFERENCES
C LASSIFICATION REPORT WHEN CNN TRAINED WITH THE WHOLE DATA
[1] L. H. Yeo, X. Che, and S. Lakkaraju, “Modern intrusion detection
Classification report systems,” CoRR, vol. abs/1708.07174, 2017.
Traffic [2] K. Ansam, G. Iqbal, V. Peter, and K. Joarder, “Survey of intrusion
Precision Recall F1-score FAR FNR
detection systems: techniques, datasets and challenges,” Cybersecurity,
Benign 75% 43% 55% vol. 2, p. 20, jul 2019.
Bot 100% 100% 100% [3] J. Kim, Y. Shin, and E. Choi, “An intrusion detection model based
Brute Force Web 85% 81% 83% on a convolutional neural network,” Journal of Multimedia Information
Brute Force XSS 89% 93% 91% System, vol. 6, no. 4, pp. 165–172, 2019.
DDoS HOIC 100% 100% 100% [4] L. Yong and Z. Bo, “An intrusion detection model based on multi-scale
cnn,” in 2019 IEEE 3rd Information Technology, Networking, Electronic
DDoS LOIC UDP 100% 100% 100%
and Automation Control Conference (ITNEC), pp. 214–218, 2019.
DDoS LOIC HTTP 84% 100% 91% [5] R. U. Khan, X. Zhang, M. Alazab, and R. Kumar, “An improved
DoS GoldenEye 100% 84% 91% 79.818% 1.111% convolutional neural network model for intrusion detection in networks,”
DoS Hulk 99% 100% 99% in 2019 Cybersecurity and Cyberforensics Conference (CCC), pp. 74–
DoS SlowHTTPTest 68% 71% 70% 77, 2019.
DoS Slowloris 99% 100% 100% [6] M. Azizjon, A. Jumabek, and W. Kim, “1d cnn based network intrusion
FTP Brute Force 70% 67% 68% detection with normalization on imbalanced data,” in 2020 International
Conference on Artificial Intelligence in Information and Communication
Infiltration 62% 87% 73%
(ICAIIC), pp. 218–224, 2020.
SQL injection 90% 88% 89% [7] J. Kim, J. Kim, H. Kim, M. Shim, and E. Choi, “Cnn-based network
SSH Brute Force 100% 100% 100% intrusion detection against denial-of-service attacks,” Electronics, vol. 9,
no. 6, 2020.
[8] J. Yoo, B. Min, S. Kim, D. Shin, and D. Shin, “Study on network
intrusion detection method using discrete pre-processing method and
promising performance in validation and prediction. Zero-day convolution neural network,” IEEE Access, vol. 9, pp. 142348–142361,
attacks could be detected. Compared to DAE, the CNN model 2021.
[9] Y. Kang, M. Tan, D. Lin, and Z. Zhao, “Intrusion detection model based
gives more details about classification and can make easier on autoencoder and XGBoost,” Journal of Physics: Conference Series,
the life of networks administrators as DAE only raises alarms vol. 2171, p. 012053, jan 2022.
when anomalies are detected. In comparison, the CNN detects [10] M. Catillo, M. Rak, and U. Villano, “Discovery of dos attacks by the
ZED-IDS anomaly detector,” J. High Speed Networks, vol. 25, no. 4,
easily some attacks like DDoS-LOIC UDP and Slowloris while pp. 349–365, 2019.
DAE really struggles against them. On the other hand, the [11] R. M. Catillo Marta and V. Umberto, “2l-zed-ids: A two-level anomaly
DAE effectively detects web attacks and infiltration while detector for multiple attack classes,” in Web, Artificial Intelligence and
Network Applications (L. Barolli, F. Amato, F. Moscato, T. Enokido,
CNN struggles. Each DL model got its own way to represent and M. Takizawa, eds.), (Cham), pp. 687–696, Springer International
traffic features. A combination of these two models could be Publishing, 2020.
interesting. [12] TensorFlow, “Tensorflow 2 quickstart for experts - cnn.” https://www.
tensorflow.org/tutorials/quickstart/advanced, mar 2018.
[13] J. Jordan, “Introduction to autoencoders..” https://www.jeremyjordan.
VI. C ONCLUSION me/autoencoders/, mar 2018.
[14] TensorFlow, “Intro to autoencoders - third example : Anomaly detec-
We proposed in this paper two models for traffic classifica- tion.” https://www.tensorflow.org/tutorials/generative/autoencoder, apr
2022.
tion by a CNN and anomaly detection by DAE. We explored [15] “Cse-cic-ids2018 on aws.” https://www.unb.ca/cic/datasets/ids-2018.
the processed part of CSE-CIC-IDS2018 and extracted infor- html.
mation about content, features, and correlation. This helped [16] https://github.com/cstub/ml-ids/tree/master/notebooks/02
exploratory-data-analysis.
us in this study to exploit data quite well, pre-processing [17] “Normalizing your data (specifically, input and batch normalization).”
the dataset to train the models. Both models perform quite https://www.jeremyjordan.me/batch-normalization/.
well in the CSE-CIC-IDS2018, the results showed that DAE [18] T. O’Malley, E. Bursztein, J. Long, F. Chollet, H. Jin, L. Invernizzi,
et al., “Kerastuner.” https://github.com/keras-team/keras-tuner, 2019.
can detect almost all attacks effectively. CNN also showed
exceptional results against almost all attacks. By performing
well on the validation set and test set, they show abilities to
detect zero-day attacks with some exceptions. Indeed, some
attacks such as infiltration and DDoS-LOIC UDP remains
stealthy respectively for CNN and DAE because they got
benign traffic properties. The CNN trained per attack type
presented greater performance than CNN trained with all data.
The CNN per attack type results showed improvement in terms
of FAR and FNR. We wanted to illustrate in this study that a
single model is not effective in detecting all the attacks, and
can’t be implemented alone on an IDS. Instead of using a big
model with millions of parameters, we can use multiple little
models each for a specific purpose. The real challenge of the
DAE model is the high FAR to reduce. This problem will be
addressed in future topics.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.

You might also like