A CNN-based Attack Classification Versus An AE-based Unsupervised Anomaly Detection For Intrusion Detection Systems
A CNN-based Attack Classification Versus An AE-based Unsupervised Anomaly Detection For Intrusion Detection Systems
of the International Conference on Electrical, Computer and Energy Technologies (ICECET 2022)
20-22 July 2022, Prague-Czech Republic
Abstract—As the cyber threat landscape expands, attacks are IDS, is the high rate of false positives. Added to this, the IDS
becoming stealthier, faster and smarter. Traditional security tech- has a limited capacity for analysis in terms of throughput. To
niques therefore become ineffective against polymorphic threats address these drawbacks of IDS, research has been oriented
and zero-day attacks. Thus, research is increasingly oriented
towards AI. Machine Learning (ML) quickly showed its limits on machine learning techniques (ML) to improve the detection
due to the amount of data and the high dimensionality imposed by ratio of both known and zero-day attacks and to reduce the FP
the Big Data era, and the workload on manual feature extraction. and FN. But due to the amount and the high dimensionality
IDS based on ML has thus shown poor performance and Deep of data, some ML models like SVM [2] showed their limits.
learning could therefore be a possible solution. In this paper, Deep Learning (DL) remains the ideal candidate to:
we propose traffic classification by a one-dimensional CNN and
anomaly detection by a deep/stacked autoencoder (DAE). The • handle the high dimensionality and the amount of data;
evaluation of the proposed models show that the false alarm • learn and extract itself the features of network traffic and
rate (FAR) and the false negative rate (FNR) are very low.
their correlation;
Additionally, the DAE model works well against almost any
attack. Finally, both models show high performance. • and infer in consequence according to labels or threshold.
Index Terms—intrusion detection system; anomaly detection; Some recent studies have shown that DL could be a reliable
deep learning; convolutional neural network; auto-encoder;
solution for IDS. Most of them proposed various interesting
DL models but used datasets such as KDD and NLS-KDD
I. I NTRODUCTION that are outdated. The Communications Security Establishment
Nowadays organizations due to the Internet and technolo- (CSE) and the Canadian Institute for Cybersecurity (CIC) in a
gies are exposed to threats, intrusions, and attacks. Due to collaborative project provide the most up-to-date dataset. The
these threats, intrusions and attacks, cybersecurity becomes CSE-CIC-IDS2018 is a big data dataset with 15 traffic classes
then the main concern. To address this exposition, security is (one benign and 14 different attacks). There are only a few
built, with defined security policies, by deploying firewalls at studies with the CSE-CIC-IDS2018 and some of them have
the perimeter and intrusion detection systems (IDS) inside the shown poor detection performance because of imbalanced
network. Intrusion detection is a set of mechanisms and prac- classes on available datasets.
tices used towards detecting errors that may lead to security In this study, we propose an unsupervised anomaly detec-
failures, and diagnosing intrusions and attacks and an IDS is tion by a deep auto-encoder (DAE) and a multi-class traffic
its implementation. The IDS have been using the signature classification by a convolutional neural network (CNN). The
of attacks and anomaly-based techniques for detection [1], CSE-CIC-IDS2018 dataset is studied in more depth through
[2]. The signature-based IDS compares the traffic data to a an exploratory data analysis and the general pre-processing
signature database of known attacks for detection. More recent technique is used to clean data before training and testing the
IDS use anomaly detection based on different kinds of profiles models. To address the data imbalance problem, a random
to detect deviations from these defined profiles. oversampling technique is used for multi-class traffic classifi-
In practice, despite their utility, IDS suffer from many cation. The models are trained with much more data than in
problems. The first problem is the existing rate of false similar previous studies.
negatives (FN) where attacks are flagged as normal traffic, The outline of this paper is organized as follows. In Section
due to the inability of the IDS to detect unknown, zero-day II, we investigate the DL-based IDS specifically the CNN-
attacks. The second problem, mainly seen in Anomaly-based based and the DAE-based studies with CSE-CIC-IDS2018. In
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
A. EDA and pre-processing
B. Proposed Models
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
How are they distributed? (iv) What are the features? (v) How
are they correlated?
The CSE-CIC uses 50 machines to attack a victim organiza-
INFERENCE
tion which has 5 departments and includes 420 machines and
30 servers. The dataset of +450GB is made of network traffic
and system log captures of each machine [15]. Alongside raw
data, CICFlowMeter-V3 is used then to extract 80 features TEST SET
TRAFFIC
most recent attacks traffic data. Information about features is DAE MODEL
provided in Table I.
The full description of each feature is listed in [15]. Table II Fig. 3. Anomaly detection experiment workflow
shows details about the distribution of different attacks on the
datasets. It is a very unbalanced dataset with about benign 13
484 708 samples i.e., about 83%. This data imbalance could • For the CNN model, we perform a random oversampling
lead to a generalization problem. to obtain equal distribution for all classes, as we know,
The feature names and information are clearly shown there that DL models are vulnerable to class imbalance. We
might be a correlation between features. It’s obvious that split data into train, validation, and test sets while extract-
packet minimal length depends on protocol, or packet header ing targets and converting them to categorical targets.
length depends on total packet length and protocol, etc. The After these steps, we create a pipeline to normalize data
correlation analysis is done to check the redundant features with the Normalizer function and scale them with the Min-
and eliminate them. A deeper analysis is done in [16] on the MaxScaler function. The utility of normalization is explained
dataset to determine which features influence the most training well in [17]. We finally reshape data to (data, 1) format after
and evaluation processes. data normalization for the CNN model. We train the models
with 78 features of 80. Label and timestamp are suppressed
B. Pre-processing
for good generalization.
In the EDA step, we also check the percentage of NaN and
duplicated values. The dataset contains less than 1% of NaN C. Results
and 3.48% of duplicated values. We began the pre-processing We performed the experiments using TensorFlow with
by suppressing NaN and duplicated values as they represent Keras, NumPy, Panda, Scikit Learn, and the Python3 language.
a very small portion, then we replace the infinite values with The environment is an HP EliteBook 840 G1 with a Core™ i5-
NaN before suppressing them. We encode labels using the 4310U processor (4x 2.6 GHz), Ram 12 GB + 60 GB fixedly
label encoder function and separate data per label. At this
stage, we have different pre-processing for each model.
TABLE II:
• For the DAE, we isolate the attack traffic for test purposes T HE NUMBER OF FLOWS PER ATTACK TYPE
and split the benign traffic into 50% of train/validation
CSE-CIC-IDS2018
sets and 50% of the test set. The targets aren’t used here Traffic Number of samples Rate
because we go for unsupervised anomaly detection. Benign 13484708 83.07%
DDOS attack-HOIC 686012
DDoS attacks-LOIC-HTTP 576191 7.79%
TABLE I: DDoS attacks-LOIC-UDP 1730
CSE-CIC-IDS2018 FEATURES INFORMATION DoS attacks-Hulk 461912
Dos attacks-SlowHTTPTest 139890
CSE-CIC-IDS2018 4.03%
Dos attacks-GoldenEye 41508
Features Features information Dos attacks-Slowloris 10990
0 – 3, 82 Network connections
Botnet 286191 1.76%
4 - 15 Networks packets
16 - 21 Network flows FTP-BruteForce 193360
2.35%
SSH-BruteForce 187589
22 – 44 Statistic Network flow
45 – 62 Packets content Infiltration 161934 0.99%
63 – 66 Subflow packets Brute Force -Web 611
0.006%
67 – 78 Traffic features Brute Force -XSS 230
79 Label SQL Injection 87
80 – 83 Flow ID, source and dest IP addresses Total 16 232 943
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
CLASSIFICATION CLASSIFICATION
DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #1 OVERSAMPLING
TRAIN/VAL/TEST SET
DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #2 OVERSAMPLING
TRAIN/VAL/TEST SET
SEPARATION
PER ATTACK
CSE‐CIC TYPE
IDS2018
CLASSIFICATION CLASSIFICATION
DATA CLEANSING
RANDOM EQUAL
ATTACK TYPE #5 OVERSAMPLING
TRAIN/VAL/TEST SET
TABLE III:
1D CNN
C LASSIFICATION REPORT WHEN CNN TRAINED PER ATTACK TYPE
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
TABLE IV:
P ERCENTAGE OF DETECTED ANOMALIES PER THRESHOLD
Traffic Threshold (n=1) Threshold (n=2) Threshold (n=3) Threshold (max loss)
Benign test 32.572% 32.572% 32.572% 100%
Bot 99.998% 100% 100% 100%
Brute Force Web 99.836% 99.836% 100% 100%
Brute Force XSS 100% 100% 100% 100%
DDoS HOIC 0% 75.503% 75.503% 100%
DDoS LOIC UDP 1.098% 1.156% 1.156% 8.035%
DDoS LOIC HTTP 99.876% 99.878% 99.878% 99.894%
DoS GoldenEye 98.280% 98.664% 98.794% 100%
DoS Hulk 97.434% 97.447% 99.474% 100%
DoS SlowHTTPTest 68.970% 83.121% 97.276% 100%
DoS Slowloris 25.318% 26, 991% 28.128% 100%
FTP brute Force 58.265% 70.386% 85.193% 100%
Infiltration 97.110% 97.841% 98.241% 100%
SQL injection 94.252% 94.253% 100% 100%
SSH brute Force 79.974% 79.975% 99.818% 100%
V. D ISCUSSION
In the field of Cybersecurity, both Recall and Precision are
important because we need a Recall as high as possible, but we
also need high Precision as we don’t want a huge workload try-
ing to find ghost attacks. This is why cybersecurity researchers
look for a good F1 score in AI-based IDS. In addition to FAR,
the experiment of traffic classification described above shows
a few attack misclassifications as the model make mistakes
in predicting an attack as another. This kind of mistake is
not severe as this could lead to the right alarm anyway. An
attack is an attack. These experiments allowed us to see that
CNN networks are more sensitive to certain attacks than to
others (see Table III). Driven experiments in that way are
closer to reality than taking all data to train the model. In
a specific attack scenario, benign traffic is correlated to attack
traffic and it would be rare for an attacker to run many
different types of attacks at the same time on the same victim.
This is why we found it more interesting to train models by
attack type. Indeed, we found it more interesting to implement Fig. 6. Confusion Matrix for CNN vs. All Attacks
in IDS, models trained per attack type. CNN trained with
a specific attack type is more accurate. To illustrate that,
we decide to train the same CNN model with the whole detected anomalies of attack traffic effectively increases. Thus,
dataset. Results given in Table V show that models trained the threshold when n = 2 and n = 3 appears to be more optimal
with the whole dataset, present poorer performance in terms for detecting attacks. DDoS LOIC UDP and the DoS Slowloris
of F1-score and FAR. Attacks like infiltration blur the model detected anomalies percentages remain low with the different
which makes more mistakes in classifying benign traffic than thresholds. With the last threshold, DAE effectively detects all
previous models. This also affects DoS attacks classification anomalies except DDoS LOIC UDP but raises many alarms
as shown in figure 6. In this confusion matrix, the number as it detects all the benign traffic in the test set as anomalies.
of FP is the sum of the row of benign traffic in true classes DoS Slowloris use slower and partial requests to use up server
except for the diagonal value and the number of FN is the sum resources as it will never be able to release any of the open
of the column of benign traffic in predicted classes except for partial connections. This attack tries to mimic normal traffic
the diagonal value. and this is why the DAE struggles to detect it. DDoS LOIC
Anomaly detection is all about threshold as we can see it UDP attack is a flood of UDP packets by thousand coordinated
in the results in Table IV. The model needs to define the right users (e.g. botnet) on the same victim. Since traffic from a
threshold to prevent a high FAR and FNR. We run multiple single attacker in this attack is ‘normal’ that would explain
experiments by varying the n values from 1 to 3 and by setting why DAE cannot detect it.
the threshold to the max of the training loss. The percentage of Both models can handle unknown data as they present
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.
TABLE V: R EFERENCES
C LASSIFICATION REPORT WHEN CNN TRAINED WITH THE WHOLE DATA
[1] L. H. Yeo, X. Che, and S. Lakkaraju, “Modern intrusion detection
Classification report systems,” CoRR, vol. abs/1708.07174, 2017.
Traffic [2] K. Ansam, G. Iqbal, V. Peter, and K. Joarder, “Survey of intrusion
Precision Recall F1-score FAR FNR
detection systems: techniques, datasets and challenges,” Cybersecurity,
Benign 75% 43% 55% vol. 2, p. 20, jul 2019.
Bot 100% 100% 100% [3] J. Kim, Y. Shin, and E. Choi, “An intrusion detection model based
Brute Force Web 85% 81% 83% on a convolutional neural network,” Journal of Multimedia Information
Brute Force XSS 89% 93% 91% System, vol. 6, no. 4, pp. 165–172, 2019.
DDoS HOIC 100% 100% 100% [4] L. Yong and Z. Bo, “An intrusion detection model based on multi-scale
cnn,” in 2019 IEEE 3rd Information Technology, Networking, Electronic
DDoS LOIC UDP 100% 100% 100%
and Automation Control Conference (ITNEC), pp. 214–218, 2019.
DDoS LOIC HTTP 84% 100% 91% [5] R. U. Khan, X. Zhang, M. Alazab, and R. Kumar, “An improved
DoS GoldenEye 100% 84% 91% 79.818% 1.111% convolutional neural network model for intrusion detection in networks,”
DoS Hulk 99% 100% 99% in 2019 Cybersecurity and Cyberforensics Conference (CCC), pp. 74–
DoS SlowHTTPTest 68% 71% 70% 77, 2019.
DoS Slowloris 99% 100% 100% [6] M. Azizjon, A. Jumabek, and W. Kim, “1d cnn based network intrusion
FTP Brute Force 70% 67% 68% detection with normalization on imbalanced data,” in 2020 International
Conference on Artificial Intelligence in Information and Communication
Infiltration 62% 87% 73%
(ICAIIC), pp. 218–224, 2020.
SQL injection 90% 88% 89% [7] J. Kim, J. Kim, H. Kim, M. Shim, and E. Choi, “Cnn-based network
SSH Brute Force 100% 100% 100% intrusion detection against denial-of-service attacks,” Electronics, vol. 9,
no. 6, 2020.
[8] J. Yoo, B. Min, S. Kim, D. Shin, and D. Shin, “Study on network
intrusion detection method using discrete pre-processing method and
promising performance in validation and prediction. Zero-day convolution neural network,” IEEE Access, vol. 9, pp. 142348–142361,
attacks could be detected. Compared to DAE, the CNN model 2021.
[9] Y. Kang, M. Tan, D. Lin, and Z. Zhao, “Intrusion detection model based
gives more details about classification and can make easier on autoencoder and XGBoost,” Journal of Physics: Conference Series,
the life of networks administrators as DAE only raises alarms vol. 2171, p. 012053, jan 2022.
when anomalies are detected. In comparison, the CNN detects [10] M. Catillo, M. Rak, and U. Villano, “Discovery of dos attacks by the
ZED-IDS anomaly detector,” J. High Speed Networks, vol. 25, no. 4,
easily some attacks like DDoS-LOIC UDP and Slowloris while pp. 349–365, 2019.
DAE really struggles against them. On the other hand, the [11] R. M. Catillo Marta and V. Umberto, “2l-zed-ids: A two-level anomaly
DAE effectively detects web attacks and infiltration while detector for multiple attack classes,” in Web, Artificial Intelligence and
Network Applications (L. Barolli, F. Amato, F. Moscato, T. Enokido,
CNN struggles. Each DL model got its own way to represent and M. Takizawa, eds.), (Cham), pp. 687–696, Springer International
traffic features. A combination of these two models could be Publishing, 2020.
interesting. [12] TensorFlow, “Tensorflow 2 quickstart for experts - cnn.” https://www.
tensorflow.org/tutorials/quickstart/advanced, mar 2018.
[13] J. Jordan, “Introduction to autoencoders..” https://www.jeremyjordan.
VI. C ONCLUSION me/autoencoders/, mar 2018.
[14] TensorFlow, “Intro to autoencoders - third example : Anomaly detec-
We proposed in this paper two models for traffic classifica- tion.” https://www.tensorflow.org/tutorials/generative/autoencoder, apr
2022.
tion by a CNN and anomaly detection by DAE. We explored [15] “Cse-cic-ids2018 on aws.” https://www.unb.ca/cic/datasets/ids-2018.
the processed part of CSE-CIC-IDS2018 and extracted infor- html.
mation about content, features, and correlation. This helped [16] https://github.com/cstub/ml-ids/tree/master/notebooks/02
exploratory-data-analysis.
us in this study to exploit data quite well, pre-processing [17] “Normalizing your data (specifically, input and batch normalization).”
the dataset to train the models. Both models perform quite https://www.jeremyjordan.me/batch-normalization/.
well in the CSE-CIC-IDS2018, the results showed that DAE [18] T. O’Malley, E. Bursztein, J. Long, F. Chollet, H. Jin, L. Invernizzi,
et al., “Kerastuner.” https://github.com/keras-team/keras-tuner, 2019.
can detect almost all attacks effectively. CNN also showed
exceptional results against almost all attacks. By performing
well on the validation set and test set, they show abilities to
detect zero-day attacks with some exceptions. Indeed, some
attacks such as infiltration and DDoS-LOIC UDP remains
stealthy respectively for CNN and DAE because they got
benign traffic properties. The CNN trained per attack type
presented greater performance than CNN trained with all data.
The CNN per attack type results showed improvement in terms
of FAR and FNR. We wanted to illustrate in this study that a
single model is not effective in detecting all the attacks, and
can’t be implemented alone on an IDS. Instead of using a big
model with millions of parameters, we can use multiple little
models each for a specific purpose. The real challenge of the
DAE model is the high FAR to reduce. This problem will be
addressed in future topics.
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on March 19,2024 at 06:58:32 UTC from IEEE Xplore. Restrictions apply.