Detecting IoT Botnet Attacks Using Machine Learning Methods
Detecting IoT Botnet Attacks Using Machine Learning Methods
Abstract--Today, with the technological developments, the use serve them, are harmless and make life easier. However, each
of internet connected devices is increasing. It is a fact that life has of them is a computer that watches and follows the lives of its
become easier with the “Internet of Things (IoT), which users. From this point of view, it can be easily seen that the
contributes to the simultaneous operation of these devices with devices in question are not so innocent about security and
each other. IoT is a technology that designs and does the things
people need to do - within a program - and increases the comfort
privacy. Each of the IoT devices are produced for different
of the user. All the advantages of IoT devices are valid as long as purposes. While performance and cost are in the first place in
they work correctly and securely. However, when these devices do the production of IoT devices, security is in the next place. In
not work properly and securely or are abused by someone, their addition, IoT devices are sometimes produced with limited
advantages as well as disadvantages emerge. The best example of hardware and software according to their intended use. The
this is the IoT-based Botnet attacks in 2016. Machine learning limited resources on them prevent the allocation of resources
methods are used to prevent IoT-based attacks and planned for security. Since the security standard changes according to
attacks. The aim of this study is to detect the normal network countries and brands, a common security concept cannot be
traffic and attack traffic with high accuracy by using machine developed worldwide. This situation leads to deficiencies and
learning methods. The data set used is the N-BaIoT Provision
737E security camera data set, which includes normal network
weaknesses in the security point of IoT devices and makes
traffic and attack network traffic, and has been used in the these devices attractive against cyber attacks. In attacks, IoT
literature. Machine learning has been carried out using this data devices are both directly targeted and used as a tool for other
set. The study was carried out in two ways, with and without attacks. Botnet attacks are the leading attacks using IoT
supervision. EM (Expectation Maximization) algorithm was used devices.
while performing unsupervised learning and 76.73% success was
achieved. In the application performed with supervised learning, In this study, detection of IoT Botnet attacks with machine
the decision tree (J48) algorithm was used and 99.95% success learning methods is explained. Botnet attacks are presented in
was achieved. The application was carried out with the Weka 3.8 the second chapter, and learning models are presented in the
program. third chapter. In the fourth section, the development of the
Keywords–-IoT Botnets, N-BaIoT DDOS Attacks, Machine
application is given. While the comparison results are given
Learning, Cyber Security in the fifth section, the results of the study are given in the last
section.
II. BOTNET ATTACKS
While IoT devices make our lives easier, they work
I. INTRODUCTION simultaneously with other tools over the internet. This increases
With the increasing use of IoT devices in health, military, the number of devices connected to the internet day by day and
industry, commercial and daily fields, it is observed that the attracts the attention of cyber attackers. The synchronous
number of devices connected to the internet is increasing day operation feature of the devices is used for different purposes
by day. According to researches, it is thought that the number with some changes made by the attackers. This leads to Botnet
of devices connected to the internet will be 24 billion in 2020 attacks. Botnet attacks have increased in popularity with the
[1]. Undoubtedly, IoT devices [have a large share in this IoT. The best example of this is the Mirai Botnet attack in 2016
increase. The fact that most of these devices are small does [2]. Cyber attacks on IoT devices can be listed as
not usually give the user the impression of a computer. inaccessibleto the device, stealing the user's information and
However, in the cyber world, every device with an IP address capturing the device for other activities. After the IoT devices
is viewed as a computer. IoT devices have computer features are captured, these devices are assigned tasks in DDOS attacks.
in terms of generating packages, having IP addresses, and The number of devices used in DDOS attacks is important. The
having an operating system. Users forget that these devices number of devices used in the attack determines the effect of
are connected to the internet and see them as devices that the DDOS attack and the result to be obtained. That's why
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.
attackers want to reach more Internet-connected devices and to the accuracy matrix. After learning, the model training is
join their team. These devices are mostly devices of completed and it is ready to use with other data. Choosing the
unconscious users or IoT devices that are used incorrectly. appropriate model is important in machine learning. As
Attackers, while taking over the management of IoT devices, mentioned above, machine learning models are divided into
usually either use the factory settings of the devices or take two as supervised and unsupervised.
advantage of the weak, breakable structure of the passwords
Supervised learning is the distribution of a data set among
given to the devices even if the settings are changed. In short,
different classes using pre-labeled data. The algorithm used in
the attackers capture the target device with factory settings
classification learns this classification from the tagged training
information, user vulnerabilities or brute force attacks.
set. The data in the training set is labeled and applied. Thus, it
Captured devices are made members of the organization
is determined in advance which data will be in which class. The
established by the attacker, the Botnet. The member device is
machine interprets and classifies the next, unclassified data as
moved simultaneously with other devices on the instruction of
it learns. In other words, the machine is expected to carry out
the administrator during the attack. Botnet life cycle; It consists
the training with the data whose classes are certain, then give
of four stages: the occurrence stage, the command and control
this data to the trained model and classify them. This type of
stage, the attack stage and the post attack stage [3]. If the
learning model is called a supervised learning model. Logistic
network behaviors of botnet member IoT devices are detected
Regression, Decision Trees, Linear Regression, Support Vector
and shaped, it is possible to protect the network from these
Machines, Nearest Neighborhood are examples of supervised
devices. Thus, normal network traffic continues to be
learning models. In this study, decision trees algorithm is used
transmitted without being affected by the situation. Some
for supervised learning. Decision trees algorithm, named C4.5
approaches have been developed in the literature to prevent
in the first version, was named J48 with some changes made on
such attacks using machine learning. These approaches can be
it [5]. Decision tree algorithm works with classification logic
listed as signature-based, anomaly-based and hybrid-based
by dividing the data from upper level to lower level. In decision
where both are used in different combinations [4]. The first of
tree learning, class labels at the level of the leaves of the tree by
these methods is to detect malicious traffic by comparing the
creating a tree structure and the manipulations on the features
malicious traffic signatures in the database with the signature
with the branches leading to these leaves from the beginning are
of the incoming traffic. The signature here; It is obtained by
expressed [6]. In the unsupervised learning model, data sets are
passing the information of incoming traffic through the hash
divided into clusters. The main basis of this clustering is that the
algorithm. The second approach is the one developed on the
data in the same cluster has more similar features, but the
basis of abnormal events in network traffic. The third approach
similarity ratio between the clusters is the least. So this method
is to use two approaches together according to the needs of the
is unattended, unlike classification. The clustering process is
network. The working logic of firewalls and programs used in
completely left to the machine. The machine applies a
information systems is based on these approaches. With the
clustering process to the data using various algorithms and in
above approaches, the way firewalls work is divided into two as
doing so, the above mentioned path is followed. Clustering, EM,
intrusion detector (IPS) and intrusion prevention (IDS) [4]. The
PCA are examples of unsupervised learning models.
model developed in this study is in the field of intrusion
Expectation-Maximization (EM) method, one of the
detection.
unsupervised learning methods, was used in this study. EM is
an iterative search method used to find the greatest likelihood
or the largest aftershock estimates of the parameters of
III. MACHINE LEARNING MODELS
statistical models that are dependent on variables not observed
The increase of IoT devices day by day causes an increase in statistics [7].
in network traffic. These traffics can sometimes contain a large
number and variety of attacks. This disadvantageous situation
can be turned into an advantageous situation by analyzing the IV. IMPLEMENTATION OF THE APPLICATION
network traffic data sets and performing a learning. The
In this study, network data sets formed by Provision 737E
realized learning can be used in applications and devices with
model security cameras, one of the N-BaIoT data sets, were
an algorithm. Machine Learning takes place in 4 steps.
used [5]. Since the data sets were also used in previous studies
Network data is collected first. Collected data cannot be used
and were pre-processed, in this study, no pretreatment was
directly in machine learning algorithms. Network traffic
applied to the data set except for feature size reduction. This
packets are made available to learning after necessary analysis,
data set was obtained by simulating normal and abnormal
transformation and simplification processes. Machine
network traffic behaviors in the laboratory environment [8].
Learning model is determined for the adapted data. The
The data set consists of malicious network traffic and attack
Machine Learning model can be chosen from either supervised
traffic by IoT devices.
learning models or unsupervised learning models. Training is
carried out with the determined model. Training is tested in the
next stage. Data used in education or appropriate new data can
be used in the test process. In the last stage, the success of the
learning is evaluated. Evaluation of success is made according
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.
With the normal network traffic data generated by the classification process was successfully performed with this
Provision 737E security camera, network traffic data (in packet method and the accuracy rate was calculated as 99.95%. ROC
type such as udp, tcp) during the DDOS attack was obtained is a probability curve used in machine learning to evaluate
and supervised and unsupervised learning was performed performance after learning occurs. The area under the ROC
through the Weka program. Due to its Weka feature, it can curve is called AUC. Area Under the Curve (AUC) in machine
preprocess the data. In addition, Weka offers different learning provides information about the performance of the
algorithm options ready to use under these two learning models. developed model. The size of the field is directly proportional
In addition to these, Weka is a program that has options such as to the success of learning [10]. According to the classification
feature extraction and sorting by feature weight. The size of the process types of data sets (Benign, Bashlite, Mirai), ROC
data is 876 MB and consists of 828260 lines. The data set graphics are as in Figure 1, Figure 2, Figure 3, respectively.
consisting of different types of network traffic packets (tcp,
udp) has been reduced to a single file using the Python
programming language. Combined data is saved in csv file type
so that it can be processed in Weka program. The size of the
data set examined in this study includes 115 features. Using all
of these 115 features causes both software and hardware
problems [8]. This situation causes the data set to not be
evaluated properly with the algorithms in Weka. The feature
reduction process was carried out by using the One -R Attribute
Evaluator feature in Weka and sorting was made according to
the weight of the features. 10 features were extracted according
to their importance and studies were made with these features.
Related features are given in Table I. These features are MI (L1,
L3, L5, L0.1, L0.01) and H (L1, L3, L5, L0.1, L0.01).
Figure 1. Benign ROC chart
TABLE I. FEATURES USED IN THE STUDY
Number 10 Features Selected from 115 Features Figure 1 shows the ROC traffic of normal network
1 MI_dir_L0.1_Mean traffic. The size of the area under the curve shows the
2 H_L0.1_Mean success of the classification.
3 MI_dir_L0.01_mean
4 MI_dir_L1_mean
5 H_L1_Mean
6 H_L0.01_Mean
7 H_L5_Mean
8 MI_dir_L3_Mean
9 H_L3_Mean
10 MI_dir_L5_Mean
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.
are quite high, in this case it shows that the learning was
successful. Similarly, in this study, the accuracy percentages are
99.95% in supervised learning; it was found to provide an
accuracy of 76.73% in unsupervised learning. As in this study,
feature reduction was performed in only one of the reviewed
studies. The reason why decision tree algorithm is used is that
it gives higher accuracy rate when compared with the other
learning supervised models. Besides, why EM algorithm is used
is that there are no other studies (at least to our knowledge) used
EM algorithm directly for NBaIoT data set.
When the studies in the literature are examined, in addition
to the data set examined in this study, attack traffics obtained
both in the laboratory environment and in the real environment
are also examined in the data sets. Most of the studies reviewed
Figure 3. Mirai ROC chart in this part used the N-BaIoT data set, which was also used in
this study. In some of the studies only supervised learning
models were used, whilst in some of the studies both supervised
and unsupervised models were used similar to this study. The
Weka program, which includes ready-made tools, was used in
the studies based on the researcher’s preference. In terms of
capacity maintenance, Weka can give proper results with
limited size data sets. As in some of the studies, feature size
reduction was also performed. In all of the studies, the success
percentage was specified. Accuracy, F1 score, precision, recall
values were counted as success criteria. In this study, the
decision tree model was used. In the literature, another study
was also conducted using the same data set as thi model, but the
accuracy percentage result was found to be less than this study.
The N-BaIoT data set was examined for the first time with EM
algorithm.
In [11], the hierarchical clustering, X-Means clustering,
and rule-based classification were used for achieving fast and
accurate recognition of bootnet attacks. While X-Means
Figure 4. Benign, Bashlite, Mirai data scatter plot algorithm led to the highest cohesion inside the clusters and the
maximum distance between clusters by choosing optimal K,
When Figure 4 is examined, it is seen that the traffic is each cluster with the similar flow is placed in a bot cluster, a
gathered in three groups. It is seen that some colors are mixed semi-bot cluster or a normal cluster with the help of rule-based
with each other in the graphic. This shows the deviations, that classification. Through network traffic flow analysis with the
is, incorrect clustering resulting from learning, even if a little. help of proposed method, sets of botnets have been evaluated
and the results indicated that more than 95% accuracy in
detection. In another s study [12], a framework especially for
TABLE II. DISTRIBUTION OF DATA SET IoT devices identification and malicious traffic detection were
Benign Bashlite Mirai
proposed. The framework extracts feature per network flow to
identify the source, the type of the generated traffic, and to
61928 219 7 detect network attacks by pushing the intelligence to the
139 329948 9 network edge. Consequently, various machine learning
algorithms were compared with random forest, which gives the
6 23 435981
best results: up to 94.5% accuracy for deviceǦtype
identification, up to 93.5% accuracy for trafficǦtype
In addition, the numerical distribution of the packages classification, and up to 97% accuracy for abnormal traffic
according to the groups is shown in Table II after learning. detection. [13] used three sets of experiments with the purpose
of to reveal the effectiveness of classification methods applied
V. COMPARISON RESULTS to the problem of network-based botnet detection, and to offer
Ten studies found as a result of the literature review were opportunities for improvement based on careful selection of a
examined and the success rates of these studies were compared. small subset of attributes. While the first set of experiments
Looking at Table 3, it is striking that the accuracy percentages demonstrated very high accuracy in classifying network
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.
TABLE III. LITERATURE STUDIES
The Hierarchica l
Virut, Agobot,
Clustering, Xmeans
[11] Unsupervised Accuracy Rbot, Zeus, And NA %95
Clustering, And Rule-
Njrat 2013
Based Classification.
Cross Validat
ion
[5] Supervised Decision Tree Classification N-BaIoT %99
10
.87
Fold
RandomForest,
Supervised/Unsup %96
[17] k-NN, Gaussian F1 Score Experimental setup NA
ervised
Naive Bayes
Recall, %99
Ensamble-KNN, Precision, %99
[18] Supervised N-BaIoT NA %99
Accuracy, F1-
DT, MLP %49
Score
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.
activity into the 11 classes as well as significant redundancy in comparative analysis among meta-learning approaches and
the 155 attributes; the second set of experiments focused on individual classifiers to classify network traffic. Investigating
identifying the attributes which are the most beneficial for and evaluating a range of meta-learning techniques like Voting,
network-based botnet detection and on quantifying this benefit. Stacking, Bagging and Boosting, the study proposes a new
In contrast to the first set of experiments, the third sets of experimental analysis of different meta-learning techniques and
experiments chose attributes in order of merit scores. The compare them with their own base classifiers when used
results demonstrate the benefit of this method by exhibiting individually. Then, regarding the emerging popularity of
high accuracy with 20 to 30 attributes. Consequently, in a Neutral Networks, the study analyzed this scenario using the
deployment using fewer attributes is likely to ease the load on Multi-layer Perceptron classifier. Data provided by the UCI
the monitoring infrastructure suggesting that such trimming Machine Learning Repository were used in the experiments. As
need not incur a penalty in accuracy. The focus of [14] is that a result, the best performance was obtained by an ensemble
feature selection procedure can reduce the required number of technique (Bagging), which obtained accuracy of 99.972% and
features in an unsupervised learning model providing anomaly- false positive rate of 0.00018%.
based detection function in IoT networks. Reduced feature
As can be seen in Table III, some of the studies were
(reduction from 115 to 10) set helps less consumption of
conducted with supervision and some without supervision. In
computational resources and more interpretable results. To
addition, cross-validation was used in different folds in the
contribute to the development of the timely DDoS traffic
studies. Some of the studies used the data set used in this study.
detection generated in the environments like smart home
Percentage of accuracy is shown as the evaluation matrix for all
environment, [15] seeks to establish the diversity of traffic
studies.
generated by IoT devices in such environments with respect to
the traffic generated through human type communication. The VI. CONCLUSION
results of the study are expected to represent base for the future
development of new models aimed at detecting this specific In this study, machine learning has been carried out by using
DDoS traffic type. In [5], the researchers propose and the network traffic data of Provision 737E security cameras,
empirically evaluate a novel network-based anomaly detection one of the N-BaIoT data sets. The aim of the study is to
method which extracts behavior snapshots of the network and distinguish between normal traffic and attack traffic in a
network with high accuracy through machine learning. Training
uses deep autoencoders to detect anomalous network traffic
was carried out with the machine learning methods applied in
emanating from compromised IoT devices. To evaluate their
the study, and then, by using this training, the machine was
method, they let their nine IoT devices get defected with Mirai
expected to distinguish between normal traffic and attack
and BASHLITE. The results revealed that their method is
traffic. The result of the work carried out was evaluated
capable of detecting the attacks accurately and instantly. Last
according to the accuracy percentage as in other machine
but not the least, [16] proposed another model to identify IoT
learning. Since smooth and accurate feature extraction is
botnet attacks from compromised IoT devices by exploiting the
important in machine learning, the dimension with 115 features
efficiency of a recent swarm intelligence algorithm called Grey
was reduced to 10 dimensions in this study. Feature extraction
Wolf Optimization algorithm (GWO) and to optimize the
application of Weka program was used while performing
hyperparameters of the OCSVM and as well as finding the
feature reduction. In this way, computational complexity has
features that best describe the IoT botnet problem. With the aim
been reduced. In the next step, the data set was trained with two
of showing the efficiency of the method, its performance is
different learning methods. Training was carried out using the
evaluated using typical anomaly detection evaluation measures
Decision Tree (J48) algorithm in the supervised learning model.
over a new version of a real benchmark dataset. The results
During the training, 10fold option was preferred in the cross-
showed that the method proposed in [16] performed better that
validation section. The accuracy percentage in supervised
all other algorithms regarding true positive rate, false positive
learning was calculated as 99.95%. EM algorithm is used in
rate, and G-mean for all IoT device types. Furthermore, the
unsupervised learning method. The accuracy rate was
method was capable of having the lowest detection time and
calculated as 76.73%. Working with two different learning
reducing the number of selected features. In [17], the network
models has been successfully completed.
traffic classification was carried out using the data of the Mirai
botnet attack in 2016 with the technique developed in the form REFERENCES
of EDIMA. The classification was done with the machine [1] Ş. Kılınç, “2020 Yılında İnternete Bağlı 24 Milyar Cihaz Olacak!",
learning models Naive Base KNN and Random Forest Learning Webtekno,2020.[Online].Available:https://www.webtekno.com/2020-
models. Information about the results are presented in Table3. yilinda-internete-bagli-24-milyar-cihaz-kullanimda-olacakh30300. html.
It was carried out in an experimental environment. With the aim [Accessed: 09- Nov- 2020].
of having the correct classification of normal and harmful [2] C. Kolias, G. Kambourakis, A. Stavrou and J. Voas, "DDoS in the IoT:
POST and get traffic packets, three different machine learning Mirai and Other Botnets," Comp, vol. 50, no. 7, pp. 80-84, 2017, doi:
models were used. At the end, it was seen that the highest score 10.1109/MC.2017.201.
was obtained with 96% with the KNN algorithm. With the aim [3] J. Leonard, S. Xu and R. Sandhu, "A Framework for Understanding
of highlighting limitations and particularities of individual Botnets," 2009 International Conference on Availability, Reliability and
algorithms for network traffic classification, [20] presents a Security, Fukuoka, 2009, pp. 917-922, doi: 10.1109/ARES.2009.65.
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.
[4] S. Dua, X. Du, Data Mining and Machine Learning in Cybersecurity
Book,” New York: CRC Press, 2011.
[5] Y. Meidan et al., "N-BaIoT—Network-Based Detection of IoT Botnet
Attacks Using Deep Autoencoders," IEEE Perva Comp, vol. 17, no. 3,
pp. 12-22, Jul.-Sep. 2018, doi: 10.1109/MPRV.2018.03367731.
[6] S. Patil and U. Kulkarni, "Accuracy Prediction for Distributed Decision
Tree using Machine Learning approach”, 3rd International Conference on
Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019,
pp. 1365-1371, doi: 10.1109/ICOEI.2019.8862580.
[7] B. Karaçalı, "Improved quasi-supervised learning by expectation-
maximization," 2013 21st Signal Processing and Communications
Applications Conference (SIU), Haspolat, 2013, pp. 1-4, doi:
10.1109/SIU.2013.6531366.
[8] H. Bahşi, S. Nõmm and F. B. La Torre, "Dimensionality Reduction for
Machine Learning Based IoT Botnet Detection," 15th International
Conference on Control, Automation, Robotics and Vision (ICARCV),
Singapore, 2018, pp. 1857-1862, doi: 10.1109/ICARCV.2018.8581205.
[9] M.B. Durna, “Cross validation nedir? Nasıl çalışır?”, 2020. [Online].
Available: https://medium.com/bilişim-hareketi/cross-validation-nedir-
nasıl-çalışır-4ec4736e5142. [Accessed: 09- Nov- 2020].
[10] D. K. Mclish, “Analysing a Portion of the ROC Curve”, Soc of Medical
Decision Making, vol 9, pp. 190-195,1989.
[11] P. Amini, R. Azmi and M.A. Araghizadeh, “Analysis of network traffic
flows for centralized botnet detection” Jour of Telecomm., Elec. And
Elec. Engineering, vol 11, pp. 7-19, 2019.
[12] O. Salman, I. Elhajj, A. Chehab and A. Kayssi, "A machine learning
based framework for IoT device identification and abnormal traffic
detection", Trans on Emerging Telecomm. Techn, 2019. Available:
10.1002/ett.3743.
[13] S. S. Chawathe, "Monitoring IoT Networks for Botnet Activity," IEEE
17th International Symposium on Network Computing and Applications
(NCA), Cambridge, MA, pp. 1-8, 2018. doi:
10.1109/NCA.2018.8548330
[14] S. Nõmm and H. Bahşi, "Unsupervised Anomaly Based Botnet Detection
in IoT Networks," 17th IEEE International Conference on Machine
Learning and Applications (ICMLA), Orlando, FL, 2018, pp. 1048-1053.
[15] I. Cvitic, D. Prekovic, M. Perisa and M. Botica, “Smart home IoT traffic
characteristics as a basis for DDoS traffic detection”, 3rd EAI
International Conference on Management of Manufacturing Systems,
Dubrovnik November 06-08, 2018.
[16] A.Al Shorman, H.Faris, and I. Aljarah, “Unsupervised intelligent system
based on one class support vector machine and Grey Wolf optimization
for IoT botnet detection”, J Ambient Intell Human Comput, 11, pp.
2809– 2825, 2020. Available at: https://doi.org/10.1007/s12652-019-
01387-y.
[17] A. Kumar, T. J. Lim, 2019, "EDIMA: Early Detection of IoTMalware
Network Activity Using Machine Learning Techniques",IEEE 5th World
Forum on Internet of Things.
[18] I.P.Possebon et all, 2019 "Improved Network Traffic Classification
Using Ensemble, IEEE Symposium on Computers and Communications
(ISCC).
Authorized licensed use limited to: b-on: Instituto Politecnico de Beja. Downloaded on November 19,2022 at 23:28:38 UTC from IEEE Xplore. Restrictions apply.