Main
Main
Abstract—Critical Infrastructures (CIs) use Supervisory Con- the Internet. True isolation, however, is difficult in a real-
trol And Data Acquisition (SCADA) systems for remote control world environment. First, true isolation may lead to outdated
and monitoring. For a long time, operator of CIs applied the software [1], [2]. Without connectivity to the Internet, the soft-
air gap principle, a security strategy that physically isolates
the control network from other communication channels. True ware cannot easily receive security updates from the vendor.
isolation, however, is difficult nowadays due to the massive spread Second, true isolation is hard to implement since CI is often
of connectivity: using open protocols and more connectivity opens geographically distributed. To avoid the high costs of laying
new network attacks against CIs. To cope with this dilemma, direct fiber cable to substations, CI operators make use of
sophisticated security measures are needed to address malicious radio, Asymmetric Digital Subscriber Line (ADSL), General
intrusions, which are steadily increasing in number and variety.
Traditional Intrusion Detection Systems (IDSs) cannot detect Packet Radio Service (GPRS), or leased lines. Moreover,
attacks that are not already present in their databases. In malware like Stuxnet [3] or Flame [4] has shown us that even a
this paper, we assess Machine Learning (ML) for intrusion USB flash drive can provide connectivity to the outside world.
detection in SCADA systems using a real data set collected Besides the air gap principle, SCADA systems have made use
from a gas pipeline system and provided by the Mississippi of proprietary software, hardware, and communication proto-
State University (MSU). The contribution of this paper is two-
fold: 1) The evaluation of four techniques for missing data cols which have provided a false sense of security through
estimation and two techniques for data normalization, 2) The obscurity [1].
performances of Support Vector Machine (SVM), Random Forest Nowadays, the use of standardized communications pro-
(RF), Bidirectional Long Short Term Memory (BLSTM) are tocols has enabled the integration of SCADA systems with
assessed in terms of accuracy, precision, recall and F1 score the Internet and corporate networks. Given this new context,
for intrusion detection. Two cases are differentiated: binary
and categorical classifications. Our experiments reveal that RF SCADA systems are prone to numerous threats due to their
and BLSTM detect intrusions effectively, with an F1 score of large deployment areas, distributed operating mode and grow-
respectively > 99% and > 96%. ing interconnectivity [5]. Indeed, the widespread use of the
TCP/IP stack has led to the its adoption in SCADA systems.
I. I NTRODUCTION AND P ROBLEM S TATEMENT Modicom Communication Bus (Modbus) TCP, Distributed
Supervisory Control And Data Acquisition (SCADA) sys- Network Protocol (DNP3) [6], and IEC 60870-5-104 are the
tems are commonly used by Critical Infrastructures (CIs) or main communication protocols used. These protocols were de-
industries which are vital to citizens’ daily lives and countries’ signed over twenty years ago and are known to be highly vul-
economies. It includes oil pipelines, water treatment, and nerable to simple network attacks [7]–[10]. Mirian et al. [11],
chemical manufacturing plants to name but a few. Typically, using Internet-wide scanners such as ZMap [12], identified
SCADA systems consist of (1) field instrument devices for 60,000 vulnerable SCADA devices connected to the Internet.
sensing conditions of the CI (power level, pressure, through- Clearly, these protocols stacks are subject to increasing risks.
put, etc.); (2) operating equipment such as valves, pumps, This can also be seen in the cyberattacks against the Ukrainian
etc. controlled by actuators; (3) field local processors such as power grid in 2015, were 225,000 Ukrainian people were
Programmable Logic Controllers (PLCs) and Remote Terminal without electricity. These attack were the first that resulted
Units (RTUs) that communicate with field instrument devices in a power outage [13].
and operating equipment; and finally (4) the Human Machine Our contributions: In this paper, we focus on assessing
Interface (HMI) that acts as a central controller and monitoring the performances of Machine Learning (ML) techniques such
host. To operate properly in a synchronized manner, these as Support Vector Machine (SVM), Random Forest (RF),
different components must communicate. While short-range and Bidirectional Long Short Term Memory (BLSTM) in
communications are used to establish links between local pro- detecting intrusion in SCADA systems. Section II lays out
cessors, instrument devices and operating equipments, long- the foundation of the SCADA architecture and the ML al-
range communications are used to connect PLCs and RTUs gorithms used. We analyze SCADA protocols from monthly
with the HMI or the Master Terminal Unit (MTU). Internet-wide scans and see an increasing number of SCADA
Historically, SCADA systems implemented a security prin- services reachable and attackable over the Internet. Section III
ciple known as air gap, a strategy that physically isolates describes the data set and the experimental setup in detail.
the control network from the rest of the network, including In Section IV, we analyze four missing data strategies and
1
two data normalization techniques, characterizing the perfor- lines. All information converges to the HMI or SCADA master,
mances of the ML algorithms in terms of accuracy, precision, which is monitored and controlled by an employee.
recall and F1 score for binary and categorical classification.
We describe related works in Section V and compare them
with our approach. Finally, we conclude in Section VI and
Attacker A
1: Intercept Human Machine Interface
give directions for future research. 2: Interrupt
II. T ECHNICAL BACKGROUND 3: Modify a
4: Fabricate
In this section, we provide a brief overview of the SCADA
architecture, its network protocols, and the ML algorithms that Communication
we have used in this work. While discussing the technical b
Network
background, we also highlight the vulnerabilities that exist in
SCADA protocols. Substation
c
A. Attack Vectors on SCADA
As described in Section I, adversaries often can reach the RTU/PLC RTU/PLC RTU/PLC
control system from the Internet, because the air gap principle
is no longer not applicable in modern SCADA networks [1].
Most of these networks are geographically distributed. Hence,
they need to be connected to the HMI, either via ADSL, Sensors Sensors Sensors
GPRS, or leased lines. All of these connections can be used Actuators Actuators Actuators
to gain access to the control system.
After an attacker has gained access to the network, there Fig. 1. Attack model demonstrating four network attacks, denoted as (1–4),
are three attack vectors against a SCADA protocol: First, by against a simplified SCADA architecture with three attack targets (a–c) based
on [7].
exploiting vendor-specific implementation faults like memory-
corruption bugs; second, by exploiting weaknesses in the in-
frastructure like missing or inadequate firewall rules; and third, III. A NOMALY D ETECTION IN SCADA S YSTEMS : DATA
by exploiting protocol-specific weaknesses in the specification. S ET AND M ETHODOLOGY
In this paper, we focus on the third attack vector. An attacker To investigate the merits of the ML-based techniques for
wanting to exploit SCADA protocol weaknesses, has four anomaly detection in SCADA systems, a real-world gas
general attacks to choose from [7], as shown in Figure 1: pipeline data set is used for anomaly detection in our experi-
1) Interception: An attacker is able to analyse the network ments. We now describe the data set in detail, as well as the
traffic and gather information about the network infras- different steps of our methodology for anomaly detection.
tructure;
2) Interruption: An attacker intercepts packets and does not A. The Gas Pipeline Data Set
forward them to the next node; The SCADA data set used in this work is hosted on the
3) Modification: The attacker is a man-in-the-middle Industrial Control System (ICS) Cyber Attack Data Sets [26]
(MitM) modifying packets in a network stream; website. The real-world raw data was generated using a gas
4) Fabrication: An attacker is able to inject packets into the pipeline system provided by the Mississippi State University
network. (MSU)’s in-house SCADA lab. It contains a total of 274,628
Figure 1 depicts a simplified SCADA architecture in which instances.
an attacker (red square) has gained access to the network. All The methodology for the data set collection is described in
four attacks can target the HMI (a), the network infrastructure the study carried out by Turnipseed [27]. The data set, present
itself (b), or the RTU/PLC (c). The field devices shown in the Attribute-Relation File Format (ARFF), is used to create
in Figure 1 are sensors and actuators. A sensor monitors ML models once it has been pre-processed. It contains 20
the environment, e.g. the pressure of a gas pipeline, and features from Modbus RTU packets, three different types of
sends the information to the next higher level; an actuator, labels and also pure raw data, which is provided to aid in
in contrast, receives commands to control the environment, the pre-processing stage. Table I lists the features and their
e.g. opening and closing a valve. The RTU or PLC controls corresponding types.
and monitors the field devices, building a substation. One The address feature is a unique eight-bit value used for
advantage of the SCADA architecture is that substations can device identification. It is assigned to each master and slave
be geographically distributed; this is often a necessity for a CI. device allowing them to recognize each other while estab-
The control centre is located in a different physical location lishing a communication. This feature is used to overcome
and contains the HMI which monitors and controls the RTUs scan attacks which broadcast commands to all possible station
and PLCs. The RTUs/PLCs are connected to the HMI via addresses to determine which addresses are in use. The second
communication links such as radio, fibre-optics, or dial-up feature is the function code. Some function codes can be used
2
TABLE I on the attacks that SCADA systems may suffer.
L IST OF FEATURES FROM THE GAS PIPELINE DATA SET.
3
TABLE III
E XAMPLES OF THE MISSING VALUES IN THE GAS PIPELINE DATA SET.
payload then the value was kept and the indicator set to
Data Set
0 [29].
Keep prior value, also known as forward-filling, deals with
the non-existent values by replacing them with the imme-
Training Set Training + Val Sets
diately preceding existing feature value. In the case where
forward-filling is not possible due to a lack of existing
prior feature values, backward-filling is conducted. The
Pre-Processing Pre-Processing
intuition behind this technique is that the missing values
Data Cleaning Data Cleaning
are not dues to data loss but simply cannot exist, since
Statistics Statistics the type of the packet does not support these features.
Therefore, they appear in the data as non-existent values
and they may be inferred from previously seen feature
Val Set Data Transf. Data Transf. Test Set values.
2) Data Transformation: This step was conducted by per-
forming, first, the mean-standard deviation and then min-max
Hyper SVM Model methods. The mean-standard deviation method consists of
Params RF Model
subtracting the calculated overall mean and dividing by the
Search BLSTM Model
calculated overall standard deviation for each of the values
within a certain feature. Thus,
Classification xi − µ
zi = , (2)
σ
Fig. 2. Flow chart diagram illustrating the steps of our work pipeline.
where x is a feature value, µ is the mean, and σ is the standard
deviation. Performing this pre-processing strategy ensures the
samples and a pre-assigned centroid point, assigning them minimization of the sample deviations from the mean. The
to a certain cluster and updating the centroids of the second method is min-max approach, which consists of finding
clusters until convergence on the best separation of the the minimum and maximum value from a given feature and
data. normalizing the feature values between 0 and 1. Hence,
In both GMM and K-means techniques, the first payload
xi − min(x)
type were considered as cluster k = 0, and the second and zi = , (3)
third payload types were be assigned to k number of clusters max(x) − min(x)
defined by the elbow method. This method determined the best where xi is a feature value, min(x) and max(x) are the
number of clusters based on the cost function or distortion: minimum and maximum values calculated from the overall
K X
feature values.
3) Hyperparameter Search: In a SVM, the hyperparameters
X
= ||xi − µk ||2 . (1)
k=1 i∈Ck
C and γ must be correctly set for each of the sixteen data
sets. Hence, we performed a random search to determine the
Lower values of determine a preferable number k of clusters best hyperparameters for our models. Although grid search
and thus, better data separation. With this strategy, payloads and manual search are the most widely used techniques
are classified into k clusters, which are represented in the for hyperparameter optimization, it has been empirically and
pre-processed data as a one-hot encoded notation. One hot theoretically demonstrated that randomly chosen tests are more
encoding is a process of converting categorical variables into efficient [30].
form more suitable for ML algorithms. For each of the sixteen pre-processed data sets, we ran thirty
Zero imputation & indicators is a technique in which we different prediction trials over the corresponding validation
substituted missing values with 0 and indicated their po- set, during the hyperparameter search. The seven most notable
sitions by adding corresponding indicators with 1 values results are analyzed to investigate how the algorithms converge
to the payload feature. If the feature value existed in the to a good result after the best hyperparameters are found. Due
4
to the long training time of SVMs, we used only 25% from both cases of data normalization (MEAN and MIN-MAX)
the entire data set. using the Keep prior value. Indeed, the lowest F1 score for
In RF, the hyperparameters number of estimators and max- binary classification is 92.04% (see Figure 5g) while for CAT
imum depth of the trees must be correctly set for each of classification this value drops to 88.45 % (see Figure 5(j)).
the sixteen data sets. Once again, we performed a random The worst performance in terms of F1 score for both classifiers
search, through thirty different prediction trials, to define the was obtained by GMM and K-means algorithms. The Zeros &
best hyperparameters for these models. Indicators method performs better that GMM but worse than
In BLSTM, the hyperparameters learning rate, batch size, Keep prior value. For both binary and categorical classifiers,
sequence length, dropout and hidden layer size must be the MEAN normalization strategy outperforms MIN-MAX
correctly set for each of the sixteen data sets. Again, we normalization. Table IV summarizes the results, highlighting
conducted a random search, by running through fifty epochs, the best for BIN and CAT SVM classifiers employing the
a parameter for BLSTM, to define the best hyperparameters split criterion of 80% for the training set and 20% for the
for these models. For each data set, we ran thirty different test set, and using the hyperparameters that gave us the best
predictions over the corresponding validation set during the performance. We obtained a F1 score of 94.34% for BIN
hyperparameter search. The seven most significant results are and a F1 score of 92.50% for the CAT classifier. These
used in this study to show how the algorithm converges once were achieved using MEAN normalization and keep the prior
the best hyperparameters are found. existing value strategy respectively to deal with missing values.
4) Classification: In this step, models are created with the
aim of classifying novel observations on a set of predefined TABLE IV
classes. If only two possible classes exist, then it is called B EST BINARY AND CATEGORICAL CLASSIFIERS MODELED WITH SVM.
binary classification. In contrast, if more than two classes SVM Hyper-parameters Measurements
Test sets C gamma Acc Prec Recall F1-score
are differentiated, it is called multi-class classification. In the binary-mean-keep 346.219 0.3975 94.36 % 94.33 % 94.36 % 94.34 %
context of this work, a classification task is performed to binary-minmax-keep 579.161 0.6270 92.78 % 92.91 % 92.78 % 92.83 %
categorical-mean-keep 107.411 0.2689 92.56 % 92.47 % 92.56 % 92.50 %
correctly classify benign and malicious packets. The trained categorical-minmax-keep 536.672 0.7150 89.70 % 90.50 % 89.70 % 89.97 %
5
SVM BIN-MEAN (a) RF BIN-MEAN (b) BLSTM BIN-MEAN (c)
92.84 93.26 93.54 100 98.64 98.93
99.3 99.34 99.36 97.52 97.58 97.86
90.64 91.07 96.8 96.98 97.24
96.21
89.03 96.45
90
85.27 95
80 90 90
F1
85.26
70 85
80
60 80
cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7
88.55
89.95 90.75 91.07 91.14 100 97.65
98.41 98.78 98.84
99.14 99.16 99.17
95.99
96.79 96.93 97.25 97.28
97.35 97.39
84.79
80 77.28
90 90
F1
60
80 80
40
cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7
90
70 85
85
60 80
80
cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7
86.55 87.03
88.45 100 98.96 99.12 99.13 99.15 99.17 99.17 94.9 95.24
95.5
85.22 85.96 95 93.97
83.99 93.24
96.72 91.63
80 76.81 90.97
90
95
F1
85
60
90 80
cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7 cfg1 cfg2 cfg3 cfg4 cfg5 cfg6 cfg7
Fig. 3. Results for the hyperparameter search for SVM, RF, and BLSTM. The first row shows the results for the binary classification (BIN) using the
mean-standard deviation normalization strategy (MEAN), the second row for the categorical classification (CAT) using MEAN. The third row shows the
results for BIN using the min-max normalization strategy (MIN-MAX) and finally the fourth row for CAT using MIX-MAX. On the x axis are the different
configurations for the hyperparameter depicted and on the y axis is the F1 score depicted.
6
[29]. The Table VI summarizes the results for BIN and CAT TABLE VII
BLSTM classifiers, running three hundred epochs with the C LASSIFICATION REPORT OF THE RF ALGORITHM .
best hyperparameters and using the 80%–20% split criterion. Random Forest Accuracy test data = 99.41 %
Bidirectional Long Short Term Memory outperforms SVM. Type of Data precision recall f1-score support
We obtained a F1 score of 98.39% for BIN and a F1 score of Normal 99.48 % 99.90 % 99.69 % 42953
97.68% for CAT. As shown in Table VI, for both binary and NMRI 98.14 % 96.99 % 97.56 % 1526
CMRI 98.84 % 96.40 % 97.60 % 2641
categorical classifiers, the MEAN is better than MIN-MAX. MSCI 99.28 % 98.63 % 98.96 % 1538
The difference between these two normalization strategies is MPCI 99.90 % 98.00 % 98.94 % 4101
0.77% for BIN and 1.2% for CAT classification. MFCI 98.77 % 100 % 99.38 % 967
DoS 97.54 % 95.42 % 96.47 % 415
Recon 99.61 % 97.96 % 98.78 % 786
TABLE VI avg / total 99.41 % 99.41 % 99.41 % 54927
B EST BINARY AND CATEGORICAL CLASSIFIERS MODELLED WITH BLSTM
A LGORITHM ; LR , BATCH , SEQ , DROP AND H LAYER CORRESPOND TO
LEARNING RATE , BATCH SIZE , SEQUENCE LENGTH , DROPOUT AND
TABLE VIII
HIDDEN LAYER SIZE .
C ONFUSION MATRIX OF THE RF ALGORITHM .
BLSTM Hyper-parameters Measurements
Test sets lr batch seq drop h layer Acc Prec Recall F1-score Normal NMRI CMRI MSCI MPCI MFCI DoS Recon
binary-mean-indi 0.008308 67 4 0.019025 110 98.40 % 98.40 % 98.40 % 98.39 % 42908 12 9 9 4 0 8 3 Normal
binary-minmax-indi 0.011490 121 4 0.027915 218 97.64 % 97.64 % 97.65 % 97.62 %
categorical-mean-indi 0.009908 138 4 0.032404 136 97.71 % 97.69 % 97.71 % 97.68 % 25 1480 21 0 0 0 0 0 NMRI
categorical-minmax-indi 0.013236 138 4 0.039841 254 96.57 % 96.53 % 96.57 % 96.48 % 79 16 2546 0 0 0 0 0 CMRI
20 0 0 1517 0 0 1 0 MSCI
79 0 0 2 4019 0 1 0 MPCI
0 0 0 0 0 967 0 0 MFCI
4) Results Analysis: Although BLSTM models are widely 19 0 0 0 0 0 396 0 DoS
used for time-dependent problems given their capabilities of 4 0 0 0 0 12 0 770 Recon
7
variables of the control devices. Two different approaches similar messages to monitor (read) and control (write) sensors
of one-class classification, the Support Vector Data Descrip- and actuators. In addition, these protocols can be the victim
tion (SVDD) and the Kernel Principle Component Analysis of attacks that we have highlighted in Figure 1.
(KPCA), were proposed as well in [44]. Lp -norms are studied An interesting future investigation would be the extraction
in Radial Basis Function (RBF) kernels for intrusion detection. of rules from RF algorithms to integrate them with signature-
An IDS that detects SCADA attacks based on the network based NIDSs such as Snort.
traffic behaviour was proposed in [45]. The IDS extracts the
time correlation between different network packets and then ACKNOWLEDGMENT
monitors the system to determine if it is behaving normally or This work was partially funded by ATENA H2020 EU
not. An alarm is raised when anomalies are detected. Project (H2020-DS-2015-1 Project 700581). We thank Do-
Authors of [46] presented an IDS using Neural Network minic Dunlop for his review and comments that greatly
based Modelling (IDS-NNM) algorithm following the super- improved the manuscript.
vised learning approach. They adopted a specific window
based attribute extraction approach to capture the time series R EFERENCES
nature of the network packet stream. More recently, a Recur- [1] E. Byres, “The Air Gap: SCADA’s Enduring Security Myth,”
rent Neural Network (RNN) with unidirectional Long Short Communications of the ACM, vol. 56, no. 8, pp. 29–31, Aug. 2013.
Term Memory (LSTM) architecture was proposed in [47] to [Online]. Available: http://doi.acm.org/10.1145/2492007.2492018
[2] C. S. Wright. (2011, September) SCADA: Air Gaps Do Not Exist.
detect industrial control system anomalies. Accessed: 2017-12-04. [Online]. Available: http://infosecisland.com/
blogview/16770-SCADA-Air-Gaps-Do-Not-Exist.html
VI. C ONCLUSION AND F UTURE W ORK [3] R. Langner, “Stuxnet: Dissecting a Cyberwarfare Weapon,” IEEE Secu-
rity Privacy, vol. 9, no. 3, pp. 49–51, May 2011.
Until not too long ago, the most common security strategy [4] K. Zetter. (2012, May) Meet ’Flame,’ The Massive Spy Malware
for SCADA systems was the air gap principle: an operator of Infiltrating Iranian Computers. Accessed: 2017-12-04. [Online].
SCADA networks segregated the control network from other Available: https://www.wired.com/2012/05/flame/
[5] V. M. Igure, S. A. Laughter, and R. D. Williams, “Security Issues in
networks. Hence, attackers could not access them. The attacker SCADA Networks,” Elsevier Computers & Security, vol. 25, no. 7, pp.
had to be physically close to the SCADA system to access 498–506, 2006.
the communication channel, inject malicious data or even [6] “IEEE Standard for Electric Power Systems Communications-
Distributed Network Protocol (DNP3),” IEEE Std 1815-2012 (Revision
interfere with the protocol. Nowadays, with growing demands of IEEE Std 1815-2010), pp. 1–821, Oct 2012.
for connectivity between the SCADA control network and the [7] S. East, J. Butts, M. Papa, and S. Shenoi, “A Taxonomy of Attacks on the
corporate network, novel network attacks have appeared as DNP3 Protocol,” in International Conference on Critical Infrastructure
Protection. Springer Berlin Heidelberg, 2009, pp. 67–81.
PLCs or RTUs devices are managed over IP communication [8] N. R. Rodofile, K. Radke, and E. Foo, “Real-Time and Interactive
protocols. This increased interconnectivity results in the de- Attacks on DNP3 Critical Infrastructure Using Scapy,” in Proceedings
isolation of SCADA systems, making them more vulnerable. of the 13th Australasian Information Security Conference (AISC 2015),
2015, pp. 67–70.
Attackers no longer need to gain physical access to on-site
[9] P. Huitsing, R. Chandia, M. Papa, and S. Shenoi, “Attack Taxonomies for
circuits to perform a hostile action but instead, malicious the Modbus Protocols,” International Journal of Critical Infrastructure
network packets can reach the field devices from anywhere. Protection, vol. 1, pp. 37–44, 2008.
In this paper, we have shown that ML techniques can [10] P. Maynard, K. McLaughlin, and B. Haberler, “Towards Understanding
Man-In-The-Middle Attacks on IEC 60870-5-104 SCADA Networks,”
detect network attacks against SCADA systems. We used a in Proceedings of the 2nd International Symposium for ICS & SCADA
SCADA data set provided by the MSUs’s in-house SCADA Cyber Security Research 2014 (ICS-CSR 2014), Sep. 2014. [Online].
lab. It was generated using a gas pipeline SCADA system Available: http://ewic.bcs.org/content/ConWebDoc/53228
[11] A. Mirian, Z. Ma, D. Adrian, M. Tischer, T. Chuenchujit, T. Yardley,
hosted in their laboratory. We used SVM, RF, and BLSTM R. Berthier, J. Mason, Z. Durumeric, A. J. Halderman, and M. Bailey,
to implement diverse IDS classifiers. We provided a complete “An Internet-wide view of ICS devices,” in 14th IEEE Privacy, Security,
comparison between these algorithms along with the random and Trust Conference (PST’16), 2016.
[12] Z. Durumeric, E. Wustrow, and J. A. Halderman, “ZMap: Fast
hyper-parameter search results. We published our source code Internet-wide Scanning and Its Security Applications,” in Proceedings
on GitHub [31] to help other researchers to verify, compare, of the 22Nd USENIX Conference on Security. USENIX Association,
and/or extend their studies. In contrast to the state-of-the-art 2013, pp. 605–620. [Online]. Available: http://dl.acm.org/citation.cfm?
id=2534766.2534818
studies, the use of the test set accuracy, precision, recall and [13] R. M. Lee, M. J. Assante, and T. Conway, “TLP: White Analysis of the
F1 score allowed us to assess their performance correctly Cyber Attack on the Ukrainian Power Grid,” E-ISAC, Tech. Rep., Mar.
and comprehensively. The RF algorithm gives the best per- 2016.
[14] S. W. A.-H. Baddar, A. Merlo, and M. Migliardi, “Anomaly detection
formance by detecting 99.90% of benign data and 98.46% of in computer networks: A state-of-the-art review.” JoWUA, vol. 5, no. 4,
attacks, with an overall detection rate (recall) of 99.58%. pp. 29–64, 2014.
Our approach can be applied to different SCADA environ- [15] Z. Durumeric, D. Adrian, A. Mirian, M. Bailey, and J. A. Halderman, “A
Search Engine Backed by Internet-Wide Scanning,” in Proceedings of
ments, because SCADA is based on a well-defined architec- the 22nd ACM Conference on Computer and Communications Security,
ture (see Section II). The used data set was generated in a Oct. 2015.
real gas pipeline following a typical SCADA architecture. [16] “ModBus Application Protocol Specification V1.1b 3,” http://www.
modbus.org/docs/Modbus Application Protocol V1 1b3.pdf, 2012, ac-
Although, the data set contains only Modbus RTU traffic, cessed: 2017-11-24.
other SCADA protocols (e.g. DNP3 or IEC 60870-5-104) have [17] C. C. Aggarwal, Data Mining: The Textbook. Springer, 2015.
8
[18] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine [33] F. J. Valverde-Albacete and C. Peláez-Moreno, “100% Classification
Learning, vol. 20, no. 3, pp. 273–297, Sep. 1995. [Online]. Available: Accuracy Considered Harmful: The Normalized Information Transfer
https://doi.org/10.1023/A:1022627411411 Factor Explains the Accuracy Paradox,” PloS one, vol. 9, no. 1, 2014.
[19] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: [34] X. Zhu, Knowledge Discovery and Data Mining: Challenges and
From Theory to Algorithms. Cambridge University Press, 2014. Realities: Challenges and Realities. Igi Global, 2007.
[20] C. Olah. (2015, Aug) Understanding LSTM Networks. [35] B. Zhu and S. Sastry, “Scada-specific intrusion detection/prevention
Accessed: 2017-12-04. [Online]. Available: https://colah.github.io/ systems: a survey and taxonomy,” in Proceedings of the 1st Workshop
posts/2015-08-Understanding-LSTMs/ on Secure Control Systems (SCS), vol. 11, 2010.
[21] A. Karpathy. (2015, May) The Unreasonable Effectiveness of
[36] A. George, “Anomaly Detection Based on Machine Learning: Di-
Recurrent Neural Networks. Accessed: 2017-12-04. [Online]. Available:
mensionality Reduction using PCA and Classification using SVM,”
https://karpathy.github.io/2015/05/21/rnn-effectiveness/
International Journal of Computer Applications, vol. 47, no. 21, 2012.
[22] M. Schuster and K. K. Paliwal, “Bidirectional Recurrent Neural Net-
works,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. [37] G. Wang, J. Hao, J. Ma, and L. Huang, “A new Approach to Intrusion
2673–2681, 1997. Detection using Artificial Neural Networks and Fuzzy Clustering,” An
[23] A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional lstm net- International Journal of Expert Systems with Applications, vol. 37, no. 9,
works for improved phoneme classification and recognition,” in Artificial pp. 6225–6232, 2010.
Neural Networks: Formal Models and Their Applications – ICANN 2005, [38] J. Zhang and M. Zulkernine, “A Hybrid Network Intrusion Detection
W. Duch, J. Kacprzyk, E. Oja, and S. Zadrożny, Eds. Berlin, Heidelberg: Technique using Random Forests,” in The First International Conference
Springer Berlin Heidelberg, 2005, pp. 799–804. on Availability, Reliability and Security. IEEE, 2006, pp. 8–pp.
[24] S. Latif, M. Usman, and J. Q. R. Rana, “Abnormal heartbeat detection [39] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long Short Term
using recurrent neural networks,” arXiv preprint arXiv:1801.08322, Memory Recurrent Neural Network Classifier for Intrusion Detection,”
2018. in Proceedings of the International Conference on Platform Technology
[25] X. Zhang, W. Kou, E. I. Chang, H. Gao, Y. Fan, Y. Xu et al., “Sleep and Service (PlatCon). IEEE, 2016, pp. 1–5.
stage classification based on multi-level feature learning and recurrent [41] S. Cheung, B. Dutertre, M. Fong, U. Lindqvist, K. Skinner, and
neural networks via wearable device,” arXiv preprint arXiv:1711.00629, A. Valdes, “Using model-based intrusion detection for scada networks,”
2017. in Proceedings of the SCADA security scientific symposium, vol. 46.
[26] “Industrial Control System (ICS) Cyber Attack Datasets,” https://sites. Citeseer, 2007, pp. 1–12.
google.com/a/uah.edu/tommy-morris-uah/ics-data-sets, accessed: 2017- [42] L. A. Maglaras and J. Jiang, “Intrusion Detection In SCADA Systems
12-04. using Machine Learning Techniques,” in Science and Information Con-
[27] I. Turnipseed, “A New Scada Dataset For Intrusion Detection Research,” ference (SAI), 2014, 2014, pp. 626–631.
M. Sc., Mississippi State University, August 2015. [43] A. F. S. Prisco and M. J. F. Duitama, “Intrusion detection system for
[28] T. Morris and W. Gao, “Industrial Control System Traffic Data Sets for scada platforms through machine learning algorithms,” in Communica-
Intrusion Detection Research,” Advances in Information and Communi- tions and Computing (COLCOM), 2017 IEEE Colombian Conference
cation Technology Critical Infrastructure Protection VIII, pp. 65––78, on. IEEE, 2017, pp. 1–6.
2014.
[44] P. Nader, P. Honeine, and P. Beauseroy, “lp -norms in one-class classifi-
[29] Z. C. Lipton, D. C. Kale, and R. Wetzel, “Directly Modeling Missing
cation for intrusion detection in scada systems,” IEEE Transactions on
Data in Sequences with RNNs: Improved Classification of Clinical Time
Industrial Informatics, vol. 10, no. 4, pp. 2308–2317, 2014.
Series,” in Proceedings of Machine Learning for Healthcare 2016, 2016,
pp. 253–270. [45] N. Sayegh, I. H. Elhajj, A. Kayssi, and A. Chehab, “Scada intrusion
[30] J. Bergstra and Y. Bengio, “Random Search for Hyper-Parameter Opti- detection system based on temporal behavior of frequent patterns,” in
mization,” The Journal of Machine Learning Research, vol. 13, no. Feb, Electrotechnical Conference (MELECON), 2014 17th IEEE Mediter-
pp. 281–305, 2012. ranean. IEEE, 2014, pp. 432–438.
[31] “Machine learning techniques for Intrusion Detection in SCADA Sys- [46] O. Linda, T. Vollmer, and M. Manic, “Neural network based intrusion
tems,” https://github.com/Rocionightwater/ML-NIDS-for-SCADA.git. detection system for critical infrastructures,” in Neural Networks, 2009.
[32] L. Talavera, “Dynamic Feature Selection in Incremental Hierarchical IJCNN 2009. International Joint Conference on. IEEE, 2009, pp. 1827–
Clustering,” in Proceedings of the European Conference on Machine 1834.
Learning. Springer, 2000. [47] Feng, Cheng and Li, Tingting and Chana, Deeph, “Multi-level Anomaly
[40] Y. Yang, K. McLaughlin, T. Littler, S. Sezer, B. Pranggono, and Detection in Industrial Control Systems via Package Signatures and
H. Wang, “Intrusion detection system for iec 60870-5-104 based scada LSTM Networks,” in Proceedings of the 47th IEEE/IFIP International
networks,” in Power and Energy Society General Meeting (PES), 2013 Conference on Dependable Systems and Networks. IEEE, 2017, pp.
IEEE. IEEE, 2013, pp. 1–5. 261–272.