Data Set Paper
Data Set Paper
Editor-in-Chief
A. Joe Turner, Seneca, SC, USA
Editorial Board
Foundations of Computer Science
Jacques Sakarovitch, Télécom ParisTech, France
Software: Theory and Practice
Michael Goedicke, University of Duisburg-Essen, Germany
Education
Arthur Tatnall, Victoria University, Melbourne, Australia
Information Technology Applications
Erich J. Neuhold, University of Vienna, Austria
Communication Systems
Aiko Pras, University of Twente, Enschede, The Netherlands
System Modeling and Optimization
Fredi Tröltzsch, TU Berlin, Germany
Information Systems
Jan Pries-Heje, Roskilde University, Denmark
ICT and Society
Diane Whitehouse, The Castlegate Consultancy, Malton, UK
Computer Systems Technology
Ricardo Reis, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Security and Privacy Protection in Information Processing Systems
Yuko Murayama, Iwate Prefectural University, Japan
Artificial Intelligence
Tharam Dillon, Curtin University, Bentley, Australia
Human-Computer Interaction
Jan Gulliksen, KTH Royal Institute of Technology, Stockholm, Sweden
Entertainment Computing
Matthias Rauterberg, Eindhoven University of Technology, The Netherlands
IFIP – The International Federation for Information Processing
IFIP was founded in 1960 under the auspices of UNESCO, following the First
World Computer Congress held in Paris the previous year. An umbrella organi-
zation for societies working in information processing, IFIP’s aim is two-fold:
to support information processing within its member countries and to encourage
technology transfer to developing nations. As its mission statement clearly states,
The flagship event is the IFIP World Computer Congress, at which both invited
and contributed papers are presented. Contributed papers are rigorously refereed
and the rejection rate is high.
As with the Congress, participation in the open conferences is open to all and
papers may be invited or submitted. Again, submitted papers are stringently ref-
ereed.
The working conferences are structured differently. They are usually run by a
working group and attendance is small and by invitation only. Their purpose is
to create an atmosphere conducive to innovation and development. Refereeing is
also rigorous and papers are subjected to extensive group discussion.
Publications arising from IFIP events vary. The papers presented at the IFIP
World Computer Congress and at open conferences are published as conference
proceedings, while the results of the working conferences are often published as
collections of selected and edited papers.
Any national society whose primary activity is about information processing may
apply to become a full member of IFIP, although full membership is restricted to
one society per country. Full members are entitled to vote at the annual General
Assembly, National societies preferring a less committed involvement may apply
for associate or corresponding membership. Associate members enjoy the same
benefits as full members, but without voting rights. Corresponding members are
not represented in IFIP bodies. Affiliated membership is open to non-national
societies, and individual and honorary membership schemes are also offered.
Jonathan Butts Sujeet Shenoi (Eds.)
Critical
Infrastructure
Protection VIII
8th IFIP WG 11.10 International Conference, ICCIP 2014
Arlington, VA, USA, March 17-19, 2014
Revised Selected Papers
13
Volume Editors
Jonathan Butts
Air Force Institute of Technology
Wright-Patterson Air Force Base
Dayton, OH 45433-7765, USA
E-mail: [email protected]
Sujeet Shenoi
University of Tulsa
Tulsa, OK 74104-3189, USA
E-mail: [email protected]
Contributing Authors ix
Preface xvii
2
Detecting Malicious Software Execution in Programmable Logic 15
Controllers Using Power Fingerprinting
Carlos Aguayo Gonzalez and Alan Hinton
3
Timing of Cyber-Physical Attacks on Process Control Systems 29
Marina Krotofil, Alvaro Cardenas, and Kishore Angrishi
4
Recovery of Structural Controllability for Control Systems 47
Cristina Alcaraz and Stephen Wolthusen
5
Industrial Control System Traffic Data Sets for Intrusion Detection 65
Research
Thomas Morris and Wei Gao
6
An Industrial Control System Testbed Based on Emulation, 79
Physical Devices and Simulation
Haihui Gao, Yong Peng, Zhonghua Dai, Ting Wang,
Xuefeng Han, and Hanjing Li
vi CRITICAL INFRASTRUCTURE PROTECTION VIII
8
An Automated Dialog System for Conducting Security Interviews 111
for Access Control
Mohammad Ababneh, Malek Athamnah, Duminda Wijesekera and
Paulo Costa
9
A Survey of Critical Infrastructure Security 127
William Hurst, Madjid Merabti, and Paul Fergus
11
Reinforcement Learning Using Monte Carlo Policy Estimation for 155
Disaster Mitigation
Mohammed Talat Khouj, Sarbjit Sarkaria, Cesar Lopez,
and Jose Marti
12
Accuracy of Service Area Estimation Methods Used for Critical 173
Infrastructure Recovery
Okan Pala, David Wilson, Russell Bent, Steve Linger,
and James Arnold
15
Assessing Potential Casualties in Critical Events 231
Simona Cavallini, Fabio Bisogni, Marco Bardoscia,
and Roberto Bellotti
17
Asynchronous Binary Byzantine Consensus over Graphs with 263
Power-Law Degree Sequence
Goitom Weldehawaryat and Stephen Wolthusen
Contributing Authors
James Arnold received his M.S. degree in Geography from the University
of Utah, Salt Lake City, Utah. His research interests include spatial analysis,
geographic information systems and remote sensing.
Simona Cavallini is the Head of the Research and Innovation Area at the
FORMIT Foundation, Rome, Italy. Her research interests include critical in-
frastructure protection, interdependency analysis, economics of security and
macroeconomics modeling.
Madjid Merabti is the Director and Head of Research at the School of Com-
puting and Mathematical Sciences, Liverpool John Moores University, Liver-
pool, United Kingdom. His research interests include distributed multimedia
systems, computer networks, operating systems and computer security.
Okan Pala is a Ph.D. student in Software and Information Systems at the Uni-
versity of North Carolina at Charlotte, Charlotte, North Carolina. His research
interests include intelligent software systems, spatial decision support systems,
geographic information systems, critical infrastructure protection, computa-
tional geometry and accuracy assessment.
1. Introduction
On June 26, 1996, an oil pipeline operator in Fork Shoals, South Carolina
acted on erroneous data that conflicted with the true state of the pipeline
system [5]. To relieve pressure in the pipeline, the operator sent a remote
signal to start a pump. Although the operator’s console revealed that the
pump had started, it was a faulty indication and the pump had not been
activated. As the pressure readings continued to increase, the operator was
confused by the anomaly and took actions that exacerbated the problem. The
pipeline ultimately ruptured, spilling 957,600 gallons of oil into a nearby river
and surrounding areas, and causing more than 20 million dollars in damage.
∗ The rights of this work are transferred to the extent transferable according to title 17 U.S.C. 105.
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 3–13, 2014.
IFIP International Federation for Information Processing 2014 (outside the US)
4 CRITICAL INFRASTRUCTURE PROTECTION VIII
Industrial control systems monitor and control infrastructure assets that are
vital to society – the electric power grid, oil and gas pipelines, transportation
systems and water treatment and supply facilities. Attacks that impact the
operations of these critical assets can have devastating consequences. The com-
plexity and interconnectivity of control systems have introduced vulnerabilities
and attack surfaces that previously did not exist, resulting in a significant in-
crease in security incidents during the past few years [6, 7]. Indeed, researchers
have demonstrated that a number of critical infrastructure systems have been
exposed to malicious process manipulation [1, 8].
Industrial control devices inherently trust system inputs for proper opera-
tion [4]. Few, if any, advanced decision support systems are available to assist
operators in identifying anomalous data and determining the best course of
action in the presence of conflicting information about process systems. As a
result, accidental or malicious manipulations of system parameters can cascade
to produce incorrect functionality and possibly induce system failures.
The Byzantine Generals Problem (BGP) [2] is a classic problem in dis-
tributed computing that seeks to determine the appropriate course of action
when there is no consensus among the actors. Indeed, this problem is relevant
to industrial control systems where operators often have to make important
process control and management decisions in the presence of bad data. This
paper considers a formulation of the Byzantine Generals Problem in the context
of industrial control systems. The goal is to draw inferences from the physical
state of a system to help determine integrity compromises.
when there are only three generals, no solution exists in the presence of even a
single traitor. Lamport, et al. [2] proved that consensus can be reached when
there are at least 3m + 1 generals in the presence of at most m traitors. More
generally, with 3m + 1 total nodes, at most m nodes can suffer from Byzantine
faults. That is, for m = 1, only one of the four nodes can be malicious for the
solution to be valid.
Figure 1 shows a traditional Byzantine Generals Problem scenario where the
commander C1 sends a consistent value (message v) to three lieutenants, L1, L2
and L3, where L3 is a traitor. L2 receives conflicting data from the commander
and the other two lieutenants, C1, L1 and L3, and evaluates the values provided
by C1, L1 and L3. Specifically, L2 identifies the inconsistency using a majority
function that considers the three inputs (v, v, x). The inconsistent data source
is identified and, in this case, L3 is identified as the traitor based on the set
of three messages. Note that the majority function is the basis for conflict
resolution in the Byzantine Generals Problem [2].
ponent node. Note that only PLC devices are considered in this paper.
However, the approach is applicable to any type of industrial control de-
vice that reports status information to a central authority.
Sensor/Actuator Node: This corresponds to a physical layer node.
Multiple sensor (Si ) and actuator (Ai ) nodes are present to monitor the
process system and perform control actions on the process system, respec-
tively. In an industrial control environment, the physical layer provides
ground truth of the system state.
4. Algorithm
In order to detect integrity errors, the decision authority in an industrial
control system executes an algorithm after it receives local state values from
the contributing nodes. A total of (l + m) state values exist, one from each of
the l loyal contributing nodes and the m malicious contributing nodes. By using
inferred data, the algorithm identifies the m malicious nodes, where m ≥ 1 and
the total number of nodes n ≥ 4, as long as there are a majority of l > m
loyal nodes. To assist in identifying loyal and malicious nodes, the function
CONSISTENT is invoked. This function evaluates the state values to identify
the inconsistent nodes. After the inconsistent nodes are identified, the integrity
of the entire system is evaluated by comparing the number of malicious nodes
with the number of loyal nodes.
Arnold, Butts & Thirunarayan 9
Each contributing node receives an input from a physical device that repre-
sents ground truth; however, the contributing nodes can be loyal or malicious.
It is assumed that physical devices work properly and have no faults, and all
the inconsistencies are due to the malicious nodes. It is also assumed that
the decision node has complete information about the design of the physical
system. Specifically, it can determine if the contributing nodes are reporting
consistent values or inconsistent values.
The algorithm uses two primary functions. After all the state values are
collected, the function CONSISTENT is executed for each pair of contributing
nodes to determine consistency. After all the nodes are analyzed for consistency,
the function MAJORITY is executed to determine if the majority of nodes are
consistent or inconsistent.
The input to the CONSISTENT function is (si , si+1 ) where si and si+1 are
state values for PLCs Pi and Pi+1 , respectively. There are two possible return
values for the CONSISTENT function, True or False, which reflect whether the
state values are consistent or inconsistent, respectively. If both the state values
are consistent, then the state determination ti is assigned the consistent value
C; if the values are inconsistent, the inconsistent value I is assigned. Note
that, if the values are consistent, then the nodes are either both loyal or both
malicious. If the values are inconsistent, then one of the nodes is malicious.
The MAJORITY function evaluates the results generated by the CONSIS-
TENT function. The function returns an overall state for the system. The
system is consistent if, over all the ti , the number of C values is greater that
the number of I values; otherwise, the system is inconsistent.
4.1 Evaluation
The algorithm, shown in Figure 5, begins by acquiring local state values from
all the l + m contributing nodes. If there are at least three contributing nodes,
the first value is labeled as consistent; otherwise, the algorithm terminates be-
cause of the lack of a sufficient number of loyal nodes required to evaluate
system state. Next, pairs of state values are evaluated for consistency in se-
quential order. Nodes are labeled for consistency based on their relationship
to previous evaluations. After all the nodes are evaluated for consistency, the
majority operation is performed. If the majority of the nodes are consistent,
then the first node is loyal. As a result, all the consistent nodes are labeled as
loyal and all the inconsistent nodes are labeled as malicious. Alternatively, if
the majority of the nodes are inconsistent, the first node is malicious. These
results hold as long as there are more loyal nodes than malicious nodes (i.e.,
l > m).
Figure 6 shows a three-node system with P 1 as a malicious node and flagged
with an integrity error t1 = I. In the example, decision authority D1 receives
inputs from PLC field devices P 1 and P 2. P 1 reports a local state value
s1 = b, where b is the value of a system parameter. Meanwhile, P 2 reports
a local state value s2 = d. D1 can infer, based on the state value reported
by P 1, that the state value reported by P 2 should correlate with the state
10 CRITICAL INFRASTRUCTURE PROTECTION VIII
Figure 6. Impossibility of a solution for a system with less than four nodes.
Consider the case where P 3 is flagged for an integrity problem. In the first
step, D1 receives the state values: s1 = b from P 1, s2 = c from P 2 and s3 = f
from P 3. In the second step, since there are least three contributing nodes (P 1,
P 2 and P 3), the first node is labeled consistent (t1 = C).
Next, the state values from P 1 and P 2 are compared for consistency. Specif-
ically, CONSISTENT(s1 , s2 ) = True. As a result, the integrity flag of P 2 is set
to the same value as P 1 (t2 = C). Next, CONSISTENT(s2 , s3 ) = False. As a
result, the integrity flag of P 3 is set appropriately (t3 = I).
Finally, the MAJORITY function is executed. Since MAJORITY(C, C, I)
= C, the first node is loyal. As a result, all the nodes with ti = C are identified
as loyal and all the nodes with ti = I are identified as malicious.
At this stage, using visual observations only, a control system operator may
fail to identify the node with the integrity problem and could act on the invalid
data, as in the case of the Fork Shoals pipeline rupture. Implementing this
algorithm would have identified the faulty alert that led the pipeline operator
to believe that the pump had started, when, in fact, it had not.
4.2 Application
The algorithm identifies integrity problems and a means for evaluating con-
flicting data. From a cyber security perspective, the integrity of field devices
can be manipulated when components are networked to the Internet and tar-
geted compromises or accidental manipulations occur. A field device can, thus,
become compromised and report false data. If this occurs, the integrity of the
data can be compromised. The following example highlights a scenario where
devices provide inconsistent data, but the malicious device can be identified.
Figure 7 represents a notional oil pipeline with its associated connectivity
and interdependencies. The physical layer components are several miles apart
12 CRITICAL INFRASTRUCTURE PROTECTION VIII
and the components are managed by multiple PLCs that report to a single
decision component. The distribution of PLCs makes it difficult for an operator
to manually or visually verify the current state of every device. Nonetheless,
the operator must rely on the system for situational awareness prior to making
decisions or taking actions. Previous examples have demonstrated the negative
effects of conflicting data.
In the example, a field device P 1 monitors pressure, a field device P 2 mon-
itors a control valve and a field device P 3 monitors flow. A change in state at
P 1 changes the physical layer and the corresponding states of the subsequent
field devices P 2 and P 3. This is an important property, which enables the deci-
sion authority to infer the state of the subsequent field devices. For simplicity,
pressure can be High or Low, the valve position can be Open or Closed, and
the flow sensor shows a flow state of Yes or No. Each field device reports either
the accurate local state or the false local state.
Data reported by each field device is evaluated for consistency to allow the
decision component D1 to make decisions. Integrity problems are present if
the field devices P 1, P 2 and P 3 report conflicting state values. Table 1 lists
various combinations of sensor readings that represent consistent and inconsis-
tent states in the notional example. The table values are used for evaluating
the CONSISTENT function and enabling inference.
When the pressure at P 1 is Low, the valve position at P 2 should be Open
and the flow rate at P 3 should be Yes. When the valve position at P 2 is Closed,
the pressure at P 1 is High and the flow rate at P 3 is No. Inconsistent sequences
of the reported state values are detected by the algorithm.
For example, consider the second row in Table 1. In the first step, the
decision component receives state values from field devices P 1, P 2 and P 3. P 1
reports the pressure as High, P 2 reports the valve as Open and P 3 reports
the flow rate as Yes. In the second step, P 1 is labeled as consistent (t1 =
C). In the third step, values from P 1 and P 2 are compared for consistency.
CONSISTENT(s1 , s2 ) = False, meaning that High pressure at P 1 does not
infer an Open valve position at P 2. Since P 1 and P 2 are inconsistent, P 2 is
labeled as inconsistent (t2 = I). Next, values from P 1 and P 3 are compared
Arnold, Butts & Thirunarayan 13
5. Conclusions
Human operators and automated decision components in industrial control
environments often must make rapid decisions to react to system integrity er-
rors. The application of the Byzantine Generals Problem to industrial control
systems provides a formal mechanism for recognizing the presence of anomalous
data and potentially identifying its sources. Using physical system properties,
the resulting algorithm enables a decision authority to infer the system state
and identify integrity compromises. A key constraint is that, when more than
three field devices report the physical state of a system and when there are
more trusted devices than compromised devices, it is possible to identify the
specific devices that are compromised. The gas pipeline example demonstrates
how the algorithm can identify and resolve conflicting data. As demonstrated,
solutions to the Byzantine Generals Problem in the context of industrial control
environments facilitate the resolution of inconsistent data that can result from
cyber attacks against field devices and communications links.
References
[1] J. Finkle, “Irrational” hackers are growing U.S. security fear, Reuters, May
22, 2013.
[2] L. Lamport, R. Shostak and M. Pease, The Byzantine Generals Problem,
ACM Transactions on Programming Languages and Systems, vol. 4(3), pp.
382–401, 1982.
[3] Y. Lindell, A. Lysyanskaya and T. Rabin, On the composition of authenti-
cated Byzantine agreement, Journal of the ACM, vol. 56(6), pp. 881–917,
2006.
[4] T. Macaulay and B. Singer, Cyber Security for Industrial Control Systems:
SCADA, DCS, PLC, HMI and SIS, CRC Press, Boca Raton, Florida, 2012.
[5] National Transportation Safety Board, Pipeline Accident Report, Pipeline
Rupture and Release of Fuel Oil into the Reedy River at Fork Shoals, South
Carolina, Report PB98-916502, NTSB/PAR-98-01, Washington, DC, 1996.
[6] F. Rashid, ICS-CERT: Response to cyber “incidents” against critical in-
frastructure jumped 52 percent in 2012, Security Week, January 10, 2013.
[7] U.S. Department of Homeland Security, ICS-CERT Incident Response
Summary Report: 2009–2011, Washington, DC, 2012.
[8] Z. Zorz, Company’s industrial heating system hacked via backdoor, Help
Net Security, Kastav, Croatia, December 12, 2012.
Chapter 2
1. Introduction
Industrial control systems are computer-based systems that monitor and
control process systems in critical infrastructure assets such as water treat-
ment and distribution facilities, transportation systems, oil and gas pipelines,
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 15–27, 2014.
IFIP International Federation for Information Processing 2014
16 CRITICAL INFRASTRUCTURE PROTECTION VIII
2. Power Fingerprinting
Power fingerprinting analyzes a processor side channel, such as power con-
sumption or electromagnetic emissions, to determine whether or not it deviates
from expected operation. A power fingerprinting monitor, shown in Figure 1,
uses a physical sensor to capture electromagnetic signals containing small pat-
terns that emerge during the transition from one instruction to another. In
power fingerprinting, captured power traces are processed by an external device
that implements signal detection and classification techniques. The observed
traces are compared against baseline references to assess whether or not exe-
cution has deviated from its expected behavior, such as when malware alters
normal operation.
Aguayo Gonzalez & Hinton 17
+V
!
i(t)
t
2.2 Characterization
The baseline references contain the expected side channel signals and in-
dicate the acceptable tolerance variation. Power fingerprinting baselines are
18 CRITICAL INFRASTRUCTURE PROTECTION VIII
Oscilloscope
purposes. The captured signals were transferred via a USB drive and processed
by the power fingerprinting host using custom software tools and scripts.
Tank
High Pump
1 H P
0
Logic
Low Alarm
1 L A
0
Figure 4 shows a simplified model of the control logic. The sensors were
configured to provide a logical one when the tank water level was at or above
the sensor level and a logical zero when the water level was below the sensor
level.
Aguayo Gonzalez & Hinton 21
The control system logic has four execution paths. An execution path is
selected based on the combination of input values at the beginning of the logic
cycle. To facilitate synchronization, the logic incorporates a physical trigger,
an electric signal sent to the digitizer via the output port of the programmable
logic controller to indicate when the logic cycle is started.
//PFP Trigger
if L = 0 && H = 0 then
Call FC1 pump = On
if trap = 1 then alarm = Off
pump = On!! else if L = 1 && H = 1 then
alarm = Off!! pump = Off
end alarm = Off
else if L = 0 && H = 1 then
// PFP Trigger alarm = ON !!!
pump = Off
increase alarm counter
else
outputs unchanged
end
Figure 6 shows how the original logic block is moved in the tampered ver-
sion. After the original logic is executed, the tampered block post-processes the
results to change the system behavior. The most important element of the tam-
pering, however, is the fact that behavioral modifications only take place under
specific conditions. Similar to Stuxnet, the attack remains dormant and the
system exhibits normal behavior until the triggering condition is encountered.
The triggering condition is induced by another digital input pin that controls
the sabotage routine. Note that the triggering mechanism is arbitrary; selecting
Aguayo Gonzalez & Hinton 23
Tank
High Pump
1 H P
0
Logic
Low Alarm
1 L A
0
Malware
Trigger
Table 2 shows the tampered control system logic. When the triggering con-
dition is induced, the programmable logic controller turns the pump on regard-
less of the sensor inputs, causing the water in the tank to overflow. When the
triggering condition is absent, the observed behavior matches the original logic.
4. Experimental Results
After characterizing the original control logic and extracting the power fin-
gerprinting references for all the execution paths, the power fingerprinting mon-
itor was able to effectively monitor the integrity of the Siemens S7-1200 pro-
grammable logic controller. Furthermore, power fingerprinting successfully de-
tected malicious software execution even when the triggering condition was
absent.
execution paths during the characterization process. Training traces were cap-
tured in a controlled environment in which input vectors were provided to
exhaustively exercise all the possible execution paths.
A total of 100 training traces were captured for each execution path and
processed using a spectral periodogram (spectrogram) to extract the frequency
components of each training trace at different time segments. The spectrogram,
which corresponds to the squared magnitude of the discrete-time short-time
Fourier transform (X(τ, ω)), is given by:
Note that x[n] is the captured power fingerprinting trace and w[n] is a Gaus-
sian window. The power fingerprinting references were constructed by averag-
ing the spectrograms of the 100 training traces for each execution path. For
Path 0, the power fingerprinting reference is denoted by S0 ; for Path 1, the
power fingerprinting reference is denoted by S1 ; and so on.
After the references for each execution path were computed, the power fin-
gerprinting monitor captured a new runtime test trace r[n], and compared it
against the references to determine if r[n] corresponded to an authorized exe-
cution path or if it should be flagged as an anomaly. In order to match r[n]
to a specific path reference Si , the spectrogram of r[n] was computed and sub-
tracted from each baseline reference over selected time segments and frequency
bands. The difference was then smoothed and summed across the selected
time segments and frequency bands to determine the final distance for each
path reference yi .
The reference that produced the minimum distance from the test trace,
yf = mini {yi }, was selected as the likely execution path that generated the test
trace r[n]. If yf is within the normal range as determined during the charac-
terization, the power fingerprinting monitor classifies the trace as belonging to
the corresponding execution path. If the test trace does not match any ref-
erence within the predefined tolerance, then the power fingerprinting monitor
determines that an anomaly exists and raises an alarm.
the tampered control system logic. Note that the closer yf is to zero, the
more similar the tampered execution trace is to the baseline reference trace.
A clear separation can be seen between the distributions, which demonstrates
the ability of power fingerprinting to detect malicious software execution.
Similar results were obtained for the other execution paths. Figure 9 presents
a boxplot of an aggregated view of the execution paths. The boxplot shows that
the separation between the original and tampered distributions is maintained
for all possible execution paths. The results demonstrate the ability of power
fingerprinting to detect malicious software in an industrial control system by
directly monitoring programmable logic controller execution.
5. Conclusions
Power fingerprinting is a novel technique for directly monitoring the exe-
cution of systems with constrained resources. The technique, which has been
successfully demonstrated on a variety of platforms, does not require software
artifacts to be loaded on the target platforms.
The experimental results demonstrate that power fingerprinting can directly
monitor programmable logic controller execution and detect the presence of
malware. Because of its zero-day detection capability and negligible overhead,
26 CRITICAL INFRASTRUCTURE PROTECTION VIII
Figure 9. Anomaly detection performance for execution paths in the original logic.
References
[1] C. Aguayo Gonzalez and J. Reed, Dynamic power consumption monitoring
in SDR and CR regulatory compliance, Proceedings of the Software Defined
Radio Technical Conference and Product Exposition, 2009.
[2] C. Aguayo Gonzalez and J. Reed, Power fingerprinting in SDR and CR
integrity assessment, Proceedings of the IEEE Military Communications
Conference, 2009.
[3] C. Aguayo Gonzalez and J. Reed, Detecting unauthorized software execu-
tion in SDR using power fingerprinting, Proceedings of the IEEE Military
Communications Conference, pp. 2211–2216, 2010.
[4] C. Aguayo Gonzalez and J. Reed, Power fingerprinting in unauthorized
software execution detection for SDR regulatory compliance, Proceedings
of the Software Defined Radio Technical Conference and Product Exposi-
tion, pp. 689–694, 2010.
[5] C. Aguayo Gonzalez and J. Reed, Power fingerprinting in SDR integrity
assessment for security and regulatory compliance, Analog Integrated Cir-
cuits and Signal Processing, vol. 69(2-3), pp. 307–327, 2011.
Aguayo Gonzalez & Hinton 27
[6] S. Axelsson, The base-rate fallacy and the difficulty of intrusion detection,
ACM Transactions on Information and System Security, vol. 3(3), pp.
186–205, 2000.
[7] A. Bose, X. Hu, K. Shin and T. Park, Behavioral detection of malware
on mobile handsets, Proceedings of the Sixth International Conference on
Mobile Systems, Applications and Services, pp. 225–238, 2008.
[8] A. Bose and K. Shin, On mobile viruses exploiting messaging and Blue-
tooth services, Proceedings of the Second International Conference on Se-
curity and Privacy in Communications Networks and the Securecomm
Workshops, 2006.
[9] M. Christodorescu, S. Jha, S. Seshia, D. Song and R. Bryant, Semantics-
aware malware detection, Proceedings of the IEEE Symposium on Security
and Privacy, pp. 32–46, 2005.
[10] A. Cui and S. Stolfo, Defending embedded systems with software sym-
biotes, Proceedings of the Fourteenth International Symposium on Recent
Advances in Intrusion Detection, pp. 358–377, 2011.
[11] S. Das, K. Kant and N. Zhang, Handbook on Securing Cyber-Physical
Critical Infrastructure: Foundations and Challenges, Morgan Kaufmann,
Waltham, Massachusetts, 2012.
[12] H. Erbacher and S. Hutchinson, Distributed sensor objects for intrusion
detection systems, Proceedings of the Ninth International Conference on
Information Technology, pp. 417–424, 2012.
[13] N. Falliere, L. O’Murchu and E. Chien, W32.Stuxnet Dossier, Version 1.4,
Symantec, Mountain View, California, 2011.
[14] J. Oberheide, E. Cooke and F. Jahanian, CloudAV: N-version antivirus
in the network cloud, Proceedings of the Seventeenth USENIX Security
Symposium, pp. 91–106, 2008.
[15] M. Rajab, L. Ballard, N. Jagpal, P. Mavrommatis, D. Nojiri, N. Provos
and L. Schmidt, Trends in Circumventing Web-Malware Detection, Google
Technical Report rajab-2011a, Google, Mountain View, California, 2011.
[16] J. Reeves, A. Ramaswamy, M. Locasto, S. Bratus and S. Smith,
Lightweight intrusion detection for resource-constrained embedded con-
trol systems, in Critical Infrastructure Protection V, J. Butts and S. Shenoi
(Eds.), Springer, Heidelberg, Germany, pp. 31–46, 2011.
[17] S. Stone, Radio-Frequency-Based Programmable Logic Controller
Anomaly Detection, Ph.D. Dissertation, Department of Electrical and
Computer Engineering, Air Force Institute of Technology, Wright-
Patterson Air Force Base, Ohio, 2013.
[18] S. Stone and M. Temple, Radio-frequency-based anomaly detection for
programmable logic controllers in the critical infrastructure, International
Journal of Critical Infrastructure Protection, vol. 5(2), pp. 66–73, 2012.
[19] C. Tankard, Advanced persistent threats and how to monitor and deter
them, Network Security, vol. 2011(8), pp. 16–19, 2011.
Chapter 3
Abstract This paper introduces a new problem formulation for assessing the vul-
nerabilities of process control systems. In particular, it considers an
adversary who has compromised sensor signals and has to decide on the
best time to launch an attack. The task of selecting the best time to
attack is formulated as an optimal stopping problem that the adversary
has to solve in real time. The theory underlying the best choice problem
is used to identify an optimal stopping criterion, and a low-pass filter
is subsequently used to identify when the time series of a process vari-
able has reached the state desired by the attacker (i.e., its peak). The
complexities associated with the problem are also discussed, along with
directions for future research.
1. Introduction
One of the growing research areas related to cyber-physical system security
is developing threat models that consider an adversary who can manipulate
sensor or actuator signals in order to drive a physical process to an undesired
state. While many researchers have focused on the implications of manipulating
signals, little work has attempted to understand the complexity and uncertain-
ties associated with launching successful attacks and, in particular, finding the
“best time” to launch an attack.
Attempting to disrupt a physical process without clearly understanding the
consequences of the attack actions on the process is likely to result in a minor
nuisance instead of an actual disruption – after all, breaking into a system is
not the same a breaking a system.
This paper considers an attacker who can read a sensor signal for a given
process variable and has to decide on a time to launch a denial-of-service (DoS)
attack in order to “freeze” a certain process value above or below the setpoint
stored in controller memory [5]. In doing so, the attacker deceives the controller
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 29–45, 2014.
IFIP International Federation for Information Processing 2014
30 CRITICAL INFRASTRUCTURE PROTECTION VIII
about the current state of the process and evokes compensating reactions that
could bring the process into the state desired by the attacker (e.g., unsafe state).
In order to achieve the attack goal faster, the attacker may opt to freeze one of
the peak values of a process variable (low or high) to expedite process dynamics.
Typical sensor signals in a process control environment fluctuate around the
setpoint or track dynamic changes in the process. In both cases, the process
variable exhibits a time series of low and high peaks. The attacker neither
knows how high nor how low the process variable can span, nor which of the
peak values should be chosen from among all the possible boundary states.
This paper formulates the challenge as an optimal stopping time problem
for the attacker. In particular, it is formulated as a best choice problem (also
known as the secretary problem), in which the adversary is presented with a
time series of system states provided by sensor measurements and has to decide
on the optimal time to attack. Because the best choice problem assumes non-
correlated time measurements, it is necessary to discern upward or downward
trends in process measurements (time correlations) and then identify when a
local optimum has been reached. This is a non-trivial task in many real-world
environments because sensor measurements can be noisy and can have sudden
fluctuations.
physical substances and the manufacturing of end products. Over the past few
decades, industrial plants have undergone tremendous modernization. Tech-
nology has become an enabler of efficiency as well as a source of problems.
Panels of relays are now embedded computers and simple analog sensors are
now IP-enabled smart transmitters [8] with multiple wired and wireless commu-
nications modes, numerous configuration modes and even web-servers, so that
maintenance staff can calibrate and manage the devices from remote locations.
Thus, the possibility of remote exploitation of industrial control systems and
the physical processes they manage has become a reality.
The attacker faces the following problem: given a time series that exhibits
a sequence of peaks and valleys of different amplitudes, select one of the peaks
to launch a DoS attack in real time. If the attacker strikes too soon, the
opportunity to have a greater impact on the system is lost (compared with if
the attacker waits until the process variable reaches a higher (or lower) value).
However, if the attacker waits too long, the process variable may not reach a
more vulnerable state than previously observed and the attacker could miss
the opportunity to cause maximal damage and even have the implanted attack
tools (e.g., communications jammers and sensor malware) detected before the
attack is launched.
The problem of selecting an opportune time to attack can be framed as
an optimal stopping problem. This problem focuses on choosing the time to
take a particular action based on sequentially-observed random variables in
order to maximize an expected payoff. The optimal stopping decision task, in
which the binary decision to stop or continue the search depends only on the
relative ranks, is modeled as the best choice problem, which is also known as
the secretary problem [2].
to search at sample Xi is determined not only by the aspiration value but also
by the difference between the stopping value and the continuation value Xi+1 .
The problem of identifying a signal peak is exacerbated by the fact that pro-
cess variables are noisy and, therefore, an upward trend might be followed by
a quick drop, followed again by an even higher gain.
To solve this problem, a low-pass filter is incorporated to smooth out short-
term signal fluctuations and highlight the longer-term trends. This enables
a peak to be identified as soon as a downward trend in a smoothed signal is
detected (e.g., three consecutive measurement drops).
4. Simulation Setup
The empirical analysis employed a Matlab model of the Tennessee Eastman
challenge process [1] developed by Ricker [9]. It is implemented as a C-based
MEX S-function with a Simulink model.
1
SP8 5
2
SP3 SP7
8
3 SP5
6
SP7
SP9 9 Steam
SP2
SP6
4
SP1
In order to obtain statistically significant results, the original code was modi-
fied by generating a new seed for the random number generator for each run. In
addition, higher sampling rates for the process variables – 2,000 sensor samples
per hour (per sensor) – were incorporated in the Matlab workspace.
immediately launch a DoS attack to freeze the peak value from the previous
control loop cycle in the controller memory.
5. Experimental Results
The experiments assume the presence of an attacker whose goal is to force
the physical process to shut down. The result of such an attack is evaluated
using the shutdown time (SDT), the time that the process is able to run be-
fore being shut down because it has exceeded the safety constraints. First, the
Krotofil, Cardenas & Angrishi 37
A feed
0.302 Raw signal
A feed
0.4364
0.4362
kscmh
0.436
0.4358
Raw signal
Smoothed signal, µ=50
0.4356 Smoothed signal, µ=250
shortest SDT that can be achieved using a DoS attack on each sensor signal is
determined. Following this, to justify the importance of the strategic selection
of the attack time, evidence of the ineffectiveness of DoS attacks conducted
at random times is provided. In particular, it is shown that random selection
not only significantly increases the time required to bring the process to the
critical state, but in some cases, it could be completely ineffective. Also, the
experiments evaluate the effects of the length of the learning phase and param-
eter smoothing on the attacker’s prospects of selecting the highest (or lowest)
possible process value in real time.
kPa gauge
kscmh
0.3 2800
0.2 2790
Raw signal
0.1
2780 Smoothed signal
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Hours Hours
9.6
3700
9.4
kscmh
kg/h
9.2
3650
9
The mean times to shutdown for the attacks on different sensors were deter-
mined based on the results of 50 simulations. Table 1 summarizes the results.
The 95% confidence intervals are calculated using the Student’s t-distribution.
The table does not include results for XMEAS{10; 11} because no attack on
these sensors drives the system to an unsafe state.
Due to the variability of process measurement noise, the process is never
in the same state. However, as the results indicate, the Tennessee Eastman
process is, in general, resilient to noise variations and the SDT does not exhibit
max
significant variations, with the exception of the attacks Frecycle and FAmin .
Krotofil, Cardenas & Angrishi 39
Attack FAmin on the A-feed is of special interest. Not all attack instances
trigger process shutdowns. Thus, the result for the FAmin attack is based on 43
out of 50 cases where the process reaches an unsafe state. At the same time,
max
attacks Ppressure and FAmax do not drive the process to an unsafe state. This
means that an attacker who intends to launch an attack on reactor pressure
should only strike at the minimum peaks.
0.5
0.8
Success probability
Success probabilty
0.4
0.6
0.3
0.4
0.2
0.2
0.1
0 0
0 5 10 15 20 0 5 10 15 20
Hours Hours
C feed
9.3
9.2
9.1 Mean
9
8.9
0 5 10 15 20 25
Hours
6. Conclusions
This paper demonstrates that sensor signal characteristics must be con-
sidered carefully when developing attacks that target process measurements.
Moreover, finding the appropriate values of parameters such as optimal sig-
nal smoothing (μ) and stopping decision (r) are not straightforward and the
parameters are best determined experimentally.
An attacker may do extensive homework and proactively design portions of
attacks, but the attacks would have to be tuned through reconnaissance activi-
ties such as changing configuration parameters, manipulating process variables
and turning components on and off while observing the effects on the process
system. From the defensive perspective, short-term process deviations aris-
ing from such “testing” can be detected by process-aware anomaly detection
methods. Furthermore, in order to hinder the attacker’s ability to disrupt a
process system, plant administrators should strategically place misleading or
false technical documentation to influence the attacker’s strategy selection.
Overall, a better understanding of the complexities and uncertainties faced
by an attacker when designing targeted cyber-physical attacks in the physi-
cal domain allows for better judgment regarding the efforts required to design
Table 2. Simulation results for the educated guess approach for XMEAS1.
μ = 50
r=0 r=1 r=2 r=3
N • 1.06 (0.82; 1.32) • 0.7 (0.45; 0.96) • 0.62 (0.25; 0.98) • 0.82 (0.53; 1.11)
e • 49.46 (31.8; 67.11) • 44.86 (29.91; 59.81) • 35.78 (20.36; 51.20) • 54.08 (36.67; 73.04)
• 7 (79.07) • 4 (96.17) • 6 (98.46) • 8 (47.87)
μ = 150
24h
• 1.35 (1.03; 1.67) • 0.74 (0.48; 1.00) • 0.84 (0.55; 1.14) • 0.75 (0.35; 1.16)
Krotofil, Cardenas & Angrishi
• 33.98 (11.14; 56.83) • 26.56 (8.06; 45.50) • 22.47 (9.73; 35.20) • 21.39 (7.01; 35.76)
• 8 (112.56) • 5 (61.35) • 5 (35.39) • 5 (89.16)
μ = 250
• 1.42 (0.78; 2.07) • 0.73 (0.18; 1.29) • 0.80 (0.22; 1.38) • 0.69 (0.40; 0.98)
• 26.87 (7.59; 46.04) • 8.56 (0.34; 16.78) • 6.33 (0.18; 12.49) • 27.96 (6.88; 49.03)
• 7 (60.87) • 3 (93.19) • 7 (90.38) • 5 (32.93)
μ = 50
N
log(N)
• 1.31 (0.98; 1.69) • 0.75 (0.46; 1.04) • 0.84 (0.58; 1.09) • 0.69 (0.47; 0.84)
• 72.63 (57.62; 87.66) • 65.48 (51.79; 79.16) • 66.16 (50.76; 81.55) • 74.56 (59.88; 89.23)
•0 •0 •0 • 1 (91.32)
μ = 150
24h • 1.04 (1.21; 1.59) • 1.03 (0.62; 1.45) • 1.00 (0.52; 1.47) • 0.96 (0.51; 1.43)
• 49.73 (32.81; 66.65) • 48.26 (34.72; 61.35) • 33.6 (18.77; 48.42) • 27.85 (12.22; 43.49)
•0 • 1 (164.46) •0 •0
μ = 250
• 1.49 (1.01; 1.98) • 0.71 (0.27; 1.15) • 0.78 (0.43; 1.12) • 0.84 (0.52; 1.16)
• 37.57 (20.98; 54.7) • 40.44 (23.00; 57.86) • 28.33 (13.42; 43.25) • 46.74 (32.18; 61.29)
•0 • 1 (53.81) •0 •0
43
44 CRITICAL INFRASTRUCTURE PROTECTION VIII
A feed D feed
12 4
10
Number of Simulations
Number of simulations
3
8
6 2
4
1
2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
% %
6
Number of Simulations
Number of simulations
6
5
4
4
3
2
2
1
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
% %
and conduct cyber-physical attacks with surgical precision (as in the case of
Stuxnet). Clearly, developing sophisticated and effective cyber-physical attacks
requires extensive experimentation with the same specialized industrial equip-
ment as that installed at the targeted site.
References
[1] J. Downs and E. Vogel, A plant-wide industrial process control problem,
Computers and Chemical Engineering, vol. 17(3), pp. 245–255, 1993.
[2] P. Freeman, The secretary problem and its extensions: A review, Revue
Internationale de Statistique, vol. 51(2), pp. 189–206, 1983.
[3] J. Gilbert and F. Mosteller, Recognizing the maximum of a sequence,
Journal of the American Statistical Association, vol. 61(313), pp. 35–73,
1966.
[4] Y. Huang, A. Cardenas, S. Amin, Z. Lin, H. Tsai and S. Sastry, Under-
standing the physical and economic consequences of attacks on control
systems, International Journal of Critical Infrastructure Protection, vol.
2(3), pp. 73–83, 2009.
Krotofil, Cardenas & Angrishi 45
RECOVERY OF STRUCTURAL
CONTROLLABILITY FOR
CONTROL SYSTEMS
1. Introduction
Domination, a central topic in graph theory, is a relevant theme in the
design and analysis of control systems because it is an equivalent problem
to that of (Kalman) controllability. The motivation comes from the concept
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 47–63, 2014.
IFIP International Federation for Information Processing 2014
48 CRITICAL INFRASTRUCTURE PROTECTION VIII
However, this formulation is quite restrictive for large networks (e.g., power
networks or similarly large control systems) where there exists an exponential
growth of input values as a function of nodes. This is the main reason that
our investigations concentrate on structural controllability, where matrix A in
Equation (1) represents the network topology and matrix B contains the set of
nodes with the capacity to drive control [16].
Lin [15] defines G(A, B) = (V, E) as a digraph where V = VA ∪ VB is the set
of vertices and E = EA ∪ EB is the set of edges. In this representation,
VB comprises the nodes capable of injecting control signals into the entire
network, also known as driver nodes (denoted as nd ) corresponding to input
vector u in Equation (1). The identification of these nodes has so far been
studied in relation to general networks. This paper concentrates on power-law
networks, most pertinent to a number of large-scale infrastructure networks. To
identify the minimum driver node subsets ND , we follow the approach based
on the power dominating set (PDS) problem, which is described in more detail
in [1, 2]. This interest is primarily because PDS-based networks have similar
logical structures as real-world monitoring systems, where driver nodes can
represent, for example, remote terminal units that control industrial sensors and
actuators. In fact, the PDS problem was originally introduced as an extension
of the dominating set (DS) by Haynes, et al. [12], mainly motivated by the
structure of electric power networks and the need to efficiently monitor the
networks.
Building on previous work [1, 2], this paper proposes several restoration
strategies for controlling a network after ND has been perturbed. Different
attack patterns that compromise nodes and the effects of the attacks have been
considered extensively in [1, 2], in particular, the analysis and evaluation of
interactive and non-interactive attacks, including multiple rounds between at-
tackers and defenders, respectively. However, it is clearly undesirable to restore
overall controllability through complete re-computation if the PDS properties
are only partially violated – where this is possible given the constraints im-
posed by compromised nodes – because the PDS problem is known to be N P-
Alcaraz & Wolthusen 49
complete for general graphs as well as for bipartite and chordal graphs as shown
by Haynes, et al. [12].
Subsequent research by Guo, et al. [11] extended N P-completeness proofs
to planar, circle and split graphs, with the exception of partial k-tree graphs
with k ≥ 1 and parameterized using ND , in which the DS and PDS problems
can become tractable in linear-time, while the parameterized intractability can
result in W [2]-hardness [8]. Pai, et al. [17] have provided results for grid graphs
while Atkins, et al. [3] have studied block graphs. There are other approaches
that address PDS for specific cases [5, 6], but none of them focus on efficient
solutions for the restoration of the PDS problem following perturbations, i.e.,
where a PDS of the original graph G is known along with the changes induced
on G.
The restoration strategies defined in this paper center on general power-law
and scale-free distributions by offering similar characteristics to real power net-
works. In particular, three strategies are defined to determine the complexity
of restoration. To evaluate the complexity, this paper considers: (i) a strategy
without any type of constraint for restoration; (ii) a strategy based on the graph
diameter to minimize the intrinsic problem of the non-locality of PDS; and (iii)
a strategy based on backup instances of driver nodes. The paper shows that
this offers a gain in efficiency over re-computation while resulting in acceptable
deviations from an optimal (i.e., minimal |ND |) PDS. Because many critical
infrastructures require timely or even real-time bounded restoration to ensure
resilience and continued operation, the ability to restore controllability rapidly
is essential and, of course, highly desirable.
Note that the omission of OR2 already results in the N P-complete DS prob-
lem with a polynomial-time approximation factor of Θ(log n) [9]. The following
condition is that the construction of ND is arbitrary and depends on the selec-
tion of vertices satisfying OR1, allowing the customizable selection of controlla-
bility generation strategies as specified in [1]. After ND has been obtained, we
evaluate two different scenarios concentrating on attacks against either node or
edge (communications link) availability [1, 2]:
SCN-1: Randomly remove some (not all) edges of one or several ver-
tices, which may compromise the controllability of dependent nodes or
disconnect parts of the control graph and underlying network.
None of the new restoration links must violate the out-degree distribution
of a power-law network and must not introduce cycles.
At the end of the algorithm, the restored set ND can increase the initial
number of driver nodes such that U = (note that | ND |=| V | in degenerate
cases). However, and unfortunately, we must also consider the handicap of
non-locality of PDS and the N P-complete property demonstrated by Haynes,
et al. [12].
Our heuristic approach is based on ensuring that the hard constraints, i.e.,
observation rules, are satisfied primarily and that, as a secondary constraint,
the out-degree distribution property of the underlying power-law network re-
mains unaltered. This strategy also depends on the approaches taken for each
restoration strategy defined in the remainder of the paper. In this case, the
study is based on two main approaches:
This section develops and analyzes the three associated algorithms, consid-
ering in addition the parameters and functions described above.
The tree width corresponds to the minimum width w over all tree decompo-
sitions of G(V, E), where w = maxi∈I (| bagi |∈ T ) − 1 and w ≥ 1. This means
that a tree decomposition T of width w with | ND | driver nodes can be turned
into a nice tree decomposition of width w, but subject to the diameter associ-
ated with each driver node within the network [7]. In this way, bags containing
driver nodes with smaller diameters are the leaves of T while driver nodes with
higher diameters are located closer to the root.
For transformation to a nice tree decomposition, each node i in the tree T
has at most two children (j, z) complying with two additional conditions: (i)
nodes with two children bagj and bagz , bagi = bagj = bagz (bagi as a join node);
and (ii) nodes with a single child bagj such that bagi = bagj ∪ {nd } (bagi as an
introduce node) or bagi = bagj −{nd } (bagi as a forget node). In practice, these
trees are constructed using tables with at least three columns (i, j, z), where
each entry i contains those subsets of nd in relation to i. However, this data
structure also takes into account the maximum diameter associated with each
bag because the approach does not focus on re-linking (APPR-1), the value
of which remains constant throughout the restoration process. Therefore, the
spatial overhead for such a table may become 3 × 2w+1 = O(2w+1 ) entries [10].
Algorithm 5 describes the behavior of the restoration strategy with one or
several nice tree decompositions Tbk as the main input parameter with a stor-
M
age cost of O( 2w+1 ). The idea is to process this parameter in a bottom-up
bk=1
fashion to find the driver nodes with minimum diameter that ensure the fulfill-
ment of RR3 specified in Section 2.2. The inductive proof begins by defining
the initial and final conditions, and the base cases:
Depending on SCN-x and the targeted node TG-x, the computational com-
plexity of Algorithm 3 can become variable as described in Section 3.1:
Threat Scenarios
SCN-1 – TG-x SCN-2 – TG-x
Time ND Time ND
2 2
STG-1 O(kn ) nd + 2 O(kn ) nd + 1
STG-2 O(kn2 ) nd + 2 O(kn2 ) nd + 1
M
M
STG-3 O(k( (2w+1 (b + e)))) nd + 2 O(k( (2w+1 (b + e)))) nd + 1
bk=1 bk=1
M
TG-x in SCN-1: O(k( (2w+1 (b + e)))) + O((a + e) + n2 ), resulting
bk=1
M
in O(k( (2w+1 (b + e)))), where ND increases its value at least in two
bk=1
nodes in the worst case.
M
M
TG-x in SCN-2: O(k( (2w+1 (b + e)))) + O(n2 ) = O(k( (2w+1
bk=1 bk=1
(b + e)))), where ND increases its value at least in one node.
100 92 = = =
1,100 1,073 = = =
2,100 2,036 = = =
3,100 3,000 = = =
SCN-1: n/2 Random Targets
n Nbef
D NST
D
G−1
NST
D
G−2
NST
D
G−3
100 95 = = =
1,100 1,072 1,074 = 1,073
2,100 2,029 2,034 2,030 2,030
3,100 3,022 3,026 = 3,030
SCN-2: One Target
n Nbef
D NST
D
G−1
NST
D
G−2
NST
D
G−3
100 94 = = =
1,100 1,066 = = =
2,100 2,010 = = =
3,100 3,000 = = =
SCN-2: n/2 Random Targets
n Nbef
D NST
D
G−1
NST
D
G−2
NST
D
G−3
the efficiency of the three strategies with regard to the changes caused on the
size of ND after perturbation. It can be deduced from the table that the
variation of the set of driver nodes does not become significant with respect to
the number of attacked nodes. In addition, it is important to note that 99%
of the observation rate (for U-1 and U-2 nodes) were completely lost for all
cases after perturbation. Despite this, we also observed that the networks were
equally able to retake 100% of the control after recovery without significant
changes in the majority of the cases, and especially for STG-2 partly due to
the use of the network diameter.
5. Conclusions
Structural controllability offers a powerful abstraction for understanding the
properties of critical nodes in a control network, which is vital to restoring
control following node or link failures and, in particular, deliberate attacks.
62 CRITICAL INFRASTRUCTURE PROTECTION VIII
This helps minimize the period during which a control system is held by an
adversary. Also, it helps minimize the period during which the system may
reach undesirable states – in the case of electrical power systems and networks,
this period can be in the order of seconds or less before severe effects occur.
The main contributions of this paper are the three repair strategies for con-
trollability in control graphs using the structural controllability abstraction,
and relying on the PDS formulation to gain a clearer understanding of the ef-
fects of topology constraints on the repair strategies. These include re-linking
without restrictions, re-linking with constrained network diameter and the use
of pre-computed instances of driver nodes. In this way, controllability power-
law networks can be restored more efficiently than by re-computing the con-
trolling nodes when their links have been perturbed by attacks on availability.
The three strategies have been analyzed formally and subjected to a complex-
ity analysis. The results highlight that the use of a network diameter can be a
suitable option to establish control with low computational and storage costs.
Our future work will focus on extending the analysis to explore the possibility
of restoring control subgraphs instead of the entire network while retaining ac-
ceptable control graph parameters (primarily the number of nodes, maximum
out-degree and diameter), thereby improving the respective approaches and
their complexity. Another topic involves the renewed study of power-law net-
works and optimizing approximation mechanisms for controllability that give
satisfactory average-time complexity. Finally, our research will also investigate
new attack models, especially those involving interactions between attackers
and defenders.
Acknowledgements
This research was partially funded by the Marie Curie COFUND Programme
U-Mobility supported by the University of Malaga, by the EC FP7 Project
under GA No. 246550, and by the Ministerio de Economia y Competitividad
(COFUND2013-40259). This research was also partially funded by the EU
FP7 ARTEMIS Project under GA No. 269374 and by the Spanish Ministry of
Science and Innovation under the ARES Project (CSD2007-00004).
References
[1] C. Alcaraz, E. Miciolino and S. Wolthusen, Structural controllability of
networks for non-interactive adversarial vertex removal, Proceedings of the
Eighth International Conference on Critical Information Infrastructures
Security, pp. 129–132, 2013.
[2] C. Alcaraz, E. Miciolino and S. Wolthusen, Multi-round attacks on struc-
tural controllability properties for non-complete random graphs, Proceed-
ings of the Sixteenth Information Security Conference, 2014.
[3] D. Atkins, T. Haynes and M. Henning, Placing monitoring devices in elec-
tric power networks modeled by block graphs, Ars Combinatorica, vol. 79,
2006.
Alcaraz & Wolthusen 63
1. Introduction
Supervisory control and data acquisition (SCADA) systems are computer-
based process control systems that control and monitor remote physical pro-
cesses. SCADA systems are strategically important because they are widely
used in the critical infrastructure. Several incidents and cyber attacks affecting
SCADA systems have been documented; these clearly illustrate the vulner-
ability of critical infrastructure assets. The reported incidents demonstrate
that cyber attacks against SCADA systems can have severe financial impact
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 65–78, 2014.
IFIP International Federation for Information Processing 2014
66 CRITICAL INFRASTRUCTURE PROTECTION VIII
and can result in damage that is harmful to humans and the environment.
In 2000, a disgruntled engineer compromised a sewage control system in Ma-
roochy Shire, Australia, causing approximately 264,000 gallons of raw sewage
to leak into a nearby river [13]. In 2003, the Slammer worm caused a safety
monitoring system at the Davis-Besse nuclear plant in Oak Harbor, Ohio to
go offline for approximately five hours [11]. The insidious Stuxnet worm [3],
which was discovered in 2010, targeted nuclear centrifuge system controllers,
modifying system behavior by distorting monitored process information and
altering control actions.
Cyber security researchers have developed numerous intrusion detection sys-
tems to detect attacks against SCADA systems. Much of the research uses
training and validation data sets created by the same researchers who developed
the intrusion detection systems. Indeed, no standardized data set containing
normal SCADA network traffic and attack traffic is currently available to re-
searchers. In order to evaluate the performance of data mining and machine
learning algorithms for SCADA intrusion detection systems, a network data
set used for benchmarking intrusion detection system performance is sorely
needed. This paper describes four data sets, which include network traffic, pro-
cess control and process measurement features from a set of 28 attacks against
two laboratory-scale industrial control systems that use the MODBUS appli-
cation layer protocol. The data sets, which are freely available, enable effective
comparisons of intrusion detection solutions for SCADA systems.
2. Related Work
Several SCADA security researchers have developed intrusion detection sys-
tems that monitor network traffic and detect attacks against SCADA systems.
Table 1 lists example intrusion detection systems, the threat models they use
and the network protocols they analyze. Note that each intrusion detection
system uses a unique threat model. Some threat models are based on attacks
executed against SCADA laboratory testbeds while others are based on ma-
Morris & Gao 67
nipulated data sets drawn from other domains. The network protocols also
differ; MODBUS is the most common protocol (used in three systems) while
the IEEE C37.118 protocol is used in just one system. The remaining systems
use threat models with attacks implemented at different network layers.
A noticeable drawback of the research identified in Table 1 is that the threat
models only include subsets of attack classes. Not surprisingly, exploit cov-
erage is limited for each of the data sets. Only a few of the threat models
consider reconnaissance attacks while some models only include response injec-
tion attacks. Indeed, the malicious behavior captured in the data sets is neither
consistent nor comprehensive in terms of normal operations and attacks. For
this reason, it is difficult to judge the effectiveness of an intrusion detection
system against sophisticated attacks. This also leads to a situation in which
researchers cannot independently verify intrusion detection results and cannot
compare the performance of intrusion detection systems.
4. Description of Attacks
The data sets presented in this paper include network traffic, process con-
trol and process measurement features from normal operations and attacks
against the two SCADA systems. The attacks are grouped into four classes:
(i) reconnaissance; (ii) response injection; (iii) command injection; and (iv)
denial-of-service (DoS).
firmware revisions. The points scan allows the attacker to build a memory map
of MODBUS coils, discrete inputs, holding registers and input registers.
process measurement beyond its normal range. The low frequency measurement
injection attack decreases the rate of change of a process measurement below
its normal range. A replayed measurement injection attack resends process
measurements that were previously sent from the server to a client.
MODBUS server. Finally, the change ASCII input delimiter attack changes
the delimiter used for MODBUS ASCII devices.
control system based on payload content. Note that each instance contains
a label identifying it as normal MODBUS traffic or as attack traffic with the
designated attack class.
Four data sets were created as part of this research. Table 2 provides the
descriptions of the four data sets. Data Set I contains transactions from the gas
pipeline system. Data Set II contains transactions from the water storage tank
system. The two data sets were generated from network flow records captured
with a serial port data logger.
Two reduced size data sets were also created. Data Set III is a gas pipeline
system data set, which was created by randomly selecting 10% of the instances
in Data Set I. Likewise, Data Set IV is a water storage tank system data set,
which was created by randomly selecting 10% of the instances in Data Set II.
The two reduced data sets minimize memory requirements and processing time
when validating classification algorithms. They are intended for applications
for which quick feedback is desired.
Two categories of features are present in the data sets: network traffic fea-
tures and payload content features. Network traffic features describe the com-
munications patterns in SCADA systems. Compared with traditional enterprise
networks, SCADA network topologies and services are relatively static. Note
that some attacks against SCADA systems may change network communica-
tions patterns. As such, network traffic features are used to describe normal
traffic patterns in order to detect malicious activity. Network traffic features
include the device address, function code, length of packet, packet error check-
ing information and time intervals between packets. Payload content features
describe the current state of the SCADA system; they are useful for detect-
ing attacks that cause devices (e.g., PLCs) to behave abnormally. Payload
content features include sensor measurements, supervisory control inputs and
distributed control states.
Attribute Description
command address Device ID in command packet
response address Device ID in response packet
command memory Memory start position in command packet
response memory Memory start position in response packet
command memory count Number of memory bytes for R/W command
response memory count Number of memory bytes for R/W response
command length Total length of command packet
response length Total length of response packet
time Time interval between two packets
crc rate CRC error rate
systems are configured so that all the slave devices (servers) see all the master
transactions. Each slave must check the device address to discern the intended
recipient before acting on a packet. Based on the system configuration, the
set of device addresses that a slave device should encounter is fixed; device
addresses not specified in the configuration are anomalous.
The command memory, response memory, command memory count and re-
sponse memory count include internal memory addresses and field sizes for read
and write commands. The memory of a MODBUS server is grouped into data
blocks called coils, discrete inputs, holding registers and input registers. Coils
and discrete inputs represent a single, read-only Boolean bit with authorized
values of 0x00 and 0xFF. Holding and input registers are 16-bit words; holding
registers are read/write capable while input registers are read only. Each data
block may have its own set of contiguous address space or the data blocks may
share a common memory address space based on vendor implementation. The
command memory and response memory features are coil or register read/write
start addresses taken from command and response packets, respectively. The
command and response memory count features are the numbers of objects to
be read and written, respectively.
The command and response packet length features provide the lengths of
the MODBUS query and response frames, respectively. The MODBUS protocol
data unit (PDU) is limited to 253 bytes with an additional three bytes for device
ID and CRC fields, resulting in a 256-byte packet. In the gas pipeline and water
storage tank systems, the master repeatedly performs a block write to a fixed
memory address followed by a block read from a fixed memory address. The
read and write commands have fixed lengths for each system, and the read and
write responses have fixed lengths for each system. Note, however, that many
of the described attacks have different packet lengths. As such, the packet
length feature provides a means to detect many attacks.
The time interval attribute is a measurement of the time between a MOD-
BUS query and its response. The MODBUS protocol is a request-response
protocol and the time interval varies only slightly during normal operations.
74 CRITICAL INFRASTRUCTURE PROTECTION VIII
The malicious command injection, malicious response injection and DOS at-
tacks often result in significantly different time interval measurements due to
the nature of the attacks.
The last attribute is the command/response CRC error rate. This attribute
measures the rates of CRC errors identified in command and response packets.
Because SCADA network traffic patterns are relatively static, the normal com-
mand and response CRC error rates are expected to stay somewhat constant.
In a normal system, the error rates should be low; however, the rates are ex-
pected to increase when a system is subjected to a denial-of-service attack such
as the invalid CRC attack.
order to mask the actual compressor/pump working state. Note that, in the
manual mode, the compressor/pump state is controlled by the manual com-
pressor/pump setting value. A malicious state command injection attack may
change the compressor/pump mode continually or intermittently.
Table 5 shows the eight attributes that are specific to the gas pipeline system.
The initial attribute identifies the setpoint for the nominal gas pressure. The
second attribute identifies the operating mode of the system. In the automatic
mode, the PLC logic attempts to maintain the gas pressure in the pipeline
using a PID control scheme by selecting if the compressor or the relief valve
is activated. If the control scheme is zero, then the compressor is activated to
increase pressure; if the control scheme is one, then the relief valve is activated
using a solenoid to decrease the pressure. In the manual mode, the operator
controls the pressure by sending commands to start the compressor or open the
relief valve. Additionally, there are five attributes related to the PID controller.
The gain, reset, dead band, rate and cycle time impact PID controller behavior
and should be fixed during system operation. A malicious parameter command
injection attack tries to modify these parameters to interrupt normal control
operations.
Table 6 shows the four attributes that are specific to the water storage tank
system: HH, H, L and LL. In the automatic mode, the PLC logic maintains the
water level between the L and H setpoints using an on/off controller scheme.
When the sensors detect that the water level has reached the L level, the PLC
logic turns the water pump on. Alternatively, when the sensors determine that
76 CRITICAL INFRASTRUCTURE PROTECTION VIII
the water level has reached the H level, the PLC logic turns the water pump
off. Note that the water storage tank includes a manual drainage valve that
allows water to drain out of the tank when the valve is open. If the manual
drainage valve is open, the water level in the tank oscillates between the H and
L setpoints continuously as the pump cycles on and off to compensate. When
the manual drainage valve is closed, the pump stays on until the water level
reaches the H setpoint, at which point it turns off and maintains a constant
level. Due to a system fault, if the water level rises to the HH setpoint or falls
to the LL setpoint, then an alarm is triggered at the human machine interface
that monitors the water storage tank. In the manual mode, the pump state
is controlled manually by the human machine interface (i.e., an operator can
manually activate and deactivate the pump).
Table 7 lists the eight possible label values. Recall that each data set instance
is labeled as normal or according to its attack class. The labeling scheme was
chosen to match the KDD Cup 1999 Data Set [6], which identified attacks
by class. Note that specific attacks in each attack class have similar exploit
methods and similar impact on the SCADA system.
5.4 Discussion
The data sets described in this paper are relevant to other SCADA systems –
systems that use protocols other than MODBUS as well as systems other than
gas pipelines and water storage tanks. The features in the data sets are divided
into two groups in a similar manner as SCADA protocols divide packets into
network traffic related fields and content fields. Indeed, other protocols include
similar, albeit not identical, network traffic information such as addresses, func-
tion codes, payloads and checksums. Additionally, most SCADA protocols tend
to adhere to query-response traffic patterns similar to MODBUS. The content
features in the data sets include remote commands and system states similar to
how other types of systems monitor and update system settings. As such, the
data sets provide a framework to measure the accuracy of intrusion detection
approaches designed for a variety of SCADA systems.
Morris & Gao 77
6. Conclusions
Researchers have developed numerous intrusion detection approaches for de-
tecting attacks against SCADA systems. To date, researchers have generally
engaged unique threat models and the associated network traffic data sets to
train and validate their intrusion detection systems. This leads to a situation
where researchers cannot independently verify the results of other research ef-
forts, cannot compare the effectiveness of intrusion detection systems against
each other and ultimately cannot adequately judge the quality of intrusion
detection systems.
The four data sets developed in this research include network traffic, process
control and process measurement features from two laboratory-scale SCADA
systems. Data Set I contains transactions from a gas pipeline system while
Data Set II contains transactions from a water storage tank system. The data
sets were generated from network flow records captured with a serial port data
logger in a laboratory environment. A set of 28 attacks was used to create
the data sets; the attacks were grouped into four categories: reconnaissance,
response injection, command injection and denial-of-service attacks. Reduced
size data sets corresponding to Data Sets I and II were also created. Data Set
III is a gas pipeline system data set containing 10% of the instances in Data
Set I while Data Set IV is a water storage tank system data set containing
10% of the instances in Data Set II. The four data sets comprising normal and
attack traffic can be used by security researchers to compare different SCADA
intrusion detection approaches and implementations.
References
[6] S. Hettich and S. Bay, The UCI KDD Archive, Department of Information
and Computer Science, University of California at Irvine, Irvine, California
(kdd.ics.uci.edu), 1999.
[7] O. Linda, M. Manic and M. McQueen, Improving control system cyber-
state awareness using known secure sensor measurements, Proceedings of
the Seventh International Conference on Critical Information Infrastruc-
tures Security, pp. 46–58, 2012.
[8] O. Linda, T. Vollmer and M. Manic, Neural network based intrusion de-
tection system for critical infrastructures, Proceedings of the International
Joint Conference on Neural Networks, pp. 1827–1834, 2009.
[9] T. Morris, A. Srivastava, B. Reaves, W. Gao, K. Pavurapu and R. Reddi,
A control system testbed to validate critical infrastructure protection con-
cepts, International Journal of Critical Infrastructure Protection, vol. 4(2),
pp. 88–103, 2011.
[10] P. Oman and M. Phillips, Intrusion detection and event monitoring in
SCADA networks, in Critical Infrastructure Protection, E. Goetz and S.
Shenoi (Eds.), Springer, Boston, Massachusetts, pp. 161–173, 2008.
[11] K. Poulsen, Slammer worm crashed Ohio nuke plant network, Security-
Focus, Symantec, Mountain View, California (www.securityfocus.com/
news/6767), August 19, 2003.
[12] J. Rrushi and K. Kang, Detecting anomalies in process control networks,
in Critical Infrastructure Protection III, C. Palmer and S. Shenoi (Eds.),
Springer, Heidelberg, Germany, pp. 151–165, 2009.
[13] J. Slay and M. Miller, Lessons learned from the Maroochy water breach, in
Critical Infrastructure Protection, E. Goetz and S. Shenoi (Eds.), Springer,
Boston, Massachusetts, pp. 73–82, 2008.
[14] C. Ten, J. Hong and C. Liu, Anomaly detection for cybersecurity of sub-
stations, IEEE Transactions on Smart Grid, vol. 2(4), pp. 865–873, 2011.
[15] A. Valdes and S. Cheung, Communication pattern anomaly detection in
process control systems, Proceedings of the IEEE Conference on Technolo-
gies for Homeland Security, pp. 22–29, 2009.
[16] D. Yang, A. Usynin and J. Hines, Anomaly-based intrusion detection for
SCADA systems, presented at the IAEA Technical Meeting on Cyber Secu-
rity of Nuclear Power Plant Instrumentation and Control and Information
Systems, 2006.
[17] Y. Zhang, L. Wang, W. Sun, R. Green and M. Alam, Distributed intrusion
detection system in a multi-layer network architecture of smart grids, IEEE
Transactions on Smart Grid, vol. 2(4), pp. 796–808, 2011.
Chapter 6
Haihui Gao, Yong Peng, Zhonghua Dai, Ting Wang, Xuefeng Han,
and Hanjing Li
1. Introduction
Industrial control systems (ICSs) monitor and control processes in critical
infrastructure assets [12]. Due to their increased connectivity with corporate
networks and the Internet, industrial control systems are no longer immune to
cyber attacks. Indeed, in 2010, the Stuxnet worm demonstrated to the world
the seriousness of industrial control system vulnerabilities and the potential
threats [9].
In order to protect industrial control systems, it is important to conduct
cyber security research and testing to identify and mitigate existing vulnerabil-
ities [1, 7]. However, testing and evaluation of actual industrial control systems
are difficult to perform due to the uptime requirements and the risk of damage
to operational systems. Therefore, it is necessary to build suitable experimen-
tal platforms to develop and test cyber security solutions for industrial control
systems [9].
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 79–91, 2014.
IFIP International Federation for Information Processing 2014
80 CRITICAL INFRASTRUCTURE PROTECTION VIII
Internet
Level 3:
Corporate IT Network Corporate Network
Firewall
Engineering
HMI Historian
Workstation
Level 2:
Supervisory
Control LAN
Level 1:
Control Network
PLC PLC
The emulation, physical devices and simulation for industrial control sys-
tems (EPS-ICS) testbed presented in this paper seeks to address this problem.
The testbed provides configurable fidelity using physical devices for core sys-
tem components, while emulating or simulating the other components. The
proposed solution is an inexpensive, albeit useful, approximation of an indus-
trial control system environment. Indeed, the EPS-ICS testbed strikes the right
balance between research requirements and construction costs.
2. Architecture
Figure 1 shows an example industrial control system reference model that
conforms to the ANSI/ISA-99 standard [5]. The architecture is segmented into
Gao, et al. 81
four levels: (i) corporate network; (ii) supervisory control local area network
(LAN); (iii) control network; and (iv) input/output (I/O) network.
In the ANSI/ISA-99 standard, the corporate network level (Level 3) is re-
sponsible for management and related activities (e.g., production scheduling,
operations management and financial transactions) [11]. This level is con-
sistent with traditional information technology, including the general deploy-
ment of services and systems such as FTP, websites, mail servers, enterprise
resource planning (ERP) systems and office automation systems. The super-
visory control LAN level (Level 2) includes the functions involved in monitor-
ing and controlling physical processes and the general deployment of systems
such as human-machine interfaces (HMIs), engineering workstations and his-
torians. The control network level (Level 1) includes the functions involved
in sensing and manipulating physical processes. Typical devices at this level
are programmable logic controllers (PLCs), distributed control systems, safety
instrumented systems and remote terminal units (RTUs). The I/O network
level (Level 0) includes the actual physical processes and sensors and actuators
that are directly connected to process equipment.
3. Testbed Construction
Industrial control system testbeds may be categorized as:
A replicated testbed is a copy of a real system with the same physical de-
vices and information systems. An example is the National SCADA Testbed
(NSTB) of the U.S. Department of Energy [8]. Although a replicated architec-
ture provides the highest fidelity, building an identical replica of a real-world
system is usually cost prohibitive.
A software testbed uses modeling methodologies instead of actual physical
devices; it typically includes a physical process simulator, network simulator
and attack simulator. Such a testbed is a low cost solution for research fo-
cused on attacks on industrial control systems and the development of security
strategies. However, due to the absence of real components and devices, the
architecture provides low fidelity.
A hybrid testbed incorporates replicated devices and systems as well as
software models. The architecture provides a high degree of fidelity and is
also cost effective. The EPS-ICS testbed described in this paper is based on a
hybrid architecture.
82 CRITICAL INFRASTRUCTURE PROTECTION VIII
EMULATED
(Network Testbed)
Level 3 and
Level 2
INTERFACE
PHYSICAL DEVICES
(PLCs and RTUs) Level 1
INTERFACE
External
Network
Corporate
LAN
Hardware
u External Controller y
PLCs, RTUs and DCS Controllers
Internet Zone:
External Security Test
219.142.X.X
DMZ
(Web and DNS)
EMULATED
(Network Testbed)
10.2.X.X
10.218.X.X Test
Corporate Zone Instrument
Internal Security Test
192.168.X.X
PHYSICAL DEVICES
(PLCs and RTUs)
Motor 1
Motor 2
Oil-fired boiler Water
Oil inlet tank SIMULATED
Fuel valve (Matlab/Simulink)
tank
Blower
simulated devices; special hardware between the devices enables data exchange.
Peripheral component interconnect (PCI) modules support full communications
between Matlab/Simulink models and external controllers.
Furnace Model. The mass balance, energy balance and furnace radiation
heat transfer equations are:
d
ṁf + ṁa − ṁg = Vf (ρg ) (1)
dt
d
m˙ f Qf + ṁa ha − ṁg hg − Qrht = Vf (ρg hg ) (2)
dt
Boiler Drum Model. The upper section of the boiler drum contains steam
and the bottom section contains water. The liquid zone mass conservation, va-
por zone mass conservation, drum energy balance and drum liquid level equa-
tions are:
d
ṁw + (1 − x)ṁrc − ṁdcin − ṁpw − ṁec = (ρw Vdw ) (4)
dt
d
ṁrc x − ṁs + ṁec = (ρs Vds ) (5)
dt
86 CRITICAL INFRASTRUCTURE PROTECTION VIII
1 2 1
Vdw = πLv (3r − Lv ) + (L − 2r)r2 (θ − sin θ) (7)
3 2
r−L
θ = 2cos−1 ( ) (8)
r
where ṁw is the feed water flow from the economizer, x is the steam dryness,
ṁrc is the steam-water flow in the riser, ṁdcin is the downcomer inlet flow,
ṁpw is the blow-down flow, ṁec is the dynamic evaporation flow, ρw is the
saturated water density, Vdw is the drum liquid zone volume, ṁs is the steam
discharge capacity, ρs is the steam density, Vds is the drum vapor zone volume,
hw is the feed water enthalpy from the economizer, hs is the steam enthalpy,
Mdm is the drum metal quality, Cdm is the drum metal specific heat capacity,
Td is the drum temperature, J is the unit conversion factor, Vd is the drum
volume, Pd is the drum pressure, L is the drum length, r is the drum radius
and Lv = f −1 (Vdw ) is the drum liquid level.
Riser Model. The riser contains both liquid and vapor. The liquid zone
mass conservation, vapor zone mass conservation, energy balance, metal energy
balance, average ratio of vapor per cross-sectional area in the vapor zone, liquid
zone accounted for in the riser length ratio, and steam volume in the riser
equations are:
d
ṁdcout − (1 − x)ṁrc − ṁevp − ṁecl = w
(ρw Vrc ) (9)
dt
d
ṁecl + ṁevp − xṁrc = s
(ρs Vrc ) (10)
dt
d
Qrht − Qrc = Mmrc Cmrc (Tmrc ) (12)
dt
Gao, et al. 87
k
ϕrc = (13)
1+ ρs 1
ρw ( xs − 1)
s
Vrc = Vrc (1 − γrc )ϕrc (15)
where ṁdcout is the downcomer outlet water flow, ṁevp is the evaporation
w
generated by heat absorption, ṁecl is the dynamic evaporation flow, Vrc is the
water volume in the riser, Qrc is the medium heat in the riser, hdcout is the
s
medium enthalpy at the downcomer outlet, Vrc is the steam volume in the riser,
Vrc is the riser volume, Mmrc is the riser metal mass, Cmrc is the riser metal
specific heat capacity, Tmrc is the riser metal temperature, ϕrc is the liquid
zone steam section ratio in the ascending pipe and xs is the average steam
dryness in the vapor zone.
Downcomer Model. The downcomer preheats the water supply and re-
turns cool water to the bottom of the drum. The energy balance equation
is:
d
ṁdcin hdcin − ṁdcout hdcout = (ρw Vdc hdcout + Mmdc Cmdc Tmdc ) (16)
dt
where ṁdcin is the inlet water flow, ṁdcout is the outlet water flow, hdcin is
the inlet water enthalpy, hdcout is the outlet water enthalpy, Vdc is the volume,
Mmdc is the metal quality, Cmdc is the metal specific heat capacity and Tmdc
is the metal temperature at the downcomer.
Economizer
m dcin Boiler Drum
1 ma 3 mw 4 m dcin
mw Dw
Furnace mg h wout mw
T Ds
ma 1 w
mg Tw hw
1 m pw hwbh
1 h w T gout 7 mpw
2 m f Q rht 0 hs
mf hw Tgin ms
8 ha Tg Ts
9 mrc
hdcin mrc Ps 1
5 ms x
ms Tgout Ps
m gsh
ma
h Tgsh h dcout
m gah sout
h ssh T s 2 mdcin
6 Tain Ta 3
Ts Dw
Tain Ta Tg
hain Tg x Ds
Superheater
Tgin m dcin h wbh
Air Preheater hs
h dcin
h dcout Ts
12 Dw Q rht
h ain Downcomer Riser
temperature, hsin is the inlet steam enthalpy, hsout is the outlet steam enthalpy
and h̄s is the inlet and outlet average steam enthalpy.
Figure 6 shows the oil-fired boiler simulation model implemented using Mat-
lab/Simulink. The boiler simulation model interacts with external controllers.
The resulting EPS-ICS testbed can be used for cyber security research and
testing.
Man-in-the-
Middle Testing
The first step in the assessment is system discovery, which identifies infor-
mation assets, operating system types, service ports and running applications.
The second step involves a man-in-the-middle attack that tampers with the data
transmitted between host computers and end devices. The third step involves
a denial-of-service (SYN flood) attack that consumes end device resources. The
fourth step involves a replay attack that bypasses password protection, uploads
and modifies an end device program and disrupts system execution. Figure 8
shows the results of the assessment.
90 CRITICAL INFRASTRUCTURE PROTECTION VIII
Boiler status
Motor 1 OFF
Motor 2 OFF
Blower OFF
6. Conclusions
The EPS-ICS testbed presented in this paper is designed specifically for in-
dustrial control system security research and testing. It seamlessly integrates
emulation, physical device and simulation technologies to strike the right bal-
ance between fidelity and construction costs. The industrial boiler control
system case study demonstrates the application and utility of the EPS-ICS
testbed for industrial control system evaluation and certification. Future re-
search will focus on the continued refinement of the EPS-ICS testbed, which
will involve developing new monitoring and analysis techniques, expanding the
applicability of the testbed and constructing a complementary cyber-physical
testbed.
References
[1] M. Brandle and M. Naedele, Security for process control systems: An
overview, IEEE Security and Privacy, vol. 6(6), pp. 24–29, 2008.
[2] Flux Research Group, Emulab, Total Network Testbed, School of Com-
puting, University of Utah, Salt Lake City, Utah (www.emulab.net), 2014.
[3] M. Flynn and M. 0’Malley, A drum boiler model for long term power
system dynamic simulation, IEEE Transactions on Power Systems, vol.
14(1), pp. 209–217, 1999.
[4] H. Gan, J. Zhang and H. Zeng, Development of main boiler simulation sys-
tem for LNG ship, International Journal of Advancements in Computing
Technology, vol. 4(17), pp. 466–475, 2012.
Gao, et al. 91
INFRASTRUCTURE SECURITY
Chapter 7
1. Introduction
The pervasive growth of network technology has led to the integration of
telecommunications technologies and physical processes to create cyber-physical
systems. Cardenas, et al. [3] define a cyber-physical system as integrating
computing, communications and storage capabilities with monitoring and/or
control of entities in the physical world, which is done in a dependable, safe, se-
cure and efficient manner under real-time constraints. A cyber-physical system
is characterized by the tight connection and coordination between cyber and
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 95–109, 2014.
IFIP International Federation for Information Processing 2014
96 CRITICAL INFRASTRUCTURE PROTECTION VIII
2. Evidence Theory
Evidence theory is a mathematical formalism for handling uncertainty by
combining evidence from different sources to converge to an accepted belief [9].
The basic concept is to reduce uncertainty in order to identify the set that
contains the correct answer to a question.
Dempster’s Rule. Dempster’s rule of combination [4], which was the first
to be formalized, is a purely conjunctive operation. This rule strongly empha-
sizes the agreement between multiple sources and ignores conflicting evidence
through a normalization factor:
Note that Dempster’s rule assigns a null mass to the empty set, which has
certain limitations when the conflict value is very high.
Smets’ Rule. Smets’ rule of combination [12] provides the ability to explic-
itly express contradictions in the transferable belief model by letting m(∅) = 0.
Smet’s rule, unlike Dempster’s rule, avoids normalization while preserving com-
mutativity and associativity. The rule is formalized as follows:
where
mi (γa ) ⊗ mj (γa ) = mi (γb )mj (γc ) ∀γa ∈ Γ(Ω). (5)
γb ∩γc =γa
98 CRITICAL INFRASTRUCTURE PROTECTION VIII
The inequality m(∅) > 0 can be explained in two ways. The first is the open
world assumption of Dempster [4], which expresses the idea that the frame
of discernment must contain the true value. Necessarily, if the open world
assumption is true, then the set of hypotheses must contain all the possibilities.
Under this interpretation, if ∅ is the complement of Ω, then mass m(∅) > 0
represents the case where the truth is not contained in Ω.
The second interpretation of m(∅) > 0 is that there is some underlying
conflict between sources. Hence, the mass m(∅) represents the degree of conflict.
In particular, the mass m(∅) is computed as:
mi (∅) ⊗ mj (∅) = 1 − (mi (γb ) ⊗ mj (γc )) . (6)
γb ∩γc =∅
in the network and a physical layer sensor is used to indicate whether a piece
of equipment (e.g., circuit breaker) is working or not.
In order to apply evidence theory to determine the cause of a malfunction,
it is necessary to define the appropriate frame of discernment Ω. In the the
example under consideration, there are three hypotheses: normal behavior (N ),
physical fault (P ) and cyber threat (C). The system has normal behavior when
the breaker is working and the network packets conform to the operational
timing and volume constraints. A physical fault exists when the sensors detect
a breaker fault. A cyber threat exists when there is excess or low packet
volume. As shown in Figure 2, in the classical evidence theory framework, the
hypotheses are mutually exclusive with empty intersections.
A plausible scenario is simulated using the specified architecture and pa-
rameters. The scenario involves an attacker who compromises the operation
of a piece of equipment (circuit breaker) via a telecommunication attacks (dis-
tributed denial-of-service attack). A simulation, which has a duration of 100
seconds, is divided into four different situations:
Table 1 summarizes the simulation events, with a focus on the time and
information sources.
The goal is to fuse all the data provided by the sensors during a simula-
tion in order to detect a cyber-physical attack. As such, the relative frame of
discernment Ω according to the classical evidence theory is:
Ω = {C, P, N } . (8)
Each sensor has to distribute a unitary mass over specific focal sets during
a simulation. Using a combination rule, a fusion result can then be obtained.
Santini, Foglietta & Panzieri 101
Specifically, the focal sets for the cyber sensor are {C, N, P ∪ N, Ω}. Note
that a cyber security expert could identify a cyber anomaly, but is unlikely to
discern a physical anomaly. Similarly, the focal sets for the physical sensor are
{P, N, C ∪ N, Ω}.
Santini, et al. [8] have used the PCR-6 rule to develop metrics for identify-
ing the effects of cyber attacks that are designed to inflict physical damage. A
cyber-physical fault is detected in the presence of mutually exclusive hypothe-
ses by noticing the existence of non-zero similar masses in the cyber cause set
and the physical cause set. Such problems are primarily related to the BPA as-
signments for the sources, which are application dependent. Another problem
relates to the interpretation of conflict values that is done in an ad hoc man-
ner. The following exponential function (depending on the number of captured
packets) is used as the BPA assignment to set the mass of {C}:
e−(a·p)/x (10)
where a and p are positive tuning parameters and x is the number of packets.
Equation (10) is used in the same manner to express the mass of {P } after a
physical fault, where x is the persistence of the fault.
When two information sources that have high conflict exist in the cyber and
physical realms, the rough values obtained after fusion using the PCR-6 rule
are unsuitable. The solution proposed in [8] is to evaluate at each fusion step
the conflict value of the mass distribution over Ω using Smet’s rule and compare
it with the sum of the two masses in {C} and {P }. The cyber-physical alarm
triggering equation is given by:
max {mPCR−6 (γa )} ∀γa ∈ Ω, if mSmets ({∅}) ≤ ρ
(11)
mPCR−6 ({C}) + mPCR−6 ({P }) ≥ mSmets ({∅}), if mSmets ({∅}) ≥ ρ
0.9 m(C)
m(P)
0.8 m(N)
0.7
BPA values
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Time(s)
the main problem is the intersection operator. In fact, after defining the
frame of discernment Ω, it is necessary to define a special power set called
the hyper power set DΩ . The cardinality of DΩ due to the intersection oper-
ator follows the Dedekind number sequence: 1, 2, 5, 19, 167, 7580, 7828353,
56130437228687557907787... [11, 13]. Note that only cases up to n < 7 are
tractable with current computing technology. This paper resolves the prob-
lem by using a hybrid knowledge model based on classical evidence theory and
Dezert-Smarandache theory, which is described in the following section.
Ω = {C , P , N, C ∩ P } (12)
Γ(Ω ) = {∅, C , P , N, C ∩ P, C ∪ P ,
C ∪ N, C ∪ (C ∩ P ), P ∪ N, P ∪ (C ∩ P ),
N ∪ (C ∩ P ), C ∪ P ∪ N, C ∪ P ∪ (C ∩ P ), (13)
C ∪ N ∪ (C ∩ P ), P ∪ N ∪ (C ∩ P ), Ω }
Table 2. BPA assignment for cyber sensor with the new frame (a = 5, p = 2).
Considering the results obtained in the case study above and the results
obtained using the approach presented in [8], we selected the function defined
in Equation (10) for the BPA assignment. The BPA values for the cyber sensor
and physical sensor are summarized in Tables 2 and 3, respectively. Note that
the only difference is related to the BPA assignment of the focal sets:
104 CRITICAL INFRASTRUCTURE PROTECTION VIII
Table 3. BPA assignment for physical sensor with the new frame (a = 5, p = 2).
m(N ) has the same value because its intersection with the new set is
empty and {N } ∩ {C ∩ P } = ∅.
m(C) is divided into the sets {C } and {C ∩ P } belonging to Ω , as
reported in Table 2.
m(P ) is divided between m({P }) and to m({C ∩ P }) of Ω , as reported
in Table 3.
m({P ∪ N }) is now assigned to m({P ∪ N ∪ (C ∩ P )}) and m({C ∪ N })
to m({C ∪ N ∪ (C ∩ P )}), as reported in Tables 2 and 3.
0.9
0.8
0.7
BPA values
0.6
0.5
0.4
0.3
0.2
Conflict among sources
0.1 m(C)+m(P)
0
0 10 20 30 40 50 60 70 80 90 100
Time(s)
0.9
0.8
0.7
BPA values
0.6
0.5
0.4
0.3
0.2
Conflict among sources
0.1
m(P−C) + m(C−P) + m(C∩P))
0
0 10 20 30 40 50 60 70 80 90 100
Time(s)
(b) Conflict and sum of m(P − C), m(C − P ) and the intersection m(C ∩ P ) in Ω .
Figure 5. BPA trends in the power set Ω and hybrid power set Ω .
Using the new frame of discernment and the PCR-6 rule, an operator is able
to recognize, with the help of the fusion algorithm, a cyber-physical anomaly
represented by C ∩ P . With the hybrid frame of discernment, the results can
be analyzed using a classical metric (see Equation (11)). Note that throughout
the simulation there is one element of the power set with the highest value. As
such, an operator does not need any other metrics to trigger a particular event
(i.e., cyber-physical anomaly).
For the other elements of the power set Γ(Ω ), the sets represented in Figure 8
are the only ones with non-zero masses. The values m(C ∪ N ∪ (C ∩ P ))
(triangle-marked line) and m(P ∪ N ∪ (C ∩ P )) (dotted line) are the same.
106 CRITICAL INFRASTRUCTURE PROTECTION VIII
1
m(C−P)
0.9
m(P−C)
0.8 m(N)
m(C∩P)
0.7
BPA values
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Time(s)
Figure 6. Results using Dempster’s rule for the new frame of discernment Ω .
1
m(C−P)
m(P−C)
m(N)
0.8
m(C∩ P)
BPA values
0.6
0.4
0.2
0
0 20 40 60 80 100
Time(s)
Figure 7. Results using the PCR-6 rule for the frame of discernment Ω .
Table 4. Computational times for the power sets Γ(Ω) and Γ(Ω ).
Table 4 shows the computational times of the fusion script for the two frames
of discernment Ω and Ω . The script, which was written in Matlab [6], was
tested on a laptop with a 2.6 GHz quad-core Intel Core i7 processor and 8 GB
Santini, Foglietta & Panzieri 107
0.1
m(N) ∪ m(C∩P)
0.09 m(C’) ∪ m(N) ∪ m(C∩P)
m(P’) ∪ m(N) ∪ m(C∩P)
0.08 m(Ω)
0.07
BPA values
0.06
0.05
0.04
0.03
0.02
0.01
0
0 10 20 30 40 50 60 70 80 90 100
Time(s)
Figure 8. Results using the PCR-6 rule for the remaining meaningful elements of Ω .
RAM. The script was executed 100 times. Table 4 reports the means and the
variances. The frame of discernment with fewer elements (i.e., Ω) requires less
time on the average than Ω , but the time required has greater variance. Note
that the performance would improve if a non-interpreted programming language
such as Java or C++ were to be used. Nevertheless, the results are encouraging
with regard to the application of evidence theory in real-time environments.
6. Conclusions
The application of evidence theory to diagnose faults in a cyber-physical
system is an important topic in critical infrastructure protection. In certain
situations, such as when cyber and physical faults are both present, the clas-
sical Dempster-Shafer evidence theory is somewhat restrictive. Therefore, it is
necessary to redefine the frame of discernment to better represent the knowl-
edge model due to non-empty intersections between hypotheses. The Dezert-
Smarandache model explicitly considers the intersection, but it has a high com-
putational overhead due to the cardinality of the hyper power set. The solution,
as presented in this paper, is to use a hybrid knowledge model where the in-
tersection is included in the frame of discernment. The results obtained are
encouraging. The conflict value is lower and the situation is described by the
singleton set {C ∩ P } as having the highest value among the elements of the
hybrid power set during a cyber-physical anomaly.
Our research is currently focusing on generalizing evidence theory using dif-
ferent BPA values. An issue requiring further research is defining BPAs for
different cyber attacks that seek to inflict physical damage. Another problem
is to manage conflicts and understand the source of inconsistent results. Addi-
108 CRITICAL INFRASTRUCTURE PROTECTION VIII
Acknowledgement
This research was partially supported by the 7th Framework Programme of
the European Union STREP Project under Grant Agreement 285647 (COCK-
PITCI – Cybersecurity of SCADA: Risk Prediction, Analysis and Reaction
Tools for Critical Infrastructures (www.cockpitci.eu).
References
[1] O. Basir and X. Yuan, Engine fault diagnosis based on multi-sensor infor-
mation fusion using Dempster-Shafer evidence theory, Information Fusion,
vol. 8(4), pp. 379–386, 2007.
[2] M. Burmester, E. Magkos and V. Chrissikopoulos, Modeling security in
cyber-physical systems, International Journal of Critical Infrastructure
Protection, vol. 5(3-4), pp. 118–126, 2012.
[3] A. Cardenas, S. Amin and S. Sastry, Secure control: Towards survivable
cyber-physical systems, Proceedings of the Twenty-Eighth International
Conference on Distributed Computing Systems Workshops, pp. 495–500,
2008.
[4] A. Dempster, Upper and lower probabilities induced by a multivalued map-
ping, in Classic Works of the Dempster-Shafer Theory of Belief Functions,
R. Yager and L. Liu (Eds.), Springer, Berlin-Heidelberg, Germany, pp. 57–
72, 2008.
[5] C. Krishna and I. Koren, Adaptive fault-tolerance for cyber-physical sys-
tems, Proceedings of the International Conference on Computing, Network-
ing and Communications, pp. 310–314, 2013.
[6] MathWorks, MATLAB version 8.0.0, Natick, Massachusetts (www.math
works.com/products/matlab), 2014.
[7] R. Poovendran, Cyber-physical systems: Close encounters between two
parallel worlds, Proceedings of the IEEE, vol. 98(8), pp. 1363–1366, 2010.
[8] R. Santini, C. Foglietta and S. Panzieri, Evidence theory for smart grid di-
agnostics, Proceedings of the Fourth IEEE/PES Conference on Innovative
Smart Grid Technologies Europe, 2013.
[9] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press,
Princeton, New Jersey, 1976.
[10] C. Siaterlis and B. Genge, Theory of evidence-based automated decision
making in cyber-physical systems, Proceedings of the IEEE International
Conference on Smart Measurements for Future Grids, pp. 107–112, 2011.
[11] F. Smarandache and J. Dezert (Eds.), Advances and Applications of DSmT
for Information Fusion (Collected Works), American Research Press, Re-
hoboth, New Mexico, 2004.
Santini, Foglietta & Panzieri 109
[12] P. Smets and R. Kennes, The transferable belief model, Artificial Intelli-
gence, vol. 66(2), pp. 191–234, 1994.
[13] D. Wiedemann, A computation of the eighth Dedekin number, Order, vol.
8(1), pp. 5–6, 1991.
Chapter 8
Abstract Visa, border entry and security clearance interviews are critical home-
land security activities that provide access privileges to the geographical
United States or to classified information. The person conducting such
an interview may not be an expert in the subject area or could be de-
ceived by a manipulative interviewee, resulting in negative security con-
sequences. This paper demonstrates how an interactive voice response
system can be used to generate context-sensitive, yet randomized, di-
alogs that provide confidence in the trustworthiness of an interviewee
based on his/her ability to answer questions. The system uses contex-
tual reasoning and ontological inference to derive new facts dynamically.
Item response theory is employed to create relevant questions based on
social, environmental, relational and historical attributes related to in-
terviewees who seek access to controlled areas or sensitive information.
1. Introduction
Security mechanisms such as guarded gates, border control points and visa
issuance counters are implemented to allow access to individuals upon proper
authentication and authorization. Legitimacy is usually determined by rules,
regulations and/or policies applied by entry control personnel who attempt
to ensure that the entry requirements are enforced. Correctly identifying a
person may require an examination of an electronic passport, identity card and
paper documents in addition to asking the person questions about information
contained in the documents.
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 111–125, 2014.
IFIP International Federation for Information Processing 2014
112 CRITICAL INFRASTRUCTURE PROTECTION VIII
2. Background
This section describes the theory and technology underlying the interactive
voice response system.
Ababneh, Athamnah, Wijesekera & Costa 113
(1 − c)
P =c+ (1)
1 + e−a(θ−b)
where e is the Euler number.
In order to determine the discrimination and difficulty parameters of a test
item, item response theory uses Bayesian estimation, maximum likelihood es-
timation (MLE) and other similar methods [7, 8]. To estimate the examinee’s
ability, item response theory utilizes an iterative maximum likelihood estima-
tion process involving an a priori value of the ability, item parameters and
114 CRITICAL INFRASTRUCTURE PROTECTION VIII
response vector:
N
−ai [ui − Pi (θ̂s )]
θ̂s+1 = θ̂s + i=1
N
(2)
i=1 a2i Pi (θ̂s )Qi (θ̂s )
where θ̂s is the estimated ability in iteration s; ai is the discrimination pa-
rameter of item i (i = 1, 2, · · · , N ); ui is the response of the examinee (one
for correct or zero for incorrect); Pi (θ̂s ) is the probability of correct response
according to Equation (1); and Qi (θ̂s ) is the probability of incorrect response.
3. System Architecture
Figure 2 shows the overall architecture of the ontology-based interactive
voice response system. Axiomatic and derived facts from the ontology are used
to create questions asked by the system. Given that a large number of facts
can be derived from a context-sensitive ontology, but only a few questions can
be asked during an interview, item response theory is used to select the facts
that are used to generate questions.
The item response theory module transforms a question into VoiceXML and
plays it to the user. The system then waits for the user’s response and the
system’s voice recognition software attempts to recognize the input and check
116 CRITICAL INFRASTRUCTURE PROTECTION VIII
the correctness of the answer. Based on the answer, the item response theory
estimation procedure either increases or decreases the a priori ability score.
The system uses item response theory to manage and control dialog ques-
tions generated from a large pool of ontologically-derived facts in a manner
that shortens the length of a dialog while maintaining the maximum accuracy
in estimating the user’s trustworthiness. When item response theory is not
employed, the dialogs tend to be very long or are randomly generated with the
possibility of repeated questions.
Another key characteristic of the system is its use of the OWL annotation
property to assign item response theory parameters to axioms. Annotations
were selected in order to keep the semantics of the original ontology and struc-
ture intact. Every asserted axiom in the ontology is annotated with three item
response theory parameters, namely discrimination (a), difficulty (b) and guess-
ing (c). Currently, it is assumed that all the asserted axioms have the same
default degree of difficulty and a discrimination value of one.
The most important characteristic of the system is that weights are assigned
to questions and their answers according to the lengths of the inference or ex-
planation paths. The lengths of the paths are then translated to item response
theory difficulty values. Table 1 shows the difficulty value assignment scheme
used by the system.
Higher values or weights are assigned according to the number of explanation
axioms used to infer a fact. Consequently, such questions are considered to be
more difficult than those generated from asserted facts. The item response
theory based solution algorithm uses the two-parameter model that relies on
the difficulty and discrimination parameters. Figure 3 shows the algorithm
used for ability estimation.
After every interactive iteration involving question generation and answer-
ing, the item response theory algorithm estimates the ability of the user before
selecting and asking the next question. When the ability estimation reaches a
predefined threshold, the system concludes the dialog and conveys the decision.
Consequently, the decision is based on the item response theory characteristics
of the axioms, not on the percentage of correctly-answered questions as in tra-
ditional testing.
Ababneh, Athamnah, Wijesekera & Costa 117
ther filters the triple (subject, property, object) that was previously asserted
or inferred by the reasoner. The next axiom/question are determined based on
a component of the triple.
The axiom parameters of the current question are examined while the voice
rendering loop is executing. Subsequently, a query is passed to the reasoner that
returns axioms based on the current context. The next question is generated
from the newly executed query result and the item response theory parameters
that satisfy the item selection criteria for ability estimation and identification.
In the case of historical contextual reasoning, the current context is expanded
by saving the user’s session questions and answers in an ontology. A reasoner
is executed over the closure of the session with the axioms in the item bank
ontology. As a result, questions can be asked in subsequent sessions that are
related to the questions posed in previous sessions. Selection strategies for
multiple sessions include:
Asking a question related to a question that a user answered incorrectly
in a previous session.
Asking a question requiring deeper knowledge than a correctly-answered
question in a previous session.
Asking a question about personal relationships related to a previous ses-
sion (e.g., about co-workers, family members or friends).
This capability provides benefits when evaluating an individual multiple
times or a group of related people. For example, related attributes are very
likely to be encountered during immigration and security clearance interviews.
They also provide the ability to detect abnormal changes in user behavior and
personality.
with Twitter. Information extracted from the informal and social knowledge
web service is used to populate the context ontology. Figure 5 shows a code
snippet and its results, which include the Twitter followers of a specific user
and the status of the user. The method twitter.getFollowersIDs returns
the IDs and the method twitter.updateStatus returns the status and loca-
tion. Axioms such as [User 0001 isFollowing 1162907539], [81382232 isLocate-
dIn “Fairfax, VA”] and [User 0004 isSameIndividualAs 383575552] are added
to the context ontology.
Leveraging the informal data clearly enhances the questions asked by the
system. Indeed, questions can be asked about facts collected from formal in-
terviews, official forms, previous sessions as well as social network attributes.
5. Implementation
This section describes the implementation and the performance characteris-
tics of contextual queries.
Every student has a random number of friends (one to twenty) who are
also students.
Ababneh, Athamnah, Wijesekera & Costa 123
Table 2 presents the sample statistics, including the numbers of axioms and
their sizes.
To test the performance of contextual reasoning queries, three contextual
SPARQL queries were created and executed on the data set. Additionally, two
baseline queries were developed for comparison.
Query18 returns all the axioms to provide a baseline for comparison with
Query15, Query16 and Query17.
Table 3 shows the SPARQL query execution results after each successful
execution. Note that Query18 100 is not included in the table because the
124 CRITICAL INFRASTRUCTURE PROTECTION VIII
6. Conclusions
The implementation of an interview system using ontologies, item response
theory and contextual reasoning is certainly feasible. The use of ontologies
and reasoning is critical to developing dialogs with items that are differentiated
quantitatively. Item response theory provides a means to quantitatively charac-
terize dialog items and to measure user trustworthiness and ability. Contextual
reasoning provides the means to select the most appropriate questions quanti-
tatively as well as questions that are semantically-relevant to the domain and
subject of focus. The enhancement of contextual reasoning with social media
information effectively supplements policy and ontology formal attributes with
static and dynamic social attributes.
The paper also demonstrates that social media information is very useful
for driving dialogs in interviews. However, the information was used without
any analysis. Sentiment analysis is a growing area of research that attempts
to predict trends in the inclinations or feelings of groups of people towards
life issues. By leveraging social media, it is possible to deduce an individual’s
sentiments about a variety of issues, especially those related to national security.
Our future research will focus on implementing a sentiment analysis module
that builds on the social attributes module. The use of social media raises some
legal concerns. At this time, we assume that individuals provide consent to use
public social media information; however, a comprehensive evaluation of the
Ababneh, Athamnah, Wijesekera & Costa 125
legal ramifications is required before social media can be used in real interviews
of individuals who seek access to controlled areas or sensitive information.
References
[1] M. Ababneh and D. Wijesekera, Dynamically generating policy compli-
ant dialogs for physical access control, Proceedings of the Conference on
Enterprise Information Systems, pp. 361–370, 2013.
[2] F. Baker, The Basics of Item Response Theory, ERIC Clearinghouse on
Assessment and Evaluation, University of Maryland, College Park, Mary-
land, 2001.
[3] B. Beamon and M. Kumar, HyCoRE: Towards a generalized hierarchical
hybrid context reasoning engine, Proceedings of the Eighth IEEE Interna-
tional Conference on Pervasive Computing and Communications Work-
shops, pp. 30–36, 2010.
[4] Information Systems Group, HermiT OWL Reasoner, Department of
Computer Science, University of Oxford, Oxford, United Kingdom (herm
it-reasoner.com).
[5] Semantic Web and Agent Technologies Laboratory, SWAT Projects –
The Leehigh University Benchmark (LUBM), Department of Computer
Science, Lehigh University, Bethlehem, Pennsylvania (swat.cse.lehigh.
edu/projects/lubm).
[6] Twitter4J, Twitter4J (twitter4j.org/en/index.html).
[7] H. Wainer, N. Dorans, R. Flaugher, B. Green and R. Mislevy, Computer-
ized Adaptive Testing: A Primer, Routledge, New York, 2014.
[8] D. Weiss and G. Kingsbury, Application of computerized adaptive testing
to educational problems, Journal of Educational Measurement, vol. 21(4),
pp. 361–375, 1984.
[9] World Wide Web Consortium, OWL 2 Web Ontology Language Primer,
Massachusetts Institute of Technology, Cambridge, Massachusetts (www.
w3.org/TR/owl2-primer), 2012.
Chapter 9
A SURVEY OF CRITICAL
INFRASTRUCTURE SECURITY
Abstract Traditionally, securing against environmental threats was the main focus
of critical infrastructure protection. However, the emergence of cyber
attacks has changed the focus – infrastructures are facing a different
danger that has life-threatening consequences and the risk of significant
economic losses. Clearly, conventional security techniques are struggling
to keep up with the volume of innovative and emerging attacks. Fresh
and adaptive infrastructure security solutions are required. This paper
discusses critical infrastructures and the digital threats they face, and
provides insights into current and future infrastructure security strate-
gies.
1. Introduction
The critical infrastructures work together to provide a continuous flow of
goods and services, which range from food and water distribution, power supply,
military defense and transport, to healthcare and government services, to name
but a few [32]. A failure in one infrastructure can directly impact multiple other
infrastructures. Beyond the traditional critical infrastructures, non-traditional
infrastructures have emerged; these include telephone systems, banking, elec-
tric power distribution and automated agriculture. A well-established critical
infrastructure network is considered to be the hallmark of an advanced society,
and nations are usually judged by the quality of their critical infrastructure
networks and the services they provide to citizenry [12]. However, critical in-
frastructures also represent one of the greatest weaknesses of modern society,
due to the fact that a disruption of a critical infrastructure can result in life-
threatening and general debilitating consequences to the population, economy
and government [40]. As the dependence of society on critical infrastructures
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 127–138, 2014.
IFIP International Federation for Information Processing 2014
128 CRITICAL INFRASTRUCTURE PROTECTION VIII
increases, it is vital that the infrastructures are protected and the potential for
disasters is reduced to the maximal extent.
Historically, the main focus was on developing infrastructures that would be
resilient to environmental conditions [36] and natural disasters. The shutdown
of the Torness nuclear power station in Scotland by a large bloom of jelly-
fish that blocked the water intake system demonstrates the unpredictability of
nature and the importance of planning for damaging natural phenomena.
As technology advanced [7], critical infrastructures increasingly came to rely
on digital control systems and networking; this has expanded the focus of crit-
ical infrastructure protection to include cyber threats as well as environmental
incidents and accidents [3]. Critical infrastructure assets are tempting targets
for hackers, criminal organizations, terrorist groups and nation states. Remote
attacks on critical infrastructures are a new approach for conducting warfare,
with the potential to bring about at least as much damage as traditional phys-
ical attacks. Cyber attacks make it possible to incapacitate a country and
cause harm to its population. Indeed, because of the interconnectivity and in-
terdependence of critical infrastructures across national borders, there is a high
risk that a failure in one infrastructure can propagate to other infrastructures,
resulting in cascading failures [21] that could affect practically all aspects of
society in multiple countries [26].
This paper presents a survey of computer security techniques currently used
to protect critical infrastructures. Also, it discusses why effective protection
methods are essential for modern critical infrastructures.
2. Motivation
The threat levels that currently face critical infrastructures are higher than
ever before. Not only do critical infrastructures have to cope with accidents and
changing environmental conditions, but the scope, magnitude and sophistica-
tion of cyber attacks are placing great strain on defensive mechanisms. Critical
infrastructure protection strategies must continually evolve to keep up with
new and emerging threats.
computer systems and networks. During the last few months of 2011, several
malicious email attacks were directed at British Government officials. The
email messages, which contained viruses, were doctored to look like they had
been sent by government colleagues or White House officials.
Phishing attacks are engineered to steal information that is used for identity
theft and financial profit. These attacks have many forms, but one of the
most common is to direct a user to a fake website that closely resembles a
legitimate website. The counterfeit website is often used to collect user names
and passwords as well as banking and credit card information [39].
A common but more complex attack involves distributed denial of ser-
vice [33], in which computer systems are sent large volumes of traffic that
consume their resources and cause them to crash. Distributed denial-of-service
attacks are effective because legitimate resource requests and bad requests are
often practically indistinguishable, making the attacks difficult to block [1]. An-
other sophisticated technique is a man-in-the-middle attack [34] that interposes
malicious code between system components in order to insert fabricated com-
mands and/or responses. A man-in-the-middle attack can have effects ranging
from information theft to system disruption; such an attack can be mitigated
by employing an authentication protocol to ensure that communications reach
their intended recipients [11].
MI5, the British security service, has announced its intention to invest mil-
lions of pounds in cyber defense activities to combat system vulnerabilities and
counter cyber threats; other government organizations are also focusing on de-
fensive measures [10]. Meanwhile, several other countries have reported steep
increases in attacks. China reported that millions of cyber attacks a day were
targeted at Beijing Olympic Games venues in 2008 [24]. While an Olympic
Games is not an infrastructure, it is an iconic gathering of people from around
the world and would be one of the highest profile targets imaginable.
and treatment systems, oil and gas pipelines, and, of course, the electric power
grid, which is certainly the most important critical infrastructure to modern
society.
3. Critical Infrastructures
The complexity of critical infrastructures and tight demands for services
coupled with operational efficiency and reliability have led to the widespread use
of control systems in critical infrastructures. However, control systems require
extensive networking resources, which introduce numerous vulnerabilities.
Insiders: Insider attacks are among the most serious threats to crit-
ical infrastructure assets. Insiders, who may be motivated by revenge
or greed, are knowledgeable about infrastructure assets and their weak-
nesses, and often have high-level access privileges or know how to bypass
security controls [17].
are designed for individuals such as system administrators, managers and key
executives who would require access to infrastructure assets and information
systems of increasing sensitivity.
A defense-in-depth implementation positions intrusion detection systems in
the different layers to detect hostile activities and raise alerts [41]. The intru-
sion detection systems typically perform anomaly detection and/or signature-
based detection. Anomaly detection involves the detection of abnormal system
and/or network behavior (e.g., a sudden, unexpected increase in data flow in a
certain part of a system). Signature-based detection involves the use of known
attack signatures; on its own, this technique is ineffective at detecting new (i.e.,
zero-day) attacks [20]. For this reason, critical infrastructure assets typically
incorporate multiple intrusion detection systems based on different detection
modalities to maximize protection.
One of the problems with using intrusion detection systems in critical infras-
tructures is that their relatively large footprint makes it difficult to implement
them on field devices that have limited computing resources. Additionally,
the systems are often unable to identify the most serious attacks and they
tend to impact system operation (especially, the tight timing requirements of
SCADA systems) [8, 38]. Moreover, intrusion detection systems may generate
large numbers of false positive errors, resulting in false alerts. Given the scale
of critical infrastructures, massive numbers of alerts could be generated [25],
potentially misleading operators and masking real attacks [6].
Unified threat management (UTM) systems, which first appeared in 2004,
are now widely used to secure large-scale information technology systems [41].
UTM systems use a combination of firewalls, pattern recognition systems, in-
trusion detection systems and embedded analysis middleware to implement
strong protection within the hardware, software and network layers. The util-
ity of UTM systems for critical infrastructure protection derives from their
provision of multiple security features within a unified architecture [41].
The benefits of using UTM systems include lower costs because of the re-
duced number of security appliances. The systems are also easy to deploy,
which makes them ideal for organizations with limited technical capabilities.
However, one of the main problems with UTM systems is their integration of
multiple security technologies (e.g., control interfaces, message formats, com-
munication protocols and security policies), which can complicate administra-
tive and management activities; the result is that applications tend to work
independently of each other.
Wen [35] has proposed the use of intrusion detection systems involving a
combination of technologies to detect intrusions that originate from internal
and external sources. The approach, which uses pattern matching and log file
analysis to scan internal network activity and incoming network packets for
anomalies, helps combat the insider threat as well as external attacks.
Nai Fovino, et al. [22] have developed an innovative approach to detect com-
plex attacks on SCADA systems. Their approach combines signature-based
intrusion detection with state analysis. The system can be enhanced by incor-
porating ad hoc rules to detect sophisticated attacks on SCADA systems.
In addition to focusing on network intrusions, it is important to address
attacks that have successfully breached network security. This is accomplished
using host-based monitoring and anomaly detection. The approach requires
the careful analysis of normal operating conditions to establish baselines and
thresholds for identifying anomalous activities. The baselines and thresholds
should be adjusted continually to reduce false positive errors.
Wang, et al. [33] have proposed an augmented attack tree model to combat
distributed denial-of-service attacks. Their approach creates attack trees to
model attacks and guide the development of attack detection and mitigation
strategies. While the approach is innovative, specifying attack trees for the
multitude of possible attacks is an arduous task. Moreover, the attack models
have to be tuned to the specific infrastructure asset being protected.
Schweitzer, et al. [27] discuss how one would know if an attack is actually
taking place. They posit that an attack would initially involve probes for col-
lecting information about the targeted infrastructure to be used in conducting
the attack. Once the main attack is underway, it is necessary to focus on
the intruders’ movements within the infrastructure. Schweitzer and colleagues
emphasize the need to use multiple, independent communications channels, so
that if one channel is compromised, an alternative channel exists to signal an
alarm. SCADA systems used in critical infrastructures typically incorporate
redundant communications channels to ensure reliable operations; this feature
can be leveraged to signal attacks as well as to mitigate their effects.
5. Conclusions
Critical infrastructures are becoming more and more indispensable as pop-
ulations grow and demands are placed for new and increased service offerings.
Clearly, modern society cannot function if major components of the critical
infrastructure are damaged or destroyed. Despite governmental policy and reg-
ulation and massive injections of funding and resources, the vast majority of
critical infrastructure assets may not be able to cope with sophisticated and
evolving cyber threats. Critical infrastructures are large, complex and expen-
sive assets. Since it is not possible to rebuild these assets from scratch to ensure
“baked in” security, the only option is to focus on integrating conventional and
innovative security mechanisms in comprehensive defense-in-depth approaches
founded on risk management and resilience to ensure that successful attacks do
not result in catastrophes.
References
[1] A. Al Islam and T. Sabrina, Detection of various denial-of-service and dis-
tributed denial-of-service attacks using RNN ensemble, Proceedings of the
Twelfth International Conference on Computers and Information Technol-
ogy, pp. 603–608, 2009.
[2] R. Anderson and S. Fuloria, Who controls the off switch? Proceedings of
the First IEEE International Conference on Smart Grid Communications,
pp. 96–101, 2010.
[3] M. Brownfield, Y. Gupta and N. Davis, Wireless sesnsor network denial-
of-sleep attack, Proceedings of the Sixth Annual IEEE SMC Information
Assurance Workshop, pp. 356–364, 2005.
[4] L. Buttyan, D. Gessner, A. Hessler and P. Langendoerfer, Application of
wireless sensor networks in critical infrastructure protection: Challenges
and design options, IEEE Wireless Communications, vol. 17(5), pp. 44–
49, 2010.
136 CRITICAL INFRASTRUCTURE PROTECTION VIII
[31] J. Walker, B. Williams and G. Skelton, Cyber security for emergency man-
agement, Proceedings of the IEEE International Conference on Technolo-
gies for Homeland Security, pp. 476–480, 2010.
[32] C. Wang, L. Fang and Y. Dai, A simulation environment for SCADA secu-
rity analysis and assessment, Proceedings of the International Conference
on Measuring Technology and Mechatronics Automation, vol. 1, pp. 342–
347, 2010.
[33] J. Wang, R. Phan, J. Whitley and D. Parish, Augmented attack tree
modeling of distributed denial of services and tree based attack detec-
tion method, Proceedings of the Tenth IEEE International Conference on
Computer and Information Technology, pp. 1009–1014, 2010.
[34] Y. Wang, H. Wang, Z. Li and J. Huang, Man-in-the-middle attack on BB84
protocol and its defense, Proceedings of the Second IEEE International
Conference on Computer Science and Information Technology, pp. 438–
439, 2009.
[35] W. Wen, An improved intrusion detection system, Proceedings of the In-
ternational Conference on Computer Applications and System Modeling,
vol. 5, pp. 212–215, 2010.
[36] T. Wilson, C. Stewart, V. Sword-Daniels, G. Leonard, D. Johnston, J.
Cole, J. Wardman, G. Wilson and S. Barnard, Volcanic ash impacts on
critical infrastructure, Physics and Chemistry of the Earth, Parts A/B/C,
vol. 45-46, pp. 5–23, 2011.
[37] S. Wolthusen, GIS-based command and control infrastructure for criti-
cal infrastructure protection, Proceedings of the First IEEE International
Workshop on Critical Infrastructure Protection, pp. 40–50, 2005.
[38] H. Xue, MultiCore systems architecture design and implementation of
UTM, Proceedings of the International Symposium on Information Sci-
ence and Engineering, pp. 441–445, 2008.
[39] W. Yu, S. Nargundkar and N. Tiruthani, A phishing vulnerability analysis
of web-based systems, Proceedings of the IEEE Symposium on Computers
and Communications, pp. 326–331, 2008.
[40] F. Yusufovna, F. Alisherovich, M. Choi, E. Cho, F. Abdurashidovich and
T. Kim, Research on critical infrastructures and critical information in-
frastructures, Proceedings of the Symposium on Bio-Inspired Learning and
Intelligent Systems for Security, pp. 97–101, 2009.
[41] Y. Zhang, F. Deng, Z. Chen, Y. Xue and C. Lin, UTM-CM: A practical
control mechanism solution for UTM systems, Proceedings of the IEEE
International Conference on Communications and Mobile Computing, pp.
86–90, 2010.
III
INFRASTRUCTURE MODELING
AND SIMULATION
Chapter 10
1. Introduction
Critical infrastructures are the backbone of modern society, enabling the
vital functionalities that support economic and social interactions. The Euro-
pean Commission’s 2008/114/EC Directive [5] defines critical infrastructure as
“an asset, system or part thereof located in Member States which is essential
for the maintenance of vital societal functions, health, safety, security, eco-
nomic or social well-being of people, and the disruption or destruction of which
would have a significant impact in a Member State as a result of the failure
to maintain those functions.” It is important to note that system failures in
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 141–154, 2014.
IFIP International Federation for Information Processing 2014
142 CRITICAL INFRASTRUCTURE PROTECTION VIII
a specific critical infrastructure sector can, due to their strategic role in the
socio-economic context, produce domino effects that can potentially impact all
aspects of society. Understanding the effects and strategic interconnections are
essential when responding to events, setting policies and determining protective
investments.
Thurlby and Warren [12] state that, in order to rank preventative mea-
sures, the economic costs and potential savings (i.e., reduced casualties and/or
economic losses) must be evaluated. Thus, there is a growing need to under-
stand the costs for society as a whole – beyond those of the initially-impacted
infrastructures – to fully comprehend the magnitude of an event and make
appropriate response decisions.
A number of powerful simulation tools have been developed to help under-
stand how networks may be affected by major incidents, many of which help or-
ganizations to improve their response readiness. Nevertheless, the relationship
between long-term strategic choices and the ability of infrastructure networks
to withstand disruptive events are not well understood. Indeed, decision mak-
ing concerning investments in critical infrastructure assets, particularly those
related to network control systems and the people who manage the systems,
have not been thoroughly investigated to determine the long-term implications.
While it is clear that spending less on assets, systems and people will degrade
a system, it is not obvious how much impact any particular choice has over an
extended period of time. The primary issues that need to be addressed are:
How long-term choices related to strategic issues make a network more
resilient.
How these choices and others can minimize service loss when disruptive
events occur.
How strategic and operational choices can minimize the time taken for
a network to recover and, thus, minimize the total cumulative loss of
services.
The Critical Infrastructure Simulation of Advanced Models for Intercon-
nected Network Resilience (CRISADMIN) Project studies the effects produced
by critical events in an environment in which the interdependencies among
several critical infrastructure sectors are modeled using a system dynamics ap-
proach and simulated in a synthetic environment. This paper discusses the
key features of the methodology. The intention is to provide insights into the
activities and expected outputs of the project, providing researchers and pro-
fessionals with a methodology for crisis management.
2. CRISADMIN Approach
The CRISADMIN Project is focused on developing a tool for evaluating
the impacts of critical events on critical infrastructures. The tool is intended
to serve as a decision support system that is able to test and analyze critical
infrastructure interdependencies, determine the modalities through which they
Cavallini, et al. 143
are affected by predictable and unpredictable events (e.g., terrorist attacks and
natural disasters), and investigate the impacts of possible countermeasures and
prevention policies.
To achieve these challenging objectives, a three step approach has been for-
mulated:
Theoretical Model Definition: The first step is to define the system
characteristics in order to establish the investigative boundaries and key
reference points. This objective is achieved through the formulation of a
theoretical model that identifies variables and parameters that best repre-
sent (or approximate) the infrastructures of interest. Special attention is
focused on the identification of social system variables (i.e., “soft” param-
eters that are particularly difficult to quantify). Through careful analysis
of the literature, these variables are represented in a manner compatible
with system dynamics.
System Dynamics Model Development: Causal relations between
the parameters defined in the theoretical model are identified; this facil-
itates the construction of a number of causal maps. The causal maps
provide the foundation for the simulation model structure that is vali-
dated using real case studies.
Data Collection: Quantitative data concerning critical infrastructure
functionality is collected from a number of case studies. In addition,
data related to the socio-economic framework is gathered according to
its availability and reliability with reference to critical events that have
occurred in Europe in recent years.
Starting with the definition of a theoretical reference framework, the goal is
to design a system dynamics model that constitutes the logical base for develop-
ing the decision support system. The effort engages case studies for model de-
velopment and analysis. The models are integrated within the decision support
structure to produce a readily accessible and usable decision making tool.
3. Theoretical Model
The theoretical model defines the main factors that should be considered in
an emergency situation. The goal is to enhance the preparedness and response
capability of all the involved actors in order to mitigate and recover from the
negative effects of a catastrophic event. The main factors are investigated in
terms of mutual influences, those that reinforce and those that dampen the
effects of an event. Special attention is focused on the involved actors (i.e.,
victims, spectators and individuals responsible for managing the emergency) [3].
As in all complex environments, the vast majority of factors in emergency
situations are highly interconnected. The primary objective of the theoretical
model is to identify the main dependencies that impact the evolution of an
event. Territorial features, the socio-economic environment, event timing (e.g.,
time and duration) and actor preparedness are included in the analysis. In the
144 CRITICAL INFRASTRUCTURE PROTECTION VIII
CRISADMIN Project, the effects of a critical event are studied in the context
of three critical infrastructure sectors, namely transportation (private and pub-
lic), energy (electricity distribution and consumption) and telecommunications
(mobile and fixed).
Data domains are grouped according to the parameters included in the
model. Specifically, four data domains are considered:
Territory: This domain includes the set of variables and parameters
that describe the geographic features of the territory. Territorial charac-
teristics are particularly relevant to natural disasters; however, they may
also affect the efficiency of responses in other critical situations (e.g., high
territorial diversity exerts a negative influence on the promptness of emer-
gency transportation). In this data domain, the main elements are the
territorial factors and geographical nature (e.g., extension and locality)
that impact vital services and social aspects.
Environment: This domain refers to the set of variables and parameters
related to the presence and activities of human beings in the territory,
such as energy-related supply chain capacity, public transportation, pop-
ulation density and socio-economic patterns in the affected area. In the
case of human-initiated critical events, environmental parameters are es-
sential to successful crisis response.
Apparatus: This domain includes the set of variables and parameters
related to the professionals and operators who manage the effects of catas-
trophic events and the subsequent recovery. Typically, the apparatus in-
cludes multiple agencies and organizations, each of which have a specific
role in managing minor emergencies as well as unexpected critical events.
In some countries, civil authorities coordinate the activities of all the var-
ious apparatus organizations in order to mitigate the effects of a critical
event.
Events: This domain refers to the set of variables and parameters that
define “normal” conditions. The data describes the evolution of normal
situations over time (in contrast, the geographical features in the territory
domain are time independent). Data related to the environment and
apparatus depend on the normal life-cycles and are tied to the hour of
the day (e.g., work hours and commuting hours), day of the week (e.g.,
workday, weekend, bank holiday and special days) and month of the year
(e.g., festivals and vacation periods). These dependencies, which can be
more or less substantial for the different variables, are considered when
modeling the evolution of a critical event from the very first moments
after it occurs. After the first parameter adjustment at t0 , the evolution
of an event is generally considered to be independent of the hour, day
and month because of the emergency effects.
Figure 1 presents the CRISADMIN theoretical model with the four data
domains. Examples of parameters related to the three critical infrastructure
Cavallini, et al. 145
sectors are shown to illustrate the items that require investigation when re-
sponding to a critical event.
In addition, cultural and societal settings (e.g., values, attitudes and demo-
graphics) that strongly influence the preparation for and the reaction to critical
events with regard to victims and first responders in specific environments were
taken into account.
6. Data Collection
In order to apply the CRISADMIN approach, four critical events related
to previous terrorist attacks and floods were identified and analyzed. The
following criteria drove the selection of the events:
flood in the United Kingdom, the 2001 Po river floods in Italy and the
2011 Genoa flood in Italy.
Critical Infrastructure Impact: Terrorist attacks and floods destroy
essential assets, and impact critical infrastructures directly or indirectly.
These events tend to have major impacts on the transportation, energy
and telecommunications sectors.
Four case studies were used to apply and validate the CRISADMIN ap-
proach. The case studies include: (i) Madrid bombings of 2004; (ii) London
bombings of 2005; (iii) Central and Eastern Europe floods of 2002; and (iv)
United Kingdom floods of 2007. The following sections briefly describe the
selected case studies and highlight their essential elements and impacts.
structures were not directly targeted by the bombing attacks. However, the
telecommunications infrastructure experienced massive overloads due to general
panic and crisis management needs.
pressure. Rainfall accumulations were generally less than 125 mm over the
two-day period, but intense rainfall of up to 255 mm was observed in some
locations.
The rainfall triggered flood waves in the upper portions of the Danube and
Vltava catchment areas. One flood wave progressed down the Danube through
Austria, Slovakia and Hungary, causing minor damage. A more critical flood
wave progressed down the Vltava through Prague and down the Elbe through
northern Bohemia and Germany. Upon reaching Germany, the flood waters
in the Elbe inundated Dresden, causing damage to residential and commercial
property as well as many historical buildings in the city center. The increase
in river height in Dresden was more gradual and of greater magnitude than the
flood peak in Prague. Although Prague itself was hardly hit by the flash flood,
damage occurred in the historical and residential parts of the city center.
The greatest number of fatalities (58) was caused by floods resulting from
the first wave on the eastern coast of the Black Sea. Seventeen people died in
the Czech Republic, 21 in Dresden and more than 100 fatalities were reported
across Europe. Direct and indirect impacts on the transportation and energy
infrastructures were registered.
know how to react. At the peak of the flooding, around 350,000 homes across
Gloucestershire were left without water and 50,000 homes without power.
7. CRISADMIN Prototype
The CRISADMIN Project seeks to demonstrate, by means of a prototype,
that a flexible system dynamics modeling engine can assist first responders and
decision makers in managing critical events. During actual events, knowledge
of the past, coupled with the current aspects of a given context, form the basis
for selecting modeling parameters and defining influences.
The CRISADMIN decision support system takes into account experience
gained through participation in projects associated with the design of modeling
methods and tools for monitoring and contrasting emergencies [1]. The decision
support system incorporates a three-tiered architecture: (i) a back-end that
stores variables and parameters associated with the four domains; (ii) a core
that houses the system dynamics modeling engine; and (iii) a front-end that
maintains the parameters, activates the functions and presents results.
The simulation model will be made available to institutions and organiza-
tions across the European Union – public entities (e.g., civil protection and
fire brigades) as well as private entities (e.g., infrastructure asset owners and
operators). Crisis management is typically performed in interconnected oper-
ations control rooms (OCRs) that continuously monitor critical events. The
CRISADMIN decision support system is designed for use by analysts in OCRs
as they coordinate activities during critical events. The decision support tool
will be used to support operational decisions that benefit from the continuous
monitoring capabilities provided by OCRs. The tool will provide decision mak-
ers with a starting point that is both expandable and customizable. The tool
environment will also engage several fixed and non-customizable scenarios and
situations that encompass different crisis situations. This feature will enable
decision makers to understand the dynamics of interacting critical infrastruc-
ture assets. The prototype will also provide decision makers with points of
reference as they select appropriate policy alternatives for crisis management.
8. Conclusions
Decision makers responsible for infrastructure protection and crisis man-
agement must understand the consequences of policy and investment options
before they enact solutions. This notion is particularly important due to the
highly complex alternatives that must be considered when protecting critical
infrastructures in the current threat environment. An effective way to examine
and pursue trade-offs involving risk reduction and protection investments is to
utilize a decision support system that incorporates information about threats
and the consequences of disruptions. System dynamics modeling, simulation
and analysis can be used to conduct impact assessments and risk analyses based
on realistic scenarios.
Cavallini, et al. 153
Acknowledgements
This research, conducted by personnel from the Department of Computer,
Control and Management Engineering of La Sapienza University, FORMIT
Foundation, Erasmus University Rotterdam, Theorematica and Euro Works
Consulting, was performed under the CRISADMIN Project. The CRISAD-
MIN Project is supported by the Prevention, Preparedness and Consequence
Management of Terrorism and Other Security-Related Risks Program launched
by the Directorate-General of Home Affairs of the European Commission.
References
[1] P. Assogna, G. Bertocchi, A. Di Carlo, F. Milicchio, A. Paoluzzi,
G. Scorzelli, M. Vicentino and R. Zollo, Critical infrastructures as complex
systems: A multi-level protection architecture, Proceedings of the Third In-
ternational Workshop on Critical Information Infrastructure Security, pp.
368–375, 2008.
[2] BBC News, The summer floods: What happened (news.bbc.co.uk/2/hi/
uk_news/7446721.stm), June 25, 2008.
[3] L. Bourque, K. Shoaf and L. Nguyen, Survey research, International Jour-
nal of Mass Emergencies and Disasters, vol. 15(1), pp. 71–101, 1997.
[4] W. Enders and T. Sandler, The Political Economy of Terrorism, Cam-
bridge University Press, Cambridge, United Kingdom, 2012.
[5] European Commission, Identification and Designation of European Crit-
ical Infrastructures and the Assessment of the Need to Improve Their
Protection, Council Directive 2008/114/EC, Brussels, Belgium, December
8, 2008.
[6] S. Friedman, Learning to make more effective decisions: Changing beliefs
as a prelude to action, The Learning Organization, vol. 11(2), pp. 110–128,
2004.
154 CRITICAL INFRASTRUCTURE PROTECTION VIII
Mohammed Talat Khouj, Sarbjit Sarkaria, Cesar Lopez, and Jose Marti
1. Introduction
All of modern society, but in particular urban communities, rely heavily on
the system of interconnected critical infrastructures. These systems are inher-
ently complex in terms of interconnections and interdependencies. Thus, they
are vulnerable to major disruptions that could cascade to other dependent sys-
tems with possible disastrous consequences. For example, the Indian Blackout
of 2012 – the largest power blackout in history – caused massive disruptions
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 155–172, 2014.
IFIP International Federation for Information Processing 2014
156 CRITICAL INFRASTRUCTURE PROTECTION VIII
2. Related Work
This section describes related work in the areas of disaster mitigation in
interdependent critical infrastructures, agent-based modeling for disaster miti-
gation and disaster mitigation applications using reinforcement learning.
Transmission
Electric
Substation
Power
Plant Food
Distribution
Center
Communications
Local Road
Regional Refinery
Bridge Highway
Local Road Compressor Station
Oil Field
Firefighter
Pipe Purification
Plant Hospital
Pumping Station
Paramedic 911 E-Comm
O’Reilly, et al. [14] have specified a system dynamics model that describes
the interactions between interconnected critical infrastructures. They use the
model to analyze the impact of a telecommunications infrastructure failure on
emergency services. The important conclusion is that lost communications
negatively impacts medical services and drastically increases treatment costs.
Similarly, Arboleda, et al. [2] have addressed the impact of failures of inter-
dependent infrastructure components on the operation of healthcare facilities.
The goal was to determine the unsatisfied demand of interconnected infras-
tructure systems and the resulting costs using a network flow model. Linear
programming was used to assess the level of interdependency between a health-
care facility and the primary infrastructure systems linked to it.
In other work, Arboleda and colleagues [1] examined the internal operating
capabilities of healthcare facilities in terms of the interactions between different
service areas (emergency room, intensive care unit, operation room and wards).
This was performed using a system dynamics simulation model. The goal was
to assess the vulnerabilities of a healthcare facility during a disaster. The
approach enabled the identification of policies to best mitigate the effects of a
disruption.
Arboleda, et al. [3] have also integrated a network flow model and system
dynamics model. This was done to simulate the impact of infrastructure system
disruptions on the provision of healthcare services.
These studies and others make it clear that wise decisions to reallocate and
utilize the available resources are vital when dealing with interconnected critical
infrastructures. Informed decisions can potentially mitigate death and devas-
158 CRITICAL INFRASTRUCTURE PROTECTION VIII
Wiering and Dorigo [19] have developed an intelligent system that enables
decision makers to mitigate the consequences of natural and human-initiated
disasters (e.g., forest fires). Such disasters involve many interacting sub-pro-
cesses that make it difficult for human experts to estimate costs. The system
of Wiering and Dorigo uses reinforcement learning to learn the best policy or
actions to be chosen in a variety of simulated disaster scenarios.
Su, et al. [16] have proposed a path selection algorithm for disaster response
management. The algorithm is designed for search and rescue activities in
dangerous and dynamic environments. The algorithm engages reinforcement
learning to help disaster responders discover the fastest and shortest paths to
targeted locations. To accomplish this, a learning agent interacts with a two-
dimensional geographic grid model. After a number of trials, the agent learns
how to avoid dangerous states and to navigate around inaccessible states.
3.1 i2Sim
i2Sim is a hybrid discrete-time simulator that combines agent-based model-
ing with input-output production models. The simulator can model and play
out scenarios involving interdependent systems. i2Sim is designed as a real-
time simulator that can also serve as a decision support tool while a disaster is
actually occurring. The simulation capability of i2Sim enables decision mak-
ers to evaluate the predicted consequences of suggested actions before they are
executed [10].
The dynamic aspects of an i2Sim model are implemented by the movement
of tokens between i2Sim production cells (i.e., modeled infrastructures such as
power stations) through designated channels (i.e., lifelines such as water pipes).
In fact, i2Sim cells and channels correspond to discrete entities in the real world.
Figure 2 presents an example i2Sim model.
In i2Sim, each production cell performs a function. A function relates the
outputs to a number of possible operating states – physical modes (PM) and
resource modes (RM) – of the system. At every operating point along an
event timeline, the i2Sim description corresponds to a system of discrete time
equations expressed as a transportation matrix (Figure 3). The transportation
160 CRITICAL INFRASTRUCTURE PROTECTION VIII
Channel
Electricity
Channel
Electricity
Channel
Water
Channel
Water
Channel
Electricity
to the start state. The estimate is computed by averaging the samples that are
returned.
The action-value function can be implemented as a lookup table. The table
associates a long-term predicted reward Q(s, a) value with each state-action
pair defined for the modeled system. The table represents the acquired expe-
rience of the reinforcement learning agent and is updated during the learning
process.
Note that the simulated system presents the state of the modeled environ-
ment that is detected by the learning agent. In the example considered here,
the state is defined using two critical infrastructures: Power Station 4 and the
Water Pumping Station. The physical mode (PM) and the resource mode (RM)
of Power Station 4 and the Water Pumping Station are specified as PMXP4 and
RMYP4 for power, and PMXW and RMYW for water. The values of X and Y
range from one to five. When X has a value of one, the modeled infrastructure
has no physical damage; when X is equal to five, the modeled infrastructure
has collapsed completely. Similarly, when Y has a value of one, all the required
resources to maintain the minimum functionality of the modeled infrastructure
are available; when Y is equal to five, the required resources are not available.
Table 1 presents a sample lookup table used by the learning agent. In the ta-
ble, the state-action pairs are captured using the variables: PMXP4, the Power
Station 4 physical mode (state); RMYP4, the Power Station 4 resource mode
(state); PMXW, the Water Pumping Station physical mode (state); RMYW,
the Water Pumping Station resource mode (state); DP4, the Power Station 4
distributor (action); and DW, the Water Pumping Station distributor (action).
PMXP4 and RMYP4 for power and PMXW and RMYW for water, respec-
tively. This can be represented at any given time by:
⎛ ⎞
P M 1P 4 RM 1P 4 P M 1W RM 1W
⎜ P M 1P 4 RM 1P 4 P M 1W RM 2W ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ ⎟
⎜ P M 1P 4 RM 5P 4 P M 5W RM 5W ⎟
⎜ ⎟
⎜ P M 2P 4 RM 2P 4 P M 1W RM 1W ⎟
⎜ ⎟
⎜ P M 2P 4 RM 2P 4 P M 1W RM 2W ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ ⎟
s = ⎜ P M 2P 4 RM 5P 4 P M 5W RM 5W ⎟ (4)
⎜ ⎟
⎜ .. .. .. .. ⎟
⎜ . . . . ⎟
⎜ ⎟
⎜ P M 3P 4 RM 5P 4 P M 5W RM 5W ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎜ ⎟
⎜ . . . . ⎟
⎜ P M 4P 4 RM 5P 4 P M 5W RM 5W ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
⎝ . . . . ⎠
P M 5P 4 RM 5P 4 P M 5W RM 5W
The first row in the equation above corresponds to state s1 = (PM1P4, RM1P4,
PM1W, RM1W), where PM1P4 is the Physical Mode 1 of Power Station 4,
RM1P4 is the Resource Mode 1 of Power Station 4, PM1W is the Physical
Mode 1 of the Water Pumping Station and RM1W is the Resource Mode 1 of
the Water Pumping Station.
The number of states Ns in the model (total number of rows in vector s) is
given by:
Ns = Z K = 152 = 225 states (5)
where Z is the number of resource modes available for each controlled produc-
tion cell; and K is the number of controlled production cells.
The number of available actions Na is given by:
4. Example Scenario
This section uses an example scenario to demonstrate the application of the
learning agent to an urban community model simulated by i2Sim (Figure 5).
The goal of the agent is to find the optimum trajectory that leads to the
maximum outcome. The expectation is that this approach will converge quickly
to the maximum number of discharged patients.
Power Hospital 1
Power to V2 (H1)
Station 2
(P2)
Power to H1
Power
Station 3
(P3) Venue 2
(V2)
Casualties to H2
Distributor Power 4
Khouj, Sarkaria, Lopez & Marti
(DP4)
Power to H1
Power
Station 4 Hospital 2
(P4) (H2)
Power to H2
Generated Power
Water
Pumping Pumped Water H2 Discharged Patients
Station
(W)
Pumped Water to H2
Distributor Water
(DW) Pumped Water to H1
167
from Power Station 4 and water from an external source. The output of the
cell is high pressure pumped water that goes to a water distributor, which
distributes the water via water channels (water pipes).
Venue 1 and Venue 2 are facilities that contain large numbers of people.
Venue 2 is more than 65,000 m2 in area and hosts up to 60,000 people. Venue
1 is slightly smaller at about 44,000 m2 and hosts up to 20,000 people. It is
assumed that both venues are hosting events and are fully occupied. Thus, the
total population is 80,000.
Two hospitals are modeled, Hospital 1 (main hospital) and Hospital 2 (al-
ternative hospital). The input resources come from the four electrical power
stations (electricity) and the water pumping station (water). Based on the
availability of these resources, the rate of discharged patients for each hospital
is known from historical data.
initialized randomly at the start of the first scenario and learning continued
from one scenario to the next.
The model simulated a period of ten hours in five-minute increments. The
statistics and system latencies used by the simulator were taken from an internal
technical report [7]. The report helped guide the rates used in the simulation.
For example, a crowd of 80,000 is expected to have up to 480 injuries.
Figure 6 shows the results for two consecutive sets of trials that were ini-
tialized independently (light and dark lines). In both cases, the convergence to
an optimum solution occurred and all 480 patients were discharged from both
emergency units during the lifetime of the simulation.
During the early phases of learning in both trials, the agent had little or
no experience and was unable to maximize the number of discharged patients.
However, this was not the case in the later runs, where the agent showed
an ability to fully satisfy the demands of the modeled interconnected critical
infrastructures by carefully balancing resources across all the infrastructure
components.
In contrast, a naive decision maker might select a resource configuration
that would only favor the hospitals, but this would be a sub-optimal solution.
Instead, the actions taken by the trained agent were those that intelligently
utilized the available limited resources (power and water) without exhausting
them, which ultimately satisfied the sudden needs of all the interconnected crit-
ical infrastructures, including the venues and, most importantly, the hospitals.
170 CRITICAL INFRASTRUCTURE PROTECTION VIII
The simulation of each scenario required about three minutes using a com-
puter with an Intel Core i5 2.8 GHz CPU and 8 GB RAM. In total, 100 runs
took about 600 minutes. This is important in real-world deployments because
simulations should be faster than real time in order to assist emergency respon-
ders in making informed decisions as a situation unfolds.
5. Conclusions
The modeling and analysis framework presented in this paper is an innova-
tive approach for studying the impact of natural or human-initiated disasters on
critical infrastructures and optimally allocating the available resources during
disaster response. The framework relies on i2Sim and reinforcement learning
using Monte Carlo policy estimation (RL-MC). i2Sim permits the simulation
of complex interconnected critical infrastructures while the RL-MC approach
supports rapid learning based on experiential knowledge in order to provide
intelligent advice on allocating limited resources. The experimental results re-
veal that decision makers can reduce the impact of disruptions by employing
the look-ahead and optimization features provided by the framework. The
loosely coupled nature of reinforcement learning also enables it to be applied
to a variety of resource optimization scenarios.
Khouj, Sarkaria, Lopez & Marti 171
Our future research will analyze the computational aspects of the learning
system. A speed versus accuracy trade-off exists between approaches that use
the conventional lookup table implementation of an action-value function and
other approaches that use function approximation techniques.
Acknowledgement
This research was partially supported by the Ministry of Higher Education
of the Kingdom of Saudi Arabia.
References
[1] C. Arboleda, D. Abraham and R. Lubitz, Simulation as a tool to assess
the vulnerability of the operation of a health care facility, Journal of Per-
formance of Constructed Facilities, vol. 21(4), pp. 302–312, 2007.
[2] C. Arboleda, D. Abraham, J. Richard and R. Lubitz, Impact of interde-
pendencies between infrastructure systems in the operation of health care
facilities during disaster events, Proceedings of the Twenty-Third Joint In-
ternational Conference on Computing and Decision Making in Civil and
Building Engineering, pp. 3020–3029, 2006.
[3] C. Arboleda, D. Abraham, J. Richard and R. Lubitz, Vulnerability assess-
ment of health care facilities during disaster events, Journal of Infrastruc-
ture Systems, vol. 15(3), pp. 149–161, 2009.
[4] G. Atanasiu and F. Leon, Agent-based risk assessment and mitigation for
urban public infrastructure, Proceedings of the Sixth Congress on Forensic
Engineering, pp. 418–427, 2013.
[5] E. Bonabeau, Agent-based modeling: Methods and techniques for simulat-
ing human systems, Proceedings of the National Academy of Sciences, vol.
99(3), pp. 7280–7287, 2002.
[6] F. Daniel, India power cut hits millions, among world’s worst outages,
Reuters, July 31, 2012.
[7] M. Khouj and J. Marti, Modeling Critical Infrastructure Interdependencies
in Support of the Security Operations for the Vancouver 2010 Olympics,
Technical Report, Department of Electrical and Computer Engineering,
University of British Columbia, Vancouver, Canada, 2010.
[8] M. Khouj, S. Sarkaria and J. Marti, Decision assistance agent in real-time
simulation, International Journal of Critical Infrastructures, vol. 10(2), pp.
151–173, 2014.
[9] K. Kowalski-Trakofler, C. Vaught and T. Scharf, Judgment and decision
making under stress: An overview for emergency managers, International
Journal of Emergency Management, vol. 1(3), pp. 278–289, 2003.
[10] J. Marti, J. Hollman, C. Ventura and J. Jatskevich, Dynamic recovery
of critical infrastructures: Real-time temporal coordination, International
Journal of Critical Infrastructures, vol. 4(1/2), pp. 17–31, 2008.
172 CRITICAL INFRASTRUCTURE PROTECTION VIII
Okan Pala, David Wilson, Russell Bent, Steve Linger, and James Arnold
Abstract Electric power, water, natural gas and other utilities are served to con-
sumers via functional sources such as electric power substations, pumps
and pipes. Understanding the impact of service outages is vital to de-
cision making in response and recovery efforts. Often, data pertaining
to the source-sink relationships between service points and consumers
is sensitive or proprietary, and is, therefore, unavailable to external en-
tities. As a result, during emergencies, decision makers often rely on
estimates of service areas produced by various methods. This paper,
which focuses on electric power, assesses the accuracy of four meth-
ods for estimating power substation service areas, namely the standard
and weighted versions of Thiessen polygon and cellular automata ap-
proaches. Substation locations and their power outputs are used as
inputs to the service area calculation methods. Reference data is used
to evaluate the accuracy in approximating a power distribution network
in a mid-sized U.S. city. Service area estimation methods are surveyed
and their performance is evaluated empirically. The results indicate
that the performance of the approaches depends on the type of analysis
employed. When the desired analysis includes aggregate economic or
population predictions, the weighted version of the cellular automata
approach has the best performance. However, when the desired analy-
sis involves facility-specific predictions, the weighted Thiessen polygon
approach tends to perform the best.
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 173–191, 2014.
IFIP International Federation for Information Processing 2014
174 CRITICAL INFRASTRUCTURE PROTECTION VIII
1. Introduction
Electric power, water, natural gas, telecommunications and other utilities are
served to consumers using functional sources (facilities) such as power substa-
tions, pumps and pipes, switch controls and cell towers. Each of these sources is
related to a geographical service area that includes consumers. Data pertaining
to the source-sink relationships between service points and consumers is often
sensitive or proprietary and is, therefore, unavailable to external entities. Dur-
ing emergencies, decision makers who do not have access to utility information
must rely on estimates of service areas derived by various methods. Decision
makers have a strong interest in quantifying the accuracy of critical infras-
tructure service area estimation methods and developing enhanced estimation
techniques [14, 22, 25].
This paper assesses the accuracy of four methods that are commonly used
to estimate infrastructure impact after a disruptive event. The term “impact”
refers to the inability of a utility to provide a service, such as power or gas,
due to infrastructure damage. The paper focuses on two types of impacts:
(i) aggregate impacts, such as economic activity and the population affected
by the outage; and (ii) point data impacts, such as whether specific assets
are included in an outage. The methods include Voronoi (Thiessen) polygons,
Voronoi (Thiessen) polygons with weights, cellular automata and cellular au-
tomata with weights. The methods are compared using a reference model of a
power distribution network for a mid-sized U.S. city.
2. Background
Power, gas, water and other infrastructures serve customers in geographical
regions called service areas. Although infrastructure operators have detailed
information about the source-sink relationships between their assets, this in-
formation is neither organized to facilitate large-scale analyses nor is it docu-
mented by public regulatory agencies. In addition, the data is often highly
sensitive or proprietary.
Determining service areas in the absence of data has long been a problem, but
estimating the service areas accurately is very important in disaster recovery
situations [7, 14, 22, 25]. Typically, the geographic boundary of a service point
is required to estimate the source-sink relationships between serving entities
(sources) and served entities (sinks). Increasing the accuracy of the estimates
could lead to more efficient recovery. Moreover, understanding the compar-
ative merits of different estimation approaches is necessary to enable decision
makers to select the right mitigation and remediation strategies in disaster situ-
ations. This paper focuses on Voronoi diagram (Thiessen polygon) and cellular
automata estimation approaches.
Pala, et al. 175
11MW 10MW
12MW
2MW 10MW
12MW
9MW 13MW
10MW
puts. This approach is potentially more realistic than an approach that uses
Thiessen polygons with equal weights. For example, as shown in Figure 1(b), a
2 MW electric power substation serves a smaller area than neighboring power
substations with larger power outputs.
3. Assessment Methodology
Four algorithms are used to estimate service areas for electric power: (i)
Thiessen polygons; (ii) Thiessen polygons with weights based on the electric
power substation loads; (iii) cellular automata; and (iv) cellular automata with
weights based on the electric power substation loads.
An electric power network in a mid-sized U.S. city comprising roughly 150
substations is used in the evaluation. The reference dataset includes the trans-
mission network, substations, power demand and substation service areas. The
reference substation service areas, which are polygonal in shape, were drawn
up by an electric power system expert. Economic and population information
Pala, et al. 177
derived from the 2010 LandScan dataset [5] is incorporated, along with the
daytime/nighttime population information from [8, 20].
The ESRI suite of GIS tools was used to implement the Thiessen polygon
approach. The weighted Thiessen polygons were created using the publicly-
available ArcGIS extension [13]. IEISS [6] was used to create the cellular au-
tomata and weighted cellular automata polygons; this algorithm grows cells in
a raster format starting from each source point (i.e., electric power substation)
until it runs out of space or electric power resources.
Figure 3. Point layer (10K) overlaid with weighted Thiessen polygon layer.
Pala, et al. 179
Figure 4. Point layer (10K) overlaid with cellular automata polygon layer.
Figure 5. Point layer (10K) overlaid with weighted cellular automata polygon layer.
180 CRITICAL INFRASTRUCTURE PROTECTION VIII
SALayerͲ
SALayerͲ
SALayerͲ SALayerͲ
Weighted SALayerͲ Dataset
Weighted Cellular with10K
Thiessen Cellular Reference
Thiessen Automata
(TP)
(WTP) (CA)
Automata
Data(Ref) Random
(WCA) Points
10KPointswith
SPATIAL SPATIAL
ReferenceData
JOIN JOIN
Attributes
Error Error
Matrix Matrix
10KPoints 10KPoints 10KPoints 10KPoints Comparison forTP forWTP
withRef withRef withRef withRef ofAttributes
andTP andWTP andCA andWCA
Attributes Attributes Attributes Attributes Error Error
Matrix Matrix
forCA forWCA
serving point in the reference dataset) is measured for each polygon. To classify
the points uniformly based on their proximity to the serving source point, the
distances are normalized based on the size of the service area polygon that
overlays the point for each method. This approach enables a decision maker to
quantify the quality of the results based on where a point is located within a
service area. Facilities located closer to the service source (i.e., electric power
substation) have higher confidence values than those that are further away from
the service source. This reduction in confidence can, in fact, be quantified.
As an example, consider two hospitals as point data. The first hospital
is located 500 yards away from Substation A and the second hospital is two
miles away from Substation B. If the service area sizes are the same for both
substations, it is reasonable to compare the hospital to substation distances
and to calculate the confidence that the hospitals are correctly associated with
the substations. However, if the service area of Substation A is much smaller
than that of Substation B, then the distances must be normalized.
Normalization and point classification are based on the distance to the
source. Let P (s) be the service area polygon for service point s, A be the
area of polygon P and r be the radius of a circle with the same area A as the
service area polygon P (s). Furthermore, let i be a randomly-placed point in the
agreement zone (i.e., region where the reference data polygon and the polygon
produced by the service area estimation method overlap), d be the distance
between i and service point s, which is normalized and classified as follows:
(i) Point i is classified in Proximity Class #1 (closest 25%) if d < r/4.
(ii) Point i is classified in Proximity Class #2 (25–50%) if r/4 < d < r/2.
(iii) Point i is classified in Proximity Class #3 (50–75%) if r/2 < d < 3r/4.
(iv) Point i is classified in Proximity Class #4 (farthest 25%) if d < 3r/4.
All the points are classified using the normalized distances and the classifica-
tions are used to measure the effect of proximity on the accuracy of point data
and to quantify the confidence in a method when reference data is unavailable.
182 CRITICAL INFRASTRUCTURE PROTECTION VIII
4. Experimental Results
This section presents the results of the performance analysis of the four
approaches, namely the standard and weighted versions of the Thiessen polygon
and cellular automata approaches. For the weighted methods, peak energy
consumption (in MW) is used for the weights.
Figure 8 shows the results obtained for the four approaches. Each sub-figure
displays one service area creation method along with the reference dataset.
Figure 8(a) compares the Thiessen polygon approach results with the reference
set while Figure 8(b) compares the weighted Thiessen polygon approach results
with the reference set. Figures 8(c) and 8(d) show the corresponding results for
the cellular automata and weighted cellular automata approaches, respectively.
The first set of results pertains to aggregate statistical accuracy. In partic-
ular, the area, population and various economic indicators are compared with
the results of the reference service areas.
Table 1 shows the mean differences in the daytime and nighttime populations
between the calculated and reference service areas. A smaller value is a better
result because the population value produced by the method is closer to the
population value produced for the reference service area. For the daytime and
nighttime populations, the weighted cellular automata approach yields the best
results (smallest differences) compared with the reference data. On the other
hand, the cellular automata approach yields results with the highest differences.
Similar results were obtained for the sum of differences in populations (Ta-
ble 2). The weighted cellular automata approach yields the best results. The
184 CRITICAL INFRASTRUCTURE PROTECTION VIII
weighted Thiessen polygon approach yields better results than the standard
Thiessen polygon and cellular automata approaches.
Tables 3 and 4 show the means of the differences in the economic impact
for various metrics (direct, indirect, induced, employment and business). In
all cases, the difference is the lowest for the weighted cellular automata ap-
proach, second lowest for the cellular automata approach and third lowest for
the weighted Thiessen polygon approach. The only exception is the economic
impact on business (Table 4), for which the weighted Thiessen polygon ap-
proach and cellular automata approach swap places. The largest mean differ-
ence value is produced by the Thiessen polygon approach. Although the mean
difference for the weighted Thiessen polygon approach is larger than that for
the cellular automata approach (except for the economic impact on business),
the differences are not as notable as the differences for the other categories.
Tables 5 and 6 show the results corresponding to the sums of the differences;
the results have the same trends as in the case of the mean differences.
Pala, et al. 185
The final aggregate statistic is the total surface area of the polygons. The
results of the total surface area comparisons indicate that the average refer-
ence polygon area is 1,033 acres. As shown in Table 7, the weighted cellular
automata and Thiessen polygon approaches yield polygons that are the clos-
est in size (on average) to the reference polygon sizes. The cellular automata
approach yields the least accurate approximation for this metric.
Estimation Acccuracy
Approach (%)
Thiessen Polygons 54.1
Weighted Thiessen Polygons 68.9
Cellular Automata 52.3
Weighted Cellular Automata 59.5
For the point accuracy analysis, 10,000 points were selected randomly across
the study area and an error matrix was created for each method. The matrices
were used to calculate the overlay agreement accuracy. Table 8 shows that the
weighted Thiessen polygon approach yields the best overall results (68.9%),
followed by the weighted cellular automata approach (59.5%), while the cellular
automata approach has the lowest accuracy (52.3%).
As shown in Table 9, the results are nuanced. The weighted cellular au-
tomata approach has the highest point accuracy (91%) when points in the
closest 25% area of each polygon are considered, followed by the weighted
186 CRITICAL INFRASTRUCTURE PROTECTION VIII
Thiessen polygon approach (86%), the cellular automata approach (85%) and
the Thiessen polygon approach (81%). Farther away from the source point, a
drop in the accuracy of the unweighted approaches (Thiessen polygon and cellu-
lar automata) is observed. The accuracies of the weighted approaches decrease
considerably, but they are still higher than the accuracies of the unweighted
approaches.
Estimation Acccuracy
Approach (%)
Thiessen Polygons 96.5
Weighted Thiessen Polygons 97.4
Cellular Automata 95.2
Weighted Cellular Automata 97.9
It is important to note that the accuracies of all the approaches improve dra-
matically when neighboring polygons are included. Instead of assigning a point
to a single polygon, a point is assigned to a single polygon and a neighboring
polygon. This relaxes the analysis to indicate that a point is associated with
a source facility from a set of source facilities. The corresponding results are
shown in Table 10, where the points are correctly assigned to a set of source
facilities more than 95% of the time for all four approaches.
5. Discussion
Critical infrastructures, such as electric power, natural gas, water and tele-
communications, provide vital services to society. In the event of an outage,
these services must be restored as soon as possible to bring the situation back
to normal and reduce the negative impacts of the outage. Several factors make
it difficult for decision makers to assess the impacts of an outage. Critical
infrastructure networks are inherently complex and the relationships between
network elements as well as those between other networks are not well under-
stood. Outage propagation is complicated to trace, especially in the case of
an electric power disruption. In addition, information on source-sink relation-
Pala, et al. 187
ships is not readily available. Therefore, prioritizing restoration and repair for
network elements can be an extremely challenging task.
Moreover, critical infrastructure networks are interconnected and it is often
the case that networks depend on other networks to function. For example,
an electric network provides power to water pumps, which are part of a water
network. Likewise, telecommunications towers and hubs also require electricity
to function. Therefore, an electric power network outage can cascade within
the network as well as to other networks. The accurate determination of service
areas is vital to modeling cross-infrastructure effects. Applying four well-known
estimation methods, namely standard and weighted Thiessen polygon and cel-
lular automata approaches, to service area determination for electric power
networks yields interesting insights. In general, the weighted cellular automata
approach is the best performer while the Thiessen polygon approach has the
worst performance. However, for points closest to the boundaries of service
areas, the weighted Thiessen polygon approach has the best accuracy.
Visual inspection of the weighted cellular automata polygons compared with
the reference dataset polygons provides some insights into the point accuracy
results. Two situations lead to the lower accuracy of weighted cellular au-
tomata polygons in the point accuracy analysis. The first involves weighted
cellular automata polygons at the outer edge of the study area and is an ar-
tifact of how the cellular automata algorithm is designed. Cellular automata
algorithms favor growth in unconstrained regions and, thus, polygons at the
edges tend to grow outward rather than inward, leading to unrealistic results.
This behavior can be controlled by introducing boundaries that limit cellular
automata growth. The second situation occurs for a few cases in the dataset
where the ratio of power output for a specific substation to the total service
area in the reference dataset is too large (e.g., when some of the power is pro-
vided to an industrial complex). Including substations with large outputs and
small area coverage in the reference dataset also contributes to errors.
Finally, cellular automata algorithms incorporate several parameters that
must be tuned. This study has used “out of the box” parameters for cellu-
lar automata to allow for the least-biased comparisons with Thiessen polygon
approaches. However, while parameter tuning can dramatically improve the
performance of cellular automata approaches, the tuning is highly specific to
the application domain.
6. Conclusions
Sophisticated modeling and simulation tools are vital to enable decision mak-
ers to predict, plan for and respond to complex critical infrastructure service
outages [27, 37]. However, modeling and simulation tools cannot function ef-
fectively without adequate, good-quality data. Unfortunately, data pertaining
to critical infrastructure assets is highly sensitive and is, therefore, difficult to
obtain; detailed data about infrastructure dependencies is even more difficult
to obtain.
188 CRITICAL INFRASTRUCTURE PROTECTION VIII
In the absence of data of adequate quantity and quality, the only feasible
solution is to rely on estimation methods to predict the impacts of critical
infrastructure service outages on populations, regional economies and other
critical infrastructure components. The empirical evaluation of service area
estimation techniques described in this paper reveals that the weighted cellular
automata and weighted Thiessen polygon approaches produce better estimates
than their standard (unweighted) counterparts. Also, the results demonstrate
that the weighted cellular automata approach has the best aggregate statistical
accuracy while the weighted Thiessen polygon approach has the best point
accuracy. However, parameter tuning dramatically improves the performance
of the cellular automata approach.
Future research will proceed along three directions. First, other critical in-
frastructures will be investigated to gain an understanding of the aspects that
are unique to critical infrastructures and those that are common between crit-
ical infrastructures. Second, other comparison metrics will be developed; for
example, substation loads (in MW) could be compared with the expected con-
sumptions by populations and businesses in service areas to assess the accuracy
of the computed polygons. Third, formal probability-based methods will be in-
vestigated to cope with the error and uncertainty that underlie service area
algorithms.
References
Abstract This paper describes an asset vulnerability model decision support tool
(AVM-DST) that is designed to guide strategic investments in critical
infrastructure protection. AVM-DST is predicated on previous research
on an alternative risk methodology for assessing the current infrastruc-
ture protection status, evaluating future protective improvement mea-
sures and justifying national investments. AVM-DST is a web-based
application that works within the U.S. Department of Homeland Secu-
rity Risk Management Framework and enables decision makers to view
infrastructure assets risk profiles that highlight various features of in-
terest, select protective improvement measures within a given budget
based on defined investment strategies or other criteria, and evaluate
protective purchases against varying probabilities of attack over a given
period of time. In addition to reviewing the concepts and formulations
underlying the application, this paper describes the AVM-DST capabil-
ities, functions, features, architecture and performance.
1. Introduction
The events of September 11, 2001 and their aftermath exposed the vulner-
ability of the critical infrastructure to asymmetric domestic attacks. The 2002
Homeland Security Act made critical infrastructure protection a core mission of
the Department of Homeland Security (DHS). From the outset, DHS’s goal has
been to develop a program that would “establish standards and benchmarks for
infrastructure protection and provide the means to measure performance” [17].
Quantifiable metrics are not only essential to developing coherent strategy, but
they are also the law under the 1993 Government Performance and Results Act.
Nevertheless, despite successive attempts over the ensuing years [2–5], a 2010
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 195–211, 2014.
IFIP International Federation for Information Processing 2014
196 CRITICAL INFRASTRUCTURE PROTECTION VIII
2. AVM Overview
In 2013, an asset vulnerability model (AVM) was developed to overcome the
challenges cited in the National Research Council report [16] and provide DHS
with a quantitative means to guide strategic investments in critical infrastruc-
ture protection [21]. AVM is a risk analysis methodology that works within the
DHS Risk Management Framework to provide a baseline analysis, cost-benefit
analysis and decision support tools that provide guidance in selecting criti-
cal infrastructure protective improvement measures. AVM is predicated on a
measure designated as Θ, which represents the attacker’s probability of failure.
The selection of Θ was informed by the game theoretic research of Sandler and
Lapan [18] that evaluates defensive strategies based on an attacker’s choice of
target. The Θ formulation is constructed from five parameters corresponding to
the five phases of emergency management – prevent, protect, mitigate, respond
and recover [12]:
ID Infrastructure Sector
1 Chemical Plants
2 Dams
3 Energy
4 Financial Services
5 Food and Agriculture
6 Information Networks
7 Nuclear Reactors, Materials and Waste
8 Transportation Systems
9 Water and Wastewater Systems
According to the National Research Council, a good risk analysis (i) con-
veys current risk levels; (ii) supports cost-benefit analysis; (iii) demonstrates
risk reduction effects across multiple assets at different levels of management;
and (iv) measures and tracks investments and improvement in overall system
resilience over time [16]. Working within the DHS Risk Management Frame-
work, AVM can convey current risk levels through a baseline analysis of the
critical infrastructure sectors identified in Table 1 using the Θ risk formulation
in Equation (1). AVM can further facilitate cost-benefit analyses of proposed
protective improvement measures using the following formulation:
Figure 2 shows the same assets sorted by Θ, identifying the most protected to
the least protected assets. Similarly, the data may be sorted by asset type to
display the relative protection of assets in the same sector, or by asset location
to depict the relative protection of assets in a given geographic region. Other
views may also be generated as desired.
– Click the “Sort By” box and select the field to use in sorting.
– Click the “Direction” for sorting and select either ascending or de-
scending.
– Click “Sort” to update the chart.
Edit Improvements: The record details for assets selected to receive
protective improvements are displayed in the asset detail grid panel.
AVM-DST allows users to sort this data by clicking the column header
associated with the field that is to be sorted. AVM-DST also allows users
to remove selected asset improvements by right clicking on the desired
record and choosing “Delete.”
Export Improvements: Selected improvements may be exported in a
CSV file to support implementation efforts. The following actions must
be performed to export selected improvements:
– Click “Export” on the chart control panel (Figure 6).
– Depending on the browser being used, open the file immediately by
selecting the program with which to open it or save the file to the
browser-specific download directory.
Click “Allocate.”
Click “Simulate.”
5. AVM-DST Implementation
AVM-DST was constructed in phases using an incremental development pro-
cess. Phase 1 developed the visualization and data handling capabilities. Phase
2 added the decision support and decision analysis features. AVM-DST is writ-
ten in JavaScript and utilizes the Ext JS application framework along with the
CanvasJS charting plugin. This enables AVM-DST to run with any browser.
White, Burkhart, Chow & Maynard 207
5.1 Architecture
AVM-DST is a stand-alone, client-side, browser-oriented web application
built using JavaScript and HTML5. It does not currently contain any server
side components. It was built using the model-view-controller paradigm, which
is recommended, albeit not required, for Ext JS applications. In this paradigm,
the model is the representation of the data to be used. The model describes
the objects and their fields and specifies object relationships and hierarchies. It
also includes the functions used to manipulate the data. Ext JS uses data stores
to load, handle and manipulate collections of model instances. A view serves
as the visual interface between the user and the application. This includes
windows panels and widgets that facilitate input from the user and display
output. The controller handles the business logic of the application. It reacts
to events and updates the models and views accordingly.
5.2 Development
AVM-DST v1.0 was a proof-of-concept prototype. It included the basic func-
tionality for importing, displaying and sorting asset data. AVM-DST v1.0 used
Ext JS built-in charts that did not support zooming and panning. Also, per-
formance issues restricted the number of assets to no more than a few hundred.
AVM-DST v2.0 used CanvasJS to dramatically increase performance and
add zooming and panning. This one change enabled AVM-DST to be used to
manipulate thousands of assets. It also allowed record details of selected assets
to be displayed below the main chart are exported in the CSV format.
AVM-DST v3.0 marked the Phase 2 development by incorporating decision
support and analysis tools. It included the control panels, but only the chart
and allocate panels were functional. AVM-DST v3.0 did not implement the
attack simulation functionality.
AVM-DST v4.0 added the attack simulation functionality. It also added the
secondary chart panel to display the results.
AVM-DST v5.0, the current version, fixed the bugs identified in the previ-
ous version and optimized the attack simulation algorithm to run faster and
accommodate more simulations over longer probable attack periods.
5.3 Performance
AVM-DST was tested on a machine running Windows 7 64-bit with a 3.2 GHz
Intel Core i7-4770k CPU and an NVIDIA GeForce GTX 770 GPU. The browser
used during testing was Firefox 26.0 and the input test file contained 1,000
records. The least cost investment strategy required the most time to run
for this data set, so it was used predominantly during performance testing.
Simulation times were recorded using a ten-year probable attack period with
ten simulations and 1,000 simulations. The time to run simulations is not
always directly proportional to the number of simulations because of a constant
208 CRITICAL INFRASTRUCTURE PROTECTION VIII
pre-processing time for tasks (e.g., sorting) that are only done once regardless
of the number of simulations. Table 2 shows the run times of various functions.
6. Lessons Learned
Performance is always a concern when handling thousands of data records,
especially when using web technology. AVM-DST is a stand-alone client-side
web application. Because it does not require server-side interaction after it is
initially loaded, it does not experience network delays or server-side processing
delays that are commonly associated with web applications. AVM-DST was
tested using a data file containing 1,000 records and is expected to be able to
handle much larger data files.
Initially, the application utilized the built-in Ext JS charts that rely on
SVG technology. Because of this, AVM-DST experienced performance prob-
lems when handling charts. The browser crashed when the application was
tested on the 1,000-record file. Efforts were made to mitigate the problem by
implementing paging functionality that loads portions of the data at a time.
However, this was not ideal. For this reason, CanvasJS was incorporated be-
cause it can quickly and seamlessly handle thousands of data points in the
charts.
The asset selection decision support tool must sort the data based on the
selection model and then iteratively evaluate each asset for selection. This
process is fairly quick so the real performance bottleneck arises when populating
the grid with the selected assets.
The performance of the decision analysis tool does not depend on the size
of the input file because it only considers the asset that is most likely to be
attacked at each iteration. Instead, it is dependent on the probable period of
attack and the number of simulations to be performed. Before optimization,
this algorithm removes the destroyed assets from the data set during each
iteration and then restores and re-sorts them during the next simulation. To
prevent the browser from becoming unresponsive, the number of simulations
was limited to ten and the probability of attack was incremented in five percent
intervals. After optimization, the algorithm sorted only once, then maintained
White, Burkhart, Chow & Maynard 209
a counter that referenced the next asset being considered and incremented
the counter when it was destroyed. On the next simulation, the counter was
then reset to zero. In this manner, a substantial amount of file overhead was
eliminated by performing only a single sort and not removing the destroyed
asset records. These changes resulted in significant performance improvement.
They also afforded greater simulation resolution, allowing the probability of
attack to be incremented only one percent at each iteration, but still executing
1,000 simulations in less than three seconds.
7. Conclusions
AVM-DST leverages the AVM risk methodology to enable decision makers
to view infrastructure asset risk profiles that highlight various features of in-
terest, select protective improvement measures within a given budget based
on seven defined investment strategies and other criteria, and evaluate pro-
tective purchases against varying probabilities of attack over a given period
of time. Built as a stand-alone, client-side, browser-oriented web application
using JavaScript and HTML5, AVM-DST offers a robust range of capabilities
and functions that are easily accessible from a compact interface supporting a
variety of user features and options. Performance tests show that AVM-DST
is capable of handling large data sets with no noticeable delays; it promptly
displays simulation results for thousands of assets. Indeed, the AVM-DST re-
search demonstrates that it is possible to guide strategic critical infrastructure
protection efforts by assessing the current protection status, evaluating future
protective improvement measures and justifying national investments.
Future work related to the AVM-DST web application includes developing
additional analytics for the analysis and evaluation component, improved simu-
lation of attack scenarios based on intelligence, support for enhanced trade-offs
and extensions for including additional investment strategies. Metrics will be
added to the simulations to provide insights into the effectiveness of invest-
ment strategies. Additionally, display and visualization enhancements will be
implemented, especially optimizing the rendering of the grid panel when the
investment allocation tool populates it with the selected assets.
References
[1] A. Clauset and R. Woodard, Estimating the historical and future proba-
bilities of large terrorist events, Annals of Applied Statistics, vol. 7(4), pp.
1838–1865, 2013.
1. Introduction
Cyber attacks against supervisory control and data acquisition (SCADA)
systems [22] have shown that security violations can compromise the proper
functioning of critical infrastructures. The Stuxnet worm [13] exploited vul-
nerabilities in the information and communications technology layer (primarily
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 213–229, 2014.
IFIP International Federation for Information Processing 2014
214 CRITICAL INFRASTRUCTURE PROTECTION VIII
deficient security policies and bugs in special purpose systems), ultimately af-
fecting the operation of programmable logic controllers and the uranium hexa-
fluoride centrifuges they controlled. Cyber attacks typically induce faults in
sensors and actuators, and alter supervisory mechanisms and notification sys-
tems. Once activated, the faults become errors and result in improper opera-
tions. These can cause failures in critical infrastructures and eventually affect
services, facilities, people and the environment.
Sophisticated wireless sensor networks [5] are increasingly used to monitor
critical infrastructure assets, including dams and pipelines [4, 18]. In fact, sen-
sor networks are rapidly being integrated in SCADA environments. Wireless
sensor networks are often deployed in hydroelectric power plants and dams to
monitor feed water supply, power generation, structural stability, environmen-
tal conditions and pollution levels. A single dam can have a thousand sen-
sors, with additional sensors deployed in areas surrounding the water reservoir.
Wireless sensor networks expose SCADA systems to new threats introduced
by the information and communications technology layer. Unlike traditional
sensor systems, wireless sensor networks are also vulnerable to signal eaves-
dropping and physical tampering, along with new ways of compromising data
confidentiality, integrity and availability. The effects of cyber attacks against
a dam include: (i) anomalous variations in seepage channel flows; (ii) uncon-
trolled gate opening; (iii) excessive turbine and infrastructure vibrations; (iv)
structural instability; and (v) reservoir level variations.
Despite the adoption of security policies and the implementation of counter-
measures, SCADA systems and wireless sensor networks continue to be vulner-
able [2, 17]. SCADA systems are generally unable to cope with cyber attacks
primarily because they were not designed with security in mind. Protection
from cyber attacks has to be provided by additional security mechanisms that
must be integrated with existing SCADA systems in a seamless manner. Logi-
cal security is commonly provided by security information and event manage-
ment (SIEM) systems, which are specifically designed to manage and operate
information and communications technology applications.
This paper presents a next-generation SIEM platform that performs real-
time impact assessment of cyber attacks against monitoring and control sys-
tems in interdependent critical infrastructures. Run-time service level analysis
is performed in the SIEM workflow. This is enabled by three novel contribu-
tions: (i) enhanced security event collectors (probes) that perform advanced
semantic analysis of non-IP domains (e.g., wireless sensor networks) in the
SIEM framework; (ii) impact assessment based on interdependency simulation;
and (iii) transformation of SIEM risk assessment metrics to critical infrastruc-
ture operational levels (i.e., levels of services provided by the attacked systems).
The approach also helps predict service level variations when limits are imposed
on information sharing among different critical infrastructures.
Romano, et al. [23] have proposed the use of an enhanced SIEM system to
monitor the security level of a traditional dam that incorporates legacy control
systems and wireless sensor networks; the system was designed to collect data
Formicola, et al. 215
from physical devices (sensors) and correlate physical events with events gener-
ated at the logical layer. This paper further enhances the SIEM system to assess
the impact of cyber attacks against a dam that exhibits interdependencies with
other critical infrastructures. The goal is to improve risk analyses performed
by SIEM systems with qualitative and quantitative analysis of service level
variations. This ultimately reduces the time required for decision making and
improves decision outcomes in the presence of impending failures. The impact
assessment module of the SIEM system relies on i2Sim [16], an infrastructure
interdependency simulator that models resource flows between critical infras-
tructures and assesses how the output of one critical infrastructure is affected
by the availability of resources provided by other critical infrastructures.
2. Related Work
This section discusses related work on next-generation SIEM systems for
service level monitoring and models for evaluating critical infrastructure inter-
dependencies.
Collections of events occurring in network systems enable the SIEM frame-
work to assess the security level of network domains. A common way to store
this information is to save it in logs generated by security probes and logi-
cal sensors. Since logs have heterogeneous formats (semantics and syntax), it
is necessary to convert log data into a common representation. The overall
process encompasses data gathering, parsing, field normalization and format
conversion. Mostly, this process is executed by SIEM agents that collect data
from several sources. In order to use SIEM systems to protect critical infras-
tructures, obtain a holistic view of security and enable impact analysis of cyber
attacks on service levels, it is necessary to incorporate enhanced data collec-
tors [6]. Specifically, enhanced data processing has to be introduced at the
edge of the SIEM architecture to perform multi-level data aggregation and to
manage data processing in the organizational domain [6].
Two widely-used data collectors, OSSIM-Agents [1] for the Open Source Se-
curity Information Management (OSSIM) SIEM platform and Prelude-LML for
the Prelude OSS SIEM system [19], collect data using transport protocols (e.g.,
Syslog, Snare, FTP and SNMP) and produce OSSIM and IDMEF [8] messages,
respectively. Both types of collectors execute format translation tasks, but do
not perform content analysis and advanced data manipulation such as aggrega-
tion, filtering, correlation, anonymization and content-based encryption. Cop-
polino, et al. [7] have demonstrated that the OSSIM SIEM system can be used
to protect critical infrastructures in a non-intrusive manner (i.e., without mod-
ifying SIEM framework components). They also show how to process physical
layer data on the OSSIM server. Specifically, the server is configured to an-
alyze environmental and physical measurements to detect physical anomalies
in the SCADA workflow of a dam infrastructure. The introduction of SIEM
technology in a dam protection system enables a massive number of messages
to be sent from data sources (measurement collection points) located in the
field towards the core of the OSSIM architecture (OSSIM server).
216 CRITICAL INFRASTRUCTURE PROTECTION VIII
risk (and severity) are important to calculate the impacts of the cyber attacks
that are detected.
The impact assessment process can be described as follows:
Each event e is normalized by the GET framework in order to have a
standard structure and appear as an information vector of the monitored
activity e(x1 , ..., xN ) where N is the number of fields that comprise the
normalized event format.
The SIEM server stores all the information that can help improve the
accuracy of detection by the organization that hosts the SIEM system.
This information includes the real vulnerabilities that affect a targeted
host (e.g., known bugs) and the relevance of the target as a company asset.
This information is referred to as “context information” or simply “the
context” and is expressed as a vector of the additional data a(s1 , . . . , sm ).
It is worth noting that this information is known only to the organization
in charge of the targeted asset, (e.g., a company that manages the infras-
tructure) because it includes very sensitive information such as hardware
characteristics, IP addresses, software versions and business relevance.
This information cannot be shared with other infrastructures.
The correlation process operates on sequences of events (e(k)) and addi-
tional data vectors (a). At the end of the process, alarms may be triggered
if the security thresholds are exceeded. The SIEM server applies a risk
assessment function R to calculate the risk associated with a sequence of
events e in conjunction with the a information, i.e., R(e, a).
For example, consider the implementation of risk assessment as provided by
OSSIM SIEM. The OSSIM rules are called directives. When a directive is fired,
the following function is applied:
In OSSIM, the Priority range is zero to five, the Reliability range is zero to
ten and the Asset range is zero to five. Thus, Risk ranges from zero to ten. Pri-
ority and Asset are assigned through an offline analysis of host vulnerabilities,
the typology of the attack and the relevance of the targeted asset to the orga-
nization; these constitute the context vector in the model above. Reliability is
computed by observing the e sequence and by summing the Reliability of each
event. In OSSIM, Reliability is taken to be the probability that an attack is
real, given current events observed in the system. Note that lower Risk values
(e.g., zero) are not dangerous because they mean that one of the assessment
parameters has very low security relevance.
asset to the infrastructure that uses a service. Thus, criticality is not a unique
parameter, but is strictly dependent on the infrastructure that consumes the
service; it is computed by the provider based on information shared with the
consumer. Indeed, criticality focuses on the need as indicated by the consumer
infrastructure, which is not aware of the systems in the provider infrastructure.
Given the information supplied by the consumer, the provider calculates a
criticality value for each asset that is involved.
4. Example Scenario
The example scenario uses an attack on wireless sensor network nodes to
demonstrate how the enhanced SIEM system can help evaluate the impact of
an attack on infrastructure services. Figure 3 shows the scenario involving a
dam that feeds a hydroelectric power station, which feeds a power distribu-
tion substation through a transmission network (not modeled for simplicity).
Arrows in the figure indicate functional dependencies between critical infras-
tructures.
The dam provides water to the hydroelectric power station through a gate
that is remotely controlled to release basin water and activate the power plant
turbine. The dam and hydroelectric power station are controlled by a SCADA
system that utilizes a wireless sensor network. Water fed to the hydroelectric
power station is conveyed through pipes called penstocks. It is important to
guarantee that the water flow values in the penstocks are within the operational
222 CRITICAL INFRASTRUCTURE PROTECTION VIII
range. Lower values can result in low power generation while higher values can
lead to excessive turbine rotational speed and turbine vibration, which can
result in physical damage to the infrastructure [15].
A hospital, water distribution station and manufacturing plant receive elec-
tricity from the power distribution substation. All the dependencies are mod-
eled using i2Sim. A cyber attack is launched against the wireless sensor network
that monitors the dam; the objective is to measure the impact on the operabil-
ity level of the hospital, which requires electricity and water. Table 1 shows
the electrical demands of the critical infrastructures in the scenario.
The wireless sensor network enables the SCADA system to monitor physical
parameters. Four types of sensors are used: (i) three water flow sensors placed
in the penstocks (WF1, WF2, WF3); (ii) two water level sensors that monitor
erosion and piping phenomena under the dam wall (WL1 and WL2); (iii) a
tilt sensor placed on the dam gate to measure the gate opening level (inclina-
tion); and (iv) a vibration sensor placed on the turbine. The sensors, which
correspond to nodes in the wireless sensor network, send their measurements at
regular intervals to the wireless sensor network base station (BS). The base sta-
tion acts as wireless remote terminal unit (RTU) that forwards measurements
to the remote SCADA server. Opening commands are issued by the remote
SCADA facility to the gate actuator. The information and communications
technology components deployed include a network-based intrusion detection
system (N-IDS) installed in the remote SCADA server facility, a host-based
intrusion detection system (H-IDS) positioned in the dam facility and a SIEM
platform with a correlation engine located in a remote office. Figure 4 shows the
results of applying the MHR approach, which models the services and equip-
ment that are relevant to the critical infrastructure impact assessment module
of the SIEM platform.
generated by the SIEM correlator are mapped to physical modes of the consid-
ered critical infrastructures. Changes to the physical modes of i2Sim result in
changes to the RMs of the affected cells that measure their operability levels.
Gate Level = 2
Detection
“Reliability” Port Scan of SCADA Network N-IDS Event
2
control threshold. The result of the attack is that the gate opening moves to
level 1 although measurements indicate that the gate opening is at level 2 (last
measurements in the sequence in Figure 5).
The anomaly is revealed by two security probes: the first (WF SP) reveals
an inconsistency in the water flows and the second (G F) reveals a gate opening
level inconsistency for all three sensors. Note that another security probe that
monitors the water level in the seepage does not show any inconsistency for WL1
and WL2. The alarms from the security probes are correlated by the SIEM
platform according to the rule shown in Figure 6. The rule takes into account
the two events from the H-IDS and N-IDS due to the worm activities and
access to the wireless sensor network RTU host. The final alarm generated by
the SIEM server contains evidence that the wireless sensors exhibit anomalies.
In particular, the security probes indicate that WFx in the Penstock1 zone
exhibits anomalous conditions. Such parameters, despite being irrelevant to
the rule, are crucial to understand the impact of the attack (i.e., reduction in
the power supplied by the hydroelectric power station). The parameters are
used by i2Sim to evaluate the impact of the attack. In the rule, the Priority
is highest (5), Reliability is 8 (sum of single event reliabilities) and Asset has
the highest value (5). Thus, the Risk is (5 × 8 × 5)/25 = 8. This value must
be associated with the service criticality of the wireless sensors with respect to
the power production service in the critical infrastructure impact assessment
module.
flow measurements and lead to low or over energy production, thus impacting
the dependent critical infrastructures.
In this scenario, the Risk (R) of the attack is 8 while the event criticality (C)
is in the range 0 to 0.5 (0 is not critical and 0.5 is highly critical). Given that
the energy production is affected by the wireless sensor network measurements
by a factor of 0.5, the resulting impact is PM = R × C = 8 × 0.5 = 4.
The physical mode (PM) value is the physical mode in i2Sim where a value
of one corresponds to fully operational and a value of five corresponds to not
operational. Specifically, PM = 4 indicates that the cyber attack moves the
physical mode functionality down to its lowest energy production level. The
0.5 factor was chosen because the wireless sensor network affects the total
productivity of the power plant. Figure 7 shows a scenario where a cyber attack
against the water flow sensors is detected. Due to the existing interdependency
phenomena, the cyber attack degrades the operability level of the hospital.
5. Conclusions
The next-generation SIEM platform described in this paper is designed to
support the real-time impact assessment of cyber attacks that affect interde-
pendent critical infrastructures. The platform can detect cyber attacks against
wireless sensor network nodes and can conduct real-time assessments of the
impact of the attacks on the services provided by the wireless sensor nodes as
well as the potential cascading effects involving other critical infrastructures.
As demonstrated in the scenario, the i2Sim tool can be used to model the
physical layer and services of an interdependent system (i.e., a dam and hydro-
electric power plant) in order to analyze the impact of service degradation. The
scenario helps understand how the interdependent system reacts to an attack
that impacts water flow from the dam. The resulting functioning levels of the
hydroelectric power plant and the effects on other critical infrastructures can
be provided as inputs to an operator dashboard to help make decisions about
appropriate mitigation strategies. Our future research will continue this line of
Formicola, et al. 227
inquiry, in particular, validating the approach and the SIEM platform using a
realistic testbed that incorporates a dam equipped with sensors and actuators.
Acknowledgement
This research was supported by the Seventh Framework Programme of the
European Commission (FP7/2007-2013) under Grant Agreement No. 313034
(Situation Aware Security Operations Center (SAWSOC) Project). The re-
search was also supported by the TENACE PRIN Project (No. 20103P34XC)
funded by the Italian Ministry of Education, University and Research.
References
[1] AlienVault, OSSIM Sensor (www.alienvault.com/wiki/doku.php?id=
documentation:agent).
[2] C. Alcaraz and J. Lopez, A security analysis for wireless sensor mesh net-
works in highly critical systems, IEEE Transactions on Systems, Man and
Cybernetics, Part C: Applications and Reviews, vol. 40(4), pp. 419–428,
2010.
[3] A. Alsubaie, A. Di Pietro, J. Marti, P. Kini, T. Lin, S. Palmieri and A.
Tofani, A platform for disaster response planning with interdependency
simulation functionality, in Critical Infrastructure Protection VII, J. Butts
and S. Shenoi (Eds.), Heidelberg, Germany, pp. 183–197, 2013.
[4] X. Bai, X. Meng, Z. Du, M. Gong and Z. Hu, Design of wireless sen-
sor network in SCADA system for wind power plant, Proceedings of the
IEEE International Conference on Automation and Logistics, pp. 3023–
3027, 2008.
[5] P. Baronti, P. Pillai, V. Chook, S. Chessa, A. Gotta and Y. Hu, Wireless
sensor networks: A survey on the state of the art and the 802.15.4 and
ZigBee standards, Computer Communications, vol. 30(7), pp. 1655–1695,
2007.
[6] L. Coppolino, S. D’Antonio, V. Formicola and L. Romano, Enhancing
SIEM technology to protect critical infrastructures, Proceedings of the Sev-
enth International Workshop on Critical Information Infrastructure Secu-
rity, pp. 10–21, 2010.
[7] L. Coppolino, S. D’Antonio, V. Formicola and L. Romano, Integration of
a system for critical infrastructure protection with the OSSIM SIEM plat-
form: A dam case study, Proceedings of the Thirtieth International Con-
ference on Computer Safety, Reliability and Security, pp. 199–212, 2011.
[8] H. Debar, D. Curry and B. Feinstein, The Intrusion Detection Message
Exchange Format (IDMEF), RFC 4765, 2007.
[9] S. De Porcellinis, S. Panzieri and R. Setola, Modeling critical infrastruc-
ture via a mixed holistic reductionistic approach, International Journal of
Critical Infrastructures, vol. 5(1/2), pp. 86–99, 2009.
228 CRITICAL INFRASTRUCTURE PROTECTION VIII
Abstract This paper describes an approach for assessing potential casualties due
to events that adversely impact critical infrastructure sectors. The ap-
proach employs the consequence calculation model (CMM) to integrate
quantitative data and qualitative information in evaluating the socio-
economic impacts of sector failures. This is important because a critical
event that affects social and economic activities may also cause injuries
and fatalities. Upon engaging a structured method for gathering infor-
mation about potential casualties, the consequence calculation model
may be applied to failure trees constructed using various approaches.
The analysis of failure trees enables decision makers to implement ef-
fective strategies for reducing casualties due to critical events.
1. Introduction
The European Commission Directive 2008/114/EC of 2008 [5] defines a crit-
ical infrastructure as “an asset, system or part thereof located in [m]ember
[s]tates which is essential for the maintenance of vital societal functions, health,
safety, security, economic or social well-being of people, and the disruption or
destruction of which would have a significant impact in a [m]ember [s]tate as a
result of the failure to maintain those functions.” The directive clarifies a Eu-
ropean critical infrastructure as one that is located in a European Union (EU)
member state whose destruction or malfunction would have a significant impact
in at least two EU member states. The significance of the impact should be
assessed in terms of cross-cutting criteria, including the effects of cross-sector
dependencies involving other infrastructures.
According to Article 3 of Directive 2008/114/EC [5], the identification pro-
cess of each member state should be based on the following cross-cutting crite-
ria:
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 231–242, 2014.
IFIP International Federation for Information Processing 2014
232 CRITICAL INFRASTRUCTURE PROTECTION VIII
EE
Sector
F
EE I EE
EE I
Sector Sector Sector
Sector B O D
C EE PE F PE PE F
F Sector Sector EE I
G E
Sector
PE F I A
Sector Sector
H L
PE
t0 t1 t2 t3 T t4
Note that, according to Assumption A1, the impacts of the affected sectors
are independent, while the disruption of one sector is strictly related to the
disruption of one or more other sectors.
Sector
F
I
I
Sector Sector Sector
Sector B O D
C F F
F Sector Sector I
G E
Sector
F I A
Sector Sector
H L
t0 t1 t2 t3 T t4
computed as the sum of the injured persons and the sum of the fatalities oc-
curring in all the affected sectors. Without any loss of generality, the model for
assessing the impact in terms of injured persons and fatalities can be described,
in general, as casualties and applied to the two cases. Given n sectors, only mc
of the sectors (mc ≤ n) suffer effects in terms of casualties. In the proposed
model, the casualties caused by the disruption of the j th sector at time t are
linked to the operativity levels according to the equation:
Equation (2) implies that the outage of the j th sector has an instantaneous
effect (at the same instant of time) on the casualties. This is relaxed by intro-
ducing a delay time tj for the j th sector and modifying the equation accordingly:
y(t + tj ) = αj Θ[θj − xj (t)]. (3)
j
Thus, the operativity level of the j th sector at time t influences the casualties
at time t + tj . To this point, the additional hypotheses have not come into play.
In the case that the operativity levels do not take values in the real interval [0, 1],
but only take discrete values of 0 or 1 (Hypothesis H3), the parameter θj has
no meaning. In fact, it is perfectly reasonable for a completely functional sector
not to have any effect on the casualties, while a completely non-functional sector
must have some effect on the casualties. In this case, Equation (3) reduces to:
Cavallini, Bisogni, Bardoscia & Bellotti 237
y(t + tj ) = αj [1 − xj (t)]. (4)
j
4. Information Collection
Several academic and empirical works have attempted to assess casualties
due to critical events. For example, Cavalieri, et al. [3] evaluate the number
of casualties (injuries and fatalities) based on the number of displaced people
in the case of an earthquake or damage to infrastructure systems. Hirsch [7]
assesses casualties due to critical events based on health care system response.
Casualty assessment in the consequence calculation model employs a general
approach. Four pieces of information are needed to validate the model with
discrete operativity level values (Hypothesis H3): (i) sectors that potentially
cause casualties (mc ); (ii) average number of casualties induced by the complete
disruption of the j th sector per unit of time (αj ); (iii) delay time of the j th sector
(tj ); and (iv) number of casualties induced at time t by the complete disruption
of the j th sector (for validation purposes) (Cj (t)).
Casualty information needed by the consequence calculation model for an
Italian case study was collected from four data sources (DS1–DS4):
DS1: A pilot survey involving nearly 200 sector experts that collected
information pertaining to the identification of sector components and the
assessment of potential impacts due to sector failures.
DS2: A questionnaire submitted to one expert from each sector that po-
tentially suffers casualties. The information helped refine the assessment
of the potential casualties occurring as a result of sector failures.
the main interest is in evaluating the number of casualties caused in the time
frame starting right after the end of the direct effects of an event. This per-
spective is considered in the concrete application of the consequence calculation
model, which seeks to provide indications of intervention priorities in different
sectors in order to contain the potential consequences. For example, in the
case of the L’Aquila earthquake, analysis of the data using the consequence
calculation model should discriminate between casualties (injured persons and
fatalities) directly caused by the event and the casualties caused by consequent
failures of infrastructures in the affected area.
Another challenge arises because, in the consequence calculation model, each
sector is supposed to have a deterministic impact in case of a total failure re-
gardless of the timing of the failure (Assumption A2). For example, in the case
of the L’Aquila earthquake, data on casualties caused by consequent failures of
infrastructures in the affected area were not collected with respect to detailed
time frames (e.g., casualties due to the electricity sector outage after one hour,
one day or one week).
Step 2: Second round of interviews with the experts for the selected
sectors. The experts were given an ad hoc questionnaire (Questionnaire
for impact evaluation in terms of casualties in the event of sector failures)
(DS2).
Step 1 yields the sectors that cause casualties. In theory, a total disruption
of any sector would cause casualties in the long term. The sectors that cause
casualties are those that have higher probabilities of generating injuries and
fatalities in the short term. The selection of sectors was made on the basis
of information provided by experts in the pilot survey and a “reasonability
240 CRITICAL INFRASTRUCTURE PROTECTION VIII
100
Sector failure
Number of injuries
80
60
40
20
0
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
Time
assessment” made by the research team. A preliminary cut was made of the
sectors that might be directly responsible for the occurrence of casualties.
The key element of Step 2 was the interviews of sector experts (DS2). Gen-
eral considerations regarding the propensity of a sector to generate casualties
in the short term due to a complete and prolonged outage came with detailed
information on the impacts along the time dimension. In particular, the Ital-
ian sector experts were asked to provide indications to help construct casualty
curves of injured persons and fatalities (Figure 3). The casualty curves can
help overcome the limitations of Hypothesis H1 by adding a time after which
no more impacts occur. Note that the non-recovery of a sector implies the
indefinite generation of new casualties.
The key information provided by the experts for their sectors of reference
included:
The instant of time when the effects start and the instant of time when
the effects end with respect to the instant of time when the failure occurs.
6. Conclusions
Consolidated approaches are required to assess the consequences of critical
events, especially the casualties that potentially occur when critical infrastruc-
tures are disrupted, damaged or destroyed. The consequence calculation model
is readily applied to any structured classification of socio-economic activities
with a predefined geographical scope. The model relies on the definition of sec-
tors of economic activity as identified in official statistical classifications (e.g.,
NACE for the European context), but it can also be implemented by classifying
socio-economic activities in any coherent manner. Moreover, the consequence
calculation model can be applied to assess the effects of critical events regard-
less of the approach used to represent interdependencies (e.g., input-output
relationships and direct recognition).
The application of the consequence calculation model in the Italian context
proved to be a challenging task. Due to the paucity of publicly-available data, it
was necessary to solicit information from sector experts to apply the model and
validate the results. Nevertheless, the model and its failure trees are invaluable
to operators and strategic decision makers.
Future research will focus on alleviating the limitations induced by the as-
sumptions and hypotheses, thereby providing civil protection personnel and
first responders with an effective planning instrument for analyzing potential
casualties. Extending the scope to additional countries is another important
research topic – it will help tune the model and enhance strategies for reducing
event consequences, especially casualties, that directly affect populations.
Acknowledgement
This research was initiated under Project DOMINO, which was supported
by the Prevention, Preparedness and Consequence Management of Terrorism
and Other Security Related Risks Programme launched by the Directorate-
General Home Affairs of the European Commission. Project DOMINO research
was conducted by the Ugo Bordoni Foundation (Italy), FORMIT Foundation
(Italy) and Theorematica (Italy) with the support of the Presidency of the
Council of Ministries (Italy), Home Office (United Kingdom), SGDN (France)
and Ministry of Emergency Situations (Bulgaria).
References
[1] F. Bisogni and S. Cavallini, Assessing the economic loss and social impact
of information system breakdowns, in Critical Infrastructure Protection
IV, T. Moore and S. Shenoi (Eds.), Springer, Heidelberg, Germany, pp.
185–198, 2010.
[2] F. Bisogni, S. Cavallini, L. Franchina and G. Saja, The European perspec-
tive of telecommunications as a critical infrastructure, in Critical Infras-
tructure Protection VI, J. Butts and S. Shenoi (Eds.), Springer, Heidelberg,
Germany, pp. 3–15, 2012.
242 CRITICAL INFRASTRUCTURE PROTECTION VIII
ADVANCED TECHNIQUES
Chapter 16
EVALUATION OF FORMAT-
PRESERVING ENCRYPTION
ALGORITHMS FOR CRITICAL
INFRASTRUCTURE PROTECTION∗
1. Introduction
Legacy industrial control systems were developed and implemented well be-
fore the threats associated with modern networking were recognized. The trend
to interconnect industrial control systems, however, has introduced many se-
curity concerns [26]. The systems were designed for performance, reliability
∗ The rights of this work are transferred to the extent transferable according to title 17 U.S.C. 105.
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 245–261, 2014.
IFIP International Federation for Information Processing 2014 (outside the US)
246 CRITICAL INFRASTRUCTURE PROTECTION VIII
2. Background
Encryption is the mathematical manipulation of data in a manner that
makes it unintelligible to unauthorized parties, yet recoverable by intended
recipients [27]. Figure 1 shows the modern cryptography hierarchy. Crypto-
graphic algorithms can be categorized as symmetric or asymmetric algorithms,
also known as private-key or public-key algorithms, respectively. Symmetric
algorithms use the same key for encryption and decryption; the key must be
distributed offline or via a secure key distribution protocol. Asymmetric al-
gorithms use two keys: one for encryption and the other for decryption. One
of the keys (private key) is kept secret by one party; the other key (public
key) can be distributed openly. This resolves the problem of key distribution,
but asymmetric algorithms are typically more complex and computationally
intensive than symmetric algorithms.
Cryptographic algorithms operate as block ciphers or stream ciphers. Stream
ciphers encipher the plaintext one character at a time and concatenate the in-
dependent encryptions to produce the ciphertext. Stream ciphers are fast, but
are prone to weaknesses with regard to integrity protection and authentica-
tion [27]. On the other hand, block ciphers are slower, but their mechanisms
ensure the security properties of confusion and diffusion. Confusion means that
the key does not relate in a simple manner to the ciphertext; it refers to making
the relationship as complex as possible using the key non-uniformly through-
out the encryption process. Diffusion means that changing a single character
in the plaintext causes several characters in the ciphertext to change, and vice
versa [27]. Block ciphers are widely used in modern cryptography, and three in
particular – AES, 3DES and Skipjack – are recommended for use by NIST [6].
248 CRITICAL INFRASTRUCTURE PROTECTION VIII
AES, 3DES and Skipjack are applied to 64-bit or 128-bit blocks of data.
When AES was designed, 128-bit message blocks were commonly used for cryp-
tographic applications [22]. Messages that do not fit the prescribed block size
are padded or truncated. However, many supervisory control and data ac-
quisition (SCADA) systems used in the critical infrastructure do not permit
padding. SCADA systems traditionally use low-bandwidth links and compact
communications protocols such as Modbus and DNP3 [28]. Solutions have been
developed to retrofit security in these systems, but they often incur significant
processing and buffering overhead that cannot be tolerated in systems with
strict timing constraints [28]. A preferred solution is an algorithm that can
transform formatted data into a sequence of symbols such that the encrypted
data has the same format and length as the original data [22].
Figure 2. Feistel structure of the FF1, FF2 and FF3 algorithms [6].
FF2 Algorithm: The FF2 algorithm is derived from the VAES3 algo-
rithm proposed by Vance [29]. Figure 4 describes the FF2 algorithm,
which generates a subkey for the block cipher in the Feistel round func-
tion; this can help protect the original key from side-channel analysis [6].
FF2 differs from FF1 in that it employs a larger tweak with an indepen-
dent tweak radix to allow for additional variation in the cipher.
of the tweaks that are supported. In particular, the FF3 employs a 64-
bit tweak, which is split into right and left halves that are used to add
diffusion to odd and even encryption rounds, respectively.
test, spectral test, non-periodic templates test, overlapping template test, uni-
versal statistical test, random excursion test, random excursion variant test,
Lempel-Ziv complexity test, linear complexity test and an approximate en-
tropy test [23]. The Rijndael algorithm performed satisfactorily in all the tests
and was selected as the AES algorithm.
Since FPE algorithms are modes of operation of the underlying block cipher,
FF1, FF2 and FF3 should benefit from the statistical characteristics of AES.
This hypothesis is supported by theoretical results [13, 19, 20]. Our evaluation
uses Shannon entropy measurements to assess the security characteristics of
the three FFX algorithms. Note that entropy is a measure of unpredictabil-
ity or information content; Shannon entropy quantifies the expected value of
the information contained in a message and is typically measured in bits per
byte [27].
In addition to security performance, the computational performance of the
algorithms is an important criterion. Several metrics may be used to measure
the computational performance: encryption time, processing time and total
clock cycles per encryption [9]. The total clock cycle metric was used in this
research to evaluate the computational speed of the FF1, FF2 and FF3 algo-
rithms.
3. Experimental Design
In order to determine the security and performance of the FF1, FF2 and
FF3 algorithms for critical infrastructure assets, a set of experiments was de-
signed to test the hypothesis suggested by the algorithm designers and NIST [6]
that the algorithms inherit the strong security characteristics of the underlying
block cipher. NIST has not released details of its internal deliberations and
performance assessments.
As such, statistical tests were conducted to determine the ability of the
FPE algorithms to provide confusion and diffusion, and to output ciphertext
that is computationally indistinguishable from a random process. A dataset
containing input plaintext with varying levels of entropy was created. The
FF1, FF2 and FF3 algorithms were applied to this dataset. The algorithms
were implemented in C using the offspark AES library [18] and the entropy
of the resulting ciphertext was measured.
The second objective of our research was to evaluate the computational speed
of the three algorithms by measuring the operational latency of a hardware
implementation. This was accomplished by implementing the algorithms in
VHDL using the Xilinx ISE suite for the Virtex-6 FPGA (XC6VLX240T) [31].
A hardware-agnostic design was used to mitigate effects due to the Virtex-6
CMOS technology and Xilinx FPGA architecture. The operational latency
was estimated using the number of clock cycles between the input of plaintext
and the output of its ciphertext.
254 CRITICAL INFRASTRUCTURE PROTECTION VIII
counters. The algorithms were coded in VHDL, simulated, placed and routed,
and synthesized on a Virtex-6 (XC6VLX240T) device using the Xilinx ISE de-
sign suite. Post-PAR static timing analysis and device utilization analysis were
performed on each implementation.
The throughput, latency and hardware resource requirements are usually
the most critical parameters when evaluating a hardware implementation. Our
research evaluated the speed of each algorithm by measuring the operational
latency of an encryption cycle. To eliminate bias due to the use of a particular
FPGA technology, we estimated operational latency as the number of clock
cycles required for an algorithm to encrypt plaintext.
4.1 Security
A thirteen-byte sequence of random data obtained from Random.org served
as the control in the entropy experiment. An all-random input plaintext file
created with the sequence was determined to have an entropy of 7.996 bits/byte.
In the following analysis, the mean entropy was calculated for 20 trials of each
scenario. Note that there was no statistical significant variance between the
various trials. Figure 7 and Table 1 present the security performance of each
algorithm estimated in terms of the ciphertext entropy for each level of the
experimental factor. As expected, the entropy decreases in the plaintext as
the number of deterministic bytes increases. The input entropy ranges from
Agbeyibor, Butts, Grimaila & Mills 257
7.24 bits/byte for three deterministic bytes out of the thirteen total bytes to
4.25 bits/byte for twelve out of thirteen fixed bytes. The distribution of the
deterministic bytes, whether located in the front of the string or randomly
dispersed throughout the string, does not have a significant effect on the entropy
of the plaintext.
All three algorithms provide high levels of ciphertext security with no dis-
cernible differences in performance. In all but one scenario (12 Front), the
ciphertext is indistinguishable from a random sequence with entropy above
7.996 bits/byte. The plaintext in the 12 Front scenario with entropy of 4.256
bits/byte causes a lower entropy in the ciphertext of 7.94 bits/byte versus the
7.996 bits/byte for the random sequence. The lowered entropy presents an up-
per bound on the obfuscation capabilities of FPE. Further study is necessary
to clarify this performance limitation and categorize suitable plaintext.
The three FPE algorithms provide higher levels of entropy when the same
number of deterministic bytes are randomly distributed throughout the string in
the 12 Random scenario, These results indicate that the distribution of repeated
patterns in the plaintext affects the ability of the algorithms to obfuscate the
data more than the amount of repeated information.
4.2 Performance
The performance results shown in Table 2 indicate that the underlying AES
core is the principal factor in the area and speed of the implementation. The
AES implementation employed in the designs requires 31 clock cycles per en-
cryption and 1,864 slices (slices are the basic building blocks in an FPGA
implementation). Each slice contains a number of look up tables (LUTs) that
are used to implement AND gates, OR gates and other Boolean functions. In
addition to LUTs, slices also contain a number of registers that hold state and
are used to implement sequential logic. In the device utilization report, any slice
that is used even partially is counted towards the number of occupied slices.
A design may be fitted into fewer slices if necessary, but mapping unrelated
logic into the same slice may impact the ability to meet timing constraints [31].
258 CRITICAL INFRASTRUCTURE PROTECTION VIII
The Virtex-6 provides 18 Kb and 36 Kb blocks of RAM for storing data. Our
implementations did not require any 36 Kb RAM blocks.
The iterative looping architecture employed in the design minimizes the
hardware resources needed for each algorithm. The FF1 implementation uses
two cascaded AES blocks per round, which causes the area and number of slices
required to be approximately twice those of one AES block. FF2 makes only
one call to AES per round, but uses an additional AES block to generate the
subkey. FF3 has the smallest footprint of the three algorithms because it relies
sparingly on calls to AES.
The maximum frequency is based on the worst path delay found in the de-
sign, and it indicates the fastest frequency at which a signal may be toggled
given this constraint. A simulation test bench was used to measure to opera-
tional latency of each implementation. The numbers of clock cycles required
for completing one round and for completing an entire encryption cycle are
reported for each algorithm (Table 2). The FF1 algorithm makes two calls to
AES per round, which makes it the slowest of the three algorithms. FF2 is
faster than FF1 because of its single call to AES in its F-block. FF3 is the
fastest of the three algorithms because it uses only eight rounds. The overall
results indicate that the FF3 algorithm requires the least hardware resources
and has the lowest operational latency.
5. Conclusions
The FF1, FF2 and FF3 format-preserving encryption algorithms have im-
portant applications in critical infrastructure protection. In particular, the
algorithms could be incorporated in security modules for legacy protocols and
databases that are currently incompatible with standard cryptographic prac-
tices.
The experimental results demonstrate that algorithms are secure based on
their ability to obfuscate repetitive input data. The algorithms successfully
encipher plaintext with twelve of thirteen bytes containing a deterministic se-
quence. The three algorithms (as recommended by NIST) demonstrate the
inherited security characteristics of the underlying AES cipher.
Agbeyibor, Butts, Grimaila & Mills 259
References
[1] D. Abdul Elminaam, D. Abdul Kader and M. Hadhoud, Perfomance eval-
uation of symmetric encryption algorithms, International Journal of Com-
puter Science and Network Security, vol. 8(12), pp. 280–285, 2008.
[2] M. Bellare, P. Rogaway and T. Spies, The FFX Mode of Operation for
Format-Preserving Encryption, Report to NIST Describing the FFX Al-
gorithm, National Institute of Standards and Technology, Gaithersburg,
Maryland, 2010.
[3] J. Black and P. Rogaway, Ciphers with arbitrary finite domains, Proceed-
ings of the Cryptographer’s Track at the RSA Conference, pp. 114–130,
2002.
[4] E. Brier, T. Peyrin and J. Stern, BPS: A Format-Preserving Encryption
Proposal, National Institute of Standards and Technology, Gaithersburg,
Maryland, 2010.
[5] M. Brightwell and H. Smith, Using datatype-preserving encryption to en-
hance data warehouse security, Proceedings of the Twentieth National In-
formation Systems Security Conference, 1997.
[6] M. Dworkin, Recommendation for Block Cipher Modes of Operation:
Methods for Format-Preserving Encryption, Draft NIST Special Publica-
tion 800-38G, National Institute of Standards and Technology, Gaithers-
burg, Maryland, 2013.
[7] A. Elbirt, W. Yip, B. Chetwynd and C. Paar, An FPGA-based perfor-
mance evaluation of the AES block cipher candidate algorithm finalists,
IEEE Transactions on Very Large Scale Integration Systems, vol. 9(4), pp.
545–557, 2001.
[8] C. Finke, J. Butts and R. Mills, ADS-B encryption: Confidentiality in
the friendly skies, Proceedings of the Eighth Annual Cyber Security and
Information Intelligence Research Workshop, pp. 9–13, 2013.
260 CRITICAL INFRASTRUCTURE PROTECTION VIII
[9] T. Good and M. Benaissa, AES on FPGA from the fastest to the small-
est, Proceedings of the Seventh International Workshop on Cryptographic
Hardware and Embedded Systems, pp. 427–440, 2005.
[10] M. Luby and C. Rackoff, How to construct pseudorandom permutations
from pseudorandom functions, SIAM Journal on Computing, vol. 17(2),
pp. 373–386, 1988.
[11] M. McLoone and J. McCanny, High performance single-chip FPGA Ri-
jndael algorithm implementations, Proceedings of the Third International
Workshop on Cryptographic Hardware and Embedded Systems, pp. 65–76,
2001.
[12] B. Morris, P. Rogaway and T. Stegers, How to encipher messages on a small
domain, Proceedings of the Twenty-Ninth Annual International Conference
on Advances in Cryptology, pp. 286–302, 2009.
[13] M. Naor and O. Reingold, On the construction of pseudorandom permuta-
tions: Luby-Rackoff revisited, Journal of Cryptology, vol. 12(1), pp. 29–66,
1999.
[14] National Institute of Standards and Technology, Advanced Encryption
Standard (AES), Federal Information Processing Standards Publication
197, Gaithersburg, Maryland, 2001.
[15] National Institute of Standards and Technology, Critical Infrastructure
Protection, Gaithersburg, Maryland, 2002.
[16] National Institute of Standards and Technology, Cybersecurity Frame-
work, Gaithersburg, Maryland, 2013.
[17] B. Obama, Improving critical infrastructure cybersecurity: Executive Or-
der 13636, Federal Register, vol. 78(33), pp. 11739–11744, 2013.
[18] Offspark, offspark: Straightforward Security Communication, Rijswijk,
The Netherlands, 2014.
[19] J. Patarin, Luby-Rackoff: Seven rounds are enough for 2n(1−ε) security,
Proceedings of the Twenty-Third Annual International Conference on Ad-
vances in Cryptology, pp. 513–529, 2003.
[20] J. Patarin, Security of random Feistel schemes with five or more rounds,
Proceedings of the Twenty-Fourth Annual International Conference on Ad-
vances in Cryptology, pp. 106–122, 2004.
[21] Random.org, Random Binary File 2013-09-17, Dublin, Ireland (www.
random.org/files), 2013.
[22] P. Rogaway, A Synopsis of Format-Preserving Encryption, Voltage Secu-
rity, Cupertino, California, 2013.
[23] J. Soto, Randomness Testing of the AES Candidate Algorithms, National
Institute of Standards and Technology, Gaithersburg, Maryland, 1999.
[24] T. Spies, Feistel Finite Set Encryption Mode, National Institute of Stan-
dards and Technology, Gaithersburg, Maryland, 2008.
Agbeyibor, Butts, Grimaila & Mills 261
1. Introduction
The consensus problem is a fundamental problem in the domain of fault-
tolerant distributed systems. It requires the system processes to agree on a
common value despite the presence of some faulty processes. Fischer, et al. [17]
J. Butts and S. Shenoi (Eds.): Critical Infrastructure Protection VIII, IFIP AICT 441, pp. 263–276, 2014.
IFIP International Federation for Information Processing 2014
264 CRITICAL INFRASTRUCTURE PROTECTION VIII
some important properties observed in large real-world networks are still miss-
ing in graphs that exhibit different exponents while still showing a power-law
degree sequence. Therefore, we argue that non-complete graphs are particularly
interesting when studying the feasibility and efficiency of consensus problems.
Building on our earlier work on Erdos-Renyi random graphs [29], this paper
focuses on randomized asynchronous binary Byzantine consensus for graphs in
configuration model with power-law degree sequence and presents
the G(n, d)
an algorithm that achieves the desired primary result with reduced message
complexity for non-complete graphs. To reach this objective, a refinement
of Ben-Or’s algorithm recently proposed by Correia, et al. [13] is considered.
Their approach differs from this work in that it considers fully-connected com-
munications networks. This paper shows that, when choosing a non-complete
graph as a communications system, no additional asynchronous messaging as-
sumptions are needed. Moreover, it is possible to increase message complexity
efficiency by considering higher degree nodes that forward received messages
with high probability Phigh and lower degree nodes that forward messages with
low probability Plow .
2. Related Work
The consensus problem is a fundamental problem in the domain of dis-
tributed systems. Fischer, et al. [17] proved that a deterministic algorithm
cannot solve the consensus problem in an asynchronous model even in the pres-
ence of one faulty process. In the asynchronous model, each communication
can take an arbitrary and unknown amount of time, and there is no assumption
of a joint clock as in the synchronous model. However, Ben-Or [6] showed that
a randomized algorithm can solve the consensus problem even when a constant
fraction of processes are faulty. Interested readers are referred to [1, 4] for
a complete proof of correctness of Ben-Or’s algorithm and a detailed survey
of randomized consensus protocols. Consensus in the asynchronous Byzan-
tine message-passing model has been shown to require n ≥ 3f + 1 processes
in several variations of the basic model. Recently, Correia, et al. [13] showed
that it is possible to solve Ben-Or’s asynchronous Byzantine binary random
consensus problem with 2f + 1 processes. Consensus protocols play an im-
portant role in replication algorithms that can be utilized to protect critical
infrastructures [14]. Castro and Liskov [8], Chun, et al. [9] and Veronese, et
al. [27] proposed replication algorithms to implement highly-resilient services;
some of these algorithms can be used to control services such as water, power
and gas [18, 26].
Traditionally, the consensus problem was formulated in the context of ran-
dom and fully-connected networks, although this assumption is typically not
stated. Unfortunately, many large complex networks are poorly approximated
by complete graphs or even simple random graphs. Many of these networks
also exhibit scale-free properties. This has led to the consensus problem being
studied in scale-free networks by Wang, et al. [28] using the Barabasi-Albert
model [5], which relies on a generative model. However, the preferential attach-
266 CRITICAL INFRASTRUCTURE PROTECTION VIII
ment model only produces networks with a power-law exponent of three and
some important properties observed in large real-world networks are still miss-
ing in these graphs, which exhibit different exponents while showing a power-
law degree sequence. This paper focuses on the randomized asynchronous bi-
configuration model
nary Byzantine consensus problem for graphs in the G(n, d)
with power-law degree sequence.
Fischer, et al. [17] have shown that a deterministic protocol cannot guarantee
agreement even against benign failures in asynchronous systems. Over the
years, several techniques have been proposed to circumvent this impossibility
result. One of the first approaches to solving the consensus problem was to use
randomization. Existing results allow processes to reach an agreement in fully-
connected networks. However, the case of non-complete graphs is particularly
interesting when studying the feasibility and efficiency of consensus problems in
real-world networks such as the Internet, World-Wide Web, metabolic networks
and power networks with approximate structures [22], all of which have the
power-law form P (k) ∼ k −γ .
Several models have been introduced to generate graphs with power-law
distributions. This paper considers a simple generalization of the traditional
random graph model called the configuration model [2, 20]. Chung and Lu [10]
introduced a modified version of the configuration model, where, given a se-
quence (d1 , . . . , dn ), nodes vi , vj are connected with probability proportional
to di dj . Bollobas, et al. [7] also showed analytically that graphs constructed
Weldehawaryat & Wolthusen 267
free real-world networks include power networks, the World-Wide Web, email
networks, social networks and networks of Internet routers [15]. Scale-free
networks usually have nonhomogeneous topologies where the majority of the
nodes have few links, but a small number of nodes have a large number of
links and P (k) decays according to the power-law P (k) ∼ k −γ where γ is the
power-law exponent [21]. Most real-world networks have the scale-free property
with γ satisfying the constraint 2 < γ < 3 [3]. For 2 ≤ γ < 3, a network with
N nodes has constant or at most O(logN ) average degree, but the variance of
the degree distribution is unbounded. It is in the regime of γ that power-law
networks display many of the advantageous properties, such as small diameter,
tolerance to random node deletions and a natural hierarchy where there are
sufficiently many nodes of high degree.
Many models have been proposed for representing real networks. One of
them is the configuration model, which creates random graphs that can have
any generic degree distribution, and can, therefore, capture the degree charac-
teristics of real-world networks. This paper uses the configuration model with
a predefined degree distribution to generate static power-law networks.
We consider an undirected simple graph G(n, d) consisting of N vertices with
a degree sequence d = (d(1), d(2), . . . d(n)). The neighborhood of ni is denoted
by Λi and the degree distribution for the graph denoted by P (k) is defined to
be the fraction of nodes in the graph with degree k. The degree distribution
can be calculated as follows [25]:
|{v|d(v) = k}|
P (k) =
N
where d(v) is the degree of node v and N is the number
of nodes in the graph.
The average degree in the graph is denoted by k ≡ k kP (k). The number
of edges in the graph is given by m = kN/2.
query. It has been proven that, when 2 < γ < 3, the diameter of the network
d ∼ ln(ln N ) is smaller than small real-world networks (O(ln N )) and remains
almost constant while the network is growing [12]. The bond percolation step
guarantees that a query message is received by nodes in a high-connected com-
ponent of diameter O(log N ) and consisting of high-degree nodes. The content
and query implantation steps ensure that the content/message of a node are
cached in at least one of the nodes in this high-degree connected component
with probability approaching one, and that one of the nodes in the connected
component receives a query implantation with probability approaching one.
When a node issues a query message, each edge passes it with probability q.
Thus, with qE= qc kN/γ total number of messages, any content/high-degree
node can be located with probability approaching one in time O(log N ). After
the first phase, the second phase of Algorithm 1 starts message dissemination
using the hub node(s). Upon receiving this message and comparing the de-
gree of a node with the degree threshold (d), a high-degree node forwards the
received message msg with a high probability Phigh or a low-degree node for-
wards it with a low probability Plow where Phigh > Plow . At each step, the
Weldehawaryat & Wolthusen 271
5. Consensus Algorithm
This section describes the consensus algorithm and discusses its key features.
6. Conclusions
This paper has studied the existence and efficiency of randomized asyn-
chronous binary Byzantine consensus for graphs in the G(n, d) configuration
model with power-law degree sequence. A key result is that it is possible to re-
duce the message complexity in non-complete random graphs using high-degree
nodes to forward messages with high probability and low-degree nodes to for-
ward messages with low probability. Additionally, the modified Correia, et al.
variant of Ben-Or’s algorithm over non-complete graphs using the G(n, d) con-
figuration model with power-law degree sequence yields the desired primary
result. Specifically, it is possible to solve the asynchronous Byzantine binary
consensus problem with 2f + 1 processes over non-complete graphs using the
configuration model with power-law degree sequence by employing a re-
G(n, d)
liable broadcast algorithm (that requires a wormhole component, although this
has a considerably lower cost than increasing the density of the graph) and an
274 CRITICAL INFRASTRUCTURE PROTECTION VIII
References
[1] M. Aguilera and S. Toueg, The correctness proof of Ben-Or’s randomized
consensus algorithm, Distributed Computing, vol. 25(5), pp. 371–381, 2012.
[2] W. Aiello, F. Chung and L. Lu, A random graph model for power-law
graphs, Experimental Mathematics, vol. 10(1), pp. 53–66, 2001.
[3] R. Albert and A. Barabasi, Statistical mechanics of complex networks,
Reviews of Modern Physics, vol. 74(1), pp. 47–97, 2002.
[4] J. Aspnes, Randomized protocols for asynchronous consensus, Distributed
Computing, vol. 16(2/3), pp. 165–175, 2003.
[5] A. Barabasi and R. Albert, Emergence of scaling in random networks,
Science, vol. 286(5439), pp. 509–512, 1999.
[6] M. Ben-Or, Another advantage of free choice: Completely asynchronous
agreement protocols, Proceedings of the Second Annual ACM Symposium
on Principles of Distributed Computing, pp. 27–30, 1983.
[7] B. Bollobas, O. Riordan, J. Spencer and G. Tusnady, The degree sequence
of a scale-free random graph process, Random Structures and Algorithms,
vol. 18(3), pp. 279–290, 2001.
[8] M. Castro and B. Liskov, Practical Byzantine fault tolerance and proactive
recovery, ACM Transactions on Computer Systems, vol. 20(4), pp. 398–
461, 2002.
[9] B. Chun, P. Maniatis, S. Shenker and J. Kubiatowicz, Attested append-
only memory: Making adversaries stick to their word, ACM SIGOPS Op-
erating Systems Review, vol. 41(6), pp. 189–207, 2007.
[10] F. Chung and L. Lu, Complex Graphs and Networks, American Mathe-
matical Society, Providence, Rhode Island, 2006.
Weldehawaryat & Wolthusen 275