0% found this document useful (0 votes)
32 views52 pages

Comparison Between Differential and Correlation Power Analysis Attacks On Embedded Systems Master Thesis

This master's thesis compares differential power analysis (DPA) and correlation power analysis (CPA) attacks on embedded systems. Over 3,000 traces were acquired from an AES encryption execution on a ChipWhisperer NANO platform with a STM32F0 target processor. DPA and CPA techniques were used to analyze the traces in an effort to recover the secret AES key. Countermeasures against such side-channel attacks are discussed as well as ways to bypass existing countermeasures.

Uploaded by

Marek M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views52 pages

Comparison Between Differential and Correlation Power Analysis Attacks On Embedded Systems Master Thesis

This master's thesis compares differential power analysis (DPA) and correlation power analysis (CPA) attacks on embedded systems. Over 3,000 traces were acquired from an AES encryption execution on a ChipWhisperer NANO platform with a STM32F0 target processor. DPA and CPA techniques were used to analyze the traces in an effort to recover the secret AES key. Countermeasures against such side-channel attacks are discussed as well as ways to bypass existing countermeasures.

Uploaded by

Marek M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

POLITECNICO DI TORINO

DEPARTMENT OF CONTROL AND COMPUTER ENGINEERING (DAUIN)

Master Degree in Computer Engineering

Master Degree Thesis

Comparison between Differential and Correlation


Power Analysis Attacks on Embedded Systems

Author: Maurizio Di Lorenzo

Supervisor: Paolo Ernesto PRINETTO

December, 2021
Acknowledgements

I would like to express all my gratitude to my family, who patiently endured me during my study
weekends and provided me with unconditional support.
Special thanks to Gianluca Roascio for his help. He was a guide for me, and he has always been
available, even in these dicult times.

i
Abstract

Today, embedded electronic systems are everywhere, controlling every aspect of everyday life both in
professional and in private environments. Most of them manage private information or sensitive data
and implement some cryptographic algorithms with the aim of protecting private information from
stealing. Even if the algorithms themselves can be considered secure, they can be broken by physical
observation of certain properties of the electronic devices, such as the current absorbed or the time
taken to execute the algorithms. This is how Side-Channel Attacks take place.
These kinds of exploits are very eective ways to gain access to secret information hidden in the
embedded systems. It relies on the information channels not intended to be used and, in general,
underestimated at the development stage. The general principle has been applied a lot before the
advent of embedded system devices: in fact, it can be applied to a mechanical system or even to
humans without the need for complex measurement systems, such as a thief than opens a safe using a
stethoscope, or simply his ear, listening for some TICs that reveal a right combination digit.
Recently, the spread of the embedded devices hosting private or sensitive information, for example
in the Internet of Things (IoT) domain, pushes companies to increase their focus on security, spending
time and device hardware resources in secure ciphers based on standard algorithms that enable the
device to communicate with the external world.
In this work, the power absorption side channel is investigated, since it does not require costly
instrumentation and it is easily accessible. Using side-channel techniques, an attacker can gain insights
into working data or execution path to get access to some secret information-related behavior, reducing
the needed complexity to discover the secret information (i.e., a secret key of an advanced encryption
algorithm). Side channels rarely give direct access to secret information, but most of the time they
enormously reduce the number of attempts the hacker has to do to get a secret.
The thesis work analyzes two commonly-used techniques to hack the AES cryptographic algorithm:
Dierential Power Analysis (DPA) and Correlation Power Analysis (CPA), using a low-cost acquisition
system, ChipWhisperer. Such a platform is equipped with all the required components to execute
experimental tests: synchronous acquisition system, target victim processor, and software libraries.
Some thousands of traces have been acquired from AES encryption execution over the platform to
gather enough amount of data to test and compare the two methodologies. Comparison results with
respect to the target technology are presented.
Finally, an overview of the possible countermeasures commonly adopted is presented, together with
a list of known methods to make them ineective.

ii
Contents

Acknowledgements i
Abstract ii
1 Introduction 1
2 State of the Art 6
2.1 Side-Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 A Real Study-Case: CWNANO 10


3.1 ChipWhisperer NANO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Target STM32F0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 AES Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 SubBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 ShiftRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.3 MixColumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.4 AddRoundKey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.5 Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 CW-Nano Relevant API Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Object Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.2 Object Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.3 Useful Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Target Code Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Data Acquisition and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6.1 Leak Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6.2 Code Execution Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6.3 Points of Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.4 Dierential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.5 Correlation Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Conclusion 42
4.1 Results Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Known Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

iii
Chapter 1

Introduction

An embedded system can be dened as a computing device that has a dedicated function within a larger
system. It is embedded as part of a complete device often including electrical or electronic hardware
and, possibly, mechanical parts.
Embedded systems are made of central processing units, memories and I/O devices similar to a
personal computer, and sometimes they are personal computers eventually rugged, devoted to run
special software to control their main system. Usually, they run the device control software under real-
time computing constraints [1]. This denition covers a wide range of devices, used in many dierent
applications, included the so-called Internet of Things (IoT).
The focus of this work is on Side Channel attacks. A not exhaustive list of their direct objectives is
described here. The result of these attacks can range from the loss of insignicant personal information,
economic losses up to compromising strategic infrastructure as analyzed in [2], perhaps as an indirect
result of the attack. Most commonly the goal is of an economic nature.
In the following, a list of the Embedded systems applications and the possible direct causes if their
security is violated.

Automotive: nowadays every car has a big amount of electronic control units (ECU from now on),
in the range from 50 to 100, each of which is an embedded system. These are in general connected to
one or more local networks and sometimes connected to the external world through the ECU devoted
to multimedia and navigation system [3].
Recently, the advent of autonomous driving capability increased the number of ECUs both to
manage the sensors and the control itself.
Also the use of hybrid motors and pure electric motors had an impact on the complexity and, in most
cases, the numbers of ECUs. As shown in Figure 1.1, the car's software complexity is even higher than
the aircraft one.

Figure 1.1: Average Lines of Software Code in Modern Luxury Vehicle Compared to Types of
Aircraft [4].

1
Attacks objectives :
ˆ Change some working parameters : to increase engine power, modify the emission proles, etc.

ˆ Cause vehicle malfunctioning : can cause injury to people or damage the reputation of the car
manufacturer, etc.

ˆ Get physical access to the vehicle : to steal the vehicle or the goods present inside of it, etc.

Other transport sectors (aerospace, naval, train...): these types of vehicles are generally
equipped with numerous control units devoted to many dierent functionalities and, in many cases, to
achieve the required safety level, mainly due to redundancy.
These are not easily attachable with methodologies that require physical access to the devices, such
as Side Channel and Fault Injections, because of poor accessibility. Anyway, there is information about
tests put in place by DHS (the US Department of Homeland Security) on remote attacks with positive
results (as reported in the article in [5]). This kind of information is kept condential so no further
details are available.
Attacks objectives :
ˆ To cause vehicle malfunctioning : can cause injury to people (terrorist attacks).

ˆ To damage the reputation of the manufacturer : to cause heavy nancial impacts.

Trac control and infrastructures: trac control systems cover the supervision of all the
possible vehicles typologies, from ground (automotive, trains), sea (open sea and harbor) and air. Also
in these cases, the control systems rely on sparse sensors networks and/or on GPS signals prone to be
hacked as like the control ECUs placed on board and in control base systems.
There are already dierent security levels dened for dierent applications: a secondary road trac
light managing system is less sensitive than a harbor or an airport trac management system.
Attacks objectives :
ˆ To cause trac chaos : can cause nancial losses to both people and/or companies that rely on
a dened transfer time

ˆ To cause people injury or death : in very sensible cases a malfunctioning can cause severe accidents.

Figure 1.2: Electronic GAS consumption meter.

2
Consumption meters: consumption meters (or smart meters) are used by the supplying company
to charge their customers the cost of consumption measured over a dened period of time.
Today almost all the consumption meters have been converted to smart connected devices able to
send out the readings automatically, not involving the customers or external operators to save costs;
a sample of present smart meters is shown in Figure 1.2.
Some evaluations of attack-ability have been put in place, one example is reported in [3] where
authors demonstrate the eectiveness of cyber attacks.
Attacks objectives :
ˆ To steel a certain amount of not paid goods : the main reason to hack such systems is to report
a lower (or no) consumption in order to reduce the cost of used energy.

ˆ To cause nancial damage : the hackers can cause excessive consumption measurements to dam-
age the customer (higher cost) or the supplying company reputation.

Figure 1.3: Sony Bravia Android TV [6].

Smart entertainment systems: Most home entertainment systems (smart TV as shown in


Figure 1.3, Radio, etc) are connected devices with a lot of functionalities and capabilities to browse or
play content from the internet, subscribe to the content platform, and manage payment, collect audio
and video for communication tools such as Skype, Meet and others.
Attacks objectives :
ˆ To cause nancial damage : stealing money of subscribed contents.

ˆ To cause malfunction : causing the impossibility to get the desired contents.

ˆ To steal private information : listening private conversation or recording videos.

House appliances and security systems: Many House Appliances have been developed as
connected IoT devices. It has been done mainly to increase the comfort, a cyber butler can manage
some annoying stu for us.
These new features allow us to better manage the house heaters or coolers taking care of switching
them on or o based on the presence of people, to use the available electrical power is a smarter way
(starting the washing machine after the dishes cleaner machine nished the most power sinking phases)

3
or in a more economical way, to keep under control the fridge food contents (what is missing and the
due-date of some foods), to control the lights and shutters, etc.
Attacks objectives :
ˆ To create misbehavior : cause annoying problems due to IoT devices Denial of services (DoS)

ˆ To cause nancial damage : unwanted power consumption or goods damage

ˆ To cause issues on power grid : as depicted in [7] use of cyber attacks on a wide number of high
wattage devices to switch them on simultaneously can cause an issue to the power grid.

Figure 1.4: PaceMaker [8].

Medical: Almost all the medical devices are based on embedded systems, they can be used to
monitor patients (and eventually raise an alert if something goes wrong) or to act on a patient directly
such as the insulin auto-feeder or pacemakers (see Figure 1.4). These can be both implanted in the
patient body or used as external devices.
Patients' remote assistance, mainly to increase their comfort or to avoid hospitals related infections,
is widely used and so the vulnerability of the devices can become a severe problem.
As reported in [2] the companies are already working to mitigate their product vulnerabilities,
an example is the ICS Advisory (ICSMA-17-241-01) where Abbott Laboratories claims a Pacemaker
vulnerability and recall the patients for a software update.
Attacks objectives :
ˆ To create misbehavior : can harm or kill patients

ˆ To cause nancial damage : to the company that produces or sells the medical devices

4
PC peripherals: all the PC peripherals are based on embedded systems to manage the communi-
cation, congure its critical parameters and perform their function. Most of the times they have been
hacked to have privileged access to the PC they are connected to or to cause some annoying problem.
Attacks objectives :
ˆ To create misbehavior : cause annoying problems such as devices Denial of services (DoS)

ˆ To cause product damage : Product or PC can be damaged

ˆ To steal personal data : The PC activity, personal data (passwords, etc.) can be stolen via a
virus injected through a peripheral device.

Industrial and agriculture: Industry and Agriculture have similar requirements in terms of
sensors (that have to be placed on a large area) and actuators together with the needing for centralized
control.
These requirements are satised by SCADA systems. Another common requirement of those devices
is a long life and the compatibility with old systems in order to keep the plants working simply
substituting the damaged devices without the need to redesign the entire network system; for this
reason, a wide range of dierent connectivity is in general available and not always assuring the
required security level. Attacks have been done to large production plants gaining access to a SCADA
network and sometimes passing through the small IOT sensors or actuators to gather the right to
access the network. For further details refer to [2] and [9].
Attacks objectives :
ˆ To create misbehavior : cause annoying problems to delay product production

ˆ To cause product damage : the produced component (or vegatable) is defective or damaged

ˆ To cause production system damage : changing the working parameters some devices can cause
damages (rotational speed, robot arm trajectories etc.)

ˆ To cause operator injuries : changing the working parameters of some devices can cause damages
(rotational speed, robot arm trajectories, etc.) or misbehavior that compromise the operator
safety.

ˆ To steal proprietary data : automatic working center program contains the locally managed work-
ow, production rates, etc.

5
Chapter 2

State of the Art

A good classication of the hardware cyber-attack types by objective and methodology can be found
in referenced document [10]. According to it, the attacks methods analyzed in the following can be
classied as Passive Non-Invasive and Active Non-Invasive.
These attack methods rely on both software and hardware weaknesses. Software weaknesses are
usually caused by the research of performances and can be mitigated by using some best practices
available from suppliers operating in security assurance (i.e. RAMBUS or RADWARE) or through
related scientic publications. Hardware weaknesses are inherently related to the used technology and
internal structure of the processors; in this case, it is harder to mitigate them because any improvement
may require a redesign.

2.1 Side-Channel Attacks

Side channel attacks allow us to reveal information not intentionally made available through a non-
standard channel; these "channels" are physical observable quantities that are not intended to be used
by the system under attack as communication channels.
The use of side channel attacks began well before the electronic era and have been exploited using
the capability of electrical and electronic systems to easier access this information.
An example of human side channels are the eye movements, breath or heart frequency, involuntary
muscle contractions, etc.; such signals, if observed by an expert eye, can reveal if someone is lying.

Figure 2.1: Polygraph [11].

6
The "lie detector" (shown in Figure 2.1) helps to measure the human side channels to facilitate
and objectify their observation.
Another example is the way a safe with a mechanical combination lock can be opened without
knowing its pin. The thief listens to the safe from outside using a stethoscope to hear the "clicks" that
reveal the correct combination numbers (see Figure 2.2).
In both cases, the use of side channels allows gaining access to information that should have remained
secret.

Figure 2.2: Side channel Safe cracking, from [12].

The discovery of side channels and the beginning of their use on electric/electronic devices (mainly
ciphers used for military secret communications) can be dated at the end of 2nd World War, even if
they have been studied and applied later on, during the Cold War, from both sides of the Iron Curtain
[13].

Figure 2.3: TTY mixer 131B2 TM-11-2222, from [14].

7
At the end of the 2nd World War, the development of cipher algorithms using random key generation
and automatic electro-mechanical cipher/decipher devices made almost impossible to decipher the
messages without some information about its key. In 1943 some tests at Bell Laboratories on the
teletypewriter SET 131-B2, part of Sigtot system, showed the leak of information related to the
deciphered (plain text) message via electromagnetic emissions (in Figure 2.3 a component of the Sigtot
system).
At that time this discovery just lead to the denition of a security zone around the cipher/decipher
installation site to be kept clear from enemies spying antennas to avoid any information leaks.
After this discovery the TEMPEST project was opened: in the rst stage, its goal was to dene how
to avoid information leaks and, later on, how to leverage it to steal the enemies (or even allied) secret
communications contents.

To exploit Side Channel attacks, the attacker must gather some knowledge about how the device
to be attacked works and gain access to the physical channel he wants to measure.
Nowadays it is easier than in the past to get the device's knowledge:
Form Hardware perspective: the device is usually made of commercial o the shelf components (such
as the CPU or some devices connected to it) and the necessary information can be freely found over
the internet, in case the data is not available some experimental test have to be put in place, also in
this case, the relatively low price of the devices to be attacked (for example the electronic keys of smart
appliances or the access key to a banking system) helps to get a good cost / possible income ratio to
let it attractive.
From Software perspective: most of the security related algorithms are dened in standards (some-
times the source code is available in a freely accessible library).

A list of the most common physical quantities used as side channels on embedded systems is given
below:

ˆ Time : time measurement is the most used. Easy to measure does not require in general any
expensive equipment. Can be used alone (in case of attacks made on communication channels
only) or with other side channels to observe the time relation between dened events.

ˆ Sinked Current or Power : another commonly used side channel. It requires some additional
instrumentation in general commonly accessible. Any switching electronic device, such as the
CPU registers or data buses, exhibits a dierent current absorption for a dened state or, in
most cases, for a state change; in CMOS devices the bit ip operation causes sunk current peaks
somehow proportional to the number of bits ipped.

ˆ EM radiation : it requires more complex instrumentation. It is based on the same principle of


current or power measurement, when there is fast current change there is a generation of EM. It
could be more precise than power measurement because the EM probe can give some information
also on the physical position of the EM emissions and allow better signal isolation from other
EM sources present in the board or even in another area inside the same DIE.

ˆ Light emission in a trivial way some information can be captured through device LEDs light
emission observation, some times it can reveal some information (think at the case of a serial
communication led directly driven by a line voltage level).
More complex cases are related to the single transistor observation: when it switches few photons
are emitted and observable on a depacked device [15]; it requires the use of fast light detectors
and is, in general, more complex and costly than the methods listed before.

ˆ Sound emission also in this case there are two approaches: the rst one is a passive listening of
human-machine interaction to convert the sound emission back into the sequence of operation
(i.e. key pressed in the right sequence on a keyboard).
The more complex one leverage the sound emission due to electronic components vibrations
changes, as explained in [16], due to the change of power requirements during various computa-
tional phases. Capacitors and coils are the bigger contributors to this phenomenon.

8
ˆ Temperature the passive side channel attack measuring the die temperature requires the device
to be depacked and instrumented to measure its temperature. It is similar to the power side
channel because the used power has a direct impact on the DIE temperature (see [17]). This
approach is more complex and requires, when possible, a deep analysis to be able to reveal the
wanted hidden information.

Figure 2.4: Side channel analysis instrument bench [18].

As said before, to exploit most of the side channels attacks, the attacker should have physical access
to the target device and some instrumentation as the one shown in Figure 2.4.

9
Chapter 3

A Real Study-Case: CWNANO

In this Chapter, an experiment to identify the key of an AES algorithm is described.

3.1 ChipWhisperer NANO

To ease the measures acquisition phase a standard o-the-shelf product has been used: the ChipWhis-
perer NANO
1 (CWNANO). A picture of this tiny board is shown in Figure 3.1. This is the smaller
and lower cost platform from ChipWhisperer, anyway its capabilities are enough to gather the data
needed to perform a complete Side Channel Power Analysis to get to test dierent methodologies and
get the expected results.

Figure 3.1: ChipWhisperer NANO [19].

In short, the product specication from NewAE CWNANO website [19] and its datasheet are shown
in Table 3.1. This board embeds the victim board equipped with an STM32F303F4P6 described in
the following Chapter.
The target board clock source is supplied by the CW-NANO acquisition controller, this allows to
precisely synchronize the analog sampling point referred to the Target CPU clock edges. The victim
absorbed current measurement is made through a shunt resistor placed on the positive power supply
pin PCB track (see target description), the signal is then managed by an analogical front end to convert
it to a suitable voltage level for the high-speed 8bit parallel analog to digital converter ADC1173, in
Figure 3.2 the electrical diagram of the analog input section.

1 https://rtfm.newae.com/Capture/ChipWhisperer-Nano/

10
Feature Notes/Range
ADC Specs 8-bit 20MS/s

ADC Clock Source Internally generated, external input

Analog Input AC-Coupled, xed gain of 10dB

Sample Buer Size 50 000 samples

ADC Decimation No

ADC Oset Adjustment No

ADC Trigger Rising-edge

Presampling No

Phase Adjustment No

Capture Streaming No

Clock Generation Range 60MHz, divisible by 1, 2, 4, 8, or 16

Clock Output Regular only

Table 3.1: CWNANO specs.

The circuit performs an AC decoupling and applies an oset to get a zero current level at the mid
of the analog to digital converter measurement range (half of V dd). The complete output function of
this circuit is described in Equation 3.1 where IT is the current owing into the shunt resistor present
on victim board R12 and Vu is the signal named AIN in the Figure 3.2 schematics.

 
V dd s · C25 · R25
Vu = − (V dd − R12 · IT ) ∗ (3.1)
2 s · C25 · R24 + 1
From dynamic perspective, there is a zero at 0 rad
s and a pole at 370 · 103 rad
s so around 58 kHz after
V
that point the circuit gain magnitude will be of about 4.5
mA .

Figure 3.2: ChipWhisperer NANO analog input section, from [19].

The higher is the absorbed current the higher is output measured voltage, so the converted value,
so we will have a direct relationship between the microprocessor absorbed current and the sampled
value. ChipWhisperer NANO is not only an instrument to gather target side channels info, it includes
the capability to inject faults on the power line and to program and control the target victim (at least
if it is the one supplied with the board).
The software API is freely available on ChipWhisperer website. The supplied installation package
uses a Jupyter Notebook environment to apply the try and test methodology, anyway the ChipWhis-
perer library can be used in a standard Python program to put in place custom automated procedures.

11
3.2 Target STM32F0

The victim board supplied with this board is equipped with an STM32F030F4P6. The processor is
manufactured by ST and it is based on a 32bit CORTEX M0 IP from ARM Ltd, a RISC processor with
a three stage pipeline, with a lot of on-chip peripherals to manage clock, memory, communications,
special functions management with various I/O congurations selectable for each pin.
The victim board uses in a minimal I/O conguration, with some connection for both communication
to NANO board and control a couple of LEDs; its electrical diagram is in Figure 3.3.

Figure 3.3: ChipWhisperer NANO STM32F030 Target, from [19].

The clock source is supplied by the NANO board (the target rmware shall be congured to use
the external clock instead of the internal one to assure the synchronization of the sampling system)
and the absorbed current is sensed via a shunt resistor (R12 in Figure 3.3 schematic) placed between
the bypass capacitor and the IC input power pin, in this way the eect of the capacitor will not impact
the current measures. There is also the possibility to apply some fault injection via the T_GLITCH
signal on VDD.

12
3.3 AES Algorithm Description

All the information reported has been extracted from [20] and [21], see these referenced documents for
a deeper description.
AES is a symmetric encryption algorithm that processes data blocks of 128 bits at a time, this
length is normally referred to as 4 words of 32 bits and identied as Nb = 4. There are 3 dierent
key lengths dened in the standard: 128, 192 or 256 bit, also in this case the key length is expressed
as the number of words Nk = 4, 6, or 8.
The processing ow, for both the cipher and decipher phase, works around a state of 128 bits
organized in 4 x 4 bytes matrix or 4 words columns: Nb = 4. The initial values of the state are
the starting data to be ciphered or deciphered and modied through dierent rounds to get the nal
ciphered text or plain text. The number of rounds to be executed depend on the length of the key: 10
rounds for 128 bits, 12 for 192 bits, and 14 for 256 bits, it is identied with Nr = 10, 12 or 14.
The pseudo-code in Figure 3.4 shows the sequence of operations for the Cipher algorithm.

Figure 3.4: AES Cipher pseudo-code [21].

The inverse cipher has a similar approach (almost reverse respect to the one used in the cipher),
its description is not useful for the purpose of this work so it will not be explicitly described, refer to
[21] for a detailed explanation.
After the state loading, an AddRoundKey() is executed then the rounds are repeated until the end.
All the rounds of cipher algorithm following the rst AddRoundKey() are made of the same functions
execution sequence (each one described in the following) except the last one where the MixColoumns()
is missing.
The following transformations are made using Galois Field GF(28 ) operations, see [20] and [21]
for detailed description, for simplicity in this work they will be treated as logic functions and look-up
tables as per their code implementation.

3.3.1 SubBytes
SubBytes() is a non linear transformation operated at byte level (see Figure 3.5) using a substitution
table named S-box showed in Figure 3.6.

13
Figure 3.5: S-box application on each byte of the State from [21].

The table is invertible (so each input value gives a unique output value that is not repeated).

Figure 3.6: S-box values from [21].

3.3.2 ShiftRows
ShiftRows() transformation operate a cyclical shift (or rotation) in left direction of each state row,
made of 4 bytes, of a dened number of bytes, respectively 0, 1, 2, 3.

Figure 3.7: ShiftRows from [21].

As depicted in Figure 3.7 while moving the row content towards left, the byte that drops out of
the row is placed at the rightmost position (as like a rotation on a circle), this behavior is repeated at
each shift.

14
If we have to shift three times the last row the result of the transformation will be the one in
Equation 3.2.

S3,0 , S3,1 , S3,2 , S3,3 → S3,3 , S3,0 , S3,1 , S3,2 (3.2)

3.3.3 MixColumns
MixColumns() transformation operates on the state column by column, applying a polynomial trans-
formation. Schematically it could be showed as Figure 3.8.

Figure 3.8: MixColumns from [21].

This transformation applies the formulas in Equation 3.3, note that the dot multiplication is a
nite eld multiplication (see [21]).


S0,c = ({02} • S0,c ) ⊕ ({03} • S1,c ) ⊕ S2,c ⊕ S3,c

S1,c = S0,c ⊕ ({02} • S1,c ) ⊕ ({03} • S2,c ) ⊕ S3,c
′ (3.3)
S2,c = S0,c ⊕ S1,c ⊕ ({02} • S2,c ) ⊕ ({03} • S3,c )

S3,c = ({03} • S0,c ) ⊕ S1,c ⊕ S2,c ⊕ ({02} • S3,c )
The nite eld multiplications are the only operations that can't be easily mapped to any C code
operation. These multiplications anyway are only by 02 or 03, note that the polynomial form of this
numbers are: {02} = x and {03} = x + 1.
A special function to easily obtain the GF(28 ) multiplication time x can be used: x · 2 = x << 1
conditionally EXORed with {1b} if a carry over the byte size occurs; this function is called xtime().
So the multiplications in Equation 3.3 can substituted with Equation 3.4 where xtime() is easily
implementable in C.

({02} • A) ⊕ ({03} • B) = xtime(A) ⊕ xtime(B) ⊕ B = xtime(A ⊕ B) ⊕ B (3.4)

The new transformation became the one in Equation 3.5


S0,c = xtime(S0,c ⊕ S1,c ) ⊕ S1,c ⊕ S2,c ⊕ S3,c

S1,c = xtime(S1,c ⊕ S2,c ) ⊕ S2,c ⊕ S3,c ⊕ S0,c
′ (3.5)
S2,c = xtime(S2,c ⊕ S3,c ) ⊕ S3,c ⊕ S0,c ⊕ S1,c

S3,c = xtime(S3,c ⊕ S0,c ) ⊕ S0,c ⊕ S1,c ⊕ S2,c

15
3.3.4 AddRoundKey
The AddRoundKey() transformation operates at word level, applying a binary XOR between each state
word and key schedule word as visible in Figure 3.9.

Figure 3.9: AddRoundKey from [21].

This operation can be done at byte level with the same results and it is sometimes more suitable
if implemented together with the SubBytes to avoid to save its results into the state.
In Equation 3.6 the AddRoundKey() function description; it is repeated for all the state columns
referred as c with values that vary from 0 to 3. The key is expanded and rn represent the round
number to select the relevant key section.

′ ′
S0,c = S0,c ⊕ K0,c,rn
′ ′
S1,c = S1,c ⊕ K0,c,rn
′ ′ (3.6)
S2,c = S2,c ⊕ K0,c,rn
′ ′
S3,c = S3,c ⊕ K0,c,rn
The expanded key is made of a transformation of the known key in order to have 128 key bits
available for each round. The expanded key computation, known as key schedule, is the result of
KeyExpansion() function.

16
3.3.5 Key Expansion
The Key Expansion algorithm gives the number key words needed to complete the required rounds
Nr dened by the length of the key Nk. The key expansion algorithm pseudo-code is shown in Figure
3.10.

Figure 3.10: Key Expansion algorithm pesudo-code from [21].

The nal key schedule is made, for the rst Nk words, of the registered secret key, the remaining
part is computed in a cycle using the following functions:

ˆ SubWord() applies the S-Box to replace the single bytes content of a four bytes group.
ˆ RotWord() rotates left a group of four bytes: a0 , a1 , a2 , a3 → a3 , a0 , a1 , a2 .
ˆ Rcon[i] is a group of four bytesa0 , a1 , a2 , a3 where a1 = a2 = a3 = 0 and a0 has a value depending
by i dened as the polynomial xi−1 evaluated in GF(28 ). Practically its values are selected from
the following bytes value (written in hexadecimal notation): [00, 01, 02, 04, 08, 10, 20, 40, 80, 1b, 36].

17
3.4 CW-Nano Relevant API Description

The ChipWhisperer APIs documentation can be found in [22], here a short description of the used
classes and related parameters usefully to understand the way they have been used for this work, see
the online documentation for conguration details.
To use its objects and functions the library chipwhisperer shall be imported, usually a shorter name
like cw is assigned:

import cw as chipwhisperer

3.4.1 Object Scope


The object scope create and congure the data acquisition system on ChipWhisperer.
To congure and enable it instantiate this object as a standard class scope(). It can accept 2
optional parameters: type, sn both with default value set to None. These parameters, if not specied,
are automatically detected through the ChipWhisperer hardware connected to the PC. In case more
than one ChipWhisperers are connected to the PC the desired board can be addressed wuth either the
type (if they are of dierent types) or the serial number that allows selecting the physical board to be
addressed. The scope instantiated object has dierent sub-modules, methods and attributes based on
the detected hardware, for CWNANO we have the following sub-modules:

ˆ scope.adc : give access to ADC conguration.

 scope.adc.samples : read/set the number of samples to be stored.


 scope.adc.clk_src : read/set the ADC clock source: in can be 'int' or 'ext'.
 scope.adc.clk_freq : read/set the ADC sample frequency, the value is rounded to the
closest possible integer value.

ˆ scope.io : acquisition module GPIO settings to manage the target communication, program and
measure triggering. These are in general left as per default conguration except for target
program procedure where the CW NANO board requires a target reset cycle. The available
sub-modules are:

 scope.io.tio1 : T_GPIO1 connected to STM32F0 pin PA9,


 scope.io.tio2 : on schematic T_GPIO2 connected to STM32F0 pin PA10,
 scope.io.tio3 : on schematic T_GPIO3 connected to STM32F0 pin PA6,
 scope.io.tio4 : on schematic T_GPIO4 connected to STM32F0 pin PA7,
 scope.io.pdid : on schematic connected to the 20 pins connector to be used with external
target T_PIDD pin 20, not connected to STM32F0,
 scope.io.pdic : on schematic T_PIDC connected to STM32F0 pin BOOT0,
 scope.io.nrst : on schematic T_nRST connected to STM32F0 pin NRST,
 scope.io.clokout : on schematic T_CLKOUT connected to STM32F0 pin PF0-OSC_IN,
 scope.io.cdc_settings : to set the way the USART parameters can be changed with the
USB CDC,

ˆ scope.glitch : used to congure the target power line glitches in term of duration and time oset
from trigger.

 scope.glitch.ext_oset : oset form trigger rising edge,


 scope.glitch.repeat : width of glitch in cycles.
ˆ scope.default_setup() congure the scope object default values, for CW-Nano they are:

18
ChipWhisperer Nano Device
fw_version =
major = 0
minor = 30
debug = 0
io =
tio1 = None
tio2 = None
tio3 = None
tio4 = None
pdid = True
pdic = False
nrst = True
clkout = 7500000.0
cdc_settings = array ( 'B ' , [1 , 1])
adc =
clk_src = int
clk_freq = 7500000.0
samples = 5000
glitch =
repeat = 0
ext_offset = 0

ˆ scope.con() connects to attached CW Nano, an optional parameter with the CW-Nano serial
number can be used.

ˆ scope.dis() disconnects the scope object from CW-Nano.

ˆ scope.arm() arm the ADC, the trigger will be GPIO4 rising edge (xed trigger)

ˆ scope.capture() captures a new trace.

ˆ scope.get_last_trace() rerturns an array with last captured trace samples.

ˆ scope.get_serial_ports() get the CDC serial ports associated with this scope.

3.4.2 Object Target


The object target gives the interface to the target device (for CW-Nano the on-board STM32F0). The
dafault target interface is the Simple Serial Target. It requires only one parameter: the scope object
instance described above, and allows to congure the target type with optional parameters, some of
them are allowed if required by the target rmware. A basic usage example is:

import chipwhisperer as cw
scope = cw . scope ()
target = cw . target ( scope )

ˆ target.baud : manage the serial communication baud rate,

ˆ target.write(data) : write data passed as parameter of string to the target.

ˆ target.read(num_char=0, timeout=250): read data from target. The two optional param-
eters dene how to manage the communication:
num_char is the number of characters to be read, the default value 0 set the read of all the
available characters;
timeout is the time to wait before returning if no char is available, the default value is 250ms.

ˆ target.in_waiting() : returns the number of characters available to be read in the serial buer.

19
ˆ target.in_waiting_tx() : returns the number of characters waiting to be sent in the ChipWhis-
perer serial output buer.

ˆ target.simpleserial_wait_ack(timeout=500) : wait for an ack from target for timeout ms, if


the optional parameter is not present it waits for 500ms.

ˆ target.simpleserial_write(cmd, num, end = '\n'): write a SimpleSerial command cmd


(usually a string of 1 char) with data num (in byte-array format), end is an optional parameter
that dene the end of string value with default value '\n' (Ascii newline character).

target . simpleserial_write ( 'p ' , text )

ˆ target.simpleserial_read(cmd, pay_len, end='\n', timeout=250, ack=True): reads


data from target related to command cmd (usually a string of 1 char), pay_len is the amount of
bytes to be received, the other parameters are optional to dene the end of string, the recevive
maximum timeout and the needing of an ack at the end of the command.

response = target . simpleserial_read ( 'r ' , 16)

ˆ target.simpleserial_read_witherrors(cmd, pay_len, end='\n', timeout=250,


glitch_timeout=8000, ack=True): to be used in case of glitch test when the expected results
can include errors in data contents of format. See ChipWhisperer documentation online for
further details.

ˆ target.set_key(key, ack=True, timeout=250): same function of


target . simpleserial_write ( 'k ' , key )

ˆ target.close() : close target.

ˆ target.con(scope=None, **kwargs): connects to target.


3.4.3 Useful Functions
Chipwhisperer API have also some useful functions to be used to aggregate commands sequence to
record traces, function to program the embedded target device and objects to perform dierent kinds
of analysis included into chipwhisperer.analyzer library (not used in this work).
In this work the program_target(scope, prog_type, fw_path, **kwargs) function is used, it
allows to program the target without the needingd to use external tools.
Parameters description:

ˆ scope : the scope class instance to be used for the target connection.
ˆ prog_type : the type of target we want to program, the supported ones are:
 programmers.STM32FProgrammer
 programmers.XMEGAProgrammer
 programmers.AVRProgrammer
ˆ fw_path : path to hex le to program including le name and extension.
As example the command used to program the device with the target code described later is:

cw . program_target ( scope , cw . programmers . STM32FProgrammer , " ./


simpleserial - aes - CWNANO . hex " )

20
3.5 Target Code Description

To test the eectiveness of the two Side Channel Power Attacks object of this work, the AES test
software available in CWNANO rmware package has been used. It implements a basic AES algorithm
without the use of the encryption HW accelerator.
As visible in below description the source code uses a lot of nested functions to achieve the desired
behavior; this software structure, even is not easily readable, has been dened by NewAE people to
obtain an easy customization of the algorithm and/or the hardware platform keeping the common part
of source code used in all the managed platforms.
The selection of target Hardware, algorithms end special congurations is done through some vari-
ables conguration at compile time (usually done in a Jupyter environment using PYTHON variables
passed though a structured makele).
Only a few commands are relevant for this work: 'k' , 'p' and 'r' .
ˆ Command 'k' is used in the target.set_key() function and sets the secret key to be used by the
AES algorithm.

The command executes the AES128_ECB_indp_setkey() function where the Key pointer is
initialized with the address of the received key data buer then executes the key expansion
function.

static uint8_t * Key ;

...

void AES128_ECB_indp_setkey ( uint8_t * key )


{
Key = key ;
KeyExpansion () ;
}

KeyExpansion() is the routine devoted to create the complete key array RoundKey[] to be used
in though the dierent rounds of AES algorithm.
Only the rst part of the function is relevant for this work, to verify that the rst 16 RoundKey[]
bytes are lled with the received key values, the rest of the function computes the rest of the key
expansion used in later cipher rounds.

static uint8_t RoundKey [176];

...

// This function produces Nb ( Nr +1) round keys . The round keys


are used in each round to decrypt the states .
static void KeyExpansion ( void )
{
uint32_t i , j , k ;
uint8_t tempa [4]; // Used for the column / row operations

// The first round key is the key itself .


for ( i = 0; i < Nk ; ++ i )
{
RoundKey [( i * 4) + 0] = Key [( i * 4) + 0];
RoundKey [( i * 4) + 1] = Key [( i * 4) + 1];
RoundKey [( i * 4) + 2] = Key [( i * 4) + 2];
RoundKey [( i * 4) + 3] = Key [( i * 4) + 3];
}

21
// All other round keys are found from the previous round
keys .
for (; ( i < ( Nb * ( Nr + 1) ) ) ; ++ i )
{
for ( j = 0; j < 4; ++ j )
{
tempa [ j ]= RoundKey [( i -1) * 4 + j ];
}
if ( i % Nk == 0)
{
// This function rotates the 4 bytes in a word to
the left once .
// [ a0 , a1 , a2 , a3 ] becomes [ a1 , a2 , a3 , a0 ]

// Function RotWord ()
{
k = tempa [0];
tempa [0] = tempa [1];
tempa [1] = tempa [2];
tempa [2] = tempa [3];
tempa [3] = k ;
}

// SubWord () is a function that takes a four - byte


input word and
// applies the S - box to each of the four bytes to
produce an output word .

// Function Subword ()
{
tempa [0] = getSBoxValue ( tempa [0]) ;
tempa [1] = getSBoxValue ( tempa [1]) ;
tempa [2] = getSBoxValue ( tempa [2]) ;
tempa [3] = getSBoxValue ( tempa [3]) ;
}

tempa [0] = tempa [0] ^ Rcon [ i / Nk ];


}
else if ( Nk > 6 && i % Nk == 4)
{
// Function Subword ()
{
tempa [0] = getSBoxValue ( tempa [0]) ;
tempa [1] = getSBoxValue ( tempa [1]) ;
tempa [2] = getSBoxValue ( tempa [2]) ;
tempa [3] = getSBoxValue ( tempa [3]) ;
}
}
RoundKey [ i * 4 + 0] = RoundKey [( i - Nk ) * 4 + 0] ^
tempa [0];
RoundKey [ i * 4 + 1] = RoundKey [( i - Nk ) * 4 + 1] ^
tempa [1];
RoundKey [ i * 4 + 2] = RoundKey [( i - Nk ) * 4 + 2] ^
tempa [2];
RoundKey [ i * 4 + 3] = RoundKey [( i - Nk ) * 4 + 3] ^

22
tempa [3];
}
}

ˆ Command 'p' supplies 16 bytes (128 bits) of plain text data and encrypt it.

Before to start with the encryption it rise the trigger signal ( GPIO4) to synchronize the start
of data acquisition, this is useful because CVNANO may contain a relatively small amount of
data before being transferred to the host computer so is important that we can acquire only the
relevant part of the current absorption trace.

uint8_t get_pt ( uint8_t * pt , uint8_t len )


{
trigger_high () ;
aes_indep_enc ( pt ) ; /* encrypting the data block */
trigger_low () ;
simpleserial_put ( 'r ' , 16 , pt ) ;
return 0 x00 ;
}

aes_indep_enc(pt) is the actual


The function cipher and it is wrapped to
AES128_ECB_indp_crypto(uint8_t* input)

void AES128_ECB_indp_crypto ( uint8_t * input )


{
state = ( state_t *) input ;
BlockCopy ( input_save , input ) ;
Cipher () ;
}

The plain-text is copied byte by byte (with BlockCopy() function) into the state variable to be
modied in the following AES cipher algorithm Cipher() :
// Cipher is the main function that encrypts the PlainText .
static void Cipher ( void )
{
uint8_t round = 0;

// Add the First round key to the state before starting


the rounds .
AddRoundKey (0) ;

// There will be Nr rounds .


// The first Nr -1 rounds are identical .
// These Nr -1 rounds are executed in the loop below .

for ( round = 1; round < Nr ; ++ round )


{
SubBytes () ;
ShiftRows () ;
MixColumns () ;
AddRoundKey ( round ) ;
}

// The last round is given below .


// The MixColumns function is not here in the last round .

23
SubBytes () ;
ShiftRows () ;
AddRoundKey ( Nr ) ;
}

ˆ Command 'r' send back the ciphered text on host request.

3.6 Data Acquisition and Analysis

3.6.1 Leak Model


A simple model of the leak exploited in power analysis from [23] consider a single data line driven
by two MOSFETs (one on the UP side and one on LOW side) and a load capacitor from the data bus
to the ground as depicted in the Figure 3.11.

Figure 3.11: Charge and discharge of a CMOS inverter from [23].

An output LOW to HIGH transition causes a temporary current owing from Vdd through the shunt
resistor to charge CL , the HIGH to LOW transition discharge CL through the LOW side MOSFET.
As stated above the absorbed current of a processor changes as a function of the executed instructions.
In the case of CMOS devices big current variations happen when a parallel bus changes the number
of active (of HIGH logic status) bits as it happens during a write memory.
The number of non-zero bits of a byte or a word is known as Hamming Weight (HW from now
on) and, according to the leak model, it should be directly related to the absorbed current. The
Hamming Distance is the count of dierent bits between two states, this is often used to predict the
amount of current needed during a transition between two states since only the bits that ips produce
some dierences in current abortion. The model selection is usually done analyzing the hardware
characteristics or, in most cases, both are tested tand the best preforming is selected.
To verify the HW model several traces, about one thousand, have been acquired supplying random
plain text then grouped per HW predicting both the AddRoundKey(0) and the SubBytes() results
and averaged. The sample point has been identied (in both cases) subtracting the relevant group
HW 0 to the average of all traces searching an absolute max value in the sample range dened for the
relevant function.
This allows to depict the below Figures 3.12 and 3.13: as expected the current vs HW has almost
a linear relationship and increases when HW increases. In AddRoundKey(0) case (Figure 3.12) the
at top is caused by the input ADC saturation that happens each time the recorded track value reach
0,5 (positive saturation) or -0,5 (negative saturation).

24
Figure 3.12: Hamming Weight measured at AddRoundKey(0) on byte 0.

Figure 3.13: Hamming Weight measured at SBOX on byte 0.

3.6.2 Code Execution Timings


The code execution timings have been mapped using an instrumented code, since the used target
executes only the AES algorithm no time variations are expected for traces acquired at dierent times.
In Table 3.2 the list of the execution duration time of the above described AES functions mapped
to acquired amount of samples, this has been used to identify the Functions sample starting point.

25
Event Duration [us] Duration [smp] Absolute [smp]
Trigger 0 0 0

BlockCopy() 28.0 210 0 - 210

AddRoundKey(0) 40.0 300 210 - 510

SubBytes() 33.6 252 510 - 762

ShiftRows() 12.0 90 762 - 852

MixColoums() 67.6 508 852 - 1360

AddRoundKey(1) 40.0 300 1360 - 1660


... ... ... ...

Table 3.2: AES algorithm timings (smp stand for samples).

The data present in Table 3.2 is used to tag a sampled current trace shown in Figure 3.14.
A rst visual analysis of Figure 3.14 shows that each function's code execution absorbed current shape
becomes clearly identiable and, zooming on the relevant part of the trace, also the iterations within
the single function became easily identiable (see Figure 3.15).
AddRoundKey(0)

AddRoundKey(1)
Shi�Colums()
BlockCopy()

Shi�Rows()
SubBytes()

Figure 3.14: First round sampled current tagged.

This approach is known as SPA (Simple Power Analysis), generally, it is applied to gain some
insight of the algorithm execution [24].

26
Figure 3.15: AddRoundKey(0) and SubBytes() details of rst round sampled current.

3.6.3 Points of Attack


Analyzing both the code and the sampled traces we need to identify the possible points of attacks :

ˆ BlockCopy() : here ve have the copy of the plain text to state memory, it is clearly visible but of
no interest because it contains only known information.

ˆ AddRoundKey(0) : the plain text is XORed with the secret KEY and stored to the state memory,
this is an interesting point because it hosts the secret data we want to reveal; anyway it could
be easily shielded with small code modications to embed the EXOR function in the call of
SubBytes() at a byte level. Anyway the rmware we analyzed doesn't apply this optimization.

ˆ SubBytes() : this is a good point of attack because it involves the secret KEY and two known
operations.

ˆ The other functions adds additional transformation that makes the job of secret KEY identica-
tion harder, mainly because the single KEY byte inuences more than one memory write.

The Side Channel Power Analysis attack is now applied at the two identied points of attacks:
AddRoundKey(0) and SubBytes() whose trace is depicted in detail in Figure 3.15.

3.6.4 Dierential Power Analysis


To attack the devices leveraging the dened leak model the DPA Dierential Power Analysis [25] can
be used.
DPA relies on the statistical relevance of power absorption dierences caused by dierent data
content or execution path. This is a statistical approach, it requires multiple acquisitions to be divided
into two groups to evaluate the means dierence with a guessed secret data and compare the expected
results and acquired tracks. This method can't be applied to single tracks because of the acquisition
noise, caused mainly by other circuit components working together with the observed device (just
think at the pipeline or other microprocessor IP causing unexpected power absorption peaks).
To gather good results every group of tracks should be composed by random data where only one
parameter, selected via the guessed secret value, is common (let say the value of a bit during a memory
write operation).

27
If there is a deterministic correlation the two tacks groups mean values dierence highlights it
with identiable peak (either positive or negative) otherwise it will be close to zero (just think at the
dierence of two random sequences average value: if the amount of elements is big enough the result
will be close to zero), see Figure 3.16.

Group A

Mean

Acquired Traces Difference


Selec�on
Group

Group B

Iden�fied peak
Mean

Figure 3.16: Dierential Power Analysis schema.

As stated before the traces grouping can be made using dierent selection criteria, in any case, it
starts trying to guess the secret values we want to identify and classify the acquired tracks according
to data value expected using the guess applied to the supplied plain text.
The expected result is dened through a oine execution of the relevant part of the AES algorithm
on the known plain text supplied for each current trace acquisition. Let's start at the rst Point of
Attack identied for AES algorithm AddRoundKey, it simply applies the Equation 3.7:

OutV alue[x] = P lainT ext[x]⊕KEY [x] (3.7)

We have now an expected value for each state byte where the guessed KEY byte has been applied,
and it is a dierent result for each trace we have been acquired.
To group the results we work at a single byte level and we identify a selection criterion, the easiest
and most common is based on the value of a specic bit in the relevant OutValue.
As example the result of DPA on three guesses of KEY[0] values with group criteria on OutValue[0]
bit 0 is shown in Figure 3.17. The green and orange traces are completely superposed, this is the reason
why only the green trace is visible since it is the last one plotted. It is clearly visible that the average
trace dierences give results value in the magnitude of 0,005 not visible if we simply compare the
average traces where the amplitude of the variations are of about 0,6 (see Figure 3.15).
One of the most common ways used to identify the correct guess is to record the max track peak
value associated with every guess in a set of guesses (in the above case 256 possible values of KEY[0] )
and search for the maximum value to identify the right one.
It works most of times. Anyway sometimes the identied peak is not the one we are looking for, so
it results in a wrong value identication. These unexpected peaks are called Ghost Peaks caused by
other high HW data writing. In Figure 3.17 only three guessed values resulting averages dierences
traces are shown to avoid a too crowded graph, anyway, multiple positive peaks are visible: the two
bigger peaks are at sample number 41 (guess 0x00) and at sample number 258 (guess 0x2b and 0x ).

28
Figure 3.17: DPA on guess KEY[0].bit0 = 1.

The rst peak is clearly Ghost Peak because it falls in the BlockCopy() window outside of the
AddRoundKey(0) where we expect to see the wanted result. To avoid this wrong identication the max
peak search will be reduced on a subset of trace samples (called a window) on the AddRoundKey(0)
region: from sample 210 to 510 as depicted in Figure 3.18.

Figure 3.18: DPA on guess KEY[0].bit0 = 1 detail of AddRoundKey(0) window.

Running it for 256 single KEY byte values (0 to 255) and for all the KEY bytes give us the results
reported in Figure 3.19.

29
SubKey n:00 guessed value: 0xff - real value 0x2b at position 258
SubKey n:01 guessed value: 0x00 - real value 0x7e at position 270
SubKey n:02 guessed value: 0xff - real value 0x15 at position 284
SubKey n:03 guessed value: 0x00 - real value 0x16 at position 296
SubKey n:04 guessed value: 0x00 - real value 0x28 at position 318
SubKey n:05 guessed value: 0x00 - real value 0xae at position 331
SubKey n:06 guessed value: 0x00 - real value 0xd2 at position 344
SubKey n:07 guessed value: 0x00 - real value 0xa6 at position 357
SubKey n:08 guessed value: 0xff - real value 0xab at position 380
SubKey n:09 guessed value: 0xff - real value 0xf7 at position 393
SubKey n:10 guessed value: 0xff - real value 0x15 at position 406
SubKey n:11 guessed value: 0x00 - real value 0x88 at position 418
SubKey n:12 guessed value: 0xff - real value 0x09 at position 441
SubKey n:13 guessed value: 0xff - real value 0xcf at position 454
SubKey n:14 guessed value: 0xff - real value 0x4f at position 467
SubKey n:15 guessed value: 0x00 - real value 0x3c at position 479

Figure 3.19: DPA on guess KEY[0].bit0 = 1 results with window restriction, red lines are errors.

Knowing the stored secret KEY values we can clearly see that all the results are wrong. This
happens because of the bit-wise selection of EXOR, since the group has been selected on bit 0 the two
group average values are superposed for all the KEYs with the same bit0 value of the secret KEY.
As example the rst byte correct value is 0x2b → 0010 1011b identied as 0x → 1111 1111b .
One possible solution is to iterate it for every bit testing only the relevant bits of guessed values
0x00 and 0x are used to build the nal value as a sequence of the single-bit values. In this case the
results are more precise but some bit values identication are still wrong as shown in Figure 3.20.

SubKey n:00 bit 00 guessed value: 1 at position 258


SubKey n:00 bit 01 guessed value: 1 at position 258
SubKey n:00 bit 02 guessed value: 0 at position 258
SubKey n:00 bit 03 guessed value: 0 at position 254
SubKey n:00 bit 04 guessed value: 0 at position 257
SubKey n:00 bit 05 guessed value: 1 at position 258
SubKey n:00 bit 06 guessed value: 0 at position 257
SubKey n:00 bit 07 guessed value: 0 at position 257
––> SubKey n:00 guessed value: 0x23 - real value 0x2b
SubKey n:01 bit 00 guessed value: 0 at position 270
SubKey n:01 bit 01 guessed value: 1 at position 271
SubKey n:01 bit 02 guessed value: 1 at position 271
SubKey n:01 bit 03 guessed value: 1 at position 270
SubKey n:01 bit 04 guessed value: 1 at position 271
SubKey n:01 bit 05 guessed value: 1 at position 271
SubKey n:01 bit 06 guessed value: 0 at position 269
SubKey n:01 bit 07 guessed value: 0 at position 270
––> SubKey n:01 guessed value: 0x3e - real value 0x7e

Figure 3.20: DPA on guess KEY with: check for all bits results and window restriction, red lines are
errors.

The rst two bytes results, showed above, highlight a couple of wrong bit identication: byte 0 bit
n.3 and in byte 1 bit n.6; in both cases the wrong bit identication sample position is a little dierent
than the one used for the other bits in the same byte, this is clearly a sign of the mistake since the
byte write instruction is done in parallel in the same clock cycle. It is due to the Ghost Peak visible
in Figure 3.21.
Also this problem can be solved by windowing the traces to a narrow area, anyway the position of
the Ghost Peak and the searched Peak are very close so other solutions are preferable.

30
Figure 3.21: DPA bitwise on guess KEY[0].bit0 and bit 3.

A way to improve the DPA approach results is to change the grouping criteria involving more than
1 bit increasing the HW distance (see [26]), this should increase the distance of the two groups mean
value. With this approach, we must go back to full KEY guesses list grouping on the expected results
with HW >= 4 (i.e. at least 4 bits = 1) that gives us the results in Figure 3.22 where all secret KEY
values have been correctly identied!.

SubKey n:00 guessed value: 0x2b - real value 0x2b at position 258
SubKey n:01 guessed value: 0x7e - real value 0x7e at position 271
SubKey n:02 guessed value: 0x15 - real value 0x15 at position 284
SubKey n:03 guessed value: 0x16 - real value 0x16 at position 296
SubKey n:04 guessed value: 0x28 - real value 0x28 at position 318
SubKey n:05 guessed value: 0xae - real value 0xae at position 332
SubKey n:06 guessed value: 0xd2 - real value 0xd2 at position 345
SubKey n:07 guessed value: 0xa6 - real value 0xa6 at position 358
SubKey n:08 guessed value: 0xab - real value 0xab at position 380
SubKey n:09 guessed value: 0xf7 - real value 0xf7 at position 393
SubKey n:10 guessed value: 0x15 - real value 0x15 at position 406
SubKey n:11 guessed value: 0x88 - real value 0x88 at position 418
SubKey n:12 guessed value: 0x09 - real value 0x09 at position 440
SubKey n:13 guessed value: 0xcf - real value 0xcf at position 454
SubKey n:14 guessed value: 0x4f - real value 0x4f at position 467
SubKey n:15 guessed value: 0x3c - real value 0x3c at position 480

Figure 3.22: DPA on guess KEY: group selection HW=4 and window restriction.

In Figure 3.23 the detail on rst KEY byte correct identication (picture on the right) versus all
the others (picrture on the left), the maximum peak in AddRoundKey(0) window among all the guesses
is at sample number 258 with a value of about 0,025 while all the others are below 0,015.

31
Figure 3.23: DPA AddRoundKey(0) HW=4, byte 0 identication.

The same approach can be applied to the SubBytes() window applying a Equation 3.8 to identify
the expected values.

OutV alue[x] = SubBytes(P lainT ext[x]⊕KEY [x]) (3.8)

In this case, we can try again a simpler group classication, as like the bit 0 value, because the two
groups' bits values are not correlated to the used KEY, so the average can lead to better results as
shown in Figure 3.24.

SubKey n:00 guessed value: 0x2b - real value 0x2b at position 548
SubKey n:01 guessed value: 0x7e - real value 0x7e at position 598
SubKey n:02 guessed value: 0x15 - real value 0x15 at position 648
SubKey n:03 guessed value: 0x16 - real value 0x16 at position 698
SubKey n:04 guessed value: 0x28 - real value 0x28 at position 559
SubKey n:05 guessed value: 0xae - real value 0xae at position 609
SubKey n:06 guessed value: 0xd2 - real value 0xd2 at position 659
SubKey n:07 guessed value: 0xa6 - real value 0xa6 at position 710
SubKey n:08 guessed value: 0xab - real value 0xab at position 570
SubKey n:09 guessed value: 0xf7 - real value 0xf7 at position 620
SubKey n:10 guessed value: 0x15 - real value 0x15 at position 670
SubKey n:11 guessed value: 0x88 - real value 0x88 at position 721
SubKey n:12 guessed value: 0x09 - real value 0x09 at position 581
SubKey n:13 guessed value: 0xcf - real value 0xcf at position 631
SubKey n:15 guessed value: 0x3c - real value 0x3c at position 731

Figure 3.24: DPA on SubBytes() attack results.

In Figure 3.25 the detail on rst secret KEY byte correct identication (picture on the right) versus
all the others (picture on the left), the maximum peak in SubBytes(0) window among all the guesses
is at sample number 548 with a value of about 0,0094 while all the others are below 0,0056.

32
Figure 3.25: DPA SubBytes(0), byte 0 identication.

Both these approaches can lead to the correct identication of the secret KEY using some insight
into how the algorithm is implemented.

A way to classify the eciency of the used method is to see how many traces are needed to get
to the desired results. To get it the acquired data traces has been shued to get three independent
arrays of randomly distributed series of acquired data coupled with supplied plain text and evaluated
increasing the amount of traces to be evaluated in term of number of correctly identied secret KEY
bytes, to obtain a stable value the average of the results obtained on the three shued data sets is
used as correctness indicator and shown in the graphs.
Figure 3.26 shows the required traces number using the AddRoundKey() with group selection on
HW >= 4 ; about 320 traces are needed in this case.

Figure 3.26: DPA AddRoundKey(0),with group selection on HW >= 4, traces needed to get correct
secret KEY identication.

Figure 3.27 shows the required traces number using the SubBytes() with group selection on bit 0
value, about 600 traces are needed in this case.

33
Figure 3.27: DPA SubBytes(0) with group selection on bit 0 value, traces needed to get correct secret
KEY identication.

Figure 3.28 shows the required traces number using the SubBytes() with group selection on HW
>= 4, about 80 traces are needed in this case.

Figure 3.28: DPA SubBytes(0) with group selection on HW >= 4, traces needed to get correct secret
KEY identication.

So the HW criteria used to group the traces greatly improve the detection eectiveness reducing
the needed amount of acquired traces.

34
3.6.5 Correlation Power Analysis
A further improvement can be achieved using the Correlation Power Analysis algorithm (CPA form
here on) [27].
It works applying the Pearson Correlation Coecient (PCC from now on), shown in Equation 3.9,
to select the most probable secret KEY value among a relatively small amount of traces.

cov(X, Y )
ρX,Y = (3.9)
σX · σY

The above Equation can be written for discrete samples sequences x and y resulting in the Equation
3.10 where x̄ and ȳ are the mean value of x and y respectively.
Pn
(xi − x̄) · (yi − ȳ)
rxy = pPn i=1 pPn (3.10)
2 2
i=1 (xi − x̄) · i=1 (yi − ȳ)

The PCC gives us a number that shows how much two sets of data are linearly correlated; the
result is a number in the range [-1,1] where the magnitude says how well the two data series are linearly
correlated: 1 is perfect correlation while 0 is uncorrelated, and the sign gives us the slope of correlation.

CPA uses the same leak model used for DPA where power absorption is a linear function if the
Hamming Weight of the data to be written in memory added to some noise, see Equation 3.11.
With this assumption if a and b are constant the PCC magnitude will be 1 with a sign equal to the
sign of a. Note that we don't need to evaluate the values of a and b, we just assume they will have
small variations during the complete data acquisition procedure.

P a = a · HW (Ex) + b (3.11)

One of the arrays to be used is the expected power (let's call it y ): one element for each acquired
trace is calculated for every secret KEY guess and the trace relevant plain text byte as depicted in
Figure 3.29.

Byte n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Trace 01 [122, 157, 142, 142, 209, 173, 22, 113, 63,122,101, 202, 239, 11, 233, 75, 245]
Trace 02 [183, 223, 69, 69, 60, 59, 201, 241, 86,200,170, 89, 130, 22, 210,154, 11]
Plaintext

Trace 03 [68 , 122, 173, 173, 223, 119, 0, 144,239, 69, 97, 140, 242, 7, 223,190, 239]
Trace 04 [227, 159, 238, 238, 17, 4, 71, 158, 26,120, 43, 238, 134, 55, 247,129, 103]
Trace 05 [103, 158, 166, 166, 103, 224, 35, 35, 60,142,171, 226, 219, 5, 232,241, 241]
Trace 06 [108, 65, 216, 216, 128, 48, 164, 187,195,217,138, 81, 236, 29, 128, 10, 222]
Trace 07 [27 , 181, 80, 80, 125, 126, 33, 234, 43,104,142, 102, 213, 171, 192,141, 112]
Trace 08 [151, 59, 194, 194, 196, 124, 110, 90, 37, 39,149, 177, 16, 228, 88,132, 123]
Trace … [ …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …]

Guess
AES Power
intermediate es�mator
Key[0]
value Ex( .) Pa(.)

Figure 3.29: Power array estimator for guesses on Key[0].

The second array (let's call it x) is made of the samples of all acquired trace at the same relative
time (i.e. the sample number 0 of 20 traces is an array of 20 elements) as depicted in Figure 3.30.

35
Sample n 0 1 2 3 4 5 6 7 8 9 10 …
Trace 01 [0.23, 0.25, 0.14, 0.50, 0.27, 0.17, 0.08, 0.15, 0.00, -0.18, 0.02, …
power data Trace 02 [0.29, 0.26, 0.14, 0.50, 0.26, 0.15, 0.07, 0.14, -0.01, -0.19, 0.02, …
[0.30, 0.28, 0.14, 0.50, 0.27, 0.16, 0.07, 0.14, 0.00, -0.18, 0.01, …
Sampled
Trace 03
Trace 04 [0.30, 0.27, 0.14, 0.50, 0.26, 0.15, 0.09, 0.15, 0.00, -0.18, 0.01, …
Trace 05 [0.31, 0.25, 0.13, 0.50, 0.27, 0.17, 0.09, 0.15, 0.00, -0.18, 0.01, …
Trace 06 [0.32, 0.25, 0.14, 0.50, 0.25, 0.15, 0.08, 0.16, 0.00, -0.18, 0.01, …
Trace 07 [0.30, 0.27, 0.14, 0.50, 0.25, 0.14, 0.08, 0.15, 0.01, -0.18, 0.03, …
Trace 08 [0.30, 0.25, 0.14, 0.50, 0.26, 0.15, 0.09, 0.15, 0.00, -0.20, 0.00, …
Trace … [ …, …, …, …, …, …, …, …, …, …, …, …

𝑥 = [0.23, 0.29, 0.30, 0.30, 0.31, 0.32, 0.30, 0.30, …]

Figure 3.30: Sample array from acquired traces at sample n. 0.

The PCC correlation coecient is computed for each sample using the same y array and the x
array related to the sample under test. The result is an array with the same length of a trace made of
PCC s, see Figure 3.31. Only the maximum value of this array (and eventually its position) is relevant
for the single-byte key identication. In general, it is considered as absolute value because the HW to
Absorbed Power relationship can be either positive or negative depending on how the data capture is
made, in CWNano case we know it is positive so we can focus on positive values only. This process is
then repeated for all the key guesses and for all the key bytes to discover the complete secret key.

Figure 3.31: Correlation Power Analysis schema.

36
As rst step it is applied to the rst Attack Point : the output of AddRoundKey() function as shown
in Figure 3.32.

Figure 3.32: PCC on AddRoundKey(0) rst 1000 samples.

Again there is a Ghost Peak visible in the BlockCopy() function window; this is easily explainable
because when the KEY guess is 0 the AddRoundKey() output is exactly the plain text value, that is
the data managed by BlockCopy() function, the acquired samples details are shown in Figure 3.33.

Figure 3.33: PCC on AddRoundKey(0) Ghost Peak detail in BlockCopy() window.

37
The searched correlation is present later, at higher sample index, as depicted in Figure 3.34.

Figure 3.34: PCC on AddRoundKey(0) Peak detail in AddRoundKey(0) window.

The complete process with the max research restricted to AddRoundKey(0) window is able to
identify the correct secret key, see Figure 3.35.

byte:00 - 2b OK - Pos: 258 PCC: 0.8090464806115729


byte:01 - 7e OK - Pos: 271 PCC: 0.8354009375878052
byte:02 - 15 OK - Pos: 283 PCC: 0.8342175527632381
byte:03 - 16 OK - Pos: 297 PCC: 0.7843584960046946
byte:04 - 28 OK - Pos: 318 PCC: 0.8588368993055520
byte:05 - ae OK - Pos: 332 PCC: 0.8537232609962971
byte:06 - d2 OK - Pos: 345 PCC: 0.8791922099081197
byte:08 - ab OK - Pos: 380 PCC: 0.7599447137533424
byte:09 - f7 OK - Pos: 393 PCC: 0.8119241752369173
byte:10 - 15 OK - Pos: 405 PCC: 0.8661184533292429
byte:11 - 88 OK - Pos: 419 PCC: 0.8362670369931144
byte:12 - 09 OK - Pos: 440 PCC: 0.7978349837789444
byte:13 - cf OK - Pos: 454 PCC: 0.6730465814729092
byte:14 - 4f OK - Pos: 467 PCC: 0.8107924650751189
byte:15 - 3c OK - Pos: 480 PCC: 0.7696166996511269

Figure 3.35: PCC on AddRoundKey(0) results restricted to AddRoundKey(0) window.

38
Applying CPA to the second Attack Point : the output of SubBytes(0) function as shown in Figure
3.36.

Figure 3.36: PCC on SubBytes(0) rst 1000 samples.

The searched correlation peak detail is visible in Figure 3.37, note that in this case the Ghost Peak
is not present at the beginning of the trace, it happen because there is no correlation between the
searched Power Estimated array and the Plain Text managed by BlockCopy() function.

Figure 3.37: PCC on SubBytes(0) Peak detail.

The complete process applied on SubBytes(0) is able to identify the correct secret key without
window restriction as shown in Figure 3.38:

39
byte:00 - 2b OK - Pos: 548 PCC: 0.8251359345665976
byte:01 - 7e OK - Pos: 598 PCC: 0.9028572174914622
byte:02 - 15 OK - Pos: 648 PCC: 0.8667232746940737
byte:03 - 16 OK - Pos: 698 PCC: 0.9162618575062188
byte:04 - 28 OK - Pos: 559 PCC: 0.8239385753611774
byte:05 - ae OK - Pos: 609 PCC: 0.8117521284154102
byte:06 - d2 OK - Pos: 660 PCC: 0.8144383121109557
byte:08 - ab OK - Pos: 570 PCC: 0.8639884468362278
byte:09 - f7 OK - Pos: 620 PCC: 0.8695165115842424
byte:10 - 15 OK - Pos: 889 PCC: 0.8920412937657465
byte:11 - 88 OK - Pos: 721 PCC: 0.8353320933038173
byte:12 - 09 OK - Pos: 581 PCC: 0.8968311919205509
byte:13 - cf OK - Pos: 631 PCC: 0.8963570395501116
byte:14 - 4f OK - Pos: 681 PCC: 0.9204068436012930
byte:15 - 3c OK - Pos: 731 PCC: 0.9076983019850139

Figure 3.38: PCC on SubBytes(0) results with no window restriction.

As done before Figure 3.39 shows the required traces number to get to complete secret KEY correct
identication over 3 random traces selection in case of CPA applied to AddRoundKey(0) : about 95
traces are needed.

Figure 3.39: CPA on AddRoundKey(0), traces needed to get correct KEY identication.

Figure 3.40 shows the required traces number to get to complete secret KEY correct identication
over 3 random traces selection in case of CPA applied to SubBytes(0) : about 30 traces are needed.

40
Figure 3.40: CPA on SubBytes(0), traces needed to get correct KEY identication.

41
Chapter 4

Conclusion

4.1 Results Evaluation

The work has been carried out looking at two possible Attack Points and two methods to analyze the
acquired data traces.
As regards the Attack Points, it has been highlighted that it is preferable to choose a point in which
the data saved in the memory has a poor correlation, at single-bit level, with the plain text supplied
to the encryption algorithm. Despite this, it is still possible to choose a non-optimal Attack Point at
the price of using a higher number of acquired data traces.
The possibility of attacking an algorithm in two points allows, if necessary, to reconstruct the part
of the algorithm between these points if unknown or partially known. The look-up table implemented
by the SubBytes() function can be rebuilt after the identication of the secret KEY by exploiting the
AddRoundKey() Attack Point and then focusing on the output of SubBytes() using a selected plain
text with the aim of identifying the bytes stored in the look-up table. In this case, this activity was
useless since the look-up table contents are clearly reported in the AES algorithm described in [21].
As reported in Table 4.1 DPA requires, in general, more data traces than CPA to get a correct KEY
identication; furthermore, DPA requires to have a better separation between the groups in terms of
Hamming Weight to be able to correctly identify the secret KEY. This means that if a small amount
of data traces are available and the plain text has not a good random distribution, it could happen
that one group is not big enough to evaluate a good average.
CPA is less expensive in terms of number of needed data acquisition traces, even if it is more
complex from a computational perspective.

Attack Points
Algorithm AddRoundKey() SubBytes()
DPA (bit 0) Don't work 600

DPA (HW=4) 320 80

CPA 95 30

Table 4.1: Required traces for a successful attack: results comparison.

The comparison results can be summarized as follow, in term of needed data:

ˆ CPA works better than any DPA (around 3 times better)

ˆ DPA works better in selecting groups with higher hamming weight distance (around 7 times)

ˆ Attacks points where hamming weight shows higher un-correlation with plain text data works
better

42
The computational eort required by both methodologies is small enough to be easily managed by
a standard performance PCs, like the ones typically used in a home and/or professional environment.
The algorithms' implementation doesn't require the use of expensive or complicated software en-
vironments, they can be easily coded in PYTHON that is a completely free and available on the net,
accompanied by some libraries, also free; this is the most chosen way.
The availability of a lot of examples with relative implementation guides makes these methodologies
easily accessible even to relatively inexperienced users.
A stumbling block is still the hardware side of the work needed to acquire the desired data traces.
Even if it is simple, it still requires adequate tools and skills necessary to modify/make an interface
for acquiring the physical quantity under investigation (the absorbed current in the case of this work).
The inexpensive tools available today, as like as the ChipWhisperer used for this work, can give access
to these methodologies to a to a larger audience of users, anyway the relatively low sample rate and
the need to interface it to a real victim still create a sort of barrier to the completely inexpert user.
Some dierent congurations, using more sophisticated instruments like a PICO Scope or a standard
High Performance Digital Acquisition Oscilloscope and a ChipWhisperer used to manage only the
communication with the target and the relevant acquisition trigger, are documented in the net, in this
case the tools cost considerably increase.

4.2 Known Countermeasures

Attack methodologies undergo continuous evolution as well as countermeasures to make them less
eective. The delay in identifying countermeasures and the diculty of implementing them on devices
already released on the market are determining factors in the choice of their use.
Surely all the countermeasures that require hardware changes can be taken into account for new
products and by introducing changes on products already in production, raising their level of modi-
cation, even if it is much more dicult and expensive. A higher level of diculty is for the products
already delivered to customers. Sometimes, if the product is chip enough, as like a smart card used to
access bank accounts, they can be replaced by throwing away the old devices.
For complex and costly products, countermeasures that only impact software are more appropriate.
In many cases these products have the possibility of receiving updates remotely, think at a router or
a mobile phone, making the update easier and interrupting its operation for a much shorter time.
The countermeasures goal is to hide the relevant side channel information leak leveraging on the leak
model used. With reference to [28] and [29] a non-exhaustive list of countermeasures, both hardware
and software, that can be adopted today to deal with the attack methodologies described in this work
are:

ˆ Hiding : this kind of countermeasure is typically implemented in hardware; its goal is to avoid
the data dependency on current absorption due to dierent behavior of HIGH to LOW and
LOW to HIGH switching (HW). Many dierent implementations have been identied, anyway
the most referenced one uses memory writes with both the value and its opposite logical value,
as in the Dual Rail Logic [30]; less complex solution tries to lter o the high frequency part of
the absorbed current sometimes with a simple capacitor placed very close to the relevant power
supply pins or more complex solution as [31].
This method can also be applied in Software, balancing the number of set bits processing in a
word containing both the valid data and its complement.
This is, in theory, a very eective method; anyway the data dependency is not completely removed
due to physical dierences present in dierent parts of the device, usually it can be bypassed
with more precise measurements and signal processing analysis.

ˆ Increasing noise : adding noise, both through Hardware noise generators on chip, executing
dierent instructions in parallel, widening data path or in Software doing things in Parallel,
is another way to hide the exploitable power components. Also these solutions do not hide
completely the exploitable side channel power absorption dierences, anyway can make the attack
required work harder.

43
ˆ Masking : or secret sharing can be implemented both in Hardware or Software. The goal is to
randomize the intermediate values keeping the same input and output as the standard algorithm.
In Hardware it is typically achieved through Masked Logic elements, working at gate level using
a bit mask sequence randomly selected applied to the masked elementary logic function. In
Software it could be achieved in a similar way [32], as example by applying (EXOR) the random
mask to a certain part of the algorithm and again at the end in a way that doesn't change the
algorithm results, also in this case the mask is randomly chosen and changed at every algorithm
execution or sooner if possible.

ˆ Random shue : implemented both in Hardware or Software [33] the goal of this method is to
avoid the time correspondence (sample position) of dierent traces.

ˆ Random instruction insertion : implemented both in Hardware [34] or Software the goal of this
method is doth to avoid the time correlation (sample position) of dierent traces and to add
some noise due the unexpected instruction execution.

4.3 Future Developments

As in a war the exploitation of a vulnerability linked to a side-channel and the identication of the
relative countermeasures follow each other until a denitive solution is identied, perhaps involving
several technologies or making it cheaper to direct eorts on another side-channel or methodology.
All the countermeasures listed above are intended to make the Side-Channel based attack much
more complicated by reducing its eectiveness in terms of costs/benets. The natural continuation
of this work consists in studying the countermeasures listed in the previous chapter and the related
attack methods. Several activities are already present the literature (see [35]) such as higher-order
attacks [36], alignment algorithms [37] and acquired data noise removal [38].
One additional possible further evolution is the study of a way to automate the identication of
the used countermeasures, one or more, and dene the right methodology or set of methodologies that
ensure to put in place an eective attack.

44
Bibliography

[1] S. Heath, 1 - what is an embedded system?, in Embedded Systems Design (Second Edition)
(S. Heath, ed.), pp. 1  14, Oxford: Newnes, second edition ed., 2002.

[2] I. Stellios, P. Kotzanikolaou, M. Psarakis, C. Alcaraz, and J. Lopez, A survey of IoT-enabled cy-
berattacks: Assessing attack paths to critical infrastructures and services, IEEE Communications
Surveys & Tutorials, vol. 20, no. 4, pp. 34533495, 2018.

[3] S. Kumar, H. Kumar, and G. R. Gunnam, Security integrity of data collection from smart
electric meter under a cyber attack, in 2019 2nd International Conference on Data Intelligence
and Security (ICDIS), pp. 913, 2019.

[4] U.s. gao - vehicle cybersecurity: Dot and industry have eorts under way, but dot needs to
dene its role in responding to a real-world attack. https://www.gao.gov/products/GAO-16-350.
(Accessed on 08/31/2020).

[5] C. Biesecker, Connected aircraft open new cyber threat vectors to com-
mercial aviation, thales usa chief warns. https://www.defensedaily.com/
connected-aircraft-open-new-cyber-threat-vectors-commercial-aviation-thales-usa-chief-warns/
cyber/, 2018. (Accessed on 01/01/2021).

[6] Kskhh, A sony bravia smart tv showing the home screen. (CC BY-SA 4.0).

[7] S. Soltan, P. Mittal, and H. V. Poor, Blackiot: Iot botnet of high wattage devices can disrupt
the power grid, in 27th USENIX Security Symposium (USENIX Security 18), (Baltimore, MD),
pp. 1532, USENIX Association, Aug. 2018.

[8] S. Fruitsmaak, An articial pacemaker (serial number 1723182) from st. jude medical, with
electrode. the body of the device is about 4 centimeters long, and the electrode measures roughly
58 centimeters.. (CC BY 3.0).

[9] D. Quarta, M. Pogliani, M. Polino, F. Maggi, A. M. Zanchettin, and S. Zanero, An experimental
security analysis of an industrial robot controller, in 2017 IEEE Symposium on Security and
Privacy (SP), pp. 268286, IEEE, 2017.

[10] P. Prinetto and G. Roascio, Hardware security, vulnerabilities, and attacks: a comprehensive
taxonomy, CEUR Workshop Proceedings, 2020.

[11] FBI, Poligraph picture, public domain. (Accessed on 01/12/2021).

[12] T. Hunkin, Illegal engineering. https://www.timhunkin.com/94_illegal_engineering.htm. (Ac-


cessed on 11/06/2021).

[13] D. Easter, The impact of `tempest' on anglo-american communications security and intelligence,
19431970, Intelligence and National Security, pp. 116, jul 2020. (Accessed on 08/26/2020).

[14] U. S. G. P. Oce, Teletypewriter set 131b2. http://www.navy-radio.com/manuals/tty/fgq1-tm-


11-2209.pdf, 1946. pag 59.

45
[15] H. Wang, D. Ji, Y. Zhang, K. Chen, J. Chen, and Y. Wang, Optical side channel attacks on
Proceedings of the 2015 International Conference on Industrial Technology and
singlechip, in
Management Science, Atlantis Press, 2015.
[16] D. Genkin, A. Shamir, and E. Tromer, Acoustic cryptanalysis, Journal of Cryptology, vol. 30,
pp. 392443, feb 2016.

[17] M. Hutter and J.-M. Schmidt, The temperature side channel and heating fault attacks, in Smart
Card Research and Advanced Applications, pp. 219235, Springer International Publishing, 2014.

[18] RAMBUS, Dpa workstation testing platform. https://www.rambus.com/security/dpa-


countermeasures/. rev 04.

[19] Cw1101 chipwhisperer-nano. https://rtfm.newae.com/Capture/ChipWhisperer-Nano/. (Ac-


cessed on 14/04/2021).

[20] J. Daemen and V. Rijmen, The design of Rijndael: AES  the Advanced Encryption Standard.
Springer-Verlag, 2002.

[21]  FIPS PUB 197: Advanced encryption standard (AES), tech. rep., National Institute of Stan-
dards and Technology, nov 2001.

[22] Cw1101 chipwhisperer-nano - api description. https://chipwhisperer.readthedocs.io/en/latest/api.html.


(Accessed on 11/06/2021).

[23] F.-X. Standaert, Introduction to side-channel attacks, in Integrated Circuits and Systems, pp. 27
42, Springer US, dec 2009.

[24] M.-L. Akkar, R. Bevan, P. Dischamp, and D. Moyart, Power analysis, what is now possible..., in
Advances in Cryptology  ASIACRYPT 2000 (T. Okamoto, ed.), (Berlin, Heidelberg), pp. 489
502, Springer Berlin Heidelberg, 2000.

[25] P. Kocher, J. Jae, B. Jun, and P. Rohatgi, Introduction to dierential power analysis, Journal
of Cryptographic Engineering, vol. 1, pp. 527, mar 2011.

[26] Y. HAN, X. ZOU, Z. LIU, and Y. CHEN, Ecient DPA attacks on AES hardware implementa-
tions, International Journal of Communications, Network and System Sciences, vol. 01, no. 01,
pp. 6873, 2008.

[27] E. Brier, C. Clavier, and F. Olivier, Correlation power analysis with a leakage model, in Lecture
Notes in Computer Science, pp. 1629, Springer Berlin Heidelberg, 2004.

[28] L. Zhang, L. Vega, and M. Taylor, Power side channels in security ics: Hardware countermea-
sures, May 2016.

[29] D. Das, M. Nath, S. Ghosh, and S. Sen, Killing EM side-channel leakage at its source, IEEE,
aug 2020.

[30] J.-L. Danger, S. Guilley, S. Bhasin, and M. Nassar, Overview of dual rail with precharge logic
styles to thwart implementation-level attacks on hardware cryptoprocessors, in 2009 3rd Inter-
national Conference on Signals, Circuits and Systems (SCS), IEEE, nov 2009.

[31] M. Kar, A. Singh, S. K. Mathew, A. Rajan, V. De, and S. Mukhopadhyay, Reducing power side-
channel information leakage of AES engines using fully integrated inductive voltage regulator,
IEEE Journal of Solid-State Circuits, vol. 53, pp. 23992414, aug 2018.

[32] S. Bhasin, J.-L. Danger, S. Guilley, and Z. Najm, A low-entropy rst-degree secure provable
masking scheme for resource-constrained devices, in Proceedings of the Workshop on Embedded
Systems Security - WESS '13, ACM Press, 2013.

46
[33] N. Veyrat-Charvillon, M. Medwed, S. Kerckhof, and F.-X. Standaert, Shuing against side-
channel attacks: A comprehensive study with cautionary note, in Advances in Cryptology 
ASIACRYPT 2012 (X. Wang and K. Sako, eds.), (Berlin, Heidelberg), pp. 740757, Springer
Berlin Heidelberg, 2012.

[34] J. A. Ambrose, R. G. Ragel, and S. Parameswaran, A smart random code injection to mask
Proceedings of the 5th IEEE/ACM international
power analysis based side channel attacks, in
conference on Hardware/software codesign and system synthesis - CODES mathplus ISSS '07,
ACM Press, 2007.

[35] C. Clavier, J.-S. Coron, and N. Dabbous, Dierential power analysis in the presence of hardware
countermeasures, in Cryptographic Hardware and Embedded Systems  CHES 2000, pp. 252263,
Springer Berlin Heidelberg, 2000.

[36] K. Lemke-Rust and C. Paar, Gaussian mixture models for higher-order side channel analysis,
in Cryptographic Hardware and Embedded Systems - CHES 2007 (P. Paillier and I. Verbauwhede,
eds.), (Berlin, Heidelberg), pp. 1427, Springer Berlin Heidelberg, 2007.

[37] Q. Tian and S. A. Huss, A general approach to power trace alignment for the assessment of
side-channel resistance of hardened cryptosystems, in2012 Eighth International Conference on
Intelligent Information Hiding and Multimedia Signal Processing, IEEE, jul 2012.
[38] T.-H. Le, J. Clediere, C. Serviere, and J.-L. Lacoume, Noise reduction in side channel attack
using fourth-order cumulant, IEEE Transactions on Information Forensics and Security, vol. 2,
no. 4, pp. 710720, 2007.

47

You might also like