A Novel Completeness Test and Its Application To Side Channel Attacks and Simulators
A Novel Completeness Test and Its Application To Side Channel Attacks and Simulators
Abstract
Today’s side channel attack targets are often complex devices in which
instructions are processed in parallel and work on 32-bit data words. Con-
sequently, the state that is involved in producing leakage in these modern
devices is large, and basing evaluations (i.e. worst case attacks), simu-
lators, and assumptions for (masking) countermeasures on a potentially
incomplete state can lead to drastically wrong conclusions.
We put forward a novel notion for the “completeness” of an assumed
state, together with an efficient statistical test that is based on “collapsed
models”. Our novel test can be used to recover a state that contains
multiple 32-bit variables in a grey box setting. We illustrate how our
novel test can help to guide side channel attacks and we reveal new attack
vectors for existing implementations. We also show how the application
of our statistical test shows where even the most recent leakage simulators
do not capture all available leakage of their respective target devices.
1 Introduction
Since Kocher’s seminal work [21], research has explored the properties of all
“key ingredients” that contribute to successful side channel (key recovery) at-
tacks. These key ingredients include side channel distinguishers (i.e. the way in
which side channel leakage is statistically exploited in attacks), and side chan-
nel leakage models (i.e. the way in which side channel leakage is predicted
or explained by an adversary). The latter component, the leakage model, is
crucial for attacks (a better model leads to attacks requiring fewer leakage ob-
servations), and therefore it is also of fundamental significance in the context
of security evaluations, but it also greatly matters in the context of leakage
simulators as well as for assumptions that designers make when implementing
masking countermeasures.
But what does a leakage model constitute of? Informally, most of the existing
literature understands a leakage model to be a leakage function that maps a
1
collection of device internal variables (the state) to a real value (if it is a uni-
variate model). Considering this informal definition in the context of attacks,
it is clearly desirable to try and find a function that offers good predictions
for the true device leakage, because it enables successful attacks with fewer
traces. Thus, a lot of research has gone into deriving good estimates for leakage
functions from real device data [11, 31, 15, 35, 26]. However, the leakage function
itself is only part of what consistitutes a leakage model: the state that is being
leaked on is equally relevant.
In a realistic setting (such as low to mid-range processors, or dedicated hard-
ware implementations of core crypto algorithms), finding the state is far from
easy, not only because they tend to be closed source. If open source descrip-
tions are available, e.g. ARM released some semi-obfuscated VHDL descrip-
tions, then these are at best architecturally similar to the commercial prod-
ucts of the same type, but they are not micro-architecturally equivalent at all.
Micro-architectural effects have been explored and exploited across many recent
works [29, 19, 27, 18, 25]. These papers show how a wrong assumption about the
state renders provably secure masking schemes completely insecure in practice.
In the context of application specific crypto cores, the situation is not better
as their descriptions are typically also not available to the public. Taking the
perspective of a designer of an application specific crypto core (who has access
to such a description), it is in principle possible to identify the components that
are active during any cycle. However, inclusion of everything that contributes
without understanding of the amount of its contribution or its relevance, may
lead to a model that becomes entirely impractical to work with. Thus even in
this context, a methodology to decide what is the “state that matters” would
be desirable.
Our contribution. We stress that finding the exact intermediate state from
a typical processor in a grey box setting is a long-standing problem: like many
(statistical learning) problems, a universally optimal solution is unlikely to exist.
Thus, whilst we do not claim optimality of our work, we claim the following
contributions:
2
effective attacks, it plays an important role in comprehensive security eval-
uations.
5. We discuss the importance of completeness in the context of simulators for
leakage detections and demonstrate that our approach can lead to better
models for simulations.
2 Preliminaries
2.1 Leakage modelling: state of the art
We use some simple notations throughout this paper: for the sake of readabil-
ity we do not include time indices in our notations. Consequently, any set of
variables, and any leakage function, is specific to a point in time during the
computation of some algorithm1 .
We call the set X the entire device state. X is comprised of key and input
dependent variables that the leakage function L acts on. The variable Y = {y} ∈
Y is a leakage observation that is available to an adversary. We also follow the
usual convention that traces are noisy, whereby the leakage contribution L(X)
and the (Gaussian) noise N (0, σ 2 ) are independent:
y = L(X) + N (0, σ 2 )
3
ever occured). However, such a model is clearly unusable for attacks/evaluations
(because it requires guessing the entire key) and it is also unusable for simulators
(because the estimation of its distribution is computationally infeasible).
The de facto practice in all attacks/evaluations, when building leakage mod-
els, is to divide the model building process into two steps. The first step is
identifying a concise (i.e. easily enumerable) state Z. For instance, a popular
assumption is that the intermediate state depends completely on the output of
an S-box (denoted by Sout ) computation, which leads to the cardinality of the
state being small (e.g. #{Z} = 28 for AES).
The second step is to estimate the leakage function assuming it only acts on
Z. Various techniques have been proposed, including naive templating [11],
regression-based modelling [31, 15], step-wise regression [38] etc. Previous
works [38, 31, 17] have also proposed various metrics to evaluate/certificate the
device’s leakage (as well as the quality of model that built from the measure-
ments). As many will be utilised later, the next two subsections explain these
techniques in details, then we move on to our point of interest: what should we
do about the first step?
4
reality, in which case the model quality relates to how many of the relevant
factors it can identify. In the context of leakage attacks/evaluations, models
are used to predict side channel leaks. Therefore we use metrics such as the
coefficient of determination, and cross validation to judge the quality. In the
context of leakage simulators, models are built that are supposed to include as
many relevant leakage sources as possible. Therefore, the quality relates how
two (or more) models compared with each other in terms of explaining the
realistic leakage.
where q represents the number of traces and z (i) represents the value of z for the
i-th measurement. Meanwhile, the explained data-variance can be interpreted
as the explained sum of squares (ESS),
q
X 2
ESS = L̃(z (i) ) − ȳ
i=1
5
F-tests Given two “nested” models (i.e. there is a so called full model and
a restricted model which only consists of a subset of terms), the F-test is the
most natural way to decide whether the restricted model is missing significant
contribution compared with the full model. More specifically, assuming a full
model L̃f (Zf ) and a restricted model L̃r (Zr ), where Zr is constructed by re-
moving zf − zr explanatory variables (set regression coefficients to 0) from Zf ,
one can compute the F-statistic as
RSSr −RSSf
zf −zr
F = RSSf
.
q−zf
The resulting F follows the F distribution with (zf −zr , q−zf ) degree of freedom.
A p-value above a statistically motivated threshold rejects the null hypothesis
(the two models are equivalent) and hence suggests that at least one of the
removed variables is potentially useful. This approach was used in [26] to derive
relatively fine grained models on selected intermediate states.
6
3.1 Completeness
So far the nested F-test has mainly been used for determining low degree terms
of L̃(Z) [26, 38]. However in the wider statistical literature, its primary us-
age is in selecting contributing factors from a variety of potential explanatory
variables, which helps to build a model with a high explanatory power.
Definition 1. For any two nested models L̃(A), L̃(B) we denote A as com-
plete (with respect to B) if the F-test results support the assumption that
L̃(A) did not miss any significant contributing factor compared with L̃(B). With
infinite profiling traces (i.e. infinite statistical power), this usually suggests
B ⊆ A (if already assumed A ⊆ B, then A = B).
~
L̃r = β(Z)
If the F-test reports a p-value higher than the significance level, we can con-
clude at least one of the regression terms that depends on x3 is contributing
significantly. In other words, x3 ∈ X and X 6⊆ Z, which suggests the model
built with Z is missing some leakage, it is not complete with respect to X̂ and
therefore it is also not complete with respect to X.
7
3.2 Collapsed F-test for completeness
The full model is often too large: it consists of many large variables in practice
(in statistical terminology it is a factorial design). Consequently, we must find
ways to reduce the inevitable complexity. Techniques such as aliasing (i.e. if
several factors are known to be related) or identifying confounding variables (i.e.
variables that impact on multiple factors) are well known in statistical model
building. What is possible is typically situation dependent. We make two key
observations that apply to side channel leakage modelling.
The first observation (we mention this before in the text) is that the F-test,
although often used to deal with proportional models, actually tests for the
inclusion/exclusion of explanatory variables in nested models. Such models are
nominal models, i.e. the regression coefficient is either 0 or 1: either a variable
matters or it does not.
The second observation is that although our explanatory variables contain
n bits, they are rarely independent; and we consider a variable as relevant if
any of its bits are relevant. Consequently, there is an elegant trick to bound our
explanatory variables to small space.
a = (a0 , a0 , ..., a0 ), a0 ∈ F2 .
Applying this restriction to the other 3 inputs, the full model now contains only
24 parameters, which is easy to work with.
Of course, such a restriction is not for free: originally, there could be many
interaction terms between the explanatory variables. In the model Lc these
terms are “collapsed” and “added” to the remaining terms, e.g. a1 a0 becomes
a0 as a1 = a0 . In fact, as there is only 1 bit randomness, a0 now becomes an
alias for the operand A: having this term in Lc suggests A appears in L, but
certainly not in the same way as in Lc . We can expand this idea by allowing two
bits of randomness: this enables us to differentiate between linear and non-linear
models2 .
Formalising this idea, we define a mapping called “collapse” Coll on the
uj (Z), where Z = {AA0 BB 0 }. Recall that uj (Z) is defined (Section 2.2) as:
Y j
uj (Z) = zi i
2 We could take this further and include more “in-variable” interactions, but we left this
8
where ji represents the i-th bit of the binary representation of j. For any
j ∈ [0, 24n ), we define a 24n → 2n map Coll as:
The latter can be easily tested in an F-test. In the following, we show that
the test model passes the F-test in our “collapsed” case is a necessary (but not
sufficient) condition for passing the F-test in the original setup.
Theorem 1. If a collapsed term ujcoll (Zc ) cannot be ignored from L̃c (i.e.
βjcoll 6= 0), at least one of the corresponding uj (Z) cannot be ignored from L̃
(i.e. βj 6= 0).
Proof. In the original case, any leakage model can always be written as
24n
X −1
L̃(Z) = βj uj (Z)
j=0
However, considering the inputs have been bounded, such model collapses to:
4
2X −1 X
L̃(Z) = βj ujcoll (Zc )
jcoll =0 ∀j,Coll(j)=jcoll
Thus, if a certain collapsed term ujcoll (Zc ) has a significant contribution to L̃c
(i.e. βjcoll 6= 0), one can conclude that:
X
βj 6= 0 ⇒ ∃j, βj 6= 0
∀j,Coll(j)=jcoll
Clearly nothing can be concluded if the above sum equals 0, which suggests this
is only a necessary condition.
Theorem 1 implies that whilst we still cannot directly test L̃f = β(~ X̂), we
~
can estimate the restricted (and collapsed) models L̃cr = β(Zc ) from realistic
measurements and test it against the estimated collapsed full model L̃cf =
~ X̂c ): if the F-test finds enough evidence to reject a model L̃cr in relation to
β(
~
the collapsed full model L̃cf , then it is clear that the model L̃r = β(Z) would
~
also be rejected in comparison to the full model L̃f = β(X̂) (i.e. Z is not
complete with respect to X̂).
9
2n
Toy example. Suppose we want to test L̃r = β{AB}, ~ β~ ∈ (0, 1)2 against
4n
~
L̃f = β{AA 0
BB 0 }, β~ ∈ (0, 1)2 . As mentioned before, for n = 32, direct testing
is not feasible. However, we can bound the inputs and test
L̃cr = β0 + β1 a0 + β2 b0 + β3 a0 b0
If the F-test rejects the null hypothesis, then we know that the missing terms
make a difference not only in L̃c but also in L̃. Therefore, we can conclude
the L̃r is also not complete, without explicitly testing it. The price to pay is
that unlike the original F-test, our collapsed test becomes a necessary yet not
sufficient condition: that being said, any Z that fails our test still presents a
genuine concern, as it directly suggests the selected Z is unlikely to be complete
and all following steps can be potentially jeopardised.
Considering from now on we will always work with collapsed models, we will
not continue using the double subscript cr, but revert back to just using r.
RF2 − RR2
RSSR − RSSF
f2 = 2 =
1 − RF RSSF
Under the alternative hypothesis, the computed F-statistic follows a non-
central F distribution with non-centrality parameter λ and two degrees of free-
dom from the numerator df1 and the denominator df2 . When f 2 = 0, this
becomes the null distribution of the central F-distribution. Thus, when the
false positive rate is set to α, the threshold of F-statistic is
10
where Fnc is the CDF function of the non-central F distribution. The statistical
power for effect size f 2 is then 1 − β. Our test in Section 5.1 has df1 = {256 −
7, 256 − 19, 256 − 16}, df2 = q − 256, q = 20000, per-test α = 10−3.7 , which
comes to 1 − β ≈ 1 for the small effect size f 2 = 0.02 in [12]. According to [37]
this corresponds indeed to what they observed in similar experiments.
Summarising, the F-Test on collapsed models has high power for relevant
side channel scenarios according to [37], and assuming sufficient traces (20k in
our calculation).
11
30 30
25 25
-log(p-value)
-log(p-value)
20 20
15 15
10 10
5 5
0 0
200 400 600 800 200 400 600 800
Time(*4ns) Time(*4ns)
30 30
-log(p-value) 25 25
-log(p-value)
20 20
15 15
10 10
5 5
0 0
200 400 600 800 200 400 600 800
Time(*4ns) Time(*4ns)
Figure 1 illustrates the F-test results for all 4 Sbox computations. Clearly,
architectural interpretation of “only computation leaks” is not enough: the F-
test for the single byte models (blue line) all exceed the threshold, thus fail.
This suggests any attack that is based on a single byte might be not optimal.
Consequently, security evaluations/leakage certifications that are based on single
bytes can be overly optimistic.
One step further. Now let us further investigate what variables may be
missing. Through deliberately adding possible terms to the model and repeating
the test, we found one acceptable leakage model that is always assuming each
computation leaks all 4 bytes. The red line in Figure 1 shows that this model
is not rejected.
With this new finding in mind, we revert back to the un-collapsed case:
with the un-collapsed trace set, we plot the linear regression attack [15] results
(targeting each Sbox output) with the correct key in Figure 2: as expected, each
Sbox computation leaks mainly for the byte it is computing. However, from the
second Sbox, we start to see a small peak (slightly above the threshold) for the
first Sbox output. For the last byte, all previous three Sbox outputs have an
influence, which is consistent with what we observed in Figure 1. The fact that
a linear regression distinguisher can also find these leaks confirms the existence
and exploitability of such leakage in realistic attacks4 . This suggests that single
byte attacks are far from optimal.
entropy and can be expressed with only 2 bits. Therefore L̃(Z) can still portray any leakage
caused by S(x0 ).
4 Additional profiling is also possible where one could estimate coefficients separately to
create proportional models for dedicated attacks which is however not the focus of this paper.
12
-log(p-value)
-log(p-value)
40 40
30 30
20 20
10 10
0 0
200 400 600 800 200 400 600 800
Time(*4ns) Time(*4ns)
-log(p-value)
-log(p-value)
40 40
30 30
20 20
10 10
0 0
200 400 600 800 200 400 600 800
Time(*4ns) Time(*4ns)
purpose, we examined the source code and noticed that the code in TinyAES [22]
suggests the 4 S-box bytes are stored within one word, but not accessed adja-
cently in the Sbox look-ups (this explains why we did not see any Hamming
distance style leakage). From the existing knowledge, we suspect the observed
leakage is from the fact that the processor loads more than the target byte in a
load instruction: as suggested in Section 5.2 of [25], it is indeed quite common
to load the entire word from memory then discard the unnecessary bytes.
not available on our platform. We had rewritten an equivalent version in pure Thumb-16
assembly. This makes no difference in our leakage analysis as we are not targetting at this
part.
13
authors’ analysis [1]), or take it as a follow-up analysis where the shuffling
permutation has been already recovered using the technique in [3].
Code snippet for the Sbox In order to compute the Sbox’s ouput using the
pre-computed table, one must transfer the additive mask rai to rin , then after
the table look-up, transfer rout back to rai . The SubBytesWithMask function
performs this task as follow:
SubBytesWithMask:
... //r3=C(x) r10=ra
... //r0=i r8=S’
ldrb r4, [r3, r0] //(1) r4=C(x)_i^rin
ldrb r6, [r10, r0] //(2) r6=ra_i
eor r4, r6 //(3) r4=C(x)_i^rin^ra_i
ldrb r4, [r8, r4] //(4) r4=rmS(x)^rout
eor r4, r6 //(5) r4=rmS(x)^rout^ra_i
strb r4, [r3, r0] //(6) store r4 to state
... //removing rout later
Note that the rin is added before this function, therefore line (1)-(3) purely
focus on removing rai . Similarly, removing rout is postponed to the end of the
Sbox calculation, therefore not presented in this code.
Only computation leaks. Following the spirit of Section 4.1, we analyse the
leakage of the first Sbox look-up and use 1 bit to represent each xi . All random
masks must also be considered in our leakage analysis: we use 6 bits to represent
ra0:3 , rin and rout respectively. When collapsed to 1 bit, rm is restricted to 1
14
45 45
40 40
35 35
-log(p-value)
-log(p-value)
30 30
25 25
20 20
15 15
10 10
5 5
0 0
50 100 150 200 250 300 50 100 150 200 250 300
Time(*4ns) Time(*4ns)
(i.e. nullifies the protection of rm)6 . Thus, we exclude this bit from our F-test
and analyse the leakage where rm is set to 1. This means we will not cover any
potential unexpected leakage introduced by rm in our experiment: of course,
one can always switch to 2-bit version and use more traces to cover rm.
The complete model is therefore defined as
~
L̃(X̂) = β{∀j uj (X̂)|x̂ = x0:3 ||ra0:3 ||rin ||rout , x̂ ∈ X̂}
Following our common practice, it is expected that all the computed values
are leaked plus some transitions. As a starter, let us first use some coarse-grained
model that capturing all possible computations for the first Sbox:
~
L̃(Z) = β{∀j uj (Z)|z = x0 ||ra0 ||rin ||rout , z ∈ Z}
Readers can easily verify that all the intermediate values appear in the code
snippet can be expressed by this restricted model L̃(Z). However, once again,
we find this x0 -only model is hardly satisfying in the collapsed F-test: as we
can see in Figure 3, the blue line clearly passes the threshold, which suggests
the realistic leakage contains much more than what L̃(Z) can express.
One step further. Following the same clue we found in Section 4.1, it is
sensible to assume each ldrb loads not only the target byte, but also the other
bytes within the same word. Thus, our line (1) loads:
analyses and attacks are performed on the un-collapsed traces, where the protection of rm
still applies.
15
Let us further consider line (4): if it also loads a word, then
The tricky bit of this word is its byte-order depends on rin, which varies
from trace to trace. Therefore, if we calculate the memory bus transition leakage
from line (2) to (4), the correct form can be complicated. Nonetheless, we can
always create a conservative term Za1 where za1 = x0 ||rin||rout||ra1 : adding
~ a1 ) covers all possible transitions between ra1 and the Sbox output bytes
β(Z
from line(4), despite which byte it is transmitting to. Similarly, we add Za2 and
Za3 to L̃(Z) and construct a model that passes the F-test (i.e. the cyan line in
the left half of Figure 3).
We further verify our inference from the F-test—ldrb loads word and causes
word-wise transitions. In order to confirm such leakage does exist, we go back
to the original un-collapsed implementation and perform a linear regression
attack [15] on rm ⊗ xi ⊕ rin . In theory, ldrb should load x0 only, which means
only rm⊗x0 ⊕rin should be computed as for the masked table look-up. However,
we did observe that the other 3 bytes also contribute to peaks on the regression
results in the right half of Figure 3. To our knowledge, the most reasonable
explanation is such leakage is from the transition from line (1) and (2), where
the entire word is loaded in both cases.
16
Univariate 2nd order Bivariate 2nd order
20 20
collision
18 18 non-collision
16 16
-log10(p-value)
14
-log10(p-value)
14
12 12
10 10
8 8
6 6
4 4
2 2
0 0
10 20 30 40 50 10 20 30 40 50
Traces [*50] Traces [*50]
effective attacks that build proportional models guided by our nominal models.
From a non-profiled perspective, it also considerably extends the attack scope
from the authors’ correlation analyses [1].
17
4.3 A Case Study for Hardware
In this section we move on to another realistic scenario: the ASIC DES imple-
mentation that was the basis of the DPAContest [30]. As the goal of this trace
set is validating attacks, for our purpose, it serves as a highly practical/realistic
example.
The DPAContest website provides power traces for 2 unprotected DES crypto-
processors in ASIC and one FPGA based implementation. We used the first data
set, namely secmatv1 2006 04 0809. Our analysis for “secmatv3 20070924” lead
to the same conclusions.
The DES core is running at 32 MHz, while the scope captures traces at 20
GSa/s. As a consequence, the captured traces cover a 1 µs time period with
20000 measurement values, whereby each clock cycle contains 626.67 measure-
ment values. To avoid any statistical influence from the input, we select the cycle
when the third round encryption flips to the fourth round encryption (around
index [6893, 7520) in the traces). Considering the implementation is parallel, we
further assume each S-box computation is independent from other concurrent
S-boxes7 . Our following analysis is limited to modelling the power consumption
from the first DES S-box, while the power consumption from the other S-boxes
simply becomes noise. We do not see this as a particular restriction because this
is indeed a quite common choice for attacks/evaluations [17, 30], therefore our
results can at least apply to those cases. The entire data set contains more than
80k traces: in our experiments, the first 60k were used for constructing models
with our F-test methodology, while the last 20k traces serve as a cross-validation
set.
18
200 50 200
-log(p-value)
-log(p-value)
-log(p-value)
40
150 150
30
100 100
20
50 50
10
0 0 0
200 400 600 200 400 600 200 400 600
Time(*0.05ns) Time(*0.05ns) Time(*0.05ns)
2 2
Cross validation R Cross validation R Cross validation R 2
0.1 0.15 0.1
0.1
0.05 0.05
0.05
R2
R2
R2
0 0
0
-0.05 -0.05
-0.05
19
Profiled key recovery: secmatv1_2006_04_0809 Profiled key recovery: secmatv3_20070924
20 20
15 15
5 5
0 0
20 40 60 80 100 50 100 150 200
*10 Traces *10 Traces
combinatorial components can contribute (see L̃T A v.s. L̃l ), but the benefit
will be counteracted by the over-fitting effect (L̃f v.s. L̃le ). Consistent with
all previous results, transition leakage makes a significant contribution (L̃l v.s.
L̃le ): this suggests if adding transitions does not enlarge the key guess space,
one should prioritise models that contain transition leakage.
20
A[31:0]
ALE ABE Scan control
INSTRVALID
Address register DBGRQI
BREAKPTI
Incrementer bus
DBGACK
PC bus a b
Address ECLK
incrementer nEXEC
ISYNC
BL[3:0]
Operand A Operand B
Register bank APE
(31 x 32-bit registers)
(6 status registers)
MCLK a’ b’
nWAIT
nRW
MAS[1:0]
Instruction nIRQ
decoder and
ALU bus
32 x 8 nFIQ
Multiplier logic control nRESET
ABORT ALU
A bus
nTRANS
nMREQ
nOPC
c’
B bus
SEQ
Barrel shifter LOCK
nCPI
CPA
CPB (b) Architectural view
nM[4:0]
32-bit ALU TBE a=(a(1)||a(2)) b=(b(1)||b(2))
TBIT
HIGHZ
Instruction pipeline
Write data register Read data register ASM function for ISWd2
Thumb instruction controller r
DBE
nENOUT nENIN
D[31:0]
c=(c(1)||c(2))
Architectural view Because the actual state is unknown, our analysis must
be guided by (an abstraction of) the available information about the M3 ar-
chitecture. Figure 7a shows a simplified architectural description for a realistic
ARM M3 core [24]: whilst different realisations of an M3 adhere to this architec-
ture, their micro-architectural features (such as buffers or registers) will differ.
A common micro-architectural element for such a processor architecture would
be some so-called pipeline registers: these are the input registers in Figure 7b.
Thus we can map the entire red block in Figure 7a to Figure 7b.
21
Common instruction-wise model A common simplification in many pre-
vious grey-box simulators is focusing on the execute leakage within the ALU
(i.e. Figure 7b instead of Figure 7a). This choice is quite reasonable: even if
the processor has a multi-stage pipeline, we do not necessarily care about the
leakage from fetching the instructions (as it is often not data-dependent10 ). Fol-
lowing our principles in Section 3.1, the reference full model for Figure 7b can
be written as
~
L̃f = β{AA 0
BB 0 }
Note that the output value C is completely determined by A and B, therefore
there is no need to add C into the model here. However, if further restrictions
(eg. the leakage of A is linear) have been added, we might need to add C when
necessary. In our experiments, we also consider the following leakage models
that correspond to the most commonly used models in the existing literature:
~
L̃l = β{A, B, C}l : this model is a linear function in the current inputs and
output. Because of the linearity of the model, it is imperative to include
the output here. E.g. if the circuit implements the bitwise-and function,
the leakage on ab cannot be described by any linear function in a and b.
In the existing literature this is often further simplified to the Hamming
weight of just the output (aka the HW model).
~
L̃le = β{A, B, C, A0 , B 0 , C 0 , (dA), (dB), (dC)}l , where dA = A ⊕ A0 , dB =
B ⊕ B , dC = C ⊕ C 0 : this model further includes Hamming distance
0
Target instruction. Before any further analysis, we craft a code snippet that
can trigger the simplified leakage in Figure 7b, while not causing any other type
of data-dependent leakage from other pipeline stages (i.e. fetch and decode):
eors r2,r2 //r2=0
eors r1,r3 //r1=a’, r3=b’
nop
nop
eors r5,r7 //r5=a, r7=b **Target**
nop
nop
10 Otherwise, the program has data-dependent branches, which should be checked through
22
5
-log(p-value)
-log(p-value)
100 4
3
50 2
1
0 0
50 100 150 200 250 50 100 150 200 250
Time(*4ns) Time(*4ns)
-log(p-value)
100
50
0
50 100 150 200 250
Time(*4ns)
eors r5,r7 represents the cycle we are targeting at in Figure 7b: the 2 pipeline
registers are set to value a and b, where the previous values are a0 and b0 . a0
and b0 are set by eors r1,r3 : since both lines use eors, a (b) and a0 (b0 ) should
share the same pipeline register. The 2 nop-s before/after ensure all data-
dependent leakage should be caused by eors r5,r7 : in a 3-stage pipeline micro-
processor, the previous XOR-s should already been committed and retired, while
the fetcher/decoder should be executing nop-s (which in theory, does not cause
any data-dependent leakage11 ).
23
Table 1: Leakage detection results on a 2-share ISW multiplication gadget
Full model We now switch to a more abstract view: Figure 7c shows the
functional view of our code in Table 1. Clearly, all functional inputs a, b and
r no longer reflect any architectural port/bus/register. Having said that, as-
suming the processor starts from a constant state (in our experiments, ensured
by clearing the data registers and memory buses before the function call), all
leakage can still be bounded with all possible inputs. Thus, if both shares of a
and b and r are collapsed to 2-bit, the full model can be defined as
~ (1) A(2) B(1) B(2) R}
L̃f = β{A
24
200
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
150
-log (p-value)
100
50
0
500 1000 1500 2000 2500
Time(*4ns)
15 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
-log (p-value)
10
0
500 1000 1500 2000 2500
Time(*4ns)
One step further. Similar to Section 4, we can try to build a better leak-
age model by adding terms and re-evaluating the model quality through the
collapsed F-test. The final model out of this ad-hoc procedure is called L̃b . De-
veloping an architectural reasoning for this model is beyond the scope of this
paper. However, Figure 9 shows that L̃b only marginally fails our test, and thus
is considerably better than the linear extended model that many modern simula-
tors use (with respect to our target processor). We build this model by observing
that most operands influence the leakage for at least 2 cycles, which suggests
that the decoding stage does significantly contribute to the data-dependent leak-
age. Consequently we include data from the decoding stage and this leads to
L̃b .
Comparing L̃b . Whilst we now have a model that explains the device leakage
of an M3 for a relatively complex gadget according to our novel test, it is still
open if this better model helps to actually improve the simulator-based leakage
detections. Thus we perform a leakage detection test (first order t-test) for the
2-share ISW implementation above, on realistic traces measured from our M3
25
core, traces from ELMO, traces from MAPS, and traces where we use L̃b to
predict leakage. The last four columns in Table 1 show the resulting leakage
detection test results.
MAPS captures all register transitions, including the pipeline registers in the
micro-architecture (command line option “-p”) [13]. MAPS reports 3 leaking
instructions in our experiments: 2 are verified by the realistic 1st order t-test,
while cycle 3 is not. Technically, this may not be a false-positive because MAPS
is using the 32-bit instruction mov.w instead of the tested 16-bit instruction
mov 12 .
ELMO captures the operands and transitions on the ALU data-bus [26]:
ELMO reports exactly the same leaking cycles as MAPS. Detailed analysis
shows that both cycles leak information from their operands’ transitions: ELMO
captures these as data-bus transitions, while MAPS claims these as pipeline
register transitions. Considering the pipeline registers are connected to the
corresponding ALU data-bus, this is hardly unexpected13 .
Our manually constructed model leads to significantly better leakage pre-
dictions than both MAPS and ELMO as Table 1 shows. It reports the same
leaking cycles as we found in the the real measurements. Specifically, cycle 5
reports a leakage from the ALU output bus transition (aka C ⊕ C 0 in Figure
7b), which is a part of L̃le but not covered by ELMO or MAPS. We suspect
cycle 6 (1250-1500) and 9 (2000-2250) come from the decoding stage: they are
merely a preview of the leakage of cycle 7 and 10.
Extrapolating from this example, it seems clear that building simulators
based on insufficient models (in particular models that wrongly exclude parts
of the state) leads to incorrect detection results.
experiments.
13 At this point we want to clarify that although ELMO is specifically designed for the M0
core architecture, and we are working with an M3 here, the parts of the architecture that
relate to the instructions in our implementation are identical (to the best of our knowledge)
to the M0.
26
Table 2: Leakage assumption from various models and tools (hardware gate-
level)
L̃ = β{A, B, C}.
When considering a glitch extended probe [9], probing C gives both A and
B. If there are no hidden elements then C is fully determined by A and B, and
thus we can omit the output C. Thus, the model of a glitch extended probe
corresponds to
L̃ = β{AB}.
27
Similarly, when extended with transition leakage, A, B and C are both extended
with their previous values. Namely, such probe captures a subset of
L̃ = β{AA0 , BB 0 , CC 0 }.
Finally, when the probe is extended with both glitch and transitions, the adver-
sary can potentially learn the full model:
L̃ = β{AA0 BB 0 }.
Clearly the (1,1,0)-robust probing model contains all information that is avail-
able to an adversary for a simple (gate level) exemplary circuit. According to
our discussion above, it is thus statistically complete. For clarity, we provide a
full comparison in Table 2.
Theory vs. Practice The previous section strongly depends on the as-
sumption that our exemplary circuit was a single gate with no hidden (micro-
architectural) elements. If this assumption also holds in practice, then the
(1,1,0)-probes capture all available information. However, this is likely to only
ever hold in a white-box hardware context. This statement is particularly im-
portant when considering the use of probing based verification tools that operate
on a “high level language”: unless a verification tool is applied to a hardware
netlist, the resulting security proof does not necessarily give any security for the
resulting implementation. This is because the implementation may run on a
device that contains unknown micro-architectural elements, or types of leakage
that were not considered by the proof.
28
• We have shown that in some reported best attacks, some leakage was
missed.
• Predictive models are not guaranteed to capture all relevant leakage.
Therefore accuracy metrics like cross-validation R2 or SSE alone should
not be the basis for a security certification in the style of [17, 2, 23].
Acknowledgments
The authors were funded in part by the ERC via the grant SEAL (Project
Reference 725042).
References
[1] Ryad Benadjila, Louiza Khati, Emmanuel Prouff, and Adrian Thillard.
Hardened library for aes-128 encryption/decryption on arm cortex m4 achi-
tecture.
[2] Olivier Bronchain, Julien M. Hendrickx, Clément Massart, Alex Olshevsky,
and François-Xavier Standaert. Leakage certification revisited: Bounding
model errors in side-channel security evaluations. In Alexandra Boldyreva
and Daniele Micciancio, editors, Advances in Cryptology - CRYPTO 2019
- 39th Annual International Cryptology Conference, Santa Barbara, CA,
USA, August 18-22, 2019, Proceedings, Part I, volume 11692 of Lecture
Notes in Computer Science, pages 713–737. Springer, 2019.
29
Korea, December 7-11, 2020, Proceedings, Part I. Volume 12491 of Lecture
Notes in Computer Science., Springer (2020) 787–816
[6] Bloem, R., Groß, H., Iusupov, R., Könighofer, B., Mangard, S., Winter, J.:
Formal verification of masked hardware implementations in the presence of
glitches. IACR Cryptol. ePrint Arch. 2017 (2017) 897
[7] Ishai, Y., Sahai, A., Wagner, D.A.: Private circuits: Securing hardware
against probing attacks. In Boneh, D., ed.: Advances in Cryptology -
CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa
Barbara, California, USA, August 17-21, 2003, Proceedings. Volume 2729
of Lecture Notes in Computer Science., Springer (2003) 463–481
[8] Barthe, G., Belaı̈d, S., Dupressoir, F., Fouque, P., Grégoire, B., Strub,
P., Zucchini, R.: Strong non-interference and type-directed higher-order
masking. In Weippl, E.R., Katzenbeisser, S., Kruegel, C., Myers, A.C.,
Halevi, S., eds.: Proceedings of the 2016 ACM SIGSAC Conference on
Computer and Communications Security, Vienna, Austria, October 24-28,
2016, ACM (2016) 116–129
[9] Faust, S., Grosso, V., Pozo, S.M.D., Paglialonga, C., Standaert, F.: Com-
posable masking schemes in the presence of physical defaults & the robust
probing model. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(3)
(2018) 89–120
[10] Barthe, G., Belaı̈d, S., Cassiers, G., Fouque, P., Grégoire, B., Standaert,
F.: maskverif: Automated verification of higher-order masking in presence
of physical defaults. In Sako, K., Schneider, S.A., Ryan, P.Y.A., eds.: Com-
puter Security - ESORICS 2019 - 24th European Symposium on Research
in Computer Security, Luxembourg, September 23-27, 2019, Proceedings,
Part I. Volume 11735 of Lecture Notes in Computer Science., Springer
(2019) 300–318
[11] Suresh Chari, Josyula R. Rao, and Pankaj Rohatgi. Template attacks. In
Cryptographic Hardware and Embedded Systems - CHES 2002, 4th Interna-
tional Workshop, Redwood Shores, CA, USA, August 13-15, 2002, Revised
Papers, pages 13–28, 2002.
30
Workshop, COSADE 2018, Singapore, April 23-24, 2018, Proceedings, vol-
ume 10815 of Lecture Notes in Computer Science, pages 82–98. Springer,
2018.
[14] Yves Crama and Peter L. Hammer, editors. Boolean Models and Methods in
Mathematics, Computer Science, and Engineering. Cambridge University
Press, 2010.
[15] Julien Doget, Emmanuel Prouff, Matthieu Rivain, and François-Xavier
Standaert. Univariate side channel attacks and leakage modeling. J. Cryp-
togr. Eng., 1(2):123–144, 2011.
[16] François Durvaux, François-Xavier Standaert, and Santos Merino Del Pozo.
Towards easy leakage certification: extended version. J. Cryptogr. Eng.,
7(2):129–147, 2017.
[17] François Durvaux, François-Xavier Standaert, and Nicolas Veyrat-
Charvillon. How to certify the leakage of a chip? In Phong Q. Nguyen and
Elisabeth Oswald, editors, Advances in Cryptology - EUROCRYPT 2014 -
33rd Annual International Conference on the Theory and Applications of
Cryptographic Techniques, Copenhagen, Denmark, May 11-15, 2014. Pro-
ceedings, volume 8441 of Lecture Notes in Computer Science, pages 459–
476. Springer, 2014.
[18] Si Gao, Ben Marshall, Dan Page, and Elisabeth Oswald. Share-slicing:
Friend or foe? IACR Transactions on Cryptographic Hardware and Em-
bedded Systems, 2020(1):152–174, Nov. 2019.
[19] Barbara Gigerl, Vedad Hadzic, Robert Primas, Stefan Mangard, and Rod-
erick Bloem. Coco: Co-design and co-verification of masked software im-
plementations on cpus. IACR Cryptol. ePrint Arch., 2020:1294, 2020.
[20] Benjamin Jun Gilbert Goodwill, Josh Jaffe, Pankaj Rohatgi, et al. A testing
methodology for side-channel resistance validation. In NIST non-invasive
attack testing workshop, volume 7, pages 115–136, 2011.
[21] Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential power anal-
ysis. In Advances in Cryptology - CRYPTO ’99, 19th Annual International
Cryptology Conference, Santa Barbara, California, USA, August 15-19,
1999, Proceedings, pages 388–397, 1999.
[22] kokke. Tiny aes in c.
[23] Liran Lerman, Nikita Veshchikov, Olivier Markowitch, and François-Xavier
Standaert. Start simple and then refine: Bias-variance decomposition as a
diagnosis tool for leakage profiling. IEEE Trans. Computers, 67(2):268–283,
2018.
[24] ARM Limited. Arm7tdmi technical reference manual.
https://developer.arm.com/documentation/ddi0210/c/, 2004.
31
[25] Ben Marshall, Dan Page, and James Webb. Miracle: Micro-architectural
leakage evaluation. IACR Cryptol. ePrint Arch., 2021. https://eprint.
iacr.org/2021/261.
[26] David McCann, Elisabeth Oswald, and Carolyn Whitnall. Towards practi-
cal tools for side channel aware software engineering: ’grey box’ modelling
for instruction leakages. In 26th USENIX Security Symposium (USENIX
Security 17), pages 199–216, Vancouver, BC, 2017. USENIX Association.
[27] Lauren De Meyer, Elke De Mulder, and Michael Tunstall. On the effect of
the (micro)architecture on the development of side-channel resistant soft-
ware. IACR Cryptol. ePrint Arch., 2020:1297, 2020.
[28] Silvio Micali and Leonid Reyzin. Physically observable cryptography (ex-
tended abstract). In Moni Naor, editor, Theory of Cryptography, First
Theory of Cryptography Conference, TCC 2004, Cambridge, MA, USA,
February 19-21, 2004, Proceedings, volume 2951 of Lecture Notes in Com-
puter Science, pages 278–296. Springer, 2004.
[29] Kostas Papagiannopoulos and Nikita Veshchikov. Mind the gap: Towards
secure 1st-order masking in software. In Sylvain Guilley, editor, Construc-
tive Side-Channel Analysis and Secure Design - 8th International Work-
shop, COSADE 2017, Paris, France, April 13-14, 2017, Revised Selected
Papers, volume 10348 of Lecture Notes in Computer Science, pages 282–
297. Springer.
[30] Télécom ParisTech. Dpa contest 2008/2009.
[31] Werner Schindler, Kerstin Lemke, and Christof Paar. A stochastic model
for differential side channel cryptanalysis. In Josyula R. Rao and Berk
Sunar, editors, Cryptographic Hardware and Embedded Systems - CHES
2005, 7th International Workshop, Edinburgh, UK, August 29 - September
1, 2005, Proceedings, volume 3659 of Lecture Notes in Computer Science,
pages 30–46. Springer, 2005.
[32] Madura A. Shelton, Niels Samwel, Lejla Batina, Francesco Regazzoni,
Markus Wagner, and Yuval Yarom. Rosita: Towards automatic elimination
of power-analysis leakage in ciphers. CoRR, abs/1912.05183, 2019.
[33] Galit Shmueli. To explain or to predict? Statist. Sci., 25(3):289–310, 08
2010.
[34] François-Xavier Standaert, Tal Malkin, and Moti Yung. A unified frame-
work for the analysis of side-channel key recovery attacks. In Antoine
Joux, editor, Advances in Cryptology - EUROCRYPT 2009, 28th Annual
International Conference on the Theory and Applications of Cryptographic
Techniques, Cologne, Germany, April 26-30, 2009. Proceedings, volume
5479 of Lecture Notes in Computer Science, pages 443–461. Springer, 2009.
32
[35] Carolyn Whitnall and Elisabeth Oswald. Profiling DPA: efficacy and ef-
ficiency trade-offs. In Guido Bertoni and Jean-Sébastien Coron, editors,
Cryptographic Hardware and Embedded Systems - CHES 2013 - 15th Inter-
national Workshop, Santa Barbara, CA, USA, August 20-23, 2013. Pro-
ceedings, volume 8086 of Lecture Notes in Computer Science, pages 37–54.
Springer, 2013.
[36] Carolyn Whitnall and Elisabeth Oswald. A cautionary note regarding the
usage of leakage detection tests in security evaluation. Cryptology ePrint
Archive, Report 2019/703, 2019.
[37] Carolyn Whitnall and Elisabeth Oswald. A critical analysis of ISO 17825
(’testing methods for the mitigation of non-invasive attack classes against
cryptographic modules’). In Advances in Cryptology - ASIACRYPT 2019 -
25th International Conference on the Theory and Application of Cryptology
and Information Security, Kobe, Japan, December 8-12, 2019, Proceedings,
Part III, pages 256–284, 2019.
33
(a) HD leakage without any noise (b) HD leakage with noise variance 0.1
34
Figure 11: F-test with noise variance 0.1
A.2 HI&PI
Bronchain, Hendrickx and Massart et al. proposed that using the concepts of
Perceived Information (PI) and Hypothetical Information (HI), one can “bound
the information loss due to model errors quantitatively” by comparing these
two metrics, estimate the true unknown MI and obtain the “close to worst-
case” evaluations [2].
It is critical to remember the “worse-case” are restricted the computed MI:
back to previous our example, estimating HI and PI still bound the correct
mutual information M I(K1 ; P1 , L). The additional Hamming distance term
35
affects how we should interpret this metric: when combing multiple key-bytes
to obtains the overall security-level, M I(K1 ; P1 , L) might not be as helpful as
one may hope.
More concretely, we tested our example simulation leakage with the code
provided in [2]: as we can see in Figure 12, PI and HI still bounds the correct
MI. The only difference here is MI itself decreases as P0 and K0 are not taken
into consideration.
36