A_Tale_of_Resilience_On_the_Practical_Security_of_Masked_Software_Implementations
A_Tale_of_Resilience_On_the_Practical_Security_of_Masked_Software_Implementations
[6], [7], [8]. Still, few works show the practical secu- software in presence of both recombination effects and PPS.
rity difference between different maskings in presence of In particular we show that:
transition-based leakages [9], [10]. • PPS-based leakage is observable in the software context
Besides transitions, the threat posed by parallel process- (Section IV).
ing of shares (PPS) is overlooked for software-masked • PPS induces information leakage for all the considered
implementations,1 although Moos and Moradi show a simple masking encodings (Section V).
preprocessing technique to efficiently exploit it in hard- • Such leakage can be exploited against all the considered
ware [12]. In software, the micro-architecture of modern masking schemes by slightly adapting Moos and Moradi
CPUs relies on techniques to increase execution perfor- methodology [12] (Section VI).
mance [13], potentially handling multiple shares per clock • Transitions and PPS-based leakages lead to successful
cycle. As such, the PPS implies new potential vulnerabil- attacks against all the considered masked implemen-
ities in masked software implementations. To the best of tations. In particular, we exhibit two attacks against
our knowledge, the study of such vulnerabilities remains inner-product masking: one exploiting PPS leakages and
unexplored. one exploiting a vulnerability due to transition-based
This work explores the practical security of several leakage on the logarithm representation of the encodings
first-order software masked implementations in presence within the finite field multiplication implementation
of both recombination effects and PPS-based leakages. (Section VII).
We study three masking schemes: the most studied Boolean
(BM) [14] and inner-product (IPM) [15], and the arithmetic-
II. RELATED WORK
sum (ASM) [16]. Our investigation firstly assesses the
potential sources of vulnerabilities in software due to The nature of this work touches different areas of the side-
transition-based and PPS-based leakages, and then evaluates channel domain. This section compares our work with the
the practical security of masked software with respect to the most relevant ones from each area.
identified vulnerabilities. In more details, our methodology
develops in three steps: A. MICRO-ARCHITECTURE-INDUCED LEAKAGES
1) We characterise micro-architectural leakage effects: we Several works investigate the different micro-architectural
carefully handcraft micro-benchmarks to assess the sources of leakage, spanning through different micro-
presence of transition-based and PPS-based leakages in architectures and processors. Table 1 summarises the related
software (Section IV). state of the art, highlighting the investigated leakage sources,
2) We characterise the impact of the observed leak- types of micro-architecture and the CPU use-case(s). Papa-
age effects on masking encodings: we quantify the giannopoulos and Veshchikov assess some recombination
leaked information and investigate its exploitability effects (i.e., register overwrite and memory persistence)
(Section V, Section VI). violating the ILA on a simple AVR ATMega163 micro-
3) We characterise the impact of the observed leakage controller [20]. Marshall et al. assess the presence of multiple
effects on masked implementations: once evaluated transition-based leakages on different platforms [2]. They
leakage impact on the encodings, we assess the practi- highlight how similar platforms, executing the very same
cal security of fully masked software implementations piece of code, may exhibit or not a transition-based leak-
(Section VII). Specifically, we target as a use-case the age. Furthermore, they highlight how speculative execution
AES-128 block-cipher [17]. potentially introduces unexpected transition-based leakage.
With ARMISTICE, de Grandmaison et al. show how also the
To provide a comprehensive analysis, we split the secu-
encoding of instructions potentially affects the variability of
rity assessment in a first information leakage assessment,
the observable leakages [8]. Concerning platforms provided
to analyse the information leaked by the encoding or the
with superscalar capabilities, Barenghi and Pelosi show, on
fully masked implementation, and in a information leakage
the ARM Cortex-A7 and ARM Cortex-M7, that the increased
exploitation, to evaluate the exploitability of such informa-
parallelism provided by such micro-architectures increase
tion. In addition, as the design and implementation of the
the sources of transition-based leakage [6]. Besides the per-
execution platform potentially impacts the observed leak-
vasiveness of transition-based leakage, a micro-architecture
age [2], [18], [19], we lead our investigation on two different
potentially encompasses other recombination effects.
micro-controllers, an STM32F215 and STM32F303.
Gao et al. show that intra-register leakage interaction can
break the security of share-slicing implementations [21],
A. CONTRIBUTIONS
suggesting that leakage is due to glitch-based recombinations
To the best of our knowledge, we provide the first investiga- in the barrel-shifter unit of ARM Cortex-M0 and Cortex-M3
tion on the practical security of different masking schemes in implementations. On the contrary, Gigerl et al. show how
1 To the best of our knowledge, the only work mentioning the existence signal glitches in the forwarding logic of the superscalar
of PPS in software is [11], in footnote 1, but this work does not study the RISC-V SweRV CPU recombine multiple shares, reducing
security implications further. the masking security order beyond the factor of 2 predicted
TABLE 1. Summary of the state of the art concerning micro-architectural leakage investigations. For each work, we report the targeted leakage source,
the type of investigated micro-architecture and the CPUs analysed. With ‘‘?’’ we mark works for which it is unknown whether a given leakage source
is targeted.
TABLE 2. Summary of the state of the art concerning the practical security analyses of masking in software. For each work, we report the targeted
masking scheme, the leakage source against which we evaluate the practical security and the analyses carried on.
by the theory [3]. As a matter of fact, the current state of we consider the ASM, a masking scheme employed for ARX
the art focuses on micro-architecture-induced recombination ciphers (e.g., Speck [22]) and post-quantum cryptosystems
effects. Our work represents a novel and orthogonal effort: (e.g., Kyber [23]).
we show that PPS-based leakage can be observed in software
implementations, even on in-order scalar processors. III. BACKGROUND
This section provides the essential background to understand
our methodology. We first introduce the notations employed
B. PRACTICAL SECURITY OF SOFTWARE MASKED
throughout our work. Then, we introduce the masking coun-
IMPLEMENTATIONS
termeasures we study, the necessary security concepts and
Few works explore the practical security of masking against the potential threat implied by so-called physical effects.
micro-architectural leakages. Table 2 summarises the related We follow with an overview of three statistical tools we
state of the art, highlighting the investigated masking, the employ to assess and exploit the information leakage from the
considered leakage sources and the analyses carried on. Beck- investigated software implementations. Finally, we overview
ers et al. show that several deemed-to-be-secure software a trace preprocessing technique able to exploit PPS-based
implementations, either masked via first-order BM or IPM, leakages.
are vulnerable to simple first-order analyses (i.e., CPA and
TVLA) [10]. Although we follow a similar investigation
A. NOTATIONS
approach, our goal is different: whereas they aim to verify
the claims concerning the security of open-source implemen- We refer to a random variable with a capital italic letter,
tations, we evaluate the practical security of different masking e.g., X . We denote the sampling space of X as X2k , where
schemes against transition-based and PPS-based leakages. k > 0. We refer to the distribution of a random variable X
The work of Wu et al. is closer to ours: they analyse the as DX . We refer to a realisation of the random variable
practical security of different code-based software instances X ∈ X2k as x ∈ F2k , where F2k is a finite field. We implicitly
of BM and IPM, up to the third masking order [9]. They consider any value x ∈ F2k in binary form. We denote the
rely on first-order analyses (i.e., CPA and TVLA), as well i-th bit of x as x i , where i ∈ [0, k). We refer to vectors in
as Template Attacks (TA) and bivariate CPA. Interestingly, bold-face style, e.g., X. We refer to the j-th component of a
they analyse the practical security of code-based IPM with vector X as Xj . We refer to a set of n traces, each of m samples,
respect to different public vectors (Section III-B), providing with Tn×m , and to the subset of traces at sample 0 ≤ i < m
a better characterisation of the enconding’s security. Both with Tin×m .
works do not (explicitly) target micro-architectural leakages,
although the leakages they observe for BM implementations B. MASKING
probably result from transition-based leakages. In contrast, A side-channel attacker exploits the statistical link between
we explicitly take advantage of transition-based and PPS- an observed physical quantity (e.g., the instantaneous power
based leakages against each studied masking (Section VI and consumption) and secret-dependent data, which the target
Section VII). Finally, with respect to the two previous works, implementation manipulates. The masking countermeasure
counteracts such attacks by breaking this statistical link. That between univariate and multivariate) and the maximum
is, given any secret-dependent datum X , masking encodes it statistical moment order the attacker can compute.
with a so-called (probabilistic) encoding (Def. 1). Definition 6 (Value-based Leakage Function [5]): Let V
Definition 1 (Encoding): Given a random variable X ∈ be a finite set of intermediate variables and L(·) = L(·)d +
(n+1)
X2k , where k ≥ 1, the tuple X = (Xi )ni=0 ∈ X2k is an N (0, σ ) be a leakage function made of a deterministic part
encoding of X . The random variables Xi ∈ X2k are called L(·)d and an (additive) random noise N (0, σ ). This leakage
shares. n defines the masking order. function is value-based if its deterministic part can only take
The encoding of X is built from an nth-order masking a value v ∈ V as argument.
scheme M. Informally, an nth-order masking scheme is a Under the so-called Noisy Leakage security model (paral-
(n+1)
vector-valued function M : X2k 7 → X2k , such that it lel computation, value-based leakage, univariated attacker),
satisfies correctness (i.e., the M function is invertible) and Chari et al. proved that the masking countermeasure expo-
dth-order security (Def. 2). nentially amplifies (in the number n of shares) the difficulty
Definition 2 (dth-Order Security:) Let M be an nth-order of an attack, expressed in number of traces to collect and
masking scheme. M satisfies dth-order security if and only analyse [1]. Ishai et al. defined the d-probing security model
if, for each X ∈ X2k , any subset of (at most) d shares of (value-based leakage, multivariated attacker), under which an
X = (Xi )ni=0 = M(X ) does not depend on X . d ≤ n defines implementation is secure against any d-variated attacker [24].
the security order of M. Barthe et al. defined the Bounded-Moment (value-based
Examples of masking schemes are the Boolean masking leakage, univariated attacker), which proves the security of
(BM) [14], the arithmetic-sum masking (ASM) [16] and masked implementations against attackers able to compute
the inner-product masking (IPM) [15]. Respectively, they statistical moments of order (up to) d [11].
generate Boolean (Def. 3), arithmetic-sum (Def. 4) and inner-
product (Def. 5) encodings. C. PHYSICAL EFFECTS
Definition 3 (Boolean Encoding): Let us consider X ∈ Security proofs of masking schemes typically assume a
X2k , where k ≥ 1 and X = L (Xi )ni=0 = BM(X ) the value-based leakage model, i.e., each share leaks indepen-
Boolean encoding of X . Then X = ni=0 Xi , where ⊕ is the dently of the others. The literature refers to it as the Inde-
eXclusive OR. pendent Leakage Assumption (ILA). However, in practice,
Definition 4 (Arithmetic-Sum Encoding): Let us consider masked implementations do not comply with such hypoth-
X ∈ X2k , where k ≥ 1 and X = (Xi )ni=0 = ASM(X ) the esis. Indeed, several physical effects, such as memory tran-
arithmetic-sum encoding of X . Then X = ⊞ni=0 Xi , where ⊞ sitions, glitches and coupling, recombine the shares, hence
is the arithmetic sum. violating the ILA.
Definition 5 (Inner-Product Encoding): Let us consider A typical class of models capturing such effects are
X ∈ X2k , where k ≥ 1 and X = (Xi )ni=0 = IPM(X ) the so-called Transition-based Leakage Functions (Def. 7).
the inner-product encoding of X . Then X = ⟨L; X⟩. L = A well-known example is the Hamming-Distance leakage
(n+1)
(1, Li )ni=1 ∈ X2k is a public random vector, and ⟨·; ·⟩ is the function (Eq. 4).
inner-product operator. Definition 7 (Transition-based Leakage Function [5]): Let
The appeal in the masking countermeasure lies within the V be a set of intermediate variables, and T := {v⊕v′ | ∀ v, v′ ∈
provable security framework, composed of: V}∪V the set of all the transitions between these intermediate
• Leakage Model: it describes how an implementation variables. A leakage function L(·) is transition-based if its
leaks information through a given side channel. Typi- deterministic part L(·)d takes values t ∈ T as argument.
cally, the leakage model takes the form of an Additive
Gaussian Noise (AGN) function:
L(X , Y ) = HW(X ⊕ Y ) + N (0, σ ) = HD(X , Y ) + N (0, σ )
L(v) = L(v)d + N (0, σ ) (1) (4)
where N (0, σ ) is a Gaussian noise, and L(·)d is a Balasch et al. proved that the security order of a d-probing
deterministic function. Typically, L(·) is a value-based secure implementation in the value-based leakage model
leakage function (Def. 6), such as the Hamming-Weight is halved in the transition-based leakage model (i.e., ⌊ d2 ⌋).
function: The literature refers to this as the security-order reduction
L(v) = HW(v) + N (0, σ ) (2) theorem [5]. Specific to the context of masked software
implementations, a CPU micro-architecture exposes many
where HW is: elements violating the ILA, such as micro-architectural reg-
X isters (e.g. the inter-stage pipeline registers or the Memory
HW(x) = x i. (3)
Data Register of the Load-Store Unit [2], [6]), the for-
0≤i<k
warding logic, the Barrel-Shifter Unit, the Arithmetic-Logic
• Attacker Model: it describes how many intermedi- Unit, and the Load-Store Unit [3], [21]. Gigerl et al. show
ate variables the attacker can observe (we distinguish that, in presence of glitch-based recombinations, the order
reduction exceeds the reduction factor of 2 considered in The implementation leaks information with a certain prob-
presence of transition-based leakage [3]. Furthermore, the ability if the t-statistic overcomes a given t-threshold. The
micro-architectural properties of complex CPUs also imply t-threshold is normally set to ±4.5, which means we can
that the security reduction order is also greater than 2 [3]. reject the null hypothesis with a probability confidence
The CMOS technology is still mainstream in digital design, of 99.999%.
and the overall power consumption of a CMOS-based circuit At its core, TVLA relies on hypothesis testing. As such,
is the superposition of the power consumptions of its sub- it is affected by statistical errors too. We distinguish between
elements [25]. We can describe the induced leakage via the Type-I errors (or false positives) and Type-II errors (or false
Sum-of-Hamming-Weights leakage function: negatives) [27]. Type-I errors refer to the cases where the test
fails (null hypothesis rejected), although the implementation
L(X , Y ) = SHW(X , Y ) + N (0, σ ). (5) does not leak. Type-II errors, on the other hand, refer to the
Such AGN model assumes as its deterministic component acceptance of the null hypothesis, although the implementa-
the SHW function: tion actually leaks. Type-II errors are the most troublesome,
as they would report an implementation as leakage-free when
SHW(X , Y ) = HW(X ) + HW(Y ). (6) it is not. As a mitigation technique against these types of
errors, a strategy is to repeat the TVLA several times, each
In this paper we use the binary form of the SHW function, with a distinct fixed key [26].
which can be readily extended to accept an arbitrary number
of arguments. F. MUTUAL INFORMATION
The Mutual Information (MI) is an information-theoretic tool
D. PEARSON’s CORRELATION COEFFICIENT for the quantification of linear and non-linear relationship
The Pearson’s Correlation Coefficient (PCorrl) is a statistical between two random variables. The metric has different def-
tool which quantifies the linear relationship between two initions, according to the nature (discrete or continuous) of
random variables. Given two arbitrary random variable X , Y , the random variable. Equation 9 reports the definition of MI
the PCorrl is defined as: in the case of two discrete random variables X and Y .
E[(X − µX ) · (Y − µY )] MI(X , Y ) =
XX p(x, y)
ρ(X , Y ) = (7) p(x, y) · log2 (9)
σX · σY x∈X y∈Y
p(x) · p(y)
where µ and σ represent, respectively, the mean value and Although it can capture any type of relationship, the com-
standard deviation of the given random variable. The coef- putation of the MI relies on the knowledge of the joint
ficient takes values in the interval [−1, +1], where the probability distribution p(x, y). Generally, the distribution is
extremes indicate perfect linear dependency between the two unknown and can only be estimated. Therefore, MI cannot be
variables, whereas a coefficient of 0 indicates no linear depen- directly computed, requiring the employment of estimators.
dency. Such estimators rely on different techniques such as his-
tograms, Gaussian mixtures, k-nearest neighbours, or neural
E. TVLA networks [28], [29], [30]. Among these estimators, the empir-
When performing security evaluations of a crypto- ical Hypothetical Information (HI) provides an upper-bound
graphic implementation, the evaluator ideally aims to to the MI [31], while converging towards MI as the number
provide the most possible general answer regarding the of traces increases. As such, HI fits in those contexts where
security of the implementation (i.e., is the implementation a conservative analysis of the security of an implementation
secure?). The Test Vector Leakage Assessment (TVLA) (i.e., overestimate the information leakage) is preferable.
[26] reduces the problem at testing whether two sets of
side-channel traces Sfixed and Srandom can be distinguished by G. BIASING LEAKAGE DISTRIBUTIONS (BLD) TO ATTACK
their statistical moments (alternative hypothesis) or not (null MASKED PARALLEL IMPLEMENTATIONS
hypothesis). Srandom refers to side-channel traces collected The strength behind masking stands in the need, for an
while the implementation processes a different plaintext for attacker, to compute higher-order statistical moments and/or
each trace, whereas Sfixed refers to the usage of the same to perform multivariate statistical analyses. When consider-
plaintext for each trace. In the case of univariate first-order ing hardware masked implementations, security evaluators
TVLA, the evaluator computes the t-statistic t: assume a parallel computation model. Under this compu-
tation model, the implementation can treat related shares
µ̂fixed − µ̂random
t=q 2 (8) at the same time sample. Considering a nth-order masking
σ̂ fixed σ̂ 2 random scheme, the attacker, which observes all the n + 1 shares of
nfixed + nrandom
a key-dependent encoded value, needs, at least, the statistical
where µ̂fixed , µ̂random refer to the sample mean, σ̂ 2 fixed , moment of order n + 1 to detect any key-dependent informa-
σ̂ 2 random to the sample variance and nfixed , nrandom to the tion. Moos and Moradi proposed a preprocessing technique
number of traces of the fixed and random set, respectively. to reduce such minimal key-dependent order moment [12].
FIGURE 1. SHW distributions obtained for various secret values masked with first-order Boolean masking. x is the secret value, and x1 a random value
used for Boolean masking. Top row: distributions of SHW without preprocessing. Bottom row: distribution obtained when keeping only the lowest k%
values (k = 25% here). While the mean is independent of the secret without preprocessing, it becomes dependent on the secret when only the lowest k%
samples are kept.
Informally, the technique consists in selecting, for each trace leakage. To this end, we proceed as follows: we firstly provide
sample, a subset of the measured traces, preserving only a rationale explaining how the complexity of a CPU micro-
certain leakage values. Such Biasing Leakage Distribution architecture potentially induces PPS (Section IV-A). Then,
(BLD) preprocessing biases the leakage distribution of each we describe the three carefully hand-crafted assembler code
trace sample, converting higher-order leakages to lower-order (called micro-benchmarks, or UBenches) that we designed to
leakages. investigate the presented rationale (Section IV-B). To confirm
To exemplify this technique, let us consider a first-order or reject the presence of PPS, we run side-channel analyses
BM encoding of X ∈ X22 . Further, let us assume that the on each UBench (Section IV-D).
two shares X0 , X1 are processed in parallel, and that the As presented in Section III-C, the micro-architecture of
implementation leaks according to a noise-free SHW model modern CPUs constitutes a rich source of recombination
(Eq. 5). Fig. 1, top row, reports the marginal distributions of effects; in particular, of transition-based leakages. Hence,
each realisation of X . Each marginal distribution exhibits the we also include a UBench exercising a transition-based
same first-order moment (e.g., mean). That is, the first-order leakage originating within the micro-architecture. We have
moment is independent on the encoded value X , as it is released these micro-benchmarks (C and binary code) as
expected for a first-order masking scheme. Fig. 1, bottom publication artefacts (https://zenodo.org/record/8094516).
row, reports the marginal distributions of the realisations of X
after a preprocessing keeping the k = 25% of samples with
A. RATIONALE
the lowest values of the leakage distributions [12]. The pre-
The micro-architecture of modern CPUs extensively relies
processed first-order moments of the marginal distributions
on hardware-oriented techniques to increase the instruc-
depend on the secret value, making possible to mount first-
tion throughput [13]. Due to instruction pipelining, the
order attacks. In practice, the resulting order reduction varies
micro-architecture is partitioned into several stages, where
depending on the value of threshold k, and on the heuristic
each stage takes care of a part of the instruction life
used for traces pruning (e.g., keeping the ones with the lowest
cycle. Fig. 2 depicts a simplified 3-stage, in-order, micro-
leakage values) [12].
architecture. In such example, the Instruction Fetch (IF)
stage fetches the next instruction to be executed, the Instruc-
IV. PARALLEL PROCESSING OF SHARES IN SOFTWARE tion Decode (DE) interprets the instruction (e.g., selecting
As our goal is to evaluate the practical security of operands from the Register File), whereas the Instruction
masked software implementations (Section V, Section VI, Execute (EXE) executes the instruction. We remark
Section VII), we need first to assess the potential sources of that, in such example, the execution of memory-related
1) NOTATION
We denote the UBench target words as X0 and X1, whereas
rndN refers to one of the UBench random input values.
We denote with R_val a generic 32-bit register containing
the value val. As a special case, we denote with R_destN
a 32-bit register containing the result of the N -th UBench
instruction. We refer to the immediate address of a value
val with addr[val]. We denote a constant value const
with #const.
B. MICRO-BENCHMARKS
We design three distinct micro-benchmarks, one for each
potential PPS case we identified. Each UBench shares the Listing. 2. UB-SHW-LDRB workload.
same structure: a preamble followed by a workload (List-
ing 1). We implement the UBenches in Thumb-2 assembler,
targeting ARM-based target platforms (Section IV-C). 3) PPS-RELATED UBench #2
The UBench preamble consists in a sequence of The second PPS-related UBench stimulates the parallel
machine instructions preparing the architectural and micro- manipulation of values during the readings of X0 and X1
architectural states and the inputs for the workload. The from the memory and the register file, respectively. Listing 3
preparation of the micro-architectural state consists in the reports the corresponding workload, hereafter referred to as
randomization of the state of specific elements (e.g., micro- UB-SHW-LDR-EOR. The ldr.w instruction enters the EXE
architectural registers, memory data-path), which may oth- stage at clock cycle #k. X0 enters the micro-architecture at
erwise induce unintended leakage. The workload consists clock cycle #k+1. Due to the pipeline stall inserted during
in a sequence of machine instructions, which attempts to the address generation, the eor.w instruction passes the DE
exercise a desired leakage effect. The trigger_high() stage at clock cycle #k+1. During the DE stage, the X1 is
and trigger_low() functions, which surround the work- read from the register file. As a consequence, at clock cycle
load, respectively start and stop the collection of power-based #k+1, the values X0 and X1 are simultaneously alive in the
side-channel traces. To clearly identify the workload-induced micro-architecture.
leakage effect, we pad the workload’s beginning and end-
ing with eor.w instructions provided with random inputs. 4) PPS-RELATED UBench #3
To make clear the handling of these values, we comment each The third PPS-related UBench stimulates the parallel manip-
UBench instruction with its effect. ulation of values by processing X0 and X1, each handled by a
FIGURE 3. PCorrl-based evaluation of PPS-based and transition-based leakages. Each row reports the PCorrl from a different UBench: first row for
UB-SHW-LDRB (Listing 2), second row for UB-SHW-LDR-EOR (Listing 3), third row for UB-SHW-MOV-EOR (Listing 4), fourth row for UB-HD (Listing 5).
The two first columns report the results under the SHW leakage model, and the two last columns under the HD leakage model. The first and third
column report the results for the STM32F215 board, whereas the second and fourth ones for the STM32F303 board. Each UBench is evaluated on two
sets (test and control ) of 30, 000 power-consumption traces.
(Section IV-B). We analyse the encodings’ resilience in We describe the leakage L via an AGN leakage model (Eq. 1).
two steps: (a) quantification and comparison of the leaked According to the targeted leakage effect, we employ either
information (Section V-A); (b) exploitation of the leaked L(·)d = SHW or L(·)d = HD.
information through first-order analyses (Section V-B). Fig. 4 reports the results of the information-theoretic leak-
age evaluation. We observe that the BM encoding leaks
the most, while the IPM one leaks the least. Comparing the
A. THEORETICAL EVALUATION information leakage between the two leakage models, the
As remarked in Section IV-D, the SHW and HD leakage SHW model not only provides the least information quantity,
models might not perfectly describe the actual behaviour of but it decreases faster. This is witnessed by the slope of the
our target boards. In order to evaluate the leakage resilience in curves, as the SHW curve reports a slope of −2, whereas the
the case such models capture the leakage behaviour, we firstly HD one reports a slope of −1. As reported by Duc et al.,
conduct an information-theoretic analysis. For such purpose, such slope reports the minimal statistical moment to break
we numerically estimate MI(X , L(X0 , X1 )), where X ∈ X24 the encoding [34].
and the shares X0 , X1 ∈ X24 encode X according to BM, We verify this observation by mounting a first-order cor-
ASM or IPM. For IPM, we arbitrarily select L = (1, 6) ∈ F224 . relation analysis on simulated power-consumption traces.
B. EXPERIMENTAL EVALUATION
In the previous sub-section, we analysed the information-
theoretic resilience of first-order BM, ASM and IPM.
FIGURE 4. Information-Theoretic leakage resilience analyses results. The We completed the analyses with a PCorrl-based evaluation on
plot reports the numerically estimated MI(X , L(X0 , X1 )) evolution simulated traces. Such evaluation remarked the better leakage
according to an increasing noise variance σ 2 (both in Log10 scale).
We describe the leakage L as an AGN leakage model (Eq. 1), where resilience of ASM and IPM encodings. Although the interest
L(·)d = SHW or L(·)d = HD, for PPS-based and transition-based leakages, provided by an ideal setting (i.e., simulated traces), masked
respectively. Due to estimation errors, for σ 2 ≥ 102 , the SHW curve software implementations are executed in an imperfect
diverges from the expected straight line. As IPM reaches perfect
independence from X in the HD case, we omit the related curve. one, where the leakage behaviour potentially deviates from
the hypothetical one. As such, in this section we evalu-
ate the leakage resilience of the three masking schemes
when the first-order encodings are manipulated on our two
boards, the STM32F215 and STM32F303. For this purpose,
we re-use the UBenches of Section IV-B, which stimulate
PPS-based and transition-based leakages. Differently from
the information-theoretic analyses, for IPM we arbitrarily
select L = (1, 170) ∈ F228 .
For each UBench, we capture 4, 000, 000 traces, each of
90 samples. We first quantify the leaked information by
computing HI(X , Ti4M×90 ). We set the random target inputs
X0, X1, manipulated by each UBench, to the realisation of
FIGURE 5. PCorrl-based leakage resilience analyses results on simulated the shares X0 , X1 ∈ X28 , in each of the studied masking
traces. The plot reports ρ(HW(X ), L(X0 , X1 )) according to an increasing
noise variance σ 2 (Log10 scale), for the HD and SHW models.
encodings BM, ASM, and IPM. As explained in Section III-
We generate 1, 000, 000 power-consumption traces, each of 1 sample. F, the HI provides an upper bound of MI. This property is of
We simulate the traces according to an AGN leakage model (Eq. 1), where particular interest in our case as we want to assess conserva-
L(·)d = SHW or L(·)d = HD, for PPS-based and transition-based leakages,
respectively. The metric does not detect correlation with X under the SHW tively the amount of leakage. HI also converges towards the
for BM, ASM and IPM. true MI as the number of traces gets higher [31].
The first two columns of Fig. 4 present the results
of the HI analysis for the considered masking encod-
Specifically, we generate 1, 000, 000 traces, each of ings and UBench. We compute the HI via the ENNEMI
1 sample, via an AGN leakage model, and we compute Python library [35] which implements a k-nearest-neighbour
ρ(HW(X ), L(X0 , X1 )), where X , X0 , X1 ∈ X24 . Fig. 5 reports algorithm. Although the high number of traces and the uni-
the results of the first-order analyses. As expected, under variate setting which favours HI convergence, we observe
the HD model, we detect correlation for both the BM and weak information leakage on the STM32F215 for both UB-
ASM encodings. Consistently with the information-theoretic SHW-LDR-EOR and UB-SHW-LDRB. As shown in Fig. 1,
analysis, we do not detect correlation for the IPM encoding. PPS leakage seems very low on this board, which may explain
Concerning the SHW case, the first-order analysis does not this result. On the STM32F303, UB-SHW-LDR-EOR and
identify correlation with the encoded value X . Such evidence UB-SHW-LDRB show a tiny peak of information, whose
illustrates the need of, at least, a second-order statistical significance is uncertain. By contrast, peaks of information
moment to correlate with X . are clearly visible for UB-SHW-MOV-EOR and the UB-HD
From the information-theoretic analyses, we observed on both boards. As expected, the BM encoding leaks the most
that ASM and IPM encodings tend to better mitigate information, while leakage is hardly visible for the IPM for
transition-based and PPS-based leakages. We corroborated the given number of traces.
such analyses by first-order moment analyses, evaluating the For completeness, we also run first-order moment anal-
correlation between the encoded value and the simulated yses on the same traces sets. Specifically, we compute
power-consumption. We observed results consistent with the ρ(HW(X ), Ti4M×90 ) where i ∈ [0, 90). The last two columns
information-theoretic ones. Furthermore, we highlighted how of Fig. 4 report the results.
first-order moments cannot detect any information in the Unexpectedly, we observe a correlation peak for the UB-
presence of PPS-based leakage. SHW-MOV-EOR. As explained in Section V-A, a first-order
FIGURE 6. Experiment-based quantification of the transition-based and PPS-based leakages. Each row reports the PCorrl from a different UBench: first
row for UB-SHW-LDRB (Listing 2), second row for UB-SHW-LDR-EOR (Listing 3), third row for UB-SHW-MOV-EOR (Listing 4), fourth row for UB-HD
(Listing 5). The first two columns report the HI metric, whereas the last two report the PCorrl metric. The first and third column reports the results for the
STM32F215 board, whereas the second and fourth one for the STM32F303 board. For each UBench and board, we compute the PCorrl on a 4, 000, 000
power-consumption trace set.
moment cannot detect correlation with an encoded value via into lower-order ones, reducing the security order of the
PPS-based leakage. Still, the peak takes place at the same time encoding (Section III-G). We directly focus on experimental
sample where we verified the presence of PPS-based leakage analyses, as simulation-based ones are extensively provided
(Section III). Hence, we ascribe the observed correlation to a in the original work [12]. Due to its high correlation with
recombination effect that occurs simultaneously with the PPS the PPS-based leakage (Fig. 1), we limit our analysis to the
event. trace set collected with the UB-SHW-LDRB execution on
Up to now, we evaluated the leakage resilience of different the STM32F303. From experimental attempts, we identified
masking encodings against transition-based and PPS-based k = 10% (i.e., 400, 000 traces per sample) as a good
leakages. Concerning transition-based leakages, the results threshold. Fig. 7 provides the correlation curves from the
highlight the better leakage resilience of ASM and IPM BLD-based analyses. This time, we detect correlation peaks
encodings. Concerning the PPS-based ones, although the use for both BM and ASM encodings, confirming the potential
of 4, 000, 000 traces, the HI-based analyses hardly identify exploitability of PPS-based leakage.
any PPS-based information leakage. Nonetheless, a different This section has shown that transition-based and PPS-
approach e.g., use of the BLD preprocessing [12], could based leakages represent a concrete vulnerability in software
better take advantage of the existing information leakage. masking implementations, leaking exploitable information
With this last remark, we employ the BLD preprocess- through simple first-order analyses. Among the selected can-
ing proposed by Moos and Moradi [12]. Their approach didates, the IPM was found to be the least vulnerable, prevent-
takes advantage of the PPS, converting higher-order leakages ing even the exploitation of higher-order leakages by means
Fig. 8 reports D(HD(X0 ,X1 ),X ) and D(SHW(X0 ,X1 ),X ) for BM,
ASM and IPM. As the distributions differ, so the marginal
distributions do. It is possible to exploit such difference to
define (statistical-)moment-based leakage models.
For instance, we can associate to each X ’s realisation the
first-order moment of the marginal D(HD(X0 ,X1 ),X =x) :
1 X
HDfo (x) = HD(x0 , x1 ) (12)
|F28 |2
xi ∈F28 ,x=
J
i xi
FIGURE 8. Distribution of the HD and SHW leakage models. Given X ∈ X 4 , the first row reports D(HD(X ,X ),X ) , whereas the second one reports
2 0 1
D(SHW(X ,X ),X ) , where X0 , X1 ∈ X 4 represent the shares obtained from the application of BM (first column), ASM (second column) or IPM (third
0 1 2
column) to X .
we exploit univariate higher-order moment leakages with experimental setup provides us with correctly-aligned side-
filtering. This last phase is particularly important to assess the channel traces. Hence, we do not require any re-alignment of
practical security against PPS, since its first-order moment the side-channel traces.
leakages can’t be directly exploited (Section V-A). For the purpose of our analyses (e.g., leakage resilience
against physical effects), we have to guarantee the correct
application of the masking scheme. Each of the selected
A. EXPERIMENTAL SETUP scheme considers a value-based leakage model. Thus,
The vanilla implementation follows the FIPS-PUB-197 spec- we verify that no value-based leakage can be detected from
ification [17], except for the key-scheduling: the implemen- each implementation. To this end, we run TVLA analyses on
tation generates the next round key between the SubByte and simulated power-traces collected during the execution of each
the MixColumns steps. implementation on a ISA-level simulator of the ARMv7 pro-
Each first-order masked implementation follows by the file. Specifically, we simulate the power consumption stem-
manual application of the related masking scheme to the ming from the usage of the register file and memory requests
vanilla implementation. In particular, the BM and IPM via load and store instructions. For all the implementations,
ones follow the specification of Rivain et al. [36] and we accept the null hypothesis (i.e., the implementation does
Balasch et al. [15], respectively. For the IPM version, not leak in the value-based model), proving the correct appli-
we resort to L = (1, 170) ∈ F228 , the same we employed cation of the three considered masking schemes.
for the experiment-based analyses (Section V, Section VI).
We implement the finite field multiplication using log/exp B. INFORMATION LEAKAGE EVALUATION
tables [37]. As a first step in the leakage resilience assessment of
Concerning the ASM implementation, an inherent diffi- our AES-128 implementations, we proceed with the TVLA
culty is the masking of the field addition (i.e., the eXclusive- methodology. Precisely, we analyse the full first round of
OR, XOR). Indeed, the XOR is non-linear with respect to each implementation, except for the ASM implementation:
the arithmetic-sum operation. We mask the XOR operation as pointed out in Section VII-A, the MixColumns step counts
by means of a masked look-up table. A straightforward tab- for the largest part of the execution time. To reduce the trace
ulation of the operation would require 216 byte of memory. collection time without compromising the validity of our
To reduce the memory consumption, we tabulate the XOR results, we exclude the ASM’s MixColumns from the leakage
on 4 bits, where the concatenation of the least (and most) evaluation. As introduced in Section III-E, the TVLA allows
significant inputs’ nibbles indexes the table. We compute the an evaluator to determine whether an implementation leaks or
XOR between two 8-bit inputs as a double access to such not, independently on the particular attack or leakage model.
table: one to process the least significant nibbles of the inputs, For the vanilla, BM and ASM implementations, we collect
and one to process the most significant ones. We remark 15, 000 power-consumption traces for both fixed and ran-
that, the output carry of the arithmetic-sum potentially leaks dom sets, respectively. Concerning the IPM implementation,
information on the processed values. To prevent such leakage, we observed that it is characterised by a higher leakage
we pre-charge the landing bit of the output-carry with a fresh resilience (Section V, Section VI). To be more confident
random value. in its evaluation, perform the same assessment with 90, 000
In the vanilla implementation, for performance reasons, power-based traces for both the fixed and random trace set.
we tabulate the SBOX and the XTIME functions. In the As explained in Section III-E, the TVLA methodology is
ASM implementation, we implement the same functions by prone to errors of type I and II, where the latter represents
means of masked look-up tables. Concerning the BM and the most problematic ones. To cope with them, for each
IPM implementations, we compute those functions on the fly. implementation, we repeat the TVLA assessment two times,
We resort to the experimental setup introduced in each with a distinct fixed key, and we measure the maximum
Section IV-C (software toolchain and side-channel measure- absolute t-statistic for each sample point of the traces. Fig. 11
ment setup). We develop each implementation in C language, reports the TVLA results for each AES-128 implementation
and compile them with the compiler toolchain and compi- and each target board.
lation options reported in Section IV-C. Table 3 reports the The vanilla, BM and ASM implementations leak infor-
mean execution time, number of PRNG calls, and memory mation along the whole first round. As we verified that the
impact of each AES-128 implementation. We report such masking countermeasure is correctly applied at binary level,
parameters for both STM32F215 and STM32F303. Each and as first-order statistical moments cannot detect leakage
masked implementation draws fresh randomness from the from PPS, we ascribe such leakage to recombination effects
xoroshiro64** 1.0 PRNG [38]. The execution time from (e.g., transitions).
Table 3 includes time spent in the PRNG. We remark the long We remark that the ASM implementation presents fewer
execution time (500, 000 clock cycles on the STM32F215) leaking samples than the BM. The algebraic structure of the
for the ASM implementation. We ascribe it to the ASM encoding potentially contribute to such observation.
MixColumns step, which performs several accesses to the Unexpectedly, the leakage assessment on the IPM imple-
table-based XOR implementation. We remark that our mentations reveal several leakage points along the full first
TABLE 3. Mean execution time (in clock cycles), number of calls to the PRNG, and segment size (in bytes) of each AES-128 implementation.
FIGURE 11. TVLA results on the 4 AES-128 implementations. In red, we report the maximum t-statistic between two t-tests. In blue, the t-statistic
threshold (±4.5) for the null hypothesis rejection. We execute each t-test by using a distinct fixed key. The first and third columns refer to the
STM32F215 board, whereas the second and fourth ones to the STM32F303 board. Each plot refers to a 15, 000-vs-15, 000 t-test, except for the IPM
AES-128, which refers to a 90, 000-vs-90, 000 t-test.
round. We found out that the source of such leakages stem To this end, we rely on standard, BLD-based
from recombination effects that impact the log/exp-based (Section III-G) and moment-based-model (Section VI) CPA
field multiplication. Specifically, we verified the statisti- attacks. For each implementation and target board, we mea-
cal dependence between HD(log3 (X0 ), log3 (X1 )) and the sure 1, 000, 000 power traces.
encoded value x. We conjecture that the non-linear nature The side-channel analysis proceeds as follows. We analyse
of the logarithm function introduces some bit-interaction the usage of the first secret key byte during the SubByte step
j
effect between the share’s bits. Such effect counteracts the of the first round, and we compute ρ(L(X )d , T1M×m ), where
randomness diffusion of the IPM, making transition-based m varies according to the target implementation. Table 4
leakage again exploitable. Yet, we remark that, despite the summarises the leakage models L(·)d employed to attack each
higher number of employed traces, we observe a way lower implementation.
magnitude of the t-statistic with respect to the one of the other For the IPM implementations, we also target the SubByte’s
implementations. input, which comes as result of the field implementation.
We employ the first-order-moment leakage model HDfo,log :
C. INFORMATION LEAKAGE EXPLOITATION
In the previous section, we assessed the leakage resilience of 1 X
HDfo,log (x) = HD(log3 (x0 ), log3 (x1 )) (15)
our AES-128 implementations. We observed results consis- |F28 |2
xi ∈F28
tent to the encoding analyses (Section V, Section VI), except
for the IPM. In fact, we observed unexpected leakage stem- Fig. 7 reports the results of the different CPA attacks, and
ming from the finite field multiplication. Despite the presence Table 5 reports the minimum number of traces required to
of leakage, the TVLA methodology does not provide any clue mount a successful CPA attack. Despite the correct applica-
concerning the exploitability of the leaked information. tion of the masking scheme on the binaries, we exploit only
With this section, we explore the resilience of our soft- 140 and 241, 000 traces to break the BM and ASM imple-
ware masked implementations against information leak- mentations, respectively. Consistently with the result from
age exploitation; specifically, against univariate side-channel Section VI, the HDfo model improves the attack efficiency
attacks. against the ASM implementation, reducing up to ×8.6 times
VIII. DISCUSSION
FIGURE 12. CPA results for the four AES-128 implementations. In grey, In this section, we warn about unanticipated sources of weak-
the wrong key hypotheses, whereas in red the correct one. Fig. 12f, 7g nesses in masked implementations, then we discuss how
and 7h report the PCorrl in Log10 scale. For each implementation,
we employ a different leakage model (Table 4). For the SHWfo,k% model,
parallel-oriented architectures and programming models can
the X-axis reports the number of collected traces (i.e., before trace introduce PPS in software, and we give some principles to
filtering). Each row refers to a different implementation/leakage model prevent the vulnerabilities created by PPS.
combination. First and second columns refer, respectively, to the
STM32F215 and STM32F303 board.
A. ON THE RESILIENCE OF IPM TO TRANSITION-BASED
LEAKAGE
the minimum number of traces to mount a successful CPA In Section V, we have shown that IPM encodings are immune
attack, with respect to a plain use of the HW model. to transition-based leakages, which is consistent with litera-
By targeting the SBOX input, we successfully retrieve ture knowledge. Yet, in Section VII we were able to success-
the target key byte on IPM implementations. This suggests fully attack IPM masked implementations through a leakage
TABLE 5. Minimum number of traces to mount a successful CPA attack against the AES-128 implementations. We report failed in case of attack failure
with 1, 000, 000 traces.
TABLE 6. Key ranking of correct key guess when employing the SHWfo,k% against IPM implementations. We report the correct key-guess rank and
related number of traces for k ∈ {0.1%, 0.2%, . . . , 1%, 2%}. We omit the entries for k > 2%, as we did not succeed in the attack. The number of traces
corresponds to the number of collected traces (i.e., not the number of traces actually analysed).
model targeting such leakages. We found the root cause in the implementations. Finally, FPGAs represent an interesting
use of logarithms in the finite field multiplication implemen- case: they can be employed for either the implementation of
tation. Transition-based leakages on logarithm representation hardware implementations, or for the implementation of full
of the encodings induced exploitable leakage. Such gap CPUs [40]. In both cases, the designs might rely on some
underlines the importance of studying the masking resistance parallel features, e.g., [41], potentially introducing the PPS
both theoretically and practically. It suggests that the different vulnerability.
representations of masked encodings used in an implementa-
tion should all be considered for security assessment. C. PREVENTING PPS IN SOFTWARE
PPS emerges whenever the micro-architecture handles
B. PPS AND PARALLEL-ORIENTED ARCHITECTURES related shares in parallel. As discussed, architectures encom-
The PPS threat emerges whenever data processing paral- passing parallel features and certain programming models
lelism can be achieved. From a hardware point-of-view, PPS potentially introduce the PPS threat. As a naïve solution, the
readily extends to any architecture encompassing any kind of programmer should rely on programming techniques which
feature implying data parallelism. In our work, we focused on do not promote data parallelism, and execute the implemen-
simple micro-architectures encompassing instruction pipelin- tation on architecture not endowed with parallel features.
ing, which implies a sort of data parallelism. Gigerl et Yet, such approach would increase the already high cost of
al. show that super-scalar micro-architectures exhibit more a masked implementation, in particular for masked instances
sources of transition-based leakage [3] due to pipeline of order n > 1.
depth and multiple issuing of instructions. In such micro- Instead, we advocate for a more principled approach, based
architectures, data parallelism is exacerbated, and so the on the concept of Non-Completeness. Non-Completeness
possible occurrence of PPS. is a security property defined in the context of Threshold
Instruction Set Extensions (ISE) play an important role in Implementations [42]. Informally, by seeing an n-th order
the introduction of PPS. Miayjan et al. suggest the employ- masked algorithm as a composition of sub-functions, each
ment of SIMD (Single Instruction Multiple Data) ISE to sub-function has to handle no more than n shares. Gaspoz and
provide efficient and secure software masked implementa- Dhooghe extend this property to provide necessary security
tions [39]. The SIMD ISE enables data-level parallel process- properties against micro-architecture-induced recombination
ing, handling multiple data via a single instruction [13]. The effects [43]. In particular, we remark their Horizontal Regis-
explicit data parallelism naturally implies PPS. Such remark ter Non-Completeness as a necessary condition to avoid PPS.
extends also to GPU architectures, designed to intrinsically Such property contrasts certain programming techniques,
support data-level parallelism. Still, we are not aware of any e.g., share-slicing [21], which aim at the efficient implemen-
work concerning their usage to accelerate software masked tation of masked software implementations.
Yet, their notion of non-completeness does not take into Grandmaison for its contributions to the verification of
consideration the PPS stemming from the pipeline’s depth first-order security in the value-based leakage model.
(i.e., number of pipeline stages). Indeed, PPS originates also
from related shares manipulated in different pipeline stages. REFERENCES
It is possible to extend the non-completeness property at [1] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi, ‘‘Towards sound approaches
pipeline level, requiring that the pipeline does not process to counteract power-analysis attacks,’’ in Advances in Cryptology—
CRYPTO. Berlin, Germany: Springer, 1999.
more than n shares at a time. Gigerl et al. suggest a stricter [2] B. Marshall, D. Page, and J. Webb, ‘‘MIRACLE: MIcRo-architectural
version of this Pipeline Non-Completeness property, separat- leakage evaluation: A study of micro-architectural power leakage across
ing the processing of related shares according to the pipeline’s many devices,’’ TCHES, vol. 2022, no. 1, pp. 175–220, Nov. 2021.
[3] B. Gigerl, R. Primas, and S. Mangard, ‘‘Secure and efficient soft-
depth and number of instructions that can be executed in ware masking on superscalar pipelined processors,’’ in Advances in
parallel to prevent glitch-based leakage [3]. Cryptology—ASIACRYPT. Cham, Switzerland: Springer, 2021.
Admittedly, register and pipeline non-completeness might [4] T. D. Cnudde, B. Bilgin, B. Gierlichs, V. Nikov, S. Nikova, and V. Rijmen,
‘‘Does coupling affect the security of masked implementations?’’ in Proc.
not be sufficient to prevent PPS. Indeed, the register file, COSADE, 2017, pp. 1–18.
caches and memory, potentially store all the shares of an [5] J. Balasch, B. Gierlichs, V. Grosso, O. Reparaz, and F. Standaert, ‘‘On the
encoding. Static power leakage potentially allows an attacker cost of lazy engineering for masked software implementations,’’ in Proc.
CARDIS, 2014, pp. 64–81.
to observe these shares, enabling successful attacks [44]. The
[6] A. Barenghi and G. Pelosi, ‘‘Side-channel security of superscalar CPUs:
risk implied by static power leakage is still unexplored in the Evaluating the impact of micro-architectural features,’’ in Proc. DAC,
software context. 2018, pp. 1–6.
We conclude this discussion by remarking that the [7] A. Barenghi, L. Breveglieri, N. Izzo, and G. Pelosi, ‘‘Exploring Cortex-M
microarchitectural side channel information leakage,’’ IEEE Access, vol. 9,
IPM scheme (more generally, the family of code- pp. 156507–156527, 2021.
based masking) can amplify the security order naturally [8] A. D. Grandmaison, K. Heydemann, and Q. L. Meunier, ‘‘ARMISTICE:
expected [9], [45], [46]. That is, given a masking of order n, Microarchitectural leakage modeling for masked software formal verifica-
tion,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 41,
according to the particular public vector L, the security order no. 11, pp. 3733–3744, Nov. 2022.
can be higher than n. Although we analysed IPM instantiated [9] Q. Wu, W. Cheng, S. Guilley, F. Zhang, and W. Fu, ‘‘On efficient and secure
with non-optimal codes (i.e., which do not amplify the secu- code-based masking: A pragmatic evaluation,’’ IACR Trans. Cryptograph.
Hardw. Embedded Syst., vol. 2022, pp. 192–222, Jun. 2022.
rity order), the use of optimal codes can be a sound way to [10] A. Beckers, L. Wouters, B. Gierlichs, B. Preneel, and I. Verbauwhede,
better mitigate PPS-based leakage. We leave as an interesting ‘‘Provable secure software masking in the real-world,’’ in Proc. COSADE,
future work the investigation of the practical security guaran- 2022, pp. 215–235.
[11] G. Barthe, F. Dupressoir, S. Faust, B. Grégoire, F. Standaert, and
tees of optimal code-based software masked implementations P. Strub, ‘‘Parallel implementations of masking schemes and the bounded
when register and pipeline non-completeness are satisfied. moment leakage model,’’ in Advances in Cryptology—EUROCRYPT.
Cham, Switzerland: Springer, 2017.
[12] T. Moos and A. Moradi, ‘‘On the easiness of turning higher-order leakages
IX. CONCLUSION into first-order,’’ in Proc. COSADE, 2017, pp. 153–170.
Recent literature has highlighted the CPU micro-architecture [13] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative
as a rich source of recombination effects (e.g., transitions), Approach, 5th ed. San Francisco, CA, USA: Morgan Kaufmann, 2012.
which severely decrease the security of masking. Although [14] L. Goubin and J. Patarin, ‘‘DES and differential power analysis the ‘dupli-
cation’ method,’’ in Proc. Int. Workshop Cryptographic Hardw. Embedded
the pervasiveness of such effects, our work shows that they Syst., 1999, pp. 158–172.
do not represent the only threat to the practical security of [15] J. Balasch, S. Faust, and B. Gierlichs, ‘‘Inner product masking revisited,’’ in
masking in software: the parallel processing of share (PPS), Advances in Cryptology—EUROCRYPT. Berlin, Germany: Springer, 2015.
[16] T. S. Messerges, ‘‘Securing the AES finalists against power analy-
exercised by a CPU micro-architecture, represents a potential sis attacks,’’ in Proc. Int. Workshop Fast Softw. Encryption, 2001,
threat too. Relying on an adaptation of the preprocessing pp. 150–164.
technique proposed by Moos and Moradi [12], we show how [17] (2001). Advanced Encryption Standard (AES). NIST. [Online]. Available:
https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197-upd1.pdf
to exploit PPS-based leakage against first-order instances [18] L. D. Meyer, E. D. Mulder, and M. Tunstall, ‘‘On the effect of the
of Boolean, arithmetic-sum and inner-product masking. Fur- (micro) architecture on the development of side-channel resistant soft-
thermore, despite the fact that some schemes, such as the ware,’’ Cryptol. ePrint Arch., Tech. Paper 2020/1297, 2020.
[19] V. Arora, I. Buhan, G. Perin, and S. Picek, ‘‘A tale of two boards: On the
inner-product masking, provide immunity to transition-based influence of microarchitecture on side-channel leakage,’’ in Proc. CARDIS,
leakage, particular operations can remove such immunity. 2021, pp. 80–96.
Specifically, we show how the employment of the log opera- [20] K. Papagiannopoulos and N. Veshchikov, ‘‘Mind the gap: Towards
secure 1st-order masking in software,’’ in Proc. COSADE, 2017,
tion in the field multiplication algorithm allows the successful pp. 282–297.
exploitation of transition-based leakage against the inner- [21] S. Gao, B. Marshall, D. Page, and E. Oswald, ‘‘Share-slicing: Friend
product masking. or foe?’’ IACR Trans. Cryptograph. Hardw. Embedded Syst., vol. 2019,
pp. 152–174, Nov. 2019.
[22] R. Beaulieu, D. Shors, J. Smith, S. Treatman-Clark, B. Weeks, and
ACKNOWLEDGMENT L. Wingers, ‘‘The SIMON and SPECK families of lightweight block
The authors thank Arnaud de Grandmaison and Emanuele ciphers,’’ Cryptol. ePrint Arch., Tech. Paper 2013/404, 2013.
Valea for their helpful comments and many fruitful [23] J. Bos, L. Ducas, E. Kiltz, T. Lepoint, V. Lyubashevsky, J. M. Schanck,
P. Schwabe, G. Seiler, and D. Stehle, ‘‘CRYSTALS-kyber: A CCA-secure
discussions, Romain Frappier for its contributions to the module-lattice-based KEM,’’ in Proc. IEEE Eur. Symp. Secur. Privacy,
implementations of masked AES software, and Arnaud de Apr. 2018, pp. 353–367.
[24] Y. Ishai, A. Sahai, and D. A. Wagner, ‘‘Private circuits: Securing hardware LORENZO CASALINO received the master’s
against probing attacks,’’ in Advances in Cryptology—CRYPTO. Berlin, degree in computer science and engineering from
Germany: Springer, 2003. Politecnico di Milano, Italy, in 2020. He is
[25] S. Mangard, E. Oswald, and T. Popp, Power Analysis Attacks: Revealing currently pursuing the Ph.D. degree with the
the Secrets of Smart Cards. Berlin, Germany: Springer, 2007. CEA-List, Grenoble, France. His research inter-
[26] T. Schneider and A. Moradi, ‘‘Leakage assessment methodology: A clear ests include side-channel analyses, related micro-
roadmap for side-channel evaluations,’’ in Proc. CHES. Academic, 2015, architecture-aware countermeasures, and their
pp. 495–513. automated application.
[27] S. M. Ross, Introductory Statistics, 3rd ed. 2010.
[28] N. Veyrat-Charvillon and F. Standaert, ‘‘Mutual information analysis:
How, when and why?’’ in Proc. CHES, 2009, pp. 429–443.
[29] A. Kraskov, H. Stögbauer, and P. Grassberger, ‘‘Estimating mutual infor-
mation,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top.,
vol. 69, no. 6, Jun. 2004, Art. no. 066138.
[30] V. Cristiani, M. Lecomte, and P. Maurine, ‘‘Leakage assessment through
neural estimation of the mutual information,’’ in Proc. ACNS, 2020,
pp. 144–162.
[31] O. Bronchain, J. M. Hendrickx, C. Massart, A. Olshevsky, and F. Standaert, NICOLAS BELLEVILLE received the Ph.D. degree
‘‘Leakage certification revisited: Bounding model errors in side-channel from Université Grenoble Alpes, France, in 2019.
security evaluations,’’ in Advances in Cryptology—CRYPTO. 2019. Since 2019, he has been a Researcher with
[32] CW1200 ChipWhisperer-Pro. NewAE. Accessed: Apr. 16, 2023. [Online]. the CEA-List, Grenoble, France. His research
Available: https://rtfm.newae.com/Capture/ChipWhisperer-Pro/
interests include side-channel attacks, their coun-
[33] CW308-STM32F2—VCC Internal Regulator. NewAE. Accessed:
termeasures, and the automated application of
Apr. 16, 2023. [Online]. Available: https://rtfm.newae.com/Targets/
UFO%20Targets/CW308T-STM32F/#vcc-int-supply
countermeasures during compilation.
[34] A. Duc, S. Faust, and F.-X. Standaert, ‘‘Making masking security proofs
concrete (or how to evaluate the security of any leaking device), extended
version,’’ J. Cryptol., vol. 32, no. 4, pp. 1263–1297, Oct. 2019.
[35] P. Laarne. (2022). Polsys/Ennemi: 1.1.1. [Online]. Available:
https://doi.org/10.5281/zenodo.5848134
[36] M. Rivain and E. Prouff, ‘‘Provably secure higher-order masking of AES,’’
in Cryptographic Hardware and Embedded Systems. Berlin, Germany:
Springer, 2010.
[37] D. Goudarzi and M. Rivain, ‘‘How fast can higher-order masking be in soft-
ware?’’ in Advances in Cryptology—EUROCRYPT. Cham, Switzerland: DAMIEN COUROUSSÉ received the Ph.D.
Springer, 2017. degree from Institut National Polytechnique de
[38] D. Blackman and S. Vigna, ‘‘Scrambled linear pseudorandom number Grenoble, in 2008. He has been a Research Engi-
generators,’’ ACM Trans. Math. Softw., vol. 47, no. 4, pp. 1–32, Dec. 2021. neer and a Senior Expert with the CEA-List,
[39] A. Miyajan, Z. Shi, C.-H. Huang, and T. F. Al-Somani, ‘‘Accelerat-
since 2011. His research interests include embed-
ing higher-order masking of AES using composite field and SIMD,’’ in
ded software and its interaction with hardware,
Proc. IEEE Int. Symp. Signal Process. Inf. Technol. (ISSPIT), Dec. 2015,
pp. 575–580.
compilation, and runtime code generation for per-
[40] T. Gokulan, A. Muraleedharan, and K. Varghese, ‘‘Design of a 32-bit, formance and security, with a recent focus on
dual pipeline superscalar RISC-V processor on FPGA,’’ in Proc. 23rd hardware security.
Euromicro Conf. Digit. Syst. Design (DSD), Aug. 2020, pp. 340–343.
[41] J. Vliegen, O. Reparaz, and N. Mentens, ‘‘Maximizing the through-
put of threshold-protected AES-GCM implementations on FPGA,’’ in
Proc. IEEE 2nd Int. Verification Secur. Workshop (IVSW), Jul. 2017,
pp. 140–145.
[42] B. Bilgin, B. Gierlichs, S. Nikova, V. Nikov, and V. Rijmen, ‘‘Higher-order
threshold implementations,’’ in Advances in Cryptology—ASIACRYPT,
vol. 8874. Berlin, Germany: Springer, 2014.
[43] J. Gaspoz and S. Dhooghe, ‘‘Threshold implementations in software: KARINE HEYDEMANN received the Ph.D.
Micro-architectural leakages in algorithms,’’ IACR Trans. Cryptograph. degree in computer science from the
Hardw. Embedded Syst., vol. 2023, pp. 155–179, Mar. 2023.
University of Rennes 1, in 2004. She was an
[44] A. Moradi, ‘‘Side-channel leakage through static power: Should we care
Associate Professor with the LIP6, Sorbonne Uni-
about in practice?’’ in Cryptographic Hardware and Embedded Systems—
versity, from 2006 to 2022. She is currently a
CHES. Berlin, Germany: Springer, 2014.
[45] W. Wang, F. Standaert, Y. Yu, S. Pu, J. Liu, Z. Guo, and D. Gu, ‘‘Inner Senior Expert Architect with Thales DIS. She
product masking for bitslice ciphers and security order amplification for is also an Associate Researcher with the LIP6.
linear leakages,’’ in Proc. CARDIS, 2016, pp. 174–191. Her research interests include hardware micro-
[46] W. Cheng, S. Guilley, C. Carlet, J.-L. Danger, and S. Mesnager, architecture, compilation, code optimization, and
‘‘Information leakages in code-based masking: A unified quantification physical attacks, including modeling of hardware
approach,’’ IACR Trans. Cryptograph. Hardw. Embedded Syst., vol. 2021, fault injection effects, automated code hardening, and robustness analysis.
pp. 465–495, Jul. 2021.