0% found this document useful (0 votes)
13 views

A_Tale_of_Resilience_On_the_Practical_Security_of_Masked_Software_Implementations

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

A_Tale_of_Resilience_On_the_Practical_Security_of_Masked_Software_Implementations

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Received 1 July 2023, accepted 15 July 2023, date of publication 24 July 2023, date of current version 15 August 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3298436

A Tale of Resilience: On the Practical Security


of Masked Software Implementations
LORENZO CASALINO 1 , NICOLAS BELLEVILLE 1, DAMIEN COUROUSSÉ 1,

AND KARINE HEYDEMANN2,3


1 Univ.Grenoble Alpes, CEA, List, F-38000 Grenoble, France
2 ThalesDIS, 13590 Meyreuil, France
3 Sorbonne Université, CNRS, LIP6, F-75005 Paris, France

Corresponding author: Nicolas Belleville ([email protected])


This work was supported in part by the French National Research Agency (ANR) under Grant ANR-20-CE39-0010.

ABSTRACT Masking constitutes a provably-secure approach against side-channel attacks. However,


recombination effects (e.g., transitions) severely reduce the proven security. Concerning the software domain,
CPU microarchitectures encompass techniques improving the execution performances. Several studies show
that such techniques induce recombination effects. Furthermore, these techniques implicitly induce some
form of parallelism, and the potential associated threat has never been investigated. In addition, the practical
security of masking relies on the chosen masking scheme. Few works analysed the security of software
protected by different masking schemes, and none considered the parallelism threat. Thus, literature lacks
of a more comprehensive investigation on the practical security of software implementations relying on
various masking schemes in presence of micro-architecture-induced recombination effects and parallelism.
This work performs a first step to fill this gap. Specifically, we evaluate the practical security offered by first-
order Boolean, arithmetic-sum and inner-product masking against transitions and parallelism in software.
We firstly assess the presence of transition and parallel-based leakages in software. Secondly, we evaluate
the security of the encodings of the selected masking schemes with respect to each leakage source via micro-
benchmarks. Thirdly, we assess the practical security of different AES-128 software implementations, one
for each selected masking scheme. We carry out the investigation on the STM32F215 and STM32F303
micro-controllers. We show that 1) CPU’s parallel features allow successful attacks against masked
implementations resistant to transition-based leakages; 2) implementation choices (e.g., finite field multipli-
cation) impact on the practical security of masked software implementations in presence of recombination
effects.

INDEX TERMS Masking, processor micro-architecture, side-channel analysis, software masking.

I. INTRODUCTION noise, a successful attack requires the computation


Side-channel attacks threat the security of embedded hard- of higher-order statistical moments of the encoding’s
ware and software components, in particular cryptographic distribution. The difficulty of this task increases exponen-
primitives. To counteract a side-channel attacker, an nth-order tially in n [1], defining the security order of masking.
masking countermeasure encodes secret-dependent data into In practice, physical phenomena such as (memory-state) tran-
n + 1 random values, called shares. Under the assumption sitions [2], glitches [3] and wire cross-talking [4] recombine
of independently leaking shares (ILA) and of sufficient shares, reducing the expected security order of a masked
implementation [3], [5].
The associate editor coordinating the review of this manuscript and Several works highlight the pervasiveness of transition-
approving it for publication was Sedat Akleylek . based leakages in CPU micro-architectures [2], [3],

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 84651
L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

[6], [7], [8]. Still, few works show the practical secu- software in presence of both recombination effects and PPS.
rity difference between different maskings in presence of In particular we show that:
transition-based leakages [9], [10]. • PPS-based leakage is observable in the software context
Besides transitions, the threat posed by parallel process- (Section IV).
ing of shares (PPS) is overlooked for software-masked • PPS induces information leakage for all the considered
implementations,1 although Moos and Moradi show a simple masking encodings (Section V).
preprocessing technique to efficiently exploit it in hard- • Such leakage can be exploited against all the considered
ware [12]. In software, the micro-architecture of modern masking schemes by slightly adapting Moos and Moradi
CPUs relies on techniques to increase execution perfor- methodology [12] (Section VI).
mance [13], potentially handling multiple shares per clock • Transitions and PPS-based leakages lead to successful
cycle. As such, the PPS implies new potential vulnerabil- attacks against all the considered masked implemen-
ities in masked software implementations. To the best of tations. In particular, we exhibit two attacks against
our knowledge, the study of such vulnerabilities remains inner-product masking: one exploiting PPS leakages and
unexplored. one exploiting a vulnerability due to transition-based
This work explores the practical security of several leakage on the logarithm representation of the encodings
first-order software masked implementations in presence within the finite field multiplication implementation
of both recombination effects and PPS-based leakages. (Section VII).
We study three masking schemes: the most studied Boolean
(BM) [14] and inner-product (IPM) [15], and the arithmetic-
II. RELATED WORK
sum (ASM) [16]. Our investigation firstly assesses the
potential sources of vulnerabilities in software due to The nature of this work touches different areas of the side-
transition-based and PPS-based leakages, and then evaluates channel domain. This section compares our work with the
the practical security of masked software with respect to the most relevant ones from each area.
identified vulnerabilities. In more details, our methodology
develops in three steps: A. MICRO-ARCHITECTURE-INDUCED LEAKAGES
1) We characterise micro-architectural leakage effects: we Several works investigate the different micro-architectural
carefully handcraft micro-benchmarks to assess the sources of leakage, spanning through different micro-
presence of transition-based and PPS-based leakages in architectures and processors. Table 1 summarises the related
software (Section IV). state of the art, highlighting the investigated leakage sources,
2) We characterise the impact of the observed leak- types of micro-architecture and the CPU use-case(s). Papa-
age effects on masking encodings: we quantify the giannopoulos and Veshchikov assess some recombination
leaked information and investigate its exploitability effects (i.e., register overwrite and memory persistence)
(Section V, Section VI). violating the ILA on a simple AVR ATMega163 micro-
3) We characterise the impact of the observed leakage controller [20]. Marshall et al. assess the presence of multiple
effects on masked implementations: once evaluated transition-based leakages on different platforms [2]. They
leakage impact on the encodings, we assess the practi- highlight how similar platforms, executing the very same
cal security of fully masked software implementations piece of code, may exhibit or not a transition-based leak-
(Section VII). Specifically, we target as a use-case the age. Furthermore, they highlight how speculative execution
AES-128 block-cipher [17]. potentially introduces unexpected transition-based leakage.
With ARMISTICE, de Grandmaison et al. show how also the
To provide a comprehensive analysis, we split the secu-
encoding of instructions potentially affects the variability of
rity assessment in a first information leakage assessment,
the observable leakages [8]. Concerning platforms provided
to analyse the information leaked by the encoding or the
with superscalar capabilities, Barenghi and Pelosi show, on
fully masked implementation, and in a information leakage
the ARM Cortex-A7 and ARM Cortex-M7, that the increased
exploitation, to evaluate the exploitability of such informa-
parallelism provided by such micro-architectures increase
tion. In addition, as the design and implementation of the
the sources of transition-based leakage [6]. Besides the per-
execution platform potentially impacts the observed leak-
vasiveness of transition-based leakage, a micro-architecture
age [2], [18], [19], we lead our investigation on two different
potentially encompasses other recombination effects.
micro-controllers, an STM32F215 and STM32F303.
Gao et al. show that intra-register leakage interaction can
break the security of share-slicing implementations [21],
A. CONTRIBUTIONS
suggesting that leakage is due to glitch-based recombinations
To the best of our knowledge, we provide the first investiga- in the barrel-shifter unit of ARM Cortex-M0 and Cortex-M3
tion on the practical security of different masking schemes in implementations. On the contrary, Gigerl et al. show how
1 To the best of our knowledge, the only work mentioning the existence signal glitches in the forwarding logic of the superscalar
of PPS in software is [11], in footnote 1, but this work does not study the RISC-V SweRV CPU recombine multiple shares, reducing
security implications further. the masking security order beyond the factor of 2 predicted

84652 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

TABLE 1. Summary of the state of the art concerning micro-architectural leakage investigations. For each work, we report the targeted leakage source,
the type of investigated micro-architecture and the CPUs analysed. With ‘‘?’’ we mark works for which it is unknown whether a given leakage source
is targeted.

TABLE 2. Summary of the state of the art concerning the practical security analyses of masking in software. For each work, we report the targeted
masking scheme, the leakage source against which we evaluate the practical security and the analyses carried on.

by the theory [3]. As a matter of fact, the current state of we consider the ASM, a masking scheme employed for ARX
the art focuses on micro-architecture-induced recombination ciphers (e.g., Speck [22]) and post-quantum cryptosystems
effects. Our work represents a novel and orthogonal effort: (e.g., Kyber [23]).
we show that PPS-based leakage can be observed in software
implementations, even on in-order scalar processors. III. BACKGROUND
This section provides the essential background to understand
our methodology. We first introduce the notations employed
B. PRACTICAL SECURITY OF SOFTWARE MASKED
throughout our work. Then, we introduce the masking coun-
IMPLEMENTATIONS
termeasures we study, the necessary security concepts and
Few works explore the practical security of masking against the potential threat implied by so-called physical effects.
micro-architectural leakages. Table 2 summarises the related We follow with an overview of three statistical tools we
state of the art, highlighting the investigated masking, the employ to assess and exploit the information leakage from the
considered leakage sources and the analyses carried on. Beck- investigated software implementations. Finally, we overview
ers et al. show that several deemed-to-be-secure software a trace preprocessing technique able to exploit PPS-based
implementations, either masked via first-order BM or IPM, leakages.
are vulnerable to simple first-order analyses (i.e., CPA and
TVLA) [10]. Although we follow a similar investigation
A. NOTATIONS
approach, our goal is different: whereas they aim to verify
the claims concerning the security of open-source implemen- We refer to a random variable with a capital italic letter,
tations, we evaluate the practical security of different masking e.g., X . We denote the sampling space of X as X2k , where
schemes against transition-based and PPS-based leakages. k > 0. We refer to the distribution of a random variable X
The work of Wu et al. is closer to ours: they analyse the as DX . We refer to a realisation of the random variable
practical security of different code-based software instances X ∈ X2k as x ∈ F2k , where F2k is a finite field. We implicitly
of BM and IPM, up to the third masking order [9]. They consider any value x ∈ F2k in binary form. We denote the
rely on first-order analyses (i.e., CPA and TVLA), as well i-th bit of x as x i , where i ∈ [0, k). We refer to vectors in
as Template Attacks (TA) and bivariate CPA. Interestingly, bold-face style, e.g., X. We refer to the j-th component of a
they analyse the practical security of code-based IPM with vector X as Xj . We refer to a set of n traces, each of m samples,
respect to different public vectors (Section III-B), providing with Tn×m , and to the subset of traces at sample 0 ≤ i < m
a better characterisation of the enconding’s security. Both with Tin×m .
works do not (explicitly) target micro-architectural leakages,
although the leakages they observe for BM implementations B. MASKING
probably result from transition-based leakages. In contrast, A side-channel attacker exploits the statistical link between
we explicitly take advantage of transition-based and PPS- an observed physical quantity (e.g., the instantaneous power
based leakages against each studied masking (Section VI and consumption) and secret-dependent data, which the target
Section VII). Finally, with respect to the two previous works, implementation manipulates. The masking countermeasure

VOLUME 11, 2023 84653


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

counteracts such attacks by breaking this statistical link. That between univariate and multivariate) and the maximum
is, given any secret-dependent datum X , masking encodes it statistical moment order the attacker can compute.
with a so-called (probabilistic) encoding (Def. 1). Definition 6 (Value-based Leakage Function [5]): Let V
Definition 1 (Encoding): Given a random variable X ∈ be a finite set of intermediate variables and L(·) = L(·)d +
(n+1)
X2k , where k ≥ 1, the tuple X = (Xi )ni=0 ∈ X2k is an N (0, σ ) be a leakage function made of a deterministic part
encoding of X . The random variables Xi ∈ X2k are called L(·)d and an (additive) random noise N (0, σ ). This leakage
shares. n defines the masking order. function is value-based if its deterministic part can only take
The encoding of X is built from an nth-order masking a value v ∈ V as argument.
scheme M. Informally, an nth-order masking scheme is a Under the so-called Noisy Leakage security model (paral-
(n+1)
vector-valued function M : X2k 7 → X2k , such that it lel computation, value-based leakage, univariated attacker),
satisfies correctness (i.e., the M function is invertible) and Chari et al. proved that the masking countermeasure expo-
dth-order security (Def. 2). nentially amplifies (in the number n of shares) the difficulty
Definition 2 (dth-Order Security:) Let M be an nth-order of an attack, expressed in number of traces to collect and
masking scheme. M satisfies dth-order security if and only analyse [1]. Ishai et al. defined the d-probing security model
if, for each X ∈ X2k , any subset of (at most) d shares of (value-based leakage, multivariated attacker), under which an
X = (Xi )ni=0 = M(X ) does not depend on X . d ≤ n defines implementation is secure against any d-variated attacker [24].
the security order of M. Barthe et al. defined the Bounded-Moment (value-based
Examples of masking schemes are the Boolean masking leakage, univariated attacker), which proves the security of
(BM) [14], the arithmetic-sum masking (ASM) [16] and masked implementations against attackers able to compute
the inner-product masking (IPM) [15]. Respectively, they statistical moments of order (up to) d [11].
generate Boolean (Def. 3), arithmetic-sum (Def. 4) and inner-
product (Def. 5) encodings. C. PHYSICAL EFFECTS
Definition 3 (Boolean Encoding): Let us consider X ∈ Security proofs of masking schemes typically assume a
X2k , where k ≥ 1 and X = L (Xi )ni=0 = BM(X ) the value-based leakage model, i.e., each share leaks indepen-
Boolean encoding of X . Then X = ni=0 Xi , where ⊕ is the dently of the others. The literature refers to it as the Inde-
eXclusive OR. pendent Leakage Assumption (ILA). However, in practice,
Definition 4 (Arithmetic-Sum Encoding): Let us consider masked implementations do not comply with such hypoth-
X ∈ X2k , where k ≥ 1 and X = (Xi )ni=0 = ASM(X ) the esis. Indeed, several physical effects, such as memory tran-
arithmetic-sum encoding of X . Then X = ⊞ni=0 Xi , where ⊞ sitions, glitches and coupling, recombine the shares, hence
is the arithmetic sum. violating the ILA.
Definition 5 (Inner-Product Encoding): Let us consider A typical class of models capturing such effects are
X ∈ X2k , where k ≥ 1 and X = (Xi )ni=0 = IPM(X ) the so-called Transition-based Leakage Functions (Def. 7).
the inner-product encoding of X . Then X = ⟨L; X⟩. L = A well-known example is the Hamming-Distance leakage
(n+1)
(1, Li )ni=1 ∈ X2k is a public random vector, and ⟨·; ·⟩ is the function (Eq. 4).
inner-product operator. Definition 7 (Transition-based Leakage Function [5]): Let
The appeal in the masking countermeasure lies within the V be a set of intermediate variables, and T := {v⊕v′ | ∀ v, v′ ∈
provable security framework, composed of: V}∪V the set of all the transitions between these intermediate
• Leakage Model: it describes how an implementation variables. A leakage function L(·) is transition-based if its
leaks information through a given side channel. Typi- deterministic part L(·)d takes values t ∈ T as argument.
cally, the leakage model takes the form of an Additive
Gaussian Noise (AGN) function:
L(X , Y ) = HW(X ⊕ Y ) + N (0, σ ) = HD(X , Y ) + N (0, σ )
L(v) = L(v)d + N (0, σ ) (1) (4)
where N (0, σ ) is a Gaussian noise, and L(·)d is a Balasch et al. proved that the security order of a d-probing
deterministic function. Typically, L(·) is a value-based secure implementation in the value-based leakage model
leakage function (Def. 6), such as the Hamming-Weight is halved in the transition-based leakage model (i.e., ⌊ d2 ⌋).
function: The literature refers to this as the security-order reduction
L(v) = HW(v) + N (0, σ ) (2) theorem [5]. Specific to the context of masked software
implementations, a CPU micro-architecture exposes many
where HW is: elements violating the ILA, such as micro-architectural reg-
X isters (e.g. the inter-stage pipeline registers or the Memory
HW(x) = x i. (3)
Data Register of the Load-Store Unit [2], [6]), the for-
0≤i<k
warding logic, the Barrel-Shifter Unit, the Arithmetic-Logic
• Attacker Model: it describes how many intermedi- Unit, and the Load-Store Unit [3], [21]. Gigerl et al. show
ate variables the attacker can observe (we distinguish that, in presence of glitch-based recombinations, the order

84654 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

reduction exceeds the reduction factor of 2 considered in The implementation leaks information with a certain prob-
presence of transition-based leakage [3]. Furthermore, the ability if the t-statistic overcomes a given t-threshold. The
micro-architectural properties of complex CPUs also imply t-threshold is normally set to ±4.5, which means we can
that the security reduction order is also greater than 2 [3]. reject the null hypothesis with a probability confidence
The CMOS technology is still mainstream in digital design, of 99.999%.
and the overall power consumption of a CMOS-based circuit At its core, TVLA relies on hypothesis testing. As such,
is the superposition of the power consumptions of its sub- it is affected by statistical errors too. We distinguish between
elements [25]. We can describe the induced leakage via the Type-I errors (or false positives) and Type-II errors (or false
Sum-of-Hamming-Weights leakage function: negatives) [27]. Type-I errors refer to the cases where the test
fails (null hypothesis rejected), although the implementation
L(X , Y ) = SHW(X , Y ) + N (0, σ ). (5) does not leak. Type-II errors, on the other hand, refer to the
Such AGN model assumes as its deterministic component acceptance of the null hypothesis, although the implementa-
the SHW function: tion actually leaks. Type-II errors are the most troublesome,
as they would report an implementation as leakage-free when
SHW(X , Y ) = HW(X ) + HW(Y ). (6) it is not. As a mitigation technique against these types of
errors, a strategy is to repeat the TVLA several times, each
In this paper we use the binary form of the SHW function, with a distinct fixed key [26].
which can be readily extended to accept an arbitrary number
of arguments. F. MUTUAL INFORMATION
The Mutual Information (MI) is an information-theoretic tool
D. PEARSON’s CORRELATION COEFFICIENT for the quantification of linear and non-linear relationship
The Pearson’s Correlation Coefficient (PCorrl) is a statistical between two random variables. The metric has different def-
tool which quantifies the linear relationship between two initions, according to the nature (discrete or continuous) of
random variables. Given two arbitrary random variable X , Y , the random variable. Equation 9 reports the definition of MI
the PCorrl is defined as: in the case of two discrete random variables X and Y .
E[(X − µX ) · (Y − µY )] MI(X , Y ) =
XX p(x, y)
ρ(X , Y ) = (7) p(x, y) · log2 (9)
σX · σY x∈X y∈Y
p(x) · p(y)
where µ and σ represent, respectively, the mean value and Although it can capture any type of relationship, the com-
standard deviation of the given random variable. The coef- putation of the MI relies on the knowledge of the joint
ficient takes values in the interval [−1, +1], where the probability distribution p(x, y). Generally, the distribution is
extremes indicate perfect linear dependency between the two unknown and can only be estimated. Therefore, MI cannot be
variables, whereas a coefficient of 0 indicates no linear depen- directly computed, requiring the employment of estimators.
dency. Such estimators rely on different techniques such as his-
tograms, Gaussian mixtures, k-nearest neighbours, or neural
E. TVLA networks [28], [29], [30]. Among these estimators, the empir-
When performing security evaluations of a crypto- ical Hypothetical Information (HI) provides an upper-bound
graphic implementation, the evaluator ideally aims to to the MI [31], while converging towards MI as the number
provide the most possible general answer regarding the of traces increases. As such, HI fits in those contexts where
security of the implementation (i.e., is the implementation a conservative analysis of the security of an implementation
secure?). The Test Vector Leakage Assessment (TVLA) (i.e., overestimate the information leakage) is preferable.
[26] reduces the problem at testing whether two sets of
side-channel traces Sfixed and Srandom can be distinguished by G. BIASING LEAKAGE DISTRIBUTIONS (BLD) TO ATTACK
their statistical moments (alternative hypothesis) or not (null MASKED PARALLEL IMPLEMENTATIONS
hypothesis). Srandom refers to side-channel traces collected The strength behind masking stands in the need, for an
while the implementation processes a different plaintext for attacker, to compute higher-order statistical moments and/or
each trace, whereas Sfixed refers to the usage of the same to perform multivariate statistical analyses. When consider-
plaintext for each trace. In the case of univariate first-order ing hardware masked implementations, security evaluators
TVLA, the evaluator computes the t-statistic t: assume a parallel computation model. Under this compu-
tation model, the implementation can treat related shares
µ̂fixed − µ̂random
t=q 2 (8) at the same time sample. Considering a nth-order masking
σ̂ fixed σ̂ 2 random scheme, the attacker, which observes all the n + 1 shares of
nfixed + nrandom
a key-dependent encoded value, needs, at least, the statistical
where µ̂fixed , µ̂random refer to the sample mean, σ̂ 2 fixed , moment of order n + 1 to detect any key-dependent informa-
σ̂ 2 random to the sample variance and nfixed , nrandom to the tion. Moos and Moradi proposed a preprocessing technique
number of traces of the fixed and random set, respectively. to reduce such minimal key-dependent order moment [12].

VOLUME 11, 2023 84655


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

FIGURE 1. SHW distributions obtained for various secret values masked with first-order Boolean masking. x is the secret value, and x1 a random value
used for Boolean masking. Top row: distributions of SHW without preprocessing. Bottom row: distribution obtained when keeping only the lowest k%
values (k = 25% here). While the mean is independent of the secret without preprocessing, it becomes dependent on the secret when only the lowest k%
samples are kept.

Informally, the technique consists in selecting, for each trace leakage. To this end, we proceed as follows: we firstly provide
sample, a subset of the measured traces, preserving only a rationale explaining how the complexity of a CPU micro-
certain leakage values. Such Biasing Leakage Distribution architecture potentially induces PPS (Section IV-A). Then,
(BLD) preprocessing biases the leakage distribution of each we describe the three carefully hand-crafted assembler code
trace sample, converting higher-order leakages to lower-order (called micro-benchmarks, or UBenches) that we designed to
leakages. investigate the presented rationale (Section IV-B). To confirm
To exemplify this technique, let us consider a first-order or reject the presence of PPS, we run side-channel analyses
BM encoding of X ∈ X22 . Further, let us assume that the on each UBench (Section IV-D).
two shares X0 , X1 are processed in parallel, and that the As presented in Section III-C, the micro-architecture of
implementation leaks according to a noise-free SHW model modern CPUs constitutes a rich source of recombination
(Eq. 5). Fig. 1, top row, reports the marginal distributions of effects; in particular, of transition-based leakages. Hence,
each realisation of X . Each marginal distribution exhibits the we also include a UBench exercising a transition-based
same first-order moment (e.g., mean). That is, the first-order leakage originating within the micro-architecture. We have
moment is independent on the encoded value X , as it is released these micro-benchmarks (C and binary code) as
expected for a first-order masking scheme. Fig. 1, bottom publication artefacts (https://zenodo.org/record/8094516).
row, reports the marginal distributions of the realisations of X
after a preprocessing keeping the k = 25% of samples with
A. RATIONALE
the lowest values of the leakage distributions [12]. The pre-
The micro-architecture of modern CPUs extensively relies
processed first-order moments of the marginal distributions
on hardware-oriented techniques to increase the instruc-
depend on the secret value, making possible to mount first-
tion throughput [13]. Due to instruction pipelining, the
order attacks. In practice, the resulting order reduction varies
micro-architecture is partitioned into several stages, where
depending on the value of threshold k, and on the heuristic
each stage takes care of a part of the instruction life
used for traces pruning (e.g., keeping the ones with the lowest
cycle. Fig. 2 depicts a simplified 3-stage, in-order, micro-
leakage values) [12].
architecture. In such example, the Instruction Fetch (IF)
stage fetches the next instruction to be executed, the Instruc-
IV. PARALLEL PROCESSING OF SHARES IN SOFTWARE tion Decode (DE) interprets the instruction (e.g., selecting
As our goal is to evaluate the practical security of operands from the Register File), whereas the Instruction
masked software implementations (Section V, Section VI, Execute (EXE) executes the instruction. We remark
Section VII), we need first to assess the potential sources of that, in such example, the execution of memory-related

84656 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

1) NOTATION
We denote the UBench target words as X0 and X1, whereas
rndN refers to one of the UBench random input values.
We denote with R_val a generic 32-bit register containing
the value val. As a special case, we denote with R_destN
a 32-bit register containing the result of the N -th UBench
instruction. We refer to the immediate address of a value
val with addr[val]. We denote a constant value const
with #const.

FIGURE 2. Simplified model of a 3-stage, in-order micro-architecture.

instructions (e.g., load and store) requires 2 clock cycles,


whereas arithmetic-logic instructions require 1. For memory
accesses, the target address is sent to the memory in the first
cycle of the EXE stage. During the second cycle, the data to be Listing. 1. Common Structure of Leakage Micro-Benchmarks.
stored is sent to the memory, or the data to be read is received
from the memory. The address computation phase employs 2) PPS-RELATED UBench #1
the ALU. To avoid any resource conflict, during the address The first PPS-related UBench stimulates the parallel manip-
computation phase, the fetch and decode stages are stalled. ulation of bytes when loading a specific one from a given
Although being quite simple, such model captures the micro- memory address. The preamble crafts a 32-bit word and
architecture organisation of real micro-controller-graded stores it on the memory stack. Such word contains the least-
CPUs (e.g., ARM Cortex-M3 and Cortex-M4 [7], [8]). With significant byte (LSB) of both X0 and X1. The workload
such model in mind, it gets easy to understand how PPS reads the X0’s LSB by accessing to its address. The manipula-
can happen in software. Indeed, as mentioned above, each tion of the LSB of each share allows different word’s layouts.
stage takes care of one part of the instruction life cycle: Listing 2 reports the workload employed in our evaluations,
the execution of the DE stage happens in parallel with the hereafter referred to as UB-SHW-LDRB. We comment the
execution of the EXE stage. As a consequence, whenever the workload with a byte-oriented representation of the word’s
two stages of the simplified micro-architecture manipulate layout (LSB on the right).
related shares, the micro-architecture processes shares in
parallel.

B. MICRO-BENCHMARKS
We design three distinct micro-benchmarks, one for each
potential PPS case we identified. Each UBench shares the Listing. 2. UB-SHW-LDRB workload.
same structure: a preamble followed by a workload (List-
ing 1). We implement the UBenches in Thumb-2 assembler,
targeting ARM-based target platforms (Section IV-C). 3) PPS-RELATED UBench #2
The UBench preamble consists in a sequence of The second PPS-related UBench stimulates the parallel
machine instructions preparing the architectural and micro- manipulation of values during the readings of X0 and X1
architectural states and the inputs for the workload. The from the memory and the register file, respectively. Listing 3
preparation of the micro-architectural state consists in the reports the corresponding workload, hereafter referred to as
randomization of the state of specific elements (e.g., micro- UB-SHW-LDR-EOR. The ldr.w instruction enters the EXE
architectural registers, memory data-path), which may oth- stage at clock cycle #k. X0 enters the micro-architecture at
erwise induce unintended leakage. The workload consists clock cycle #k+1. Due to the pipeline stall inserted during
in a sequence of machine instructions, which attempts to the address generation, the eor.w instruction passes the DE
exercise a desired leakage effect. The trigger_high() stage at clock cycle #k+1. During the DE stage, the X1 is
and trigger_low() functions, which surround the work- read from the register file. As a consequence, at clock cycle
load, respectively start and stop the collection of power-based #k+1, the values X0 and X1 are simultaneously alive in the
side-channel traces. To clearly identify the workload-induced micro-architecture.
leakage effect, we pad the workload’s beginning and end-
ing with eor.w instructions provided with random inputs. 4) PPS-RELATED UBench #3
To make clear the handling of these values, we comment each The third PPS-related UBench stimulates the parallel manip-
UBench instruction with its effect. ulation of values by processing X0 and X1, each handled by a

VOLUME 11, 2023 84657


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

4 samples per clock cycle are measured. The STM32F215


comes with an internal voltage regulator, which we left
turned-on and set to 1.2V [33].
Listing. 3. UB-SHW-LDR-EOR workload.
D. EVALUATION
distinct ALU instruction. Listing 4 reports the corresponding For each UBench, we generate two datasets of randomly
workload, hereafter referred to as UB-SHW-MOV-EOR. The chosen input values: the test dataset and the control dataset.
mov.w instruction (and thus, its input operand X0) enters in Then, for each input dataset, we collect a trace set of
the EXE stage at clock cycle #k. At the same clock cycle, 30, 000 power-consumption traces, each of 90 samples.
the eor.w instruction enters the DE stage, where the target Finally, for both the collected traces sets, we compute
value X1 is read from the register file. As a consequence, ρ(L(X0, X1)d , Ti30k×90 ), where i ∈ [0, 90) and X0, X1
the values X0 and X1 will be both in the micro-architecture belong to the test dataset (i.e., the control input dataset is
within the same clock cycle #k. unused). With this procedure, we verify that any correlation
stems from X0 and X1 manipulation, not from other experi-
mental factors.
The two leftmost columns of Fig. 1 report the results under
the SHW leakage model (Eq. 5), whereas the two right-
most columns report the results under the HD leakage model
Listing. 4. UB-SHW-MOV-EOR workload. (Eq. 4). Except for the UB-SHW-LDRB on the STM32F215,
we observe that, when using the proper leakage model
(i.e., SHW and HD for PPS-oriented and transition-oriented
5) TRANSITION-RELATED UBench UBenches, respectively) we observe a higher correlation in
This UBench tests the transition-based leakage stemming the test traces, confirming the presence of the targeted leak-
from the update of the inter-stage pipeline registers. Listing 5 age effect. When looking for other effects (i.e., transitions
reports the corresponding workload, hereafter referred to in the PPS-oriented UBenches or PPS in transition-oriented
as UB-HD. At clock cycle #k, the first eor.w instruction UBenches), we do not observe any significant correlation,
enters the DE stage. X0 is read from the register file and indicating that the searched effect is negligible. Concerning
stored in the DE/EXE inter-stage register. At clock cycle the UB-SHW-LDRB, as explained in Section IV-B, we test all
#k+1 the second eor.w enters the DE stage. X1 is read from the different word layouts. For the sake of brievity, we only
the register file, and it is stored in the DE/EXE register. The report the results of UB-SHW-LDRB for the layout illustrated
update of the DE/EXE potentially causes a transition-based in Listing 2, but all the other word layouts give similar results.
leakage. Finally, we observe lower correlation values for the
STM32F215 as compared to the STM32F303. Such dif-
ference, potentially stemming from micro-architectural dif-
ferences and/or the noise generated by the STM32F215’s
internal regulator (Section IV-C), provides us two distinct
noise settings for the same leakage model. We will take
Listing. 5. UB-HD workload.
advantage of this difference to explore the practical resilience
of BM, ASM and IPM in different noise settings.
In this section, we have experimentally shown that both
C. EXPERIMENTAL SETUP transition-based and PPS-based leakages potentially occur in
We execute the UBenches on the STM32F215 and software. In the following section, we employ the developed
STM32F303 micro-controllers. The former hosts a ARM UBenches to assess the security of masking encodings against
Cortex-M3 CPU, whereas the latter a ARM Cortex-M4 CPU. transition-based and PPS-based leakages.
We compile each UBench with arm-none-eabi-gcc
version 9.2.1. We tune the compilation with -Os, -mthumb, V. EVALUATION OF THE PRACTICAL RESILIENCE OF
and -mcpu=cortex-m3 and -mcpu=cortex-m4 for the MASKING ENCODINGS
STM32F215 and the STM32F303, respectively. To minimise In the previous section, we verified the presence of both
execution time variability across runs of the same code, transition-based and PPS-based leakages on our two tar-
we fetch code from the Flash, disable the instruction and get micro-controllers. This section evaluates the practical
data cache and set the Flash access latency to 0 clock cycles. resilience of first-order masking encodings against such leak-
We collect power-based side-channel traces via the Chip- age sources. We develop the evaluation in two settings: an
Whisperer setup, with an acquisition board CW-308 UFO and ideal one (leakage model and leakage effect match); a real
the CW-1200 oscilloscope [32]. We set the micro-controllers one (leakage model and leakage effect potentially differ).
clock frequency to 7.384 MHz, and the oscilloscope samples For the latter case, we rely on the UBenches designed to
the power consumption at a rate of 29.538 MHz. Hence, assess the presence of PPS and transition-based leakages

84658 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

FIGURE 3. PCorrl-based evaluation of PPS-based and transition-based leakages. Each row reports the PCorrl from a different UBench: first row for
UB-SHW-LDRB (Listing 2), second row for UB-SHW-LDR-EOR (Listing 3), third row for UB-SHW-MOV-EOR (Listing 4), fourth row for UB-HD (Listing 5).
The two first columns report the results under the SHW leakage model, and the two last columns under the HD leakage model. The first and third
column report the results for the STM32F215 board, whereas the second and fourth ones for the STM32F303 board. Each UBench is evaluated on two
sets (test and control ) of 30, 000 power-consumption traces.

(Section IV-B). We analyse the encodings’ resilience in We describe the leakage L via an AGN leakage model (Eq. 1).
two steps: (a) quantification and comparison of the leaked According to the targeted leakage effect, we employ either
information (Section V-A); (b) exploitation of the leaked L(·)d = SHW or L(·)d = HD.
information through first-order analyses (Section V-B). Fig. 4 reports the results of the information-theoretic leak-
age evaluation. We observe that the BM encoding leaks
the most, while the IPM one leaks the least. Comparing the
A. THEORETICAL EVALUATION information leakage between the two leakage models, the
As remarked in Section IV-D, the SHW and HD leakage SHW model not only provides the least information quantity,
models might not perfectly describe the actual behaviour of but it decreases faster. This is witnessed by the slope of the
our target boards. In order to evaluate the leakage resilience in curves, as the SHW curve reports a slope of −2, whereas the
the case such models capture the leakage behaviour, we firstly HD one reports a slope of −1. As reported by Duc et al.,
conduct an information-theoretic analysis. For such purpose, such slope reports the minimal statistical moment to break
we numerically estimate MI(X , L(X0 , X1 )), where X ∈ X24 the encoding [34].
and the shares X0 , X1 ∈ X24 encode X according to BM, We verify this observation by mounting a first-order cor-
ASM or IPM. For IPM, we arbitrarily select L = (1, 6) ∈ F224 . relation analysis on simulated power-consumption traces.

VOLUME 11, 2023 84659


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

Having obtained an overview of the resilience of first-


order BM, ASM and IPM in an ideal setting (i.e., the leakage
models perfectly describes the target leakage effect), in the
next section we evaluate the resilience of such encodings in a
more realistic context.

B. EXPERIMENTAL EVALUATION
In the previous sub-section, we analysed the information-
theoretic resilience of first-order BM, ASM and IPM.
FIGURE 4. Information-Theoretic leakage resilience analyses results. The We completed the analyses with a PCorrl-based evaluation on
plot reports the numerically estimated MI(X , L(X0 , X1 )) evolution simulated traces. Such evaluation remarked the better leakage
according to an increasing noise variance σ 2 (both in Log10 scale).
We describe the leakage L as an AGN leakage model (Eq. 1), where resilience of ASM and IPM encodings. Although the interest
L(·)d = SHW or L(·)d = HD, for PPS-based and transition-based leakages, provided by an ideal setting (i.e., simulated traces), masked
respectively. Due to estimation errors, for σ 2 ≥ 102 , the SHW curve software implementations are executed in an imperfect
diverges from the expected straight line. As IPM reaches perfect
independence from X in the HD case, we omit the related curve. one, where the leakage behaviour potentially deviates from
the hypothetical one. As such, in this section we evalu-
ate the leakage resilience of the three masking schemes
when the first-order encodings are manipulated on our two
boards, the STM32F215 and STM32F303. For this purpose,
we re-use the UBenches of Section IV-B, which stimulate
PPS-based and transition-based leakages. Differently from
the information-theoretic analyses, for IPM we arbitrarily
select L = (1, 170) ∈ F228 .
For each UBench, we capture 4, 000, 000 traces, each of
90 samples. We first quantify the leaked information by
computing HI(X , Ti4M×90 ). We set the random target inputs
X0, X1, manipulated by each UBench, to the realisation of
FIGURE 5. PCorrl-based leakage resilience analyses results on simulated the shares X0 , X1 ∈ X28 , in each of the studied masking
traces. The plot reports ρ(HW(X ), L(X0 , X1 )) according to an increasing
noise variance σ 2 (Log10 scale), for the HD and SHW models.
encodings BM, ASM, and IPM. As explained in Section III-
We generate 1, 000, 000 power-consumption traces, each of 1 sample. F, the HI provides an upper bound of MI. This property is of
We simulate the traces according to an AGN leakage model (Eq. 1), where particular interest in our case as we want to assess conserva-
L(·)d = SHW or L(·)d = HD, for PPS-based and transition-based leakages,
respectively. The metric does not detect correlation with X under the SHW tively the amount of leakage. HI also converges towards the
for BM, ASM and IPM. true MI as the number of traces gets higher [31].
The first two columns of Fig. 4 present the results
of the HI analysis for the considered masking encod-
Specifically, we generate 1, 000, 000 traces, each of ings and UBench. We compute the HI via the ENNEMI
1 sample, via an AGN leakage model, and we compute Python library [35] which implements a k-nearest-neighbour
ρ(HW(X ), L(X0 , X1 )), where X , X0 , X1 ∈ X24 . Fig. 5 reports algorithm. Although the high number of traces and the uni-
the results of the first-order analyses. As expected, under variate setting which favours HI convergence, we observe
the HD model, we detect correlation for both the BM and weak information leakage on the STM32F215 for both UB-
ASM encodings. Consistently with the information-theoretic SHW-LDR-EOR and UB-SHW-LDRB. As shown in Fig. 1,
analysis, we do not detect correlation for the IPM encoding. PPS leakage seems very low on this board, which may explain
Concerning the SHW case, the first-order analysis does not this result. On the STM32F303, UB-SHW-LDR-EOR and
identify correlation with the encoded value X . Such evidence UB-SHW-LDRB show a tiny peak of information, whose
illustrates the need of, at least, a second-order statistical significance is uncertain. By contrast, peaks of information
moment to correlate with X . are clearly visible for UB-SHW-MOV-EOR and the UB-HD
From the information-theoretic analyses, we observed on both boards. As expected, the BM encoding leaks the most
that ASM and IPM encodings tend to better mitigate information, while leakage is hardly visible for the IPM for
transition-based and PPS-based leakages. We corroborated the given number of traces.
such analyses by first-order moment analyses, evaluating the For completeness, we also run first-order moment anal-
correlation between the encoded value and the simulated yses on the same traces sets. Specifically, we compute
power-consumption. We observed results consistent with the ρ(HW(X ), Ti4M×90 ) where i ∈ [0, 90). The last two columns
information-theoretic ones. Furthermore, we highlighted how of Fig. 4 report the results.
first-order moments cannot detect any information in the Unexpectedly, we observe a correlation peak for the UB-
presence of PPS-based leakage. SHW-MOV-EOR. As explained in Section V-A, a first-order

84660 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

FIGURE 6. Experiment-based quantification of the transition-based and PPS-based leakages. Each row reports the PCorrl from a different UBench: first
row for UB-SHW-LDRB (Listing 2), second row for UB-SHW-LDR-EOR (Listing 3), third row for UB-SHW-MOV-EOR (Listing 4), fourth row for UB-HD
(Listing 5). The first two columns report the HI metric, whereas the last two report the PCorrl metric. The first and third column reports the results for the
STM32F215 board, whereas the second and fourth one for the STM32F303 board. For each UBench and board, we compute the PCorrl on a 4, 000, 000
power-consumption trace set.

moment cannot detect correlation with an encoded value via into lower-order ones, reducing the security order of the
PPS-based leakage. Still, the peak takes place at the same time encoding (Section III-G). We directly focus on experimental
sample where we verified the presence of PPS-based leakage analyses, as simulation-based ones are extensively provided
(Section III). Hence, we ascribe the observed correlation to a in the original work [12]. Due to its high correlation with
recombination effect that occurs simultaneously with the PPS the PPS-based leakage (Fig. 1), we limit our analysis to the
event. trace set collected with the UB-SHW-LDRB execution on
Up to now, we evaluated the leakage resilience of different the STM32F303. From experimental attempts, we identified
masking encodings against transition-based and PPS-based k = 10% (i.e., 400, 000 traces per sample) as a good
leakages. Concerning transition-based leakages, the results threshold. Fig. 7 provides the correlation curves from the
highlight the better leakage resilience of ASM and IPM BLD-based analyses. This time, we detect correlation peaks
encodings. Concerning the PPS-based ones, although the use for both BM and ASM encodings, confirming the potential
of 4, 000, 000 traces, the HI-based analyses hardly identify exploitability of PPS-based leakage.
any PPS-based information leakage. Nonetheless, a different This section has shown that transition-based and PPS-
approach e.g., use of the BLD preprocessing [12], could based leakages represent a concrete vulnerability in software
better take advantage of the existing information leakage. masking implementations, leaking exploitable information
With this last remark, we employ the BLD preprocess- through simple first-order analyses. Among the selected can-
ing proposed by Moos and Moradi [12]. Their approach didates, the IPM was found to be the least vulnerable, prevent-
takes advantage of the PPS, converting higher-order leakages ing even the exploitation of higher-order leakages by means

VOLUME 11, 2023 84661


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

Fig. 8 reports D(HD(X0 ,X1 ),X ) and D(SHW(X0 ,X1 ),X ) for BM,
ASM and IPM. As the distributions differ, so the marginal
distributions do. It is possible to exploit such difference to
define (statistical-)moment-based leakage models.
For instance, we can associate to each X ’s realisation the
first-order moment of the marginal D(HD(X0 ,X1 ),X =x) :
1 X
HDfo (x) = HD(x0 , x1 ) (12)
|F28 |2
xi ∈F28 ,x=
J
i xi

Nevertheless, such moment-based approach cannot


improve the PCorrl results in the IPM case, as the
FIGURE 7. Evaluation of the BLD approach (Section III-G). We collect
4, 000, 000 power-consumption traces and apply the BLD approach for
D(HD(X0 ,X1 ),X ) is independent of X (and thus, any statistical
k = 10. We compute the PCorrl by means of the HW model. We collect the moment is independent of X ).
traces during the execution of the UB-SHW-LDRB (Listing 2) on the
STM32F303 board.
Concerning the SHW model, which we use to model
the PPS-based leakages, the D(SHW(X0 ,X1 ),X ) ’s first-order
moment is independent of X , for all the three masking
schemes. Thus, we can not straightforwardly employ SHWfo
of the BLD approach. Yet, such approach relies on the HW
(Eq. 13) model.
model’s distribution of the encoded value X , independently of
the targeted masking scheme and leakage source. In reality, 1 X
SHWfo (x) = 2
SHW(x0 , x1 ) (13)
the distribution of HD and SHW model changes with the |F28 |
xi ∈F28 ,x=
J
i xi
masking encoding. In the following section, we take advan-
tage of this observation to break all the evaluated software Yet, we can resort on the BLD preprocessing to make
masked implementations of AES with first-order analysis. D(HD(X0 ,X1 ),X ) ’s mean secret-dependent. We define the
biased version of the SHWfo model:
VI. EXPLOITATION OF LEAKAGE MODEL DISTRIBUTION 1 X
IN IMPROVED CORRELATION ATTACKS SHWfo,k% (x) = SHWk% (x0 , x1 ) (14)
|F28 |2
xi ∈F28 ,x=
J
In the previous section, we have evaluated the resilience of i xi
BM, ASM and IPM first-order encodings, remarking the bet-
where
ter leakage resilience of ASM and IPM ones. This result stems (
from the consistent employment of the HW to model the leak- SHW(x0 , x1 ), if SHW(x0 , x1 ) ∈ Ok% (x)
SHWk% (x0 , x1 ) =
age of the encoded variable X . In general, such model pro- 0, otherwise
vides low discrimination capabilities when targeting recom-
bination effects as transitions. For instance, given a first-order and Ok% (x) contains the k% lowest (or highest) realisation of
IPM encoding of an arbitrary X , HD(X0 , X1 ) ̸ = HW(X ). SHW(X0 , X1 ) when X = x.
The same observation holds for PPS-based leakages. In this
section, we take advantage of the above remark to enhance B. EVALUATION
the practical security investigation of masking encodings. We start with a first evaluation of the HDfo leakage model
We proceed as follow: we first elaborate on the unsuitabil- for transition-based leakages. We target the ASM scheme,
ity of the HW model when targeting transition-based and as for BM we cannot improve the results, and the IPM
PPS-based leakages; we discuss how to exploit the leak- is intrinsically immune to this leakage type. We compute
age model’s distribution to build more efficient ones. Then, ρ(HDfo (X ), Ti4M×90 ), with T4M×90 the trace set collected
we put in practice the developed models, mounting first-order from the STM32F303 board executing UB-HD. Fig. 9
analyses and compare the new security results with the pre- confirms the better suitability of the first-order moment
vious ones. leakage model, as we get a higher PCorrl value with
respect to the HW model. Then, we test the improve-
A. RATIONALE ments concerning the exploitation of PPS-based leakages.
When targeting leakages involving multiple shares, gener- We employ the SHWfo,k% model against each masking
ally the HW model provides low discrimination capabilities. scheme, computing ρ(SHWfo,k% (X ), Ti4M×90 ), with T4M×90
Considering the case of transitions and PPS-based leakages, the trace set collected from the STM32F303 board executing
the HD and SHW distributions are different from the HW’s UB-SHW-LDRB. We experimentally select k = 10%
(Eq. 10, Eq. 11). (i.e., 400, 000 traces per each 0 ≤ i < 90 sample) as it works
well for BM, ASM and IPM. Fig. 10 compares the PCorrl
when employing the HW model and our moment-based leak-
D(HD(X0 ,X1 ),X ) ̸ = D(HW(X ),X ) (10)
age model. The HW allows the detection of correlation peaks
D(SHW(X0 ,X1 ),X ) ̸ = D(HW(X ),X ) (11) in the case of BM and ASM schemes, but none in the IPM

84662 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

FIGURE 8. Distribution of the HD and SHW leakage models. Given X ∈ X 4 , the first row reports D(HD(X ,X ),X ) , whereas the second one reports
2 0 1
D(SHW(X ,X ),X ) , where X0 , X1 ∈ X 4 represent the shares obtained from the application of BM (first column), ASM (second column) or IPM (third
0 1 2
column) to X .

FIGURE 9. Experiment-based comparison of the HDfo and the HW


leakage models. We consider the ASM case. We compute PCorrl on
4, 000, 000 power-consumption traces. We collect traces during the
execution of the UB-HD (Listing 5) on the STM32F215 and the STM32F303
FIGURE 10. Experiment-based comparison of the HW and the SHWfo,k%
boards.
model. We consider the case of the BLD-based PCorrl analyses, for k = 10
(Section III-G). For the SHWfo,k% model, we set k = 10. We compute
case. In contrast, our moment-based model not only improves PCorrl over 4, 000, 000 power-consumption traces. We collect the traces
during the execution of the UB-SHW-LDRB (Listing 2) on the STM32F303
the correlation results for the ASM, but it detects a correlation board.
peak in the IPM case.
Such results corroborate the observations made in
Section V, remarking the better leakage resilience of ASM
and IPM encodings against transition-based and PPS-based of the AES-128 block-cipher: an unprotected version (vanilla
leakages. from now on) and three masked ones, one for each masking
scheme investigated. We have released all our investigated
VII. SIDE-CHANNEL RESILIENCE OF SOFTWARE MASKED implementations (both C and binary codes) as publication
AES-128 artefacts (https://zenodo.org/record/8094516).
With Section V and Section VI we assessed the practical secu- Our security assessment splits in two phases: at first,
rity of different first-order masking encodings. Such analyses we evaluate whether the masked implementations leak
are fundamental get insights on the achievable security of information; the second one assess the resistance of such
masked implementations. Inner-product encoding showed implementation against the exploitation of the (potential)
perfect resistance against transition-based leakage, while leakage. The first phase relies on the TVLA methodol-
Boolean and arithmetic encodings were more vulnerable. ogy (Section III-E) to provide an assessment independent
All masking encodings showed vulnerability to PPS-based on the class of attacker. The second phase relies on the
leakage. We question how these findings translate on a full same techniques employed to analyse the masking encodings
implementation. (Section V, Section VI). Specifically, we evaluate the security
This section aims at evaluating the impact of transition- with and without the BLD technique (Section III-G). We start
based and PPS-based leakages on 4 software implementations by exploiting univariate first-order moment leakages, then

VOLUME 11, 2023 84663


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

we exploit univariate higher-order moment leakages with experimental setup provides us with correctly-aligned side-
filtering. This last phase is particularly important to assess the channel traces. Hence, we do not require any re-alignment of
practical security against PPS, since its first-order moment the side-channel traces.
leakages can’t be directly exploited (Section V-A). For the purpose of our analyses (e.g., leakage resilience
against physical effects), we have to guarantee the correct
application of the masking scheme. Each of the selected
A. EXPERIMENTAL SETUP scheme considers a value-based leakage model. Thus,
The vanilla implementation follows the FIPS-PUB-197 spec- we verify that no value-based leakage can be detected from
ification [17], except for the key-scheduling: the implemen- each implementation. To this end, we run TVLA analyses on
tation generates the next round key between the SubByte and simulated power-traces collected during the execution of each
the MixColumns steps. implementation on a ISA-level simulator of the ARMv7 pro-
Each first-order masked implementation follows by the file. Specifically, we simulate the power consumption stem-
manual application of the related masking scheme to the ming from the usage of the register file and memory requests
vanilla implementation. In particular, the BM and IPM via load and store instructions. For all the implementations,
ones follow the specification of Rivain et al. [36] and we accept the null hypothesis (i.e., the implementation does
Balasch et al. [15], respectively. For the IPM version, not leak in the value-based model), proving the correct appli-
we resort to L = (1, 170) ∈ F228 , the same we employed cation of the three considered masking schemes.
for the experiment-based analyses (Section V, Section VI).
We implement the finite field multiplication using log/exp B. INFORMATION LEAKAGE EVALUATION
tables [37]. As a first step in the leakage resilience assessment of
Concerning the ASM implementation, an inherent diffi- our AES-128 implementations, we proceed with the TVLA
culty is the masking of the field addition (i.e., the eXclusive- methodology. Precisely, we analyse the full first round of
OR, XOR). Indeed, the XOR is non-linear with respect to each implementation, except for the ASM implementation:
the arithmetic-sum operation. We mask the XOR operation as pointed out in Section VII-A, the MixColumns step counts
by means of a masked look-up table. A straightforward tab- for the largest part of the execution time. To reduce the trace
ulation of the operation would require 216 byte of memory. collection time without compromising the validity of our
To reduce the memory consumption, we tabulate the XOR results, we exclude the ASM’s MixColumns from the leakage
on 4 bits, where the concatenation of the least (and most) evaluation. As introduced in Section III-E, the TVLA allows
significant inputs’ nibbles indexes the table. We compute the an evaluator to determine whether an implementation leaks or
XOR between two 8-bit inputs as a double access to such not, independently on the particular attack or leakage model.
table: one to process the least significant nibbles of the inputs, For the vanilla, BM and ASM implementations, we collect
and one to process the most significant ones. We remark 15, 000 power-consumption traces for both fixed and ran-
that, the output carry of the arithmetic-sum potentially leaks dom sets, respectively. Concerning the IPM implementation,
information on the processed values. To prevent such leakage, we observed that it is characterised by a higher leakage
we pre-charge the landing bit of the output-carry with a fresh resilience (Section V, Section VI). To be more confident
random value. in its evaluation, perform the same assessment with 90, 000
In the vanilla implementation, for performance reasons, power-based traces for both the fixed and random trace set.
we tabulate the SBOX and the XTIME functions. In the As explained in Section III-E, the TVLA methodology is
ASM implementation, we implement the same functions by prone to errors of type I and II, where the latter represents
means of masked look-up tables. Concerning the BM and the most problematic ones. To cope with them, for each
IPM implementations, we compute those functions on the fly. implementation, we repeat the TVLA assessment two times,
We resort to the experimental setup introduced in each with a distinct fixed key, and we measure the maximum
Section IV-C (software toolchain and side-channel measure- absolute t-statistic for each sample point of the traces. Fig. 11
ment setup). We develop each implementation in C language, reports the TVLA results for each AES-128 implementation
and compile them with the compiler toolchain and compi- and each target board.
lation options reported in Section IV-C. Table 3 reports the The vanilla, BM and ASM implementations leak infor-
mean execution time, number of PRNG calls, and memory mation along the whole first round. As we verified that the
impact of each AES-128 implementation. We report such masking countermeasure is correctly applied at binary level,
parameters for both STM32F215 and STM32F303. Each and as first-order statistical moments cannot detect leakage
masked implementation draws fresh randomness from the from PPS, we ascribe such leakage to recombination effects
xoroshiro64** 1.0 PRNG [38]. The execution time from (e.g., transitions).
Table 3 includes time spent in the PRNG. We remark the long We remark that the ASM implementation presents fewer
execution time (500, 000 clock cycles on the STM32F215) leaking samples than the BM. The algebraic structure of the
for the ASM implementation. We ascribe it to the ASM encoding potentially contribute to such observation.
MixColumns step, which performs several accesses to the Unexpectedly, the leakage assessment on the IPM imple-
table-based XOR implementation. We remark that our mentations reveal several leakage points along the full first

84664 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

TABLE 3. Mean execution time (in clock cycles), number of calls to the PRNG, and segment size (in bytes) of each AES-128 implementation.

FIGURE 11. TVLA results on the 4 AES-128 implementations. In red, we report the maximum t-statistic between two t-tests. In blue, the t-statistic
threshold (±4.5) for the null hypothesis rejection. We execute each t-test by using a distinct fixed key. The first and third columns refer to the
STM32F215 board, whereas the second and fourth ones to the STM32F303 board. Each plot refers to a 15, 000-vs-15, 000 t-test, except for the IPM
AES-128, which refers to a 90, 000-vs-90, 000 t-test.

round. We found out that the source of such leakages stem To this end, we rely on standard, BLD-based
from recombination effects that impact the log/exp-based (Section III-G) and moment-based-model (Section VI) CPA
field multiplication. Specifically, we verified the statisti- attacks. For each implementation and target board, we mea-
cal dependence between HD(log3 (X0 ), log3 (X1 )) and the sure 1, 000, 000 power traces.
encoded value x. We conjecture that the non-linear nature The side-channel analysis proceeds as follows. We analyse
of the logarithm function introduces some bit-interaction the usage of the first secret key byte during the SubByte step
j
effect between the share’s bits. Such effect counteracts the of the first round, and we compute ρ(L(X )d , T1M×m ), where
randomness diffusion of the IPM, making transition-based m varies according to the target implementation. Table 4
leakage again exploitable. Yet, we remark that, despite the summarises the leakage models L(·)d employed to attack each
higher number of employed traces, we observe a way lower implementation.
magnitude of the t-statistic with respect to the one of the other For the IPM implementations, we also target the SubByte’s
implementations. input, which comes as result of the field implementation.
We employ the first-order-moment leakage model HDfo,log :
C. INFORMATION LEAKAGE EXPLOITATION
In the previous section, we assessed the leakage resilience of 1 X
HDfo,log (x) = HD(log3 (x0 ), log3 (x1 )) (15)
our AES-128 implementations. We observed results consis- |F28 |2
xi ∈F28
tent to the encoding analyses (Section V, Section VI), except
for the IPM. In fact, we observed unexpected leakage stem- Fig. 7 reports the results of the different CPA attacks, and
ming from the finite field multiplication. Despite the presence Table 5 reports the minimum number of traces required to
of leakage, the TVLA methodology does not provide any clue mount a successful CPA attack. Despite the correct applica-
concerning the exploitability of the leaked information. tion of the masking scheme on the binaries, we exploit only
With this section, we explore the resilience of our soft- 140 and 241, 000 traces to break the BM and ASM imple-
ware masked implementations against information leak- mentations, respectively. Consistently with the result from
age exploitation; specifically, against univariate side-channel Section VI, the HDfo model improves the attack efficiency
attacks. against the ASM implementation, reducing up to ×8.6 times

VOLUME 11, 2023 84665


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

TABLE 4. Summary of the leakage models used for the side-channel


analysis of each AES-128 implementation.

that the design of masking schemes should also consider the


implementation of the employed algorithms (e.g., finite field
multiplication). We remark that the attack on the STM32F215
takes longer to succeed. This may be due to the lower accu-
racy of the HD model for this device and/or the higher noise
affecting the platform (Section IV-D).
We conclude the leakage exploitation analyses with the
BLD-based CPA attacks (Section III-G). We evaluate the
resilience of each implementation according to several k val-
ues. Table 6 reports the rank of the correct key hypothesis with
1, 000, 000 traces, and the minimum traces number to reach
that rank. On the STM32F303, the correct key hypothesis
frequently appears among the best correlated key candidates.
Table 6 reports the number of traces necessary to observe the
correct key byte hypothesis among the 4 best correlated key
candidates. Then, an attacker can brute-force the 416 possible
128-bit keys.
We remark that (1) the choice of the threshold value k is
relevant to mount a successful CPA attack, (2) that low k
values increase the probabilities of a successful side-channel
attack. We ascribe this to the higher noise setting compared to
more controlled context of the encoding analyses (Section V,
Section VI).
Our results emphasise the threat that PPS and recombina-
tion effects represent. Also, we highlight the practical secu-
rity impact of different representations of data in a masked
software implementation (e.g., logarithm of a share). As a
first guideline to mitigate PPS-based leakages exploitation,
developers should avoid packing shares within the same
word (Listing 2). However, such condition is necessary, but
not sufficient, as PPS potentially stems from other sources
(Section IV-B).

VIII. DISCUSSION
FIGURE 12. CPA results for the four AES-128 implementations. In grey, In this section, we warn about unanticipated sources of weak-
the wrong key hypotheses, whereas in red the correct one. Fig. 12f, 7g nesses in masked implementations, then we discuss how
and 7h report the PCorrl in Log10 scale. For each implementation,
we employ a different leakage model (Table 4). For the SHWfo,k% model,
parallel-oriented architectures and programming models can
the X-axis reports the number of collected traces (i.e., before trace introduce PPS in software, and we give some principles to
filtering). Each row refers to a different implementation/leakage model prevent the vulnerabilities created by PPS.
combination. First and second columns refer, respectively, to the
STM32F215 and STM32F303 board.
A. ON THE RESILIENCE OF IPM TO TRANSITION-BASED
LEAKAGE
the minimum number of traces to mount a successful CPA In Section V, we have shown that IPM encodings are immune
attack, with respect to a plain use of the HW model. to transition-based leakages, which is consistent with litera-
By targeting the SBOX input, we successfully retrieve ture knowledge. Yet, in Section VII we were able to success-
the target key byte on IPM implementations. This suggests fully attack IPM masked implementations through a leakage

84666 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

TABLE 5. Minimum number of traces to mount a successful CPA attack against the AES-128 implementations. We report failed in case of attack failure
with 1, 000, 000 traces.

TABLE 6. Key ranking of correct key guess when employing the SHWfo,k% against IPM implementations. We report the correct key-guess rank and
related number of traces for k ∈ {0.1%, 0.2%, . . . , 1%, 2%}. We omit the entries for k > 2%, as we did not succeed in the attack. The number of traces
corresponds to the number of collected traces (i.e., not the number of traces actually analysed).

model targeting such leakages. We found the root cause in the implementations. Finally, FPGAs represent an interesting
use of logarithms in the finite field multiplication implemen- case: they can be employed for either the implementation of
tation. Transition-based leakages on logarithm representation hardware implementations, or for the implementation of full
of the encodings induced exploitable leakage. Such gap CPUs [40]. In both cases, the designs might rely on some
underlines the importance of studying the masking resistance parallel features, e.g., [41], potentially introducing the PPS
both theoretically and practically. It suggests that the different vulnerability.
representations of masked encodings used in an implementa-
tion should all be considered for security assessment. C. PREVENTING PPS IN SOFTWARE
PPS emerges whenever the micro-architecture handles
B. PPS AND PARALLEL-ORIENTED ARCHITECTURES related shares in parallel. As discussed, architectures encom-
The PPS threat emerges whenever data processing paral- passing parallel features and certain programming models
lelism can be achieved. From a hardware point-of-view, PPS potentially introduce the PPS threat. As a naïve solution, the
readily extends to any architecture encompassing any kind of programmer should rely on programming techniques which
feature implying data parallelism. In our work, we focused on do not promote data parallelism, and execute the implemen-
simple micro-architectures encompassing instruction pipelin- tation on architecture not endowed with parallel features.
ing, which implies a sort of data parallelism. Gigerl et Yet, such approach would increase the already high cost of
al. show that super-scalar micro-architectures exhibit more a masked implementation, in particular for masked instances
sources of transition-based leakage [3] due to pipeline of order n > 1.
depth and multiple issuing of instructions. In such micro- Instead, we advocate for a more principled approach, based
architectures, data parallelism is exacerbated, and so the on the concept of Non-Completeness. Non-Completeness
possible occurrence of PPS. is a security property defined in the context of Threshold
Instruction Set Extensions (ISE) play an important role in Implementations [42]. Informally, by seeing an n-th order
the introduction of PPS. Miayjan et al. suggest the employ- masked algorithm as a composition of sub-functions, each
ment of SIMD (Single Instruction Multiple Data) ISE to sub-function has to handle no more than n shares. Gaspoz and
provide efficient and secure software masked implementa- Dhooghe extend this property to provide necessary security
tions [39]. The SIMD ISE enables data-level parallel process- properties against micro-architecture-induced recombination
ing, handling multiple data via a single instruction [13]. The effects [43]. In particular, we remark their Horizontal Regis-
explicit data parallelism naturally implies PPS. Such remark ter Non-Completeness as a necessary condition to avoid PPS.
extends also to GPU architectures, designed to intrinsically Such property contrasts certain programming techniques,
support data-level parallelism. Still, we are not aware of any e.g., share-slicing [21], which aim at the efficient implemen-
work concerning their usage to accelerate software masked tation of masked software implementations.

VOLUME 11, 2023 84667


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

Yet, their notion of non-completeness does not take into Grandmaison for its contributions to the verification of
consideration the PPS stemming from the pipeline’s depth first-order security in the value-based leakage model.
(i.e., number of pipeline stages). Indeed, PPS originates also
from related shares manipulated in different pipeline stages. REFERENCES
It is possible to extend the non-completeness property at [1] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi, ‘‘Towards sound approaches
pipeline level, requiring that the pipeline does not process to counteract power-analysis attacks,’’ in Advances in Cryptology—
CRYPTO. Berlin, Germany: Springer, 1999.
more than n shares at a time. Gigerl et al. suggest a stricter [2] B. Marshall, D. Page, and J. Webb, ‘‘MIRACLE: MIcRo-architectural
version of this Pipeline Non-Completeness property, separat- leakage evaluation: A study of micro-architectural power leakage across
ing the processing of related shares according to the pipeline’s many devices,’’ TCHES, vol. 2022, no. 1, pp. 175–220, Nov. 2021.
[3] B. Gigerl, R. Primas, and S. Mangard, ‘‘Secure and efficient soft-
depth and number of instructions that can be executed in ware masking on superscalar pipelined processors,’’ in Advances in
parallel to prevent glitch-based leakage [3]. Cryptology—ASIACRYPT. Cham, Switzerland: Springer, 2021.
Admittedly, register and pipeline non-completeness might [4] T. D. Cnudde, B. Bilgin, B. Gierlichs, V. Nikov, S. Nikova, and V. Rijmen,
‘‘Does coupling affect the security of masked implementations?’’ in Proc.
not be sufficient to prevent PPS. Indeed, the register file, COSADE, 2017, pp. 1–18.
caches and memory, potentially store all the shares of an [5] J. Balasch, B. Gierlichs, V. Grosso, O. Reparaz, and F. Standaert, ‘‘On the
encoding. Static power leakage potentially allows an attacker cost of lazy engineering for masked software implementations,’’ in Proc.
CARDIS, 2014, pp. 64–81.
to observe these shares, enabling successful attacks [44]. The
[6] A. Barenghi and G. Pelosi, ‘‘Side-channel security of superscalar CPUs:
risk implied by static power leakage is still unexplored in the Evaluating the impact of micro-architectural features,’’ in Proc. DAC,
software context. 2018, pp. 1–6.
We conclude this discussion by remarking that the [7] A. Barenghi, L. Breveglieri, N. Izzo, and G. Pelosi, ‘‘Exploring Cortex-M
microarchitectural side channel information leakage,’’ IEEE Access, vol. 9,
IPM scheme (more generally, the family of code- pp. 156507–156527, 2021.
based masking) can amplify the security order naturally [8] A. D. Grandmaison, K. Heydemann, and Q. L. Meunier, ‘‘ARMISTICE:
expected [9], [45], [46]. That is, given a masking of order n, Microarchitectural leakage modeling for masked software formal verifica-
tion,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 41,
according to the particular public vector L, the security order no. 11, pp. 3733–3744, Nov. 2022.
can be higher than n. Although we analysed IPM instantiated [9] Q. Wu, W. Cheng, S. Guilley, F. Zhang, and W. Fu, ‘‘On efficient and secure
with non-optimal codes (i.e., which do not amplify the secu- code-based masking: A pragmatic evaluation,’’ IACR Trans. Cryptograph.
Hardw. Embedded Syst., vol. 2022, pp. 192–222, Jun. 2022.
rity order), the use of optimal codes can be a sound way to [10] A. Beckers, L. Wouters, B. Gierlichs, B. Preneel, and I. Verbauwhede,
better mitigate PPS-based leakage. We leave as an interesting ‘‘Provable secure software masking in the real-world,’’ in Proc. COSADE,
future work the investigation of the practical security guaran- 2022, pp. 215–235.
[11] G. Barthe, F. Dupressoir, S. Faust, B. Grégoire, F. Standaert, and
tees of optimal code-based software masked implementations P. Strub, ‘‘Parallel implementations of masking schemes and the bounded
when register and pipeline non-completeness are satisfied. moment leakage model,’’ in Advances in Cryptology—EUROCRYPT.
Cham, Switzerland: Springer, 2017.
[12] T. Moos and A. Moradi, ‘‘On the easiness of turning higher-order leakages
IX. CONCLUSION into first-order,’’ in Proc. COSADE, 2017, pp. 153–170.
Recent literature has highlighted the CPU micro-architecture [13] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative
as a rich source of recombination effects (e.g., transitions), Approach, 5th ed. San Francisco, CA, USA: Morgan Kaufmann, 2012.
which severely decrease the security of masking. Although [14] L. Goubin and J. Patarin, ‘‘DES and differential power analysis the ‘dupli-
cation’ method,’’ in Proc. Int. Workshop Cryptographic Hardw. Embedded
the pervasiveness of such effects, our work shows that they Syst., 1999, pp. 158–172.
do not represent the only threat to the practical security of [15] J. Balasch, S. Faust, and B. Gierlichs, ‘‘Inner product masking revisited,’’ in
masking in software: the parallel processing of share (PPS), Advances in Cryptology—EUROCRYPT. Berlin, Germany: Springer, 2015.
[16] T. S. Messerges, ‘‘Securing the AES finalists against power analy-
exercised by a CPU micro-architecture, represents a potential sis attacks,’’ in Proc. Int. Workshop Fast Softw. Encryption, 2001,
threat too. Relying on an adaptation of the preprocessing pp. 150–164.
technique proposed by Moos and Moradi [12], we show how [17] (2001). Advanced Encryption Standard (AES). NIST. [Online]. Available:
https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197-upd1.pdf
to exploit PPS-based leakage against first-order instances [18] L. D. Meyer, E. D. Mulder, and M. Tunstall, ‘‘On the effect of the
of Boolean, arithmetic-sum and inner-product masking. Fur- (micro) architecture on the development of side-channel resistant soft-
thermore, despite the fact that some schemes, such as the ware,’’ Cryptol. ePrint Arch., Tech. Paper 2020/1297, 2020.
[19] V. Arora, I. Buhan, G. Perin, and S. Picek, ‘‘A tale of two boards: On the
inner-product masking, provide immunity to transition-based influence of microarchitecture on side-channel leakage,’’ in Proc. CARDIS,
leakage, particular operations can remove such immunity. 2021, pp. 80–96.
Specifically, we show how the employment of the log opera- [20] K. Papagiannopoulos and N. Veshchikov, ‘‘Mind the gap: Towards
secure 1st-order masking in software,’’ in Proc. COSADE, 2017,
tion in the field multiplication algorithm allows the successful pp. 282–297.
exploitation of transition-based leakage against the inner- [21] S. Gao, B. Marshall, D. Page, and E. Oswald, ‘‘Share-slicing: Friend
product masking. or foe?’’ IACR Trans. Cryptograph. Hardw. Embedded Syst., vol. 2019,
pp. 152–174, Nov. 2019.
[22] R. Beaulieu, D. Shors, J. Smith, S. Treatman-Clark, B. Weeks, and
ACKNOWLEDGMENT L. Wingers, ‘‘The SIMON and SPECK families of lightweight block
The authors thank Arnaud de Grandmaison and Emanuele ciphers,’’ Cryptol. ePrint Arch., Tech. Paper 2013/404, 2013.
Valea for their helpful comments and many fruitful [23] J. Bos, L. Ducas, E. Kiltz, T. Lepoint, V. Lyubashevsky, J. M. Schanck,
P. Schwabe, G. Seiler, and D. Stehle, ‘‘CRYSTALS-kyber: A CCA-secure
discussions, Romain Frappier for its contributions to the module-lattice-based KEM,’’ in Proc. IEEE Eur. Symp. Secur. Privacy,
implementations of masked AES software, and Arnaud de Apr. 2018, pp. 353–367.

84668 VOLUME 11, 2023


L. Casalino et al.: Tale of Resilience: On the Practical Security of Masked Software Implementations

[24] Y. Ishai, A. Sahai, and D. A. Wagner, ‘‘Private circuits: Securing hardware LORENZO CASALINO received the master’s
against probing attacks,’’ in Advances in Cryptology—CRYPTO. Berlin, degree in computer science and engineering from
Germany: Springer, 2003. Politecnico di Milano, Italy, in 2020. He is
[25] S. Mangard, E. Oswald, and T. Popp, Power Analysis Attacks: Revealing currently pursuing the Ph.D. degree with the
the Secrets of Smart Cards. Berlin, Germany: Springer, 2007. CEA-List, Grenoble, France. His research inter-
[26] T. Schneider and A. Moradi, ‘‘Leakage assessment methodology: A clear ests include side-channel analyses, related micro-
roadmap for side-channel evaluations,’’ in Proc. CHES. Academic, 2015, architecture-aware countermeasures, and their
pp. 495–513. automated application.
[27] S. M. Ross, Introductory Statistics, 3rd ed. 2010.
[28] N. Veyrat-Charvillon and F. Standaert, ‘‘Mutual information analysis:
How, when and why?’’ in Proc. CHES, 2009, pp. 429–443.
[29] A. Kraskov, H. Stögbauer, and P. Grassberger, ‘‘Estimating mutual infor-
mation,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top.,
vol. 69, no. 6, Jun. 2004, Art. no. 066138.
[30] V. Cristiani, M. Lecomte, and P. Maurine, ‘‘Leakage assessment through
neural estimation of the mutual information,’’ in Proc. ACNS, 2020,
pp. 144–162.
[31] O. Bronchain, J. M. Hendrickx, C. Massart, A. Olshevsky, and F. Standaert, NICOLAS BELLEVILLE received the Ph.D. degree
‘‘Leakage certification revisited: Bounding model errors in side-channel from Université Grenoble Alpes, France, in 2019.
security evaluations,’’ in Advances in Cryptology—CRYPTO. 2019. Since 2019, he has been a Researcher with
[32] CW1200 ChipWhisperer-Pro. NewAE. Accessed: Apr. 16, 2023. [Online]. the CEA-List, Grenoble, France. His research
Available: https://rtfm.newae.com/Capture/ChipWhisperer-Pro/
interests include side-channel attacks, their coun-
[33] CW308-STM32F2—VCC Internal Regulator. NewAE. Accessed:
termeasures, and the automated application of
Apr. 16, 2023. [Online]. Available: https://rtfm.newae.com/Targets/
UFO%20Targets/CW308T-STM32F/#vcc-int-supply
countermeasures during compilation.
[34] A. Duc, S. Faust, and F.-X. Standaert, ‘‘Making masking security proofs
concrete (or how to evaluate the security of any leaking device), extended
version,’’ J. Cryptol., vol. 32, no. 4, pp. 1263–1297, Oct. 2019.
[35] P. Laarne. (2022). Polsys/Ennemi: 1.1.1. [Online]. Available:
https://doi.org/10.5281/zenodo.5848134
[36] M. Rivain and E. Prouff, ‘‘Provably secure higher-order masking of AES,’’
in Cryptographic Hardware and Embedded Systems. Berlin, Germany:
Springer, 2010.
[37] D. Goudarzi and M. Rivain, ‘‘How fast can higher-order masking be in soft-
ware?’’ in Advances in Cryptology—EUROCRYPT. Cham, Switzerland: DAMIEN COUROUSSÉ received the Ph.D.
Springer, 2017. degree from Institut National Polytechnique de
[38] D. Blackman and S. Vigna, ‘‘Scrambled linear pseudorandom number Grenoble, in 2008. He has been a Research Engi-
generators,’’ ACM Trans. Math. Softw., vol. 47, no. 4, pp. 1–32, Dec. 2021. neer and a Senior Expert with the CEA-List,
[39] A. Miyajan, Z. Shi, C.-H. Huang, and T. F. Al-Somani, ‘‘Accelerat-
since 2011. His research interests include embed-
ing higher-order masking of AES using composite field and SIMD,’’ in
ded software and its interaction with hardware,
Proc. IEEE Int. Symp. Signal Process. Inf. Technol. (ISSPIT), Dec. 2015,
pp. 575–580.
compilation, and runtime code generation for per-
[40] T. Gokulan, A. Muraleedharan, and K. Varghese, ‘‘Design of a 32-bit, formance and security, with a recent focus on
dual pipeline superscalar RISC-V processor on FPGA,’’ in Proc. 23rd hardware security.
Euromicro Conf. Digit. Syst. Design (DSD), Aug. 2020, pp. 340–343.
[41] J. Vliegen, O. Reparaz, and N. Mentens, ‘‘Maximizing the through-
put of threshold-protected AES-GCM implementations on FPGA,’’ in
Proc. IEEE 2nd Int. Verification Secur. Workshop (IVSW), Jul. 2017,
pp. 140–145.
[42] B. Bilgin, B. Gierlichs, S. Nikova, V. Nikov, and V. Rijmen, ‘‘Higher-order
threshold implementations,’’ in Advances in Cryptology—ASIACRYPT,
vol. 8874. Berlin, Germany: Springer, 2014.
[43] J. Gaspoz and S. Dhooghe, ‘‘Threshold implementations in software: KARINE HEYDEMANN received the Ph.D.
Micro-architectural leakages in algorithms,’’ IACR Trans. Cryptograph. degree in computer science from the
Hardw. Embedded Syst., vol. 2023, pp. 155–179, Mar. 2023.
University of Rennes 1, in 2004. She was an
[44] A. Moradi, ‘‘Side-channel leakage through static power: Should we care
Associate Professor with the LIP6, Sorbonne Uni-
about in practice?’’ in Cryptographic Hardware and Embedded Systems—
versity, from 2006 to 2022. She is currently a
CHES. Berlin, Germany: Springer, 2014.
[45] W. Wang, F. Standaert, Y. Yu, S. Pu, J. Liu, Z. Guo, and D. Gu, ‘‘Inner Senior Expert Architect with Thales DIS. She
product masking for bitslice ciphers and security order amplification for is also an Associate Researcher with the LIP6.
linear leakages,’’ in Proc. CARDIS, 2016, pp. 174–191. Her research interests include hardware micro-
[46] W. Cheng, S. Guilley, C. Carlet, J.-L. Danger, and S. Mesnager, architecture, compilation, code optimization, and
‘‘Information leakages in code-based masking: A unified quantification physical attacks, including modeling of hardware
approach,’’ IACR Trans. Cryptograph. Hardw. Embedded Syst., vol. 2021, fault injection effects, automated code hardening, and robustness analysis.
pp. 465–495, Jul. 2021.

VOLUME 11, 2023 84669

You might also like