Revealing Secret Key From Low Success Rate Deep Learning-Based Side Channel Attacks
Revealing Secret Key From Low Success Rate Deep Learning-Based Side Channel Attacks
Abstract—Non-profiled deep learning-based side channel at- implementation of cryptographic algorithms. The performance
tacks utilize deep neural networks to extract highly accurate of these attacks is commonly evaluated by using SCA metrics
sensitive information. These attacks pose a significant threat to [1] like rank (or key rank), success rate (SR), or guessing
the security of cryptographic devices. Unlike profiled attacks,
non-profiled attacks do not require prior knowledge of the target entropy (GE). While the first metric requires only one attack,
device, making them more versatile. Deep learning algorithms the others are the result of performing the attacks multiple
enable attackers to learn complex relationships between side times to achieve stable results. Recently, the authors in [2]
channel signals and secret information, enabling the recovery have investigated the performance of DLSCA. Accordingly,
of cryptographic keys, even the common SCA countermeasure DL techniques have multiple sources of randomness (due
deployed. However, non-profiled DLSCA can not reveal the secret
key if the correct key’s metric is not clearly distinguished from to the initialization, regularization, and optimization proce-
the incorrect candidates. This paper discusses the mentioned dure). Consequently, DL algorithms exhibit stochastic behav-
issue of non-profiled DLSCA. Then, a new metric based on the ior, leading to variations in attack results and necessitating the
inversion of exponential rank (IER) is proposed to enhance the execution of multiple independent attacks to obtain reliable
performance of these attacks. The experimental results show that outcomes. However, it is important to note that the afore-
the proposed technique could reveal the secret subkey even if
the partial success rate percentage is only 10% in the ASCAD mentioned research exclusively investigated DLSCA within a
dataset. Furthermore, when utilizing minimally tuned models and profiling context. In addition, DL has been exploited in several
IER metric to execute attacks on the CHES-CTF 2018 data, there other studies in the non-profiled context [3]–[6]. Therefore,
is a substantial increase in the percentage of correctly revealed non-profiled DLSCA encounters similar challenges and limi-
bytes, rising from 62.5% to 93.75%. tations as identified in the profiling scenario.
Index Terms—Deep learning, Side channel attack, Key rank,
metric To investigate deeper into this issue, for a method for a deep
learning based security evaluation of cryptographic algorithm
I. I NTRODUCTION in IoT based smart healthcare systems, this paper focuses
on exploring the negative effect of the randomness sources
Deep learning (DL) has emerged as a powerful tool in vari- in non-profiled DL-based SCA techniques. Especially, we
ous domains, revolutionizing the field of artificial intelligence. introduce a novel metric aiming to yield improved outcomes
With its ability to automatically learn intricate patterns and compared to the conventional metrics used in a non-profiled
extract meaningful representations from vast amounts of data, context. Notably, the proposed metric can be used as a new
deep learning has been applied to numerous applications, rang- distinguisher that is capable of unveiling the secret key. Our
ing from image recognition to natural language processing. main contributions are:
However, like any powerful technology, deep learning can also
be harnessed for malicious purposes. One such application • Our investigation focuses on examining the impact of al-
is the field of side-channel attacks (SCA). Moreover, the gorithmic randomness on the performance of non-profiled
emergence of deep learning has led to a substantial increase in DLSCA.
the effectiveness of these attacks, increasing their potency and • We introduce a novel metric called inversion of exponen-
consequences. Especially, in the smart healthcare systems with tial rank (IER) to serve as a valuable tool for evaluating
sensitive data, the cyber-security issues are very critical. In our and quantifying the stability of non-profiled DL attacks
on-going project “AIPOSH” funded by ASEAN IVO program, in the absence of knowledge about the secret key.
a comprehensive cyber-security platform with be provided • A novel non-profiled DLSCA approach utilizing the IER
with artificial intelligence powered hardware-software oriented metric is presented. The proposed technique demonstrates
solutions for Internet-of-things based smart healthcare sys- exceptional performance in uncovering secret keys even
tems, toward a technology roadmap for ASEAN countries in under limited attack success rates.
the field. The rest of this paper is organized as follows. In Section II,
In literature, a multitude of studies have demonstrated DL-based non-profiled techniques are presented. Section III
the efficiency of DL-based SCA (DLSCA) in breaking the introduces a new SCA metric, and a new distinguisher is
1.16
1.14 Incorrect key guess
Correct key guess
1.16
1.14
Incorrect key guess
Correct key guess 1.14
Incorrect key guess
Correct key guess
B. Sources of randomness in non-profiled deep learning based
1.12
1.10
1.12 1.12 SCA
Loss
Loss
Loss
1.10 1.10
1.08
1.06
1.08 1.08 As indicated in [2], several common sources of random-
1.06 1.06
1.04
1.02
Rank=0 1.04 Rank=1 1.04 Rank=6 ness will have a negative impact on the attack results. The
0 5 10 15 20 25 30
Number of epochs
0 5 10 15 20
Number of epochs
25 30 0 5 10 15
Number of epochs
20 25 30
authors showed that the random sources are connected with
1.16
Incorrect key guess
1.16
Incorrect key guess 1.16 Incorrect key guess the dataset (dataset randomness) and the machine learning
1.14 Correct key guess 1.14 Correct key guess 1.14 Correct key guess
1.12 1.12 1.12
algorithm (algorithmic randomness). However, the source that
Loss
Loss
Loss
1.10
1.08
1.10
1.08
1.10
1.08
impacts the most attack results is algorithmic randomness. The
1.06 1.06 1.06 common sources are the initialization of weights and biases,
1.04 Rank= 14 1.04 1.04 Rank= 72
0 5 10 15 20 25 30 0 5 10
Rank= 36
15 20 25 30 0 5 10 15 20 25 30
regularization techniques, or optimization techniques.
Number of epochs Number of epochs Number of epochs Similar to profiled DLSCA, non-profiled DLSCA also faces
the randomness sources of the DL model. Furthermore, non-
Fig. 1. The results of different DDLA-SHW based attacks using randomness
(Glorot initialization) weights on the same dataset. profiled DLSCA like DDLA and MOR [6] determine the secret
key based on the training metrics, such as loss and accuracy.
Therefore, an attacker could achieve unstable DLSCA results
presented. In the next section, the experimental results are in a non-profiled context. Indeed, we take the attack results
shown to demonstrate the efficiency of the proposed tech- using DDLA-SHW described in [7] as an example for illus-
niques. Finally, we conclude the paper in Section V. trating the assumption. As illustrated in Fig. 1, it is clear to
see that the results are different with the same dataset and
hyper-parameter. The attacker can reveal the correct key by
II. N ON - PROFILED DEEP LEARNING BASED SCA comparing the loss value of all candidates or using other SCA
metrics like key rank (KR) to determine the best candidate.
A. Differential deep learning analysis However, in the case of using an un-optimized model, it is
Differential deep learning analysis (DDLA) is a technique difficult to indicate where is the correct key in the graph, even
that exploits side-channel information obtained only from the if it has the lowest loss value in most epochs.
target device based on deep learning [3]. First, the attacker
III. P ROPOSED A NEW SCA METRIC FOR NON - PROFILED
obtains (N) power traces {t1 , t2 , . . . , tN } in cryptographic
DLSCA
operations with a fixed key k secret. Based on “divide and
conquer” strategy, an attacker attacks against AES by training A. SCA metrics
a network model to obtain outputs from the power traces. The This part briefly introduces the commonly used metrics in
outputs are then compared to the specific intermediate values SCA domain, such as score & rank, success rate (SR) and
(labels) {yk,1 , yk,2 , . . . , yk,N } for each key candidate k. Each guessing entropy (GE).
label yk,i computed from the key candidate k and the plaintext 1) Score & rank: In the case of attacking 8-bit
i using a power consumption model h, such as Hamming Sbox output, the set of key candidates k is limited to
Weight (HW), Hamming Distance (HD) or Least Significant K = {0, 1, . . . , 255}. The attack produces 256 scores
Bit (LSB). The key corresponding to the highest accuracy after [score0 , score1 , . . . , score255 ], where scorei is the attack
the training is specified as the correct key. The procedure of score of the key candidate i. For example, score17 is the
the non-profiled DLSCA is shown in Algorithm 1. attack score achieved by the key candidate k = 17. Finally,
we can produce a vector [rank0 , rank1 , . . . , rank255 ], where
Algorithm 1 Differential Deep Learning Analysis (DDLA) [3] rankk is the rank of key candidate k and the best possible rank
Input: D traces (ti )1≤i≤D , corresponding plaintexts equals 1. For example, if the best score came from k = 17,
(di )1≤i≤D , and K key hypotheses. A network N et and then rank17 = 1. By convention, a rank equal to 1 indicates
number of epochs ne the best key candidate. However, we set the best key candidate
Output: kcr ∈ k rank equal to zero in our scenario.
1: Set training data as X = (ti )1≤i≤D . Taking the attack results using DDLA-SHW as an example,
2: for kj ∈ k do the KR metric is determined for each attack as illustrated in
3: Re-initialize trainable parameters of N et Fig. 1. It is clear that the correct key usually has a lower
4: Compute the series of hypothetical values loss (low KR) value than the incorrect keys, as shown in
Fig. 1.a,b,c. However, detecting the key through the normal
hkj ,i 1≤i≤D
5:
Set training labels as ykj ,i = hkj ,i 1≤i≤D method or even the “early stop” technique is not enough.
2) Success rate: The goal of the attacker is to recover the
6: Perform DL training: DL N et, X, ykj ,i , ne
key, while the evaluator’s goal is to assess how hard it is to find
7: end for
the key, even if she cannot do so. This divergence motivates
8: return key kcr which leads to the best DL training metrics
the “known-key analysis” and its respective metrics, namely
the known-key score, the known-key rank, and the success
rate [1]. With Q traces in the attack phase, an attack outputs a rank called inversion of exponential rank (IER) is proposed
key guessing vector in decreasing order of probability where and calculated as follows:
g1 denotes the most likely and the least likely key candidate. n
The success rate of order o is the average empirical probability 1X 1
IERj = , (α > 1) (4)
that the correct key is located within the first o elements of n i=1 αKRi,j
the key guessing vector g
where 1 ≤ i ≤ n and 0 ≤ j ≤ 255.
i 1, ifrankkc ≤ o By using inversion of exponential rank with α > 1, the
SRo = (1)
0, otherwise IER of the correct key will reach 1 when KRi,ck equals zero
n for all i. In this case, the SR percentage of n attacks equal
1X 100%. Conversely, a significantly small IER value indicates a
SRo = SRoi . (2)
n i=1 higher rank. The values of IER for various ranks, calculated
using different values of α, are presented in Table I. Bold
3) Guessing Entropy: Similar to SR, guessing entropy is
font highlights the values that exceed 0.1. The analysis reveals
a commonly used SCA metric. The guessing entropy metric
that the determination of significant ranks can be influenced
can be used to evaluates the rank of the correct key and
by selecting the value of α. For instance, when considering
directly derived from the average rank of the correct key kc .
α = 1.3, only key guesses within the rank range of 0 to
For a certain experiment i, GE i is equal to log2 (rankkc ) [1].
5 significantly contribute to higher values of IER. The key
Then, GE is determined by the average of all n experiments
guesses with higher ranks have a negligible impact on IER.
as presented in Equation 3.
Consequently, the more consistently the DLSCA attacks yield
n
1X low-ranked keys, the higher the IER value.
GE = GE i . (3)
n i=1 From the analysis above, the proposed metric could be used
in evaluating the performance of different models under the
Although effective, the internal relationships between the same attack conditions. Moreover, the novelty of this metric
correct key and other candidates are not considered by GE. lies in its capability to reveal the secret key without requiring
B. Inversion of Exponential Rank prior knowledge of the correct key. This can be achieved by
The aforementioned metrics are primarily employed for following the steps outlined below:
known-key analysis, indicating the level of difficulty for an • The attack is performed and repeated N times on the
attacker to extract the secret key from a given set of measured same dataset. The ranks KR of all hypothesis keys are
traces. Specifically, when the SR percentage is low due to calculated on each attack.
non-zero key ranks, an evaluator can infer that “By employing • The IERj of the key guess number j is determined
commonly occurs when an unoptimized model is utilized specified as the correct key.
for DLSCA. As demonstrated in the previous section, non- Our proposed distinguisher is completed in Algorithm 2.
profiled DLSCA leverages training metrics to discriminate the
secret key from other potential candidates. Consequently, the Algorithm 2 Proposed non-profiled DLSCA using IER metric
attacker must fine-tune the DL model’s hyperparameters to Input: D traces (ti )1≤i≤D , corresponding plaintexts
maximize the distinction between the correct and incorrect (di )1≤i≤D , and K key hypotheses. A network N et and
keys. Additionally, the DLSCA attack needs to be repeated number of epochs ne
to obtain reliable outcomes. Hence, hyperparameter tuning in Output: kcr ∈ k
DL is a time-consuming and costly process. Consequently, the 1: Set training data as X = (ti )1≤i≤D .
ability to reveal the secret key from low success rate attacks 2: for i ∈ interation do
using minimally tuned models is vital in SCA evaluations. 3: for kj ∈ k do
A notable characteristic of non-profiled DLSCA is the 4: Re-initialize trainable parameters of N et
consistency and lower values of training metrics (loss, ac- 5: Compute the series of hypothetical values
curacy) associated with the secret key compared to other hkj ,i 1≤i≤D
key hypotheses. When an optimized model is employed, the 6:
Set training labels as ykj ,i = hkj ,i 1≤i≤D
correct key kck has the lowest loss (or the highest accuracy).
7: DL N et, X, ykj ,i , ne
Otherwise, the loss (accuracy) of key kck is not the lowest
8: Calculate the key rank KRi,j
(highest). Nevertheless, it still demonstrates lower loss or
9: end for
higher accuracy compared to numerous other key hypotheses.
10: end for
Importantly, these metrics are consistently achieved across
11: Calculate the IER for all key guesses using (4)
most attacks. Hence, we can assess the consistency of a
12: return key kcr which leads to the highest IER
hypothesis key across n attacks to distinguish it from other
keys. To investigate our hypothesis, a metric based on key
TABLE I 0.30 0.30
IER VALUES OF KEY RANKS USING DIFFERENT α.
0.25 Correct subkey=224
0.25
IER by Equation 4 0.20 0.20 Correct subkey=224
Rank (KR)
α=1.01 α=1.05 α=1.1 α=1.3 α=1.5 α=1.7 α=1.9
IER
IER
0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.15 0.15
1 0.99 0.95 0.91 0.77 0.67 0.59 0.53 0.10 0.10
2 0.98 0.91 0.83 0.59 0.44 0.35 0.28
0 50100150200250
0.05 0.05
3 0.97 0.86 0.75 0.46 0.30 0.20 0.15
4 0.96 0.82 0.68 0.35 0.20 0.12 0.08 0.00 0.00
5 0.95 0.78 0.62 0.27 0.13 0.07 0.04 0 50 100 150 200 250
Key guesses Key guesses
10 0.91 0.61 0.39 0.07 0.02 0.00 0.00 a) b)
20 0.82 0.38 0.15 0.01 0.00 0.00 0.00
40 0.67 0.14 0.02 0.00 0.00 0.00 0.00 Fig. 2. Attack results using DDLA-SHW and IER. a) 30 epochs; b) 20 epochs.
80 0.45 0.02 0.00 0.00 0.00 0.00 0.00
160 0.20 0.00 0.00 0.00 0.00 0.00 0.00
Byte 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Value 23 92 242 153 122 133 131 65 60 119 223 172 126 108 89 216
MSE
MSE
MSE
76
76 76
Byte 74
74 74
Attack No. of epochs Results
3 72
72 72
MSE
MSE
76
MSE
76
76
74
74
74
72
72
72 70
70
apply Algorithm 2 on all bytes of the secret key. The output of 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Number of epochs Number of epochs Number of epochs
four bytes that have less than 40% SR are plotted in Fig. 3 and
Fig. 4. It is quite interesting when 12 correct bytes are taken. Fig. 3. Mean square errors (MSE) of the guessed subkey having SR
In which two bytes (8th and 12th ) are revealed from the low percentage lower than 40%.
SR results (36.67% for both). In addition, the other three bytes
(2th ,3th , and 15th ) are potentially distinguished since they are 0.4
Byte 2 Byte 3
0.6
Byte 8
Final guess 0.8 Final guess 191 Final guess
quite higher than other candidates. These results highlight the 0.3
89
0.7 0.5 The correct
0.6 key=65
improved performance of the proposed distinguisher, utilizing The correct
0.5
The correct 0.4
IER
IER
IER
0.2 key=92 0.4 key=242 0.3
the IER metric, in enhancing the attack’s effectiveness. 0.3 0.2
0.1 0.2
In the next step, we decided to perform further experiments 0.1
0.1
0.0 0.0 0.0
using MOR and MOR combined IER. However, one of the 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
Key guess Key guess Key guess
hyper-parameters of MOR architecture is modified to improve Byte 12 Byte 15 Byte 16
the attack’s performance. Our choice is based on the impact 0.7
0.6
Final guess
0.8
0.7 103 Final guess 0.6 113 Final guess
IER
IER
[6]. Therefore, the number of epochs is chosen as 20 in this key=172 0.4 key=89
0.3 0.3
0.3
experiment. It is worth noting that the last hyper-parameters 0.2 0.2
0.2
0.1 0.1 0.1
are kept as the original MOR architecture. The SR of the 0.0 0.0 0.0
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
attacks on 6 bytes with 20 epochs is displayed in the third Key guess Key guess Key guess
row of Table IV. As expected, the SR of these bytes goes
up significantly. However, the SR of byte 2th is still low Fig. 4. The attack results of MOR attack using IER metric on the bytes
having low SR percentage (30 attacks, 15 epochs/attack).
(approximately 20%), and the SR of byte 16th does not change
(0%). Next, the IER distinguisher is applied to the outcomes
as illustrated in Fig. 5, and three correct bytes are detected (the vice, making them highly versatile. Deep learning algorithms
circled point at the maximum IER value). Consequently, these empower attackers to uncover complex relationships between
attacks significantly increase the number of correctly revealed side-channel signals and secret information, bypassing typical
bytes (approximately 93.75%). Notably, the IER metric proves side-channel analysis (SCA) countermeasures. However, non-
effective in revealing the secret key even in scenarios with low profiled DLSCA encounters challenges when the metric of the
success rates, similar to its performance observed with the correct key is not distinguishable from incorrect ones. This
ASCAD dataset. These results provide clear evidence of the paper has addressed this issue and proposed a novel metric,
efficiency of the IER distinguisher in enhancing the probability the inversion of exponential rank (IER), to enhance the perfor-
of non-profiled DLSCA attacks in revealing the secret key. mance of non-profiled DLSCA attacks. Experimental results
demonstrate the effectiveness of the proposed technique, even
V. C ONCLUSION
in scenarios where the partial success rate percentage is as low
In conclusion, non-profiled DLSCA leveraging deep neural as 10% using the ASCAD dataset. Moreover, when applied to
networks have emerged as a formidable threat to the security the CHES-CTF 2018 data, the number of correctly revealed
of cryptographic devices, enabling accurate extraction of sensi- bytes significantly increases by 93.75% from the initial 62.5%.
tive information. These attacks are advantageous over profiled In our future work, we will further explore the proposed metric
attacks as they do not require prior knowledge of the target de- by combining it with other deep learning techniques, such as
TABLE IV
ATTACK R ESULTS OF MOR AND MOR C OMBINED IER ON THE CHES-CTF2018 DATASET.
Byte
Attack No. of epochs Results
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
MOR [6] SR (%) 96.67 3.33 26.67 93.33 100 60 86.67 36.67 73.33 60 70 36.67 70 73.33 10 0
15
MOR+IER (α = 1.3) ✓ ✗ ✗ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗
MOR [6] SR (%) - 20 53.33 - - - - 90 - - - 60 - - 60 0
20
MOR+IER (α = 1.3) ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗
✓: Successful revealing secret key
Byte 3 Byte 15 [5] V.-P. Hoang, N.-T. Do, and V. S. Doan, “Efficient non-profiled side
0.8
0.8 Final guess Final guess channel attack using multi-output classification neural network,” IEEE
0.7
0.7 Embedded Systems Letters, pp. 1–1, 2022.
0.6 The correct
0.6
The correct [6] N.-T. Do, V.-P. Hoang, and V. S. Doan, “A novel non-profiled side
0.5 0.5 key=89
channel attack based on multi-output regression neural network,” Journal
IER
IER
0.4
key=242 0.4
0.3
of Cryptographic Engineering, mar 2023.
0.3
0.2
[7] N.-T. Do, V.-P. Hoang, V. S. Doan, and C.-K. Pham, “On the performance
0.2
0.1 0.1
of non-profiled side channel attacks based on deep learning techniques,”
0.0 0.0
IET Information Security, vol. 17, no. 3, pp. 377–393, dec 2022.
0 50 100 150 200 250 0 50 100 150 200 250 [8] R. Benadjila, E. Prouff, R. Strullu, E. Cagli, and C. Dumas, “Deep
Key guess Key guess learning for side-channel analysis and introduction to ASCAD database,”
Byte 16 Byte 2 Journal of Cryptographic Engineering, vol. 10, no. 2, pp. 163–188, nov
0.5 2019.
Final guess 131 Final guess
0.3 [9] A. Gohr, S. Jacob, and W. Schindler, “Ches 2018 side channel contest
0.4 The correct The correct ctf - solution of the aes challenges,” Cryptology ePrint Archive, Paper
0.3 key=216 0.2
key=92 2019/094, 2019, https://eprint.iacr.org/2019/094. [Online]. Available:
IER
IER
https://eprint.iacr.org/2019/094
0.2
0.1
0.1
0.0 0.0
0 50 100 150 200 250 0 50 100 150 200 250
Key guess Key guess
Fig. 5. The attack results of MOR attack using IER metric on the bytes
having low SR percentage (30 attacks, 20 epochs/attack).