0% found this document useful (0 votes)
107 views52 pages

Machine Learning For Side-Channel Analysis: Stjepan Picek TU Delft, The Netherlands

The document discusses using machine learning techniques for side-channel analysis attacks. It introduces side-channel attacks and profiling attacks like template attacks. Machine learning is well-suited for side-channel analysis as supervised learning models can be used to profile devices. Deep learning techniques like multilayer perceptrons are promising for side-channel analysis due to their ability to approximate complex functions given sufficient data, but require large datasets to achieve valid results. Common challenges in applying machine learning to side-channel attacks are also discussed.

Uploaded by

Mircea Petrescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views52 pages

Machine Learning For Side-Channel Analysis: Stjepan Picek TU Delft, The Netherlands

The document discusses using machine learning techniques for side-channel analysis attacks. It introduces side-channel attacks and profiling attacks like template attacks. Machine learning is well-suited for side-channel analysis as supervised learning models can be used to profile devices. Deep learning techniques like multilayer perceptrons are promising for side-channel analysis due to their ability to approximate complex functions given sufficient data, but require large datasets to achieve valid results. Common challenges in applying machine learning to side-channel attacks are also discussed.

Uploaded by

Mircea Petrescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Machine Learning for Side-channel Analysis

Machine Learning for Side-channel Analysis

Stjepan Picek; TU Delft, The Netherlands

Cryptacus, Rennes, September 20, 2018

1 / 44
Machine Learning for Side-channel Analysis

Outline

1 Introduction

2 Side-channel Analysis and Machine Learning

3 The Promises of Deep Learning

4 Common Problems

5 Conclusions

2 / 44
Machine Learning for Side-channel Analysis
Introduction

Outline

1 Introduction

2 Side-channel Analysis and Machine Learning

3 The Promises of Deep Learning

4 Common Problems

5 Conclusions

3 / 44
Machine Learning for Side-channel Analysis
Introduction

Intro to Implementation Attacks and SCA

Implementation attacks
Implementation attacks do not aim at the weaknesses of the
algorithm, but on its implementation.

Side-channel attacks (SCAs) are passive, non-invasive


attacks.
SCAs represent one of the most powerful category of attacks
on crypto devices.
Something that enables you to know something about
something without directly observing that something.

4 / 44
Machine Learning for Side-channel Analysis
Introduction

5 / 44
Machine Learning for Side-channel Analysis
Introduction

Profiled Attacks

Profiled attacks have a prominent place as the most powerful


among side channel attacks.
Within profiling phase the adversary estimates leakage models
for targeted intermediate computations, which are then
exploited to extract secret information in the actual attack
phase.
Template Attack (TA) is the most powerful attack from the
information theoretic point of view.

6 / 44
Machine Learning for Side-channel Analysis
Introduction

Template Attack

Using the copy of device, record a large number of


measurements using different plaintexts and keys. We require
information about every possible subkey value.
Create a template of device’s operation. A template is a set of
probability distributions that describe what the power traces
look like for many different keys.
On device that is to be attacked, record a (small) number of
measurements (called attack traces) using different plaintexts.
Apply the template to the attack traces. For each subkey,
record what value is the most likely to be the correct subkey.

7 / 44
Machine Learning for Side-channel Analysis
Introduction

Profiled Attacks

8 / 44
Machine Learning for Side-channel Analysis
Introduction

Machine Learning as a Side-channel attack

We can observe how profiled scenario in SCA has clear


connections with supervised machine learning.
Consequently, some machine learning (ML) techniques also
belong to the profiled attacks.

9 / 44
Machine Learning for Side-channel Analysis
Introduction

Machine Learning
Machine Learning
Machine Learning (ML) is a subfield of computer science that
evolved from the study of pattern recognition and computational
learning theory in artificial intelligence.

Machine Learning
Field of study that gives computers the ability to learn without
being explicitly programmed

Machine Learning
A computer program is said to learn from experience E with
respect to some task T and some performance measure P, if its
performance on T, as measured with P, improves with experience
E.
10 / 44
Machine Learning for Side-channel Analysis
Introduction

Machine Learning

Algorithms extract information from data.


They also learn a model to discover something about the data
in the future.

11 / 44
Machine Learning for Side-channel Analysis
Introduction

Dangers of Extrapolating

12 / 44
Machine Learning for Side-channel Analysis
Introduction

Machine Learning Types

Supervised learning – available data also include information


how to correctly classify at least a part of data.
Unsupervised learning – input data does not tell the algorithm
what the clusters should be.
Semi-supervised learning – information on how to correctly
classify a small part of data.
Reinforcement learning – take actions based on current
knowledge of the environment and receive feedback in the
form of rewards.

13 / 44
Machine Learning for Side-channel Analysis
Introduction

Machine Learning Setting

14 / 44
Machine Learning for Side-channel Analysis
Side-channel Analysis and Machine Learning

Outline

1 Introduction

2 Side-channel Analysis and Machine Learning

3 The Promises of Deep Learning

4 Common Problems

5 Conclusions

15 / 44
Machine Learning for Side-channel Analysis
Side-channel Analysis and Machine Learning

Side-channel Analysis and Machine Learning

Machine learning techniques also represent an extremely


powerful paradigm in side-channel analysis.
We can use machine learning in SCA for classification,
clustering, feature engineering, preprocessing.

16 / 44
Machine Learning for Side-channel Analysis
Side-channel Analysis and Machine Learning

Supervised Learning

Supervised learning - available data include information how


to correctly classify at least a part of data.
Common tasks are classification and regression.

17 / 44
Machine Learning for Side-channel Analysis
Side-channel Analysis and Machine Learning

SCA with ML in a Usual Way

As as example, we consider AES software implementation.


The most leaking operation is the processing of the S-box
operation.
Hamming weight model - 9 classes or S-box output - 256
classes.
First step is to select machine learning algorithms to use,
preferably algorithms belonging to different machine learning
families.
Conduct a proper algorithm tuning phase.
Divide the data into training and testing data.
Conduct (or not) tuning phase.
After the tuning phase is done, test/verify the algorithm.
18 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Outline

1 Introduction

2 Side-channel Analysis and Machine Learning

3 The Promises of Deep Learning

4 Common Problems

5 Conclusions

19 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Deep Learning
Stacked neural networks, i.e., networks consisting of multiple
layers.
Layers are made of nodes.

20 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Multilayer Perceptron
One input layer, one output layer, at least one hidden layer.

21 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Universal Approximation Theorem

A feed-forward network with a single hidden layer containing a


finite number of neurons can approximate continuous
functions on compact subsets of Rn .
Given enough hidden units and enough data, multilayer
perceptrons can approximate virtually any function to any
desired accuracy.
Valid results if and only if there is a sufficiently large number
of training data in the series.

22 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

No Free Lunch Theorem

It proves that there exists no single model that works best for
every problem.
To find the best model for a certain problem, numerous
algorithms and parameter combinations should be tested.
Not even then we can be sure that we found the best model,
but at least we should be able to estimate the possible
trade-offs between the speed, accuracy, and complexity of the
obtained models.

23 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Machine Learning Taxonomy

Figure: Taken from trymachinelearning.com

24 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Deep Learning

By adding more hidden layers in multilayer perceptron, we


arrive to deep learning.
Some definitions say everything more than one hidden layer is
deep learning.
A field existing for a number of years but one that gained a
lot of attention in the last decade.
Sets of algorithms that attempt to model high-level
abstractions in data by using model architectures with
multiple processing layers, composed of a sequence of scalar
products and non-linear transformations.
In many tasks, deep learning is actually not necessary since
machine learning performs well.

25 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Deep Learning

In some tasks deep learning outperforms machine learning as


well as human experts.
Deep learning algorithms scale with data while shallow
learning converges.
Shallow learning refers to machine learning methods that
plateau at a certain level of performance when you add more
examples and training data to the network.
Often, machine learning works from engineered features while
deep learning works from raw features.

26 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Deep Learning and Side-channel Analysis

Can deep learning solve all problems in SCA?


Convolutional Neural Networks are the best (for now).
Deep learning in SCA is not really deep (but getting deeper).
When to use deep learning and when other machine learning
techniques?
Even when deep learning works better than other machine
learning techniques, we do not really know why.

27 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Deep Learning and Side-channel Analysis

Can deep learning solve all problems in SCA?


Convolutional Neural Networks are the best (for now).
Deep learning in SCA is not really deep (but getting deeper).
When to use deep learning and when other machine learning
techniques?
Even when deep learning works better than other machine
learning techniques, we do not really know why.

27 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

Deep Learning and Side-channel Analysis

16

(4)
32

(4)
64 9
(4)
128 (4)
conv4 pool4 flatten
pool3 out
conv3
pool2

conv2
pool1

conv1

input
28 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

More Complex Architectures

29 / 44
Machine Learning for Side-channel Analysis
The Promises of Deep Learning

More Complex Architectures

30 / 44
Machine Learning for Side-channel Analysis
Common Problems

Outline

1 Introduction

2 Side-channel Analysis and Machine Learning

3 The Promises of Deep Learning

4 Common Problems

5 Conclusions

31 / 44
Machine Learning for Side-channel Analysis
Common Problems

Common Problems

Everything is empirical. So, what does knowledge about one


scenario tell us about other scenarios?
Datasets. Only a few publicly available, not most
representative examples. Issues with data selection.
Only a few machine learning techniques are really tested.
Most of papers concentrates on classification part but neglects
other parts of the process.
Results that are published are not reproducible.
Not clear connection between machine learning and
side-channel analysis metrics.

32 / 44
Machine Learning for Side-channel Analysis
Common Problems

Common Problems

Everything is empirical. So, what does knowledge about one


scenario tell us about other scenarios?
Datasets. Only a few publicly available, not most
representative examples. Issues with data selection.
Only a few machine learning techniques are really tested.
Most of papers concentrates on classification part but neglects
other parts of the process.
Results that are published are not reproducible.
Not clear connection between machine learning and
side-channel analysis metrics.

32 / 44
Machine Learning for Side-channel Analysis
Common Problems

Common Problems

Everything is empirical. So, what does knowledge about one


scenario tell us about other scenarios?
Datasets. Only a few publicly available, not most
representative examples. Issues with data selection.
Only a few machine learning techniques are really tested.
Most of papers concentrates on classification part but neglects
other parts of the process.
Results that are published are not reproducible.
Not clear connection between machine learning and
side-channel analysis metrics.

32 / 44
Machine Learning for Side-channel Analysis
Common Problems

Common Problems

Everything is empirical. So, what does knowledge about one


scenario tell us about other scenarios?
Datasets. Only a few publicly available, not most
representative examples. Issues with data selection.
Only a few machine learning techniques are really tested.
Most of papers concentrates on classification part but neglects
other parts of the process.
Results that are published are not reproducible.
Not clear connection between machine learning and
side-channel analysis metrics.

32 / 44
Machine Learning for Side-channel Analysis
Common Problems

Common Problems

Everything is empirical. So, what does knowledge about one


scenario tell us about other scenarios?
Datasets. Only a few publicly available, not most
representative examples. Issues with data selection.
Only a few machine learning techniques are really tested.
Most of papers concentrates on classification part but neglects
other parts of the process.
Results that are published are not reproducible.
Not clear connection between machine learning and
side-channel analysis metrics.

32 / 44
Machine Learning for Side-channel Analysis
Common Problems

Common Problems

Everything is empirical. So, what does knowledge about one


scenario tell us about other scenarios?
Datasets. Only a few publicly available, not most
representative examples. Issues with data selection.
Only a few machine learning techniques are really tested.
Most of papers concentrates on classification part but neglects
other parts of the process.
Results that are published are not reproducible.
Not clear connection between machine learning and
side-channel analysis metrics.

32 / 44
Machine Learning for Side-channel Analysis
Common Problems

Imbalanced Data

Occurring with the widely used Hamming weight and


Hamming distance models.
If the dataset is difficult to classify, ML techniques will have
troubles classifying data into anything other than most
represented class.
Accuracy as a metric can indicate good performance, but SCA
metrics will show (really) bad results.

Table: Class taxonomy


HW value 0 1 2 3 4 5 6 7 8
Occurrences 1 8 28 56 70 56 28 8 1

33 / 44
Machine Learning for Side-channel Analysis
Common Problems

Imbalanced Data

Predicted Actual
0 1 2 3 4 5 6 7 8
0 0 0 0 99 0 0 0 0 0
0 0 0 0 727 0 0 0 0 1
0 0 0 0 2 767 0 0 0 0 2
0 0 0 0 5 481 0 0 0 0 3
0 0 0 0 6 815 0 0 0 0 4
0 0 0 0 5 422 0 0 0 0 5
0 0 0 0 2 777 0 0 0 0 6
0 0 0 0 809 0 0 0 0 7
0 0 0 0 103 0 0 0 0 8

34 / 44
Machine Learning for Side-channel Analysis
Common Problems

What is There from Theory

When using machine learning, we conduct experimental work.


Can we show any theoretical result?
That is hard but (hopefully) possible.
Yet, there is nothing done.
PAC learning.
Bias–variance decomposition.
Vapnik–Chervonenkis theory.
Coresets.

35 / 44
Machine Learning for Side-channel Analysis
Common Problems

What is There from Theory

When using machine learning, we conduct experimental work.


Can we show any theoretical result?
That is hard but (hopefully) possible.
Yet, there is nothing done.
PAC learning.
Bias–variance decomposition.
Vapnik–Chervonenkis theory.
Coresets.

35 / 44
Machine Learning for Side-channel Analysis
Common Problems

Accuracy as a Measure of Performance

We discuss about accuracy as metric but many models do not


use it in their loss functions since it is not easy to optimize.
The main learning problem can be defined as below:
X
min L(yi , f (Xi ; θ)), (1)
θ
i

where L indicates the objective function we minimize,


yi ∈ {c0 , c1 , · · · , cC } is the ith ground truth label drawn from
the dataset. Xi ∈ Rd is tth trace corresponding to yi where d
is the length of the trace, f is any learnable model that is
parameterized with θ.

36 / 44
Machine Learning for Side-channel Analysis
Common Problems

Accuracy as a Measure of Performance

If the empirical error function is not differentiable, it is not


easy to apply it with a deep neural network since the Back
Propagation algorithm requires a differentiable system.
We use Categorical Cross Entropy for the multi-class
classification problem as an approximation of the accuracy
function.
How do we connect such machine learning metrics with SCA
metrics like success rate and guessing entropy?

37 / 44
Machine Learning for Side-channel Analysis
Common Problems

Curse of Dimensionality

Describes the effects of exponential increase in volume


associated with the increase in the dimensions.
As the dimensionality of the problem increases, the classifier’s
performance increases until the optimal feature subset is
reached.
Further increasing the dimensionality without increasing the
number of training samples results in a decrease in the
classifier performance.

38 / 44
Machine Learning for Side-channel Analysis
Common Problems

Hyper-parameter Tuning

Once the machine learning algorithm is selected and data is


prepared, we need to tune the algorithm.
Almost every machine learning algorithm has some parameters
to tune.
Results will (significantly) differ on the basis of tuning phase.
The more parameters, the bigger search space size.
For simpler algorithms (i.e., less parameters), grid search is
possible.
More complicated techniques involve Bayesian optimization or
evolutionary algorithms, to name a few.

39 / 44
Machine Learning for Side-channel Analysis
Common Problems

Underfitting and Overfitting

Overfitting – if a model is too complex for the problem, then


it can learn the detail and noise in the training data so it
negatively impacts the performance of the model on new data
→ a model that models the training data too good.
Underfitting – if a model is too simple for the problem, then it
cannot generalize to new data.
Simple model → high bias.
Complex model → high variance.

40 / 44
Machine Learning for Side-channel Analysis
Common Problems

Bias and Variance

Figure: Bias and variance (http://scott.fortmann-roe.com/docs/BiasVariance.html)

41 / 44
Machine Learning for Side-channel Analysis
Conclusions

Outline

1 Introduction

2 Side-channel Analysis and Machine Learning

3 The Promises of Deep Learning

4 Common Problems

5 Conclusions

42 / 44
Machine Learning for Side-channel Analysis
Conclusions

Conclusions

SCA is very active research domain.


Large part of the community is actually interested in profiling
attacks and machine learning.
What is interesting and what is realistic is not the same.
We are able to do bits and pieces here and there....but we are
still missing the most important answers (and questions).
We can use a lot of knowledge from other domains, but we
need to recognize what and where.

43 / 44
Machine Learning for Side-channel Analysis
Conclusions

Conclusions

SCA is very active research domain.


Large part of the community is actually interested in profiling
attacks and machine learning.
What is interesting and what is realistic is not the same.
We are able to do bits and pieces here and there....but we are
still missing the most important answers (and questions).
We can use a lot of knowledge from other domains, but we
need to recognize what and where.

43 / 44
Machine Learning for Side-channel Analysis
Conclusions

Questions?

Thanks for your attention! Q?

44 / 44

You might also like