04 - Machine Learning for Embedded and Edge AI
04 - Machine Learning for Embedded and Edge AI
2
The basics of learning
3
A ”toy” example
Pass
Pass
NO Pass
Pass
4
Data-processing and applications
P
Data generating
process
Application
Model
of the
system
5
Learning the system model
P
Data generating
process
(x,y) Application
Estimate
a model
6
Supervised Learning: Statistical framework
Regression Classification
7
Statistical Learning: the approach
8
Designing a classifier
Consider a bidimensional problem
Feature 2
Feature 1
Designing a classifier requires identification of the function separating the labeled points
9
Some issues we need to focus on
10
Non linear regression
yi
xi
x
11
Non-linear regression: statistical framework
12
13
Inherent, approximation and estimation risks
13
14
Approximation and estimation risks
Selected Model
Best Reachable Model
best reachable linear model
Model Space
cloud of linear models
Target Space
14
15
Approximation and estimation risks
Optimal Model
Approx.
Error
Selected Model
Best Reachable Model
Estimation
Error
Model Space
Target Space
15
What about Neural Networks?
16
Modelling space and time
http://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Complete_neuron_cell_diagram_en.svg/481px-Complete_neuron_cell_diagram_en.svg.png
17
Neural computation
18
Neural computation
Activation function
• Heaviside
• Sigmoidal
• Linear
19
Multi-layer Neural Networks
xi
20
Why Neural Networks?
21
Universal approximation theorem
22
23
Approximation and estimation risks
Optimal Model
Approx.
Error
Selected Model
Best Reachable Model
Estimation
Error
Model Space
Target Space
23
24
Approximation and estimation risks
Optimal Model
Selected Model
Estimation
Error
NN Model Space
Approx. Error = 0
Target Space
24
How “good” is my good ML solution?
25
Two examples: how good is my good solution?
50 0 0 100% 23 1 0 95.8%
1 1
33.3% 0.0% 0.0% 0.0% 30.7% 1.3% 0.0% 4.2%
0 47 0 100% 0 25 2 92.6%
2 2
0.0% 31.3% 0.0% 0.0% 0.0% 33.3% 2.7% 7.4%
Output Class
Output Class
0 3 50 94.3% 0 1 23 95.8%
3 3
0.0% 2.0% 33.3% 5.7% 0.0% 1.3% 30.7% 4.2%
3
Target Class Target Class
26
27
Assessing the performance
▪ Apparent Error Rate (AER),or resubstitution: The whole set ZN is used both to
infer the model and to estimate its error
▪ Sample Partitioning (SP): SD and SE are obtained by randomly splitting ZN in
two disjoint subsets. SD is used to estimate the model and SE to estimate its
accuracy.
▪ Leaving-One-Out (LOO): SE contains one pattern in Z N, and SD contains the
remaining n − 1 patterns. The procedure is iterated n times by holding out
each pattern in ZN, and the resulting n estimates are averaged.
▪ w-fold Crossvalidation (wCV): ZN is randomly split into w disjoint subsets of
equal size. For each subset the remaining w − 1 subsets are merged to form
SD and the reserved subset is used as SE . The w estimates are averaged.
27
Memory demand of NNs
28
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
Bias Bias
2
Input 1
1 Output
29
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
Bias Bias
w(1)1,1
w(1)1,2 2
Input 1
1 Output
w(1)1,3
3
w(1)1,4
30
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
Bias Bias
b (1) 1
1
b (1) 2
2
Input 1
1 Output
b (1) 3
3
b (1) 4
4
31
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
Bias Bias
1 w(2)1,1
w(2)2,1
2
Input 1 w(2)3,1 1 Output
w(2)4,1
4
[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1]
32
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
Bias Bias
1
b(2)1
2
Input 1
1 Output
[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]
33
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
4 hidden neurons
Bias Bias
b (1) 1
1
b (1) 2 w(2)1,1 b(2)1
w(1)1,1 w(2)2,1
w(1)1,2 2
Input (2) 1 Output
1 w(1)1,3 b (1) 3 w 3,1
3 1 Output
1 input w(1) 1,4 b (1) 4 w(2) 4,1
[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]
1x4 4 4x1 1
34
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
4 hidden neurons
Layer 2
Layer 1
Input Output
1 Output
1 input
[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]
1x4 4 4x1 1
35
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
4 hidden neurons
Layer 2
Layer 1
Input Output
1 Output
1 input
[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]
1x4 4 4x1 1
36
Let’s start with a simple 1-hidden layer FFNN: 1 input, 16 hidden, 1 output
16 hidden neurons
Layer 2
Layer 1
Input Output
…
1 Output
1 input
[w(1)1,1, w(1)1,2 , … ,w(1)1,16] + [b(1)1,1, b(1)1,2 ,…,b(1)1,16] + [w(2)1,1, w(2)2,1 , … ,w(2)16,1] + [b(2)1]
1 x 16 16 16 x 1 1
37
Generalizing the memory demand in multi-input/multi-output NNs
k hidden neurons
Layer 2
Layer 1
Input Output
…
…
h input … j Output
[w(1)1,1, w(1)1,2 , …, w(1)h,k-1 ,w(1)h,k] + [b(1)1,1, … ,b(1)1,k] + [w(2)1,1, w(2)1,2 , …., w(2)k,j-1 ,w(2) k,j]+ [b(2) 1,…, b(2) j]
hxk k kxj j
38
What about multiple hidden layers? Let’s see with 2 hidden layers
39
What about multiple hidden layers? Let’s see a FFNN with 2 layers
k1 hidden k2 hidden
neurons neurons
Layer 2
Layer 1
Layer 3
Input Output
…
…
…
…
j Output
h input
h x k1 k1 k1 x k2 k2 k2 x j j
16 hidden 16 hidden
neurons neurons
Layer 2
Layer 1
Layer 3
Input Output
…
…
…
…
1 Output
1 input
1 x 16 16 16 x 16 16 16 x 1 1
Weights first layer Weights first layer Weights first layer Total amount of
= 32 = 256+16 = 272 = 17 weights = 321
41
An example with Tensorflow
42
An example of neural network training in tensorflow
• A task to solve
• Some (generated) data
• The libraries that we need
imported in your environment
43
Task and data
44
Add some noise
45
Train – Validation – Test split
46
Define the network
47
Define the network - 2
48
A recap of the memory demand of a 1-16-1 NN
16 hidden neurons
Already seen ..
Layer 2
Layer 1
Input Output
…
1 Output
1 input
[w(1)1,1, w(1)1,2 , … ,w(1)1,16] + [b(1)1,1, b(1)1,2 ,…,b(1)1,16] + [w(2)1,1, w(2)2,1 , … ,w(2)16,1] + [b(2)1]
1
1 x 16 16 16 x 1
49
Training the network
50
Evaluating the model – training and validation loss and MAE
51
The model working in practice (on the training set)
52
Define a more complex network
53
A recap of the memory demand of a 1-16-16-1 NN
Layer 2
Layer 1
Layer 3
Input Output
…
…
…
…
1 Output
1 input
1 x 16 16 16 x 16 16 16 x 1 1
Weights first layer Weights first layer Weights first layer Total amount of
= 32 = 256+16 = 272 = 17 weights = 321
54
Evaluating the model – training and validation loss and MAE
55
Test the model on the test set
56
ML is not just Neural Networks
57
Several families of machine learning algorithms
58