0% found this document useful (0 votes)
0 views

04 - Machine Learning for Embedded and Edge AI

The document outlines a lecture on Machine Learning for Embedded and Edge AI, covering topics such as technological platforms, algorithms, and the role of cloud computing in AI. It discusses various machine learning techniques, including supervised learning, neural networks, and performance assessment methods. Additionally, it highlights the importance of understanding approximation and estimation risks in model training and evaluation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

04 - Machine Learning for Embedded and Edge AI

The document outlines a lecture on Machine Learning for Embedded and Edge AI, covering topics such as technological platforms, algorithms, and the role of cloud computing in AI. It discusses various machine learning techniques, including supervised learning, neural networks, and performance assessment methods. Additionally, it highlights the importance of understanding approximation and estimation risks in model training and evaluation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Technologies for Artificial Intelligence

Prof. Manuel Roveri – [email protected]

Lecture 4 – Machine Learning for Embedded and Edge AI


Course Topics

1. Introduction to technological platforms for AI


2. Embedded and Edge AI
a. The technology
b. The Algorithms
c. Machine Learning for Embedded and Edge AI
d. Deep Learning for Embedded and Edge AI
3. Cloud computing and AI
a. Cloud computing and the ”as-a-service” approach
b. Machine and Deep Learning as a service
c. Time-series: analysis and prediction
d. Generative AI

2
The basics of learning

3
A ”toy” example

??? Physical model ? Data-driven might be


A very tough I did not completed a good solution
classification my PhD in Physics (brute-force as well)
problem yet

Pass
Pass

NO Pass
Pass

What is the learning goal here? NO Pass

4
Data-processing and applications

P
Data generating
process

Application
Model
of the
system

5
Learning the system model

P
Data generating
process

(x,y) Application

Estimate
a model

6
Supervised Learning: Statistical framework
Regression Classification

7
Statistical Learning: the approach

The training set

8
Designing a classifier
Consider a bidimensional problem

Feature 2

Feature 1

Designing a classifier requires identification of the function separating the labeled points

9
Some issues we need to focus on

• Linear vs. non linear


• Many points versus the available
points
• Several tecniques are available
to design the classifier (KNN,
feedforward NN, SVM…)

10
Non linear regression

yi

xi
x

Given a set of n noise affected couples (xi,yi) we


wish to reconstruct the unknown function

11
Non-linear regression: statistical framework

The time invariant process generating the data

The goal of learning is


to build the simplest
provides, given input output instance xi
approximating model
Process generating the
data able to explain past
data ZN and future
We collect a set of couples (training set) instances provided by
yi the data generating
process.

And wish to model unknown with


parameterized family of models

12
13
Inherent, approximation and estimation risks

estimation approximation inherent


risk risk risk

• The inherent risk depends only on the structure of the


learning problem and, for this reason, can be reduced only
by improving the problem itself
• The approximation risk depends on how close the model
family (also named hypothesis space) is to the process
generating the data
• The estimation risk depends on the ability of the learning
algorithm to select a parameter vector close to

13
14
Approximation and estimation risks

the model with zero lost


Optimal Model

Selected Model
Best Reachable Model
best reachable linear model

Model Space
cloud of linear models

Target Space

14
15
Approximation and estimation risks

Optimal Model
Approx.
Error
Selected Model
Best Reachable Model
Estimation
Error

Model Space

Target Space

15
What about Neural Networks?

16
Modelling space and time

http://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Complete_neuron_cell_diagram_en.svg/481px-Complete_neuron_cell_diagram_en.svg.png

The model encompasses


the concept of space,
time and status of the
neuron

17
Neural computation

The scalar product:


evaluate the affinity of values

18
Neural computation

Activation function
• Heaviside
• Sigmoidal
• Linear

19
Multi-layer Neural Networks

xi

20
Why Neural Networks?

21
Universal approximation theorem

A feedforward network with a single hidden layer containing a


finite number of neurons approximates any continuous function
defined on compact subsets

K. Hornik, "Approximation Capabilities of Multilayer Feedforward Networks",


Neural Networks, No.4 Vol. 2, 251–257, 1991

22
23
Approximation and estimation risks

Optimal Model
Approx.
Error
Selected Model
Best Reachable Model
Estimation
Error

Model Space

Target Space

23
24
Approximation and estimation risks

Optimal Model

Selected Model

Estimation
Error

NN Model Space
Approx. Error = 0
Target Space

24
How “good” is my good ML solution?

25
Two examples: how good is my good solution?

Confusion Matrix Confusion Matrix

50 0 0 100% 23 1 0 95.8%
1 1
33.3% 0.0% 0.0% 0.0% 30.7% 1.3% 0.0% 4.2%

0 47 0 100% 0 25 2 92.6%
2 2
0.0% 31.3% 0.0% 0.0% 0.0% 33.3% 2.7% 7.4%
Output Class

Output Class
0 3 50 94.3% 0 1 23 95.8%
3 3
0.0% 2.0% 33.3% 5.7% 0.0% 1.3% 30.7% 4.2%

100% 94.0% 100% 98.0% 100% 92.6% 92.0% 94.7%


0.0% 6.0% 0.0% 2.0% 0.0% 7.4% 8.0% 5.3%
1

3
Target Class Target Class

Apparent Error Rate Sample Partitioning

26
27
Assessing the performance
▪ Apparent Error Rate (AER),or resubstitution: The whole set ZN is used both to
infer the model and to estimate its error
▪ Sample Partitioning (SP): SD and SE are obtained by randomly splitting ZN in
two disjoint subsets. SD is used to estimate the model and SE to estimate its
accuracy.
▪ Leaving-One-Out (LOO): SE contains one pattern in Z N, and SD contains the
remaining n − 1 patterns. The procedure is iterated n times by holding out
each pattern in ZN, and the resulting n estimates are averaged.
▪ w-fold Crossvalidation (wCV): ZN is randomly split into w disjoint subsets of
equal size. For each subset the remaining w − 1 subsets are merged to form
SD and the reserved subset is used as SE . The w estimates are averaged.

27
Memory demand of NNs

28
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output

Bias Bias

2
Input 1
1 Output

29
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output

Bias Bias

w(1)1,1
w(1)1,2 2
Input 1
1 Output
w(1)1,3
3
w(1)1,4

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4]

30
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output

Bias Bias
b (1) 1
1
b (1) 2
2
Input 1
1 Output
b (1) 3
3
b (1) 4
4

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1) 1, b(1) 2 , b(1) 3 ,b(1) 4]

31
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output

Bias Bias

1 w(2)1,1
w(2)2,1
2
Input 1 w(2)3,1 1 Output

w(2)4,1
4

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1]

32
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output

Bias Bias

1
b(2)1
2
Input 1
1 Output

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]

33
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
4 hidden neurons

Bias Bias
b (1) 1
1
b (1) 2 w(2)1,1 b(2)1
w(1)1,1 w(2)2,1
w(1)1,2 2
Input (2) 1 Output
1 w(1)1,3 b (1) 3 w 3,1

3 1 Output
1 input w(1) 1,4 b (1) 4 w(2) 4,1

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]

1x4 4 4x1 1

Total amount of weights = 13

34
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
4 hidden neurons

Layer 2
Layer 1
Input Output

1 Output
1 input

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]

1x4 4 4x1 1

Total amount of weights = 13

35
Let’s start with a simple 1-hidden layer FFNN: 1 input, 4 hidden, 1 output
4 hidden neurons

Layer 2
Layer 1
Input Output

1 Output
1 input

[w(1)1,1, w(1)1,2 , w(1)1,3 ,w(1)1,4] + [b(1)1,1, b(1)1,2 , b(1)1,3 ,b(1)1,4] + [w(2)1,1, w(2)2,1 , w(2)3,1 ,w(2)4,1] + [b(2)1]

1x4 4 4x1 1

Weights first layer = 8 Weights second layer = 5

36
Let’s start with a simple 1-hidden layer FFNN: 1 input, 16 hidden, 1 output
16 hidden neurons

Layer 2
Layer 1
Input Output


1 Output
1 input

[w(1)1,1, w(1)1,2 , … ,w(1)1,16] + [b(1)1,1, b(1)1,2 ,…,b(1)1,16] + [w(2)1,1, w(2)2,1 , … ,w(2)16,1] + [b(2)1]

1 x 16 16 16 x 1 1

Weights first layer = 32 Weights second layer = 17 Total amount of weights = 49

37
Generalizing the memory demand in multi-input/multi-output NNs
k hidden neurons

Layer 2
Layer 1
Input Output



h input … j Output

[w(1)1,1, w(1)1,2 , …, w(1)h,k-1 ,w(1)h,k] + [b(1)1,1, … ,b(1)1,k] + [w(2)1,1, w(2)1,2 , …., w(2)k,j-1 ,w(2) k,j]+ [b(2) 1,…, b(2) j]

hxk k kxj j

Weights first layer = h x k + k Weights second layer = k x j + j

38
What about multiple hidden layers? Let’s see with 2 hidden layers

39
What about multiple hidden layers? Let’s see a FFNN with 2 layers

k1 hidden k2 hidden
neurons neurons

Layer 2
Layer 1

Layer 3
Input Output



j Output
h input

h x k1 k1 k1 x k2 k2 k2 x j j

Weights first layer Weights first layer Weights first layer


= h x k1 + k1 = k1 x k2 + k2 = k2 x j + j
40
An example of a 1-16-16-1 NN: the memory demand

16 hidden 16 hidden
neurons neurons

Layer 2
Layer 1

Layer 3
Input Output



1 Output
1 input

1 x 16 16 16 x 16 16 16 x 1 1

Weights first layer Weights first layer Weights first layer Total amount of
= 32 = 256+16 = 272 = 17 weights = 321
41
An example with Tensorflow

42
An example of neural network training in tensorflow

This is a tutorial for building a


simple tensorflow model. We will
need:

• A task to solve
• Some (generated) data
• The libraries that we need
imported in your environment

43
Task and data

• Goal: To train a network to


model data generated by a sine
function.
• This will result in a model that
can take a value, x, and predict
its sine, y.
• Since y is a continous
dependent variable, this is a
Regression task
• For this example, we’re using
some code to generate a
dataset.

44
Add some noise

• Since it was generated directly by


the sine function, our data fits a
nice, smooth curve.
• Machine learning models are good
at extracting meaning from messy,
real world data.
• To demonstrate this, we can add
some noise to our data.

45
Train – Validation – Test split

• To evaluate the accuracy of the


trained model, we'll need to
compare its predictions to real data.
• This evaluation happens during
training (validation) and after
training (testing)
• It's important in both cases that we
use data not already used to train
the model.
• We'll reserve 20% of our data for
validation, and another 20% for
testing.

46
Define the network

• Define an empty model with


Sequential()
• For each layer in the network, we
call model.add(layer)
• To create a Dense layer we call
layers.Dense()
• The parameters of this function
are :
• The number of neurons
• The activation (if present)
• The input shape, if it’s the first layer of
the network (the others are implied)

47
Define the network - 2

• After all the layer have been


added, we need to compile the
model with model.compile()
• The parameters are:
• The optimizer used to train
• The loss function
• The metrics used in evaluating the
network
• We can print a summary of the
model with model.summary()

48
A recap of the memory demand of a 1-16-1 NN
16 hidden neurons
Already seen ..

Layer 2
Layer 1
Input Output


1 Output
1 input

[w(1)1,1, w(1)1,2 , … ,w(1)1,16] + [b(1)1,1, b(1)1,2 ,…,b(1)1,16] + [w(2)1,1, w(2)2,1 , … ,w(2)16,1] + [b(2)1]
1
1 x 16 16 16 x 1

Weights first layer = 32 Weights second layer = 17 Total amount of weights = 49

49
Training the network

• The training starts by calling model.fit()


• We need to specify:
• The training data x_train
• The labels y_train
• The number of epochs
• The batch size
• The validation data

50
Evaluating the model – training and validation loss and MAE

We can retrieve the training and


validation loss and MAE from
history, the output of the
function fit()

51
The model working in practice (on the training set)

52
Define a more complex network

• We can define a second, more


complex model to understand if
we can improve the results
• Add a second dense layer

53
A recap of the memory demand of a 1-16-16-1 NN

16 hidden 16 hidden Already seen ..


neurons neurons

Layer 2
Layer 1

Layer 3
Input Output



1 Output
1 input

1 x 16 16 16 x 16 16 16 x 1 1

Weights first layer Weights first layer Weights first layer Total amount of
= 32 = 256+16 = 272 = 17 weights = 321
54
Evaluating the model – training and validation loss and MAE

As before, we retrieve the training


and validation loss and MAE from
history, the output of the
function fit()

55
Test the model on the test set

• Finally, we use the function


model.predict()to obtain
the prediction of the trained
model on the test set x_test
• The results improved a lot from
the single layer model!

56
ML is not just Neural Networks

57
Several families of machine learning algorithms

• Decision trees/Random forests


• K-Nearest Neighbors
• Support Vector Machines
• …

We will not explore them but


it’s important to know they exist

58

You might also like