Chapter 5 - Machine Learning Basics
Chapter 5 - Machine Learning Basics
Training
Prediction
2
Comparison
• Traditional Programming
Data
Computer Output
Program
• Machine Learning
Data
Computer Program
Output
3
What is Machine Learning?
• “the acquisition of knowledge or skills through
experience, study, or by being taught.”
4
What is Machine Learning?
• [Arthur Samuel, 1959]
– Field of study that gives computers
– the ability to learn without being explicitly programmed
5
Why Study Machine Learning?
Engineering Better Computing Systems
• Develop systems
– too difficult/expensive to construct manually
– because they require specific detailed skills/knowledge
– knowledge engineering bottleneck
• Develop systems
– that adapt and customize themselves to individual users.
– Personalized news or mail filter
– Personalized tutoring
6
Why Study Machine Learning?
Cognitive Science
• Computational studies of learning may help us
understand learning in humans
– and other biological organisms.
7
Where does ML fit in?
8
Why are things working today?
• More compute power
• Better algorithms
Accuracy
/models
9
ML in a Nutshell
• Tens of thousands of machine learning algorithms
– Hundreds new every year
10
ML in a Nutshell
• Input: x (images, text, emails…)
• Data
– (x1,y1), (x2,y2), …, (xN,yN)
class A
class B
Regression Clustering
Classification
12
Tasks
Supervised Learning
x Classification y Discrete
x Regression y Continuous
Unsupervised Learning
x Clustering y Discrete ID
x Dimensionality y Continuous
Reduction
13
Supervised Learning
Classification
x Classification y Discrete
14
Supervised Learning
Regression
x Regression y Continuous
15
Stock market
16
Weather prediction
Temperature
17
Unsupervised Learning
Clustering
x Clustering y Discrete
Unsupervised Learning
Y not provided
18
Clustering Data: Group similar things
19
Unsupervised Learning
Unsupervised Learning
Y not provided
20
Reinforcement Learning
Reinforcement Learning
x Reinforcement y
Learning
Actions
21
Reinforcement Learning
• Reinforcement Learning is a part of Machine learning where an
of learning.
performs. If the option is correct, the machine gains the reward point
22
Reinforcement Learning: Learning to act
• There is only one
“supervised” signal at
the end of the game.
23
Supervised Learning
Machine Learning Basics
24
Unsupervised Learning
Machine Learning Basics
25
Nearest Neighbor Classifier
Machine Learning Basics
• Nearest Neighbor – for each test data point, assign the class label of
the nearest training data point
Adopt a distance function to find the nearest neighbor
o Calculate the distance to each data point in the training set, and assign the class of
the nearest data point (minimum distance)
It does not require learning a set of weights
Test Training
Training example examples
examples from class
from class 2
1
26
k-Nearest Neighbors Classifier
Machine Learning Basics
x2
x
x
x o
x x
x x
+ o
o x
o + x
o
o o
o
o
x1
27
Linear Classifier
Machine Learning Basics
• Linear classifier
Find a linear function f of the inputs xi that separates the classes
Use pairs of inputs and labels to find the weights matrix W and the
bias vector b
o The weights and biases are the parameters of the function f
Several methods have been used to find the optimal set of parameters
of a linear classifier
o A common method of choice is the Perceptron algorithm, where
the parameters are updated until a minimal error is reached (single
layer, does not use backpropagation)
Linear classifier is a simple approach, but it is a building block of
advanced classification algorithms, such as SVM and neural networks
o Earlier multi-layer neural networks were referred to as multi-layer
perceptrons (MLPs)
28
Linear Classifier
Machine Learning Basics
29
Support Vector Machines
Machine Learning Basics
30
Linear vs Non-linear Techniques
Linear vs Non-linear Techniques
31
Linear vs Non-linear Techniques
Linear vs Non-linear Techniques
32
Non-linear Techniques
Linear vs Non-linear Techniques
• Non-linear classification
Features are obtained as non-linear functions of the inputs
It results in non-linear decision boundaries
Can deal with non-linearly separable data
Inputs:
Features:
Outputs:
33
Binary vs Multi-class Classification
Binary vs Multi-class Classification
34
Binary vs Multi-class Classification
Binary vs Multi-class Classification
35
ML vs. Deep Learning
Introduction to Deep Learning
36
ML vs. Deep Learning
Introduction to Deep Learning
37
ML vs. Deep Learning
Introduction to Deep Learning
38
Why is DL Useful?
Introduction to Deep Learning
40
Introduction to Neural Networks
Introduction to Neural Networks
Input Output
x1 0.
y1 is 1
1
x2 0.
y
72
is 2
The image is
“2”
…
…
…
…
…
x256 …
y10
0.2 is 0
16 x 16 = 256
x1 y1
x2
Machine y2
“2”
…
…
…
…
x256 𝑓 :𝑅
256
→𝑅
10
y10
The function is represented by a neural network
42
Elements of Neural Networks
Introduction to Neural Networks
z a1w1 a2 w2 aK wK b
a1 w1
𝑎=𝜎 ( 𝑧 )
a2 w2
z z a
…
wK
…
output
aK Activation
weight function
s b
input
bias
43
Elements of Neural Networks
Introduction to Neural Networks
Weights Biases
𝒉𝒊𝒅𝒅𝒆𝒏 𝒍𝒂𝒚𝒆𝒓 𝒉 =𝝈 ( 𝐖 𝟏 𝒙 + 𝒃𝟏 )
Activation functions
𝒚 [3 × 4] + [4 × 2] = 20 weights
4 + 2 = 6 biases
𝒙 26 learnable parameters
𝒉
44
Elements of Neural Networks
Introduction to Neural Networks
…
…
…
…
…
…
…
…
…
…
xN … yM
…
Input Output
Layer Hidden Layer
Layers 45
Elements of Neural Networks
Introduction to Neural Networks
( 1 ∙1 ) + (− 1 ) ∙ ( − 2 )+ 1= 4
1 -2
46
Elements of Neural Networks
Introduction to Neural Networks
𝑓 :𝑅 →𝑅
2 2 𝑓
([ ]) [
1
−1
= 0 .62
0.83 ]
47
Activation Functions
Introduction to Neural Networks
48
Activation: Sigmoid
Introduction to Neural Networks
𝑓 (𝑥 ) ℝ𝑛 → [ 0 , 1 ]
𝑥
49
Activation: Tanh
Introduction to Neural Networks
𝑓 (𝑥 ) ℝ 𝑛 → [ −1 , 1 ]
50
Activation: ReLU
Introduction to Neural Networks
51
Training NNs
Training Neural Networks
• The network parameters include the weight matrices and bias vectors
from all layers
𝜃= { 𝑊 , 𝑏 ,𝑊 ,𝑏 , ⋯ 𝑊 , 𝑏 }
1 1 2 2 𝐿 𝐿
x1 … 0.
y1 is 1
1
…
Softmax
x2 … 0.
7y2 is 2
…
…
…
…
…
…
…
x256 … y10
0.2 is 0
16 x 16 = 256
…
52
Training NNs
Training Neural Networks
• To train a NN, set the parameters such that for a training subset of
images, the corresponding elements in the predicted output have
maximum values
53
Training NNs
Training Neural Networks
x1 … y1 0.
1
… 2
x2 … 0.
y2 0
… 3
Cost
…
…
…
…
…
…
…
…
…
…
…
…
x256 … ℒ (𝜃) 0
y10 0.5
…
True label “1”
54
Training NNs
Training Neural Networks
• For a training set of images, calculate the total loss overall all images:
• Find the optimal parameters that minimize the total loss
ℒ1 ( 𝜃 )
x1 NN ^
𝑦
1
y1
ℒ2 ( 𝜃 )
x2 NN ^
𝑦
2
y
2
ℒ3 ( 𝜃 )
x3 NN ^
𝑦
3
y3
…
…
…
…
…
…
…
…
ℒ𝑛 ( 𝜃 )
xN NN ^
𝑦
𝑁 yN
55
Backpropagation
Training Neural Networks
• Underfitting
The model is too “simple” to
represent all the relevant class
characteristics
E.g., model with too few
parameters
Produces high error on the training
set and high error on the validation
set
• Overfitting
The model is too “complex” and fits
irrelevant characteristics (noise) in
the data
E.g., model with too many
parameters
Produces low error on the training
57
Overfitting
Generalization
• Overfitting – a model with high capacity fits the noise in the data
instead of the underlying relationship
58