Deep Learning Summer School 2015: Introduction To Machine Learning
Deep Learning Summer School 2015: Introduction To Machine Learning
• Founding project:
The Perceptron (Frank Rosenblatt 1957)
First artificial neuron learning form examples
Computer science
Artificial Intelligence
Largely symbolic AI
rks
wo
t
l ne
u ra
l ne
c i a
rtifi
a
Neurosciences
Opimization
Computer science +
control
Information Artificial Intelligence
theory
or ks
r netw
al
u
Machine Learning artifi
c i al n e
& roscie nc e s
eu
Statistics t at ion al n
om pu
c
l p hysi cs
at i st i ca
St Neurosciences
Physics
{
targets:
inputs: targets: inputs: X Y
{
(what we observe)(what we must predict) (input feature vector) (label)
etc...
preprocessing,
feature
etc...
extraction
X n,2
{
“horse” Xn (6.8, 54, ... , 17, -3, ...) +1 Yn
New test
point:
• Number of examples: n
(sometimes several millions)
• Input dimensionality: d
number of input features characterizing each example
(often 100 to 1000, sometimes 10000 or much more)
ssy
examples
me
inputs: targets:
data-plumbing
“horse
“cat
etc..
“horse
x = (, , , , ....)
dog t
runping
han
ped
se
elep
hor
jum
cat
the
jum
we
Bag of words for «The cat jumped»: x = (... 0... ,0, 1, ...0... , 1, 0, 0, ...., 0, 0, 1, 0, ...0..
•
➠ density estimation
0.9
fθ 1. Collect training data
0.75
0.7
2. Learn a function (predictor)
input → target
0.55 3. Use learned function
0.5
0.4 on new inputs
0.25
0 input
Training set Dn { x1
-0.12 x20.42 x3-1 x14 0.22
x5 34
target y
input x
lundi 3 août 2015
A machine learning algorithm
usually corresponds to a combination of
the following 3 elements:
(either explicitly specified or implicit)
✓(often
the choice of a specific function family: F
a parameterized family)
✓(typically
a way to evaluate the quality of a function f∈F
using a cost (or loss) function L
mesuring how wrongly f prédicts)
✓(typically
a way to search for the «best» function f∈F
an optimization of function parameters to
minimize the overall loss over the training set).
}
D= (x1 , y1 ) • Make sure examples are in
random order.
(x2 , y2 )
Training • Split dataset in two:
set Dtrain and Dtest
Dtrain
• Use Dtrain to choose/
optimize/find best
...
predictor f =fˆ(Dtrain )
Flinear
Model Selection
Linear (affine) predictor: (in 1 dimension)
(«linear regression») (in d dimensions)
• => Under-fitting
largeur
22 saumon bar
21
20
19
18
17
16
15
14 luminosité
2 4 6 8 10
• => Over-fitting
largeur
22 saumon bar
21
20
19
18
?
17
16
15
14 luminosité
2 4 6 8 10
Nombre d’erreurs d’entrainement: 0
lundi 3 août 2015
• Optimal capacity for this problem
(par rapport à la quantité de données)
largeur
22 saumon bar
21
20
19
18
17
16
15
14 luminosité
2 4 6 8 10
bias
estim∗a
• la !
fn) − R( ffonction
meilleure
• R( !
) = (R( dans
∗
F :fF )) + (R( fF )ti−
fn) − R( ∗
on R( f ∗
))
best function vari error
ance
in F f F = arg min R( f )
∗
Considered f ∈F
function family F
fˆ(Dtrain )
function our algo
• la meilleure fonction possible (la d écision/l’erreur de
learnt using trainset
Bayes
f ∗ = arg min R( f )
bias
ˆ(Dtrain3)
f estim∗a
• la !
fn) − R( ffonction
meilleure
• R( ∗ !
) = (R( dans F :fF )) + (R( fF )ti−
fn) − R( ∗
on R( f ∗
))
best function vari error
ance
in F f = arg min R( f )
∗
F
Considered f ∈F
function family F fˆ(Dtrain2) fˆ(Dtrain1)
function our algo
• la meilleure fonction possible (la d écision/l’erreur de
learnt using trainset
Bayes
f ∗ = arg min R( f )
• Bigger n ➪ variance ↓
So we can afford to increase capacity (to lower the bias)
➪ can use more expressive models
}
For each considered model (ML algo) A:
(x1 , y1 ) For each considered hyper-parameter config λ:
• train model A with hyperparams λ on Dtrain
(x2 , y2 ) Training fˆAλ = Aλ (Dtrain )
set • evaluate resulting predictor on Dvalid
Dtrain (with preferred evaluation metric)
eAλ = R̂(fˆAλ , Dvalid )
∗ ∗
Locate A , λ that yielded best eAλ
}
Finally: compute unbiased estimate of
generalization performance of f * using Dtest
Test set
Dtest R̂(f ∗ , Dtest )
(xN , yN ) Dtest must never have been used during training or
model selection to select, learn, or tune anything.
lundi 3 août 2015 On évalue la p
Ex of model hyper-parameter selection
Training set error
Validation set error
6,0
4,5
3,0
1,5
0
1 3 5 7 9 11 13 15
Hyper-parameter value