Lecture5 FGV
Lecture5 FGV
Handout of Lecture 5
Bernard Salanié
22 July 2024
We multiply
p the weight of misclassified observations by a factor
exp(st ) = (1 − Mt )/Mt ;
We divide the weight of well-classified observations by the
same factor;
and we use a scale factor Wt to make sure that i wit = 1.
P
So if our weak learners are still strong enough that γt > γ > 0,
we converge to perfect classification exponentially fast
but of course we want to stop before that to avoid overfitting,
hence we cross-validate.
1 P
We want to show that n i 11(yi , Ft (xi )) goes to zero
exponentially in t.
1 first, note that 1 (yi , Ft (xi )) ≤ exp(−yi Ft (xi ))
show that wit = tτ=1 Wτ /n
Q
2
A real neuron.
1
σ(t) = .
1 + exp(−t)
exp(tj )
σj (t) = P .
k exp(tk )
more interesting:
with a nonlinear activation function, we get a sort of
series/sieves flexible method
with a difference: only one basis function, many linear indices.
It will get more expressive when we add hidden layers.
n
∂L̄ X
(s)
X
(s)
(w(s) ) = −2 r̂i (w)xim 11((wl )0 xi > 0)
∂w1lm
i=1 l
pM + (D − 1)M 2 + KM parameters.