Emiprical Risk Minimization
Emiprical Risk Minimization
T m = {(xi, y i) ∈ (X × Y) | i = 1, . . . , m}
H ⊆ Y X = {h : X → Y}
A : ∪∞
m=1 (X × Y)m
→H
l i=1
true risk R(h) = Ex,y∼p[`(y, h(x))].
In case of evaluation, h is fixed and due to the law of large numbers,
m
requires {z 1, . . . , z m} to be sample from i.i.d. rv. with expeted value µ.
T m = {(x1, y 1), . . . , (xm, y m)} is drawn from i.i.d. rv. with p(x, y).
Evaluation:
h fixed independently on T m, z i = `(y i, h(xi)) and {z 1, . . . , z m} is i.i.d.
Learning:
hm = A(T m), z i = `(y i, hm(xi)) and thus {z 1, . . . , z m} is not i.i.d.
The task for the rest of the lecture is to show how to fix it.
To fix the problem we need uniform law of large numbers
7/12
P R(hm) − RT m (hm) ≥ ε ≤ P sup R(h) − RT m (h) ≥ ε ≤ B(m, H, ε)
h∈H
0.8 R(h)
RT m(h)
0.6
0.4 R(h) +
0.2 m=1000 R(h)
R(hm) RT m(hm)
0.0 hm
Uniform Law of Large Numbers
8/12
Law of Large Numbers: for any p(x, y) generating T m, and h ∈ H fixed
that
∀ ε > 0 : lim P sup R(h) − RT m (h) ≥ ε = 0
m→∞ h∈H
| {z }
empirical risk fails for some h∈H
risk, or that the hypothesis class H has the uniform convergence property.
ULLN applies for finite hypothesis class
9/12
Assume a finite hypothesis class H = {h1, . . . , hK }.
n o
B(h) = T m ∈ (X × Y)mRT m (h) − R(h) ≥ ε
− 2m ε2
P T m ∈ B(h) = 2 |H| e (b−a)2
P
P max RT m (h) − R(h) ≥ ε ≤
h∈H h∈H
∀ ε > 0 : lim P max |RT m (h) − R(h)| ≥ ε = 0
m→∞ h∈H
Generalization bound for finite hypothesis class
10/12
Hoeffding inequality generalized for a finite hypothesis class H:
2
− 2m ε 2
P max |RT m (h) − R(h)| ≥ ε ≤ 2|H|e (b−a)
h∈H
holds for all h ∈ H simultaneously and any loss function ` : Y × Y → [a, b].
Recommendations that follow from the generalization bound:
s
log 2|H| + log 1δ
R(h) ≤ RT m (h) + (b − a)
2m
| {z }
(m,|H|,δ)
R(h)
(m, |H|, δ)
RT m (h)
h1 hi∗ hK
H1 Hi∗ HK