CH 3
CH 3
Course of
Machine Learning: Theory and Algorithms
by
Professor HSSAYNI
-1-
Part 1: Machine learning theory
1. Learning framework
2. Uniform convergence:
1. 𝜺-representative sample.
2. Uniform convergence.
3. Learnability of infinite size hypotheses classes
4. Tradeoff Bias/Variance
5. Non-Uniform learning.
𝑳𝑫 𝒉𝑺 < 𝜺
1. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫,𝒇 (𝒉𝑺 ) > 𝜺 ≤ 𝟏 − 𝜹
𝑳𝑺 𝒉𝑺 Error
2. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫 (𝒉𝑺 ) > min 𝑳𝑫 𝒉 + 𝜺 ≤ 𝟏 − 𝜹
Model measure 𝒉∈𝑯
𝑨𝜶 𝑺 = 𝒉𝑺 = argmin 𝑳𝑺 (𝒉)
𝒉∈𝑯
Final hypothesis
𝒉𝑺 𝒉𝑺
𝒉
h𝐲𝐩𝐨𝐭𝐡𝐞𝐬𝐞𝐬 𝐬𝐞𝐭
h∈H
4
Course of M.L. by Pr. HSSAYNI
Motivation
Our aim in this chapter is to proove this proposition(there isn’t a realizability hypotheses ):
• 2 CU CU
H H
𝒙 𝒔 𝑺 𝑫 𝑺 𝒙 𝒔 𝑺 𝑫 𝑺
• 𝒔 𝑺 𝑫 𝑺 𝑫 𝑺 5
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample
Definition:
The sample is -representative with respect to if :
s D
Notice:
If is -representative, so H is a good learning strategy.
• 𝒔 𝑫
• 𝑫
6
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample
Lemma:
If is -representative with respect to then:
𝑫 𝒔 𝑫
𝒉∈𝑯
Such that:
H 𝜶 S S
h ∈H
7
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample
proof:
Let be -representative then:
, we have:
s D S D
We know that S is the output of H , so:
S S
h ∈H
So, , we have:
S S S
Since is -representative, then for S, We also have:
D S S S
8
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample
proof:
D S S S S D
So, :
D S D
Then :
D s D
h∈H
9
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
Definition:
We say that has the uniform convergence property with respect to ( , ), if there exist:
CU 2 2
▪a H such that: over
𝑪𝑼
▪ is a sample of size 𝑯 , whose points are drawn by , such that with
probability of at least , is -representative (avec probability ):
𝒙 𝒔 𝑺 𝑫 𝑺 𝒙 𝒔 𝑺 𝑫 𝑺
APAC: 𝒎𝑨𝑷𝑨𝑪
𝑯 𝜺, 𝜹
𝑷𝑺↝𝓓𝒎[ 𝑳𝓓 𝒉𝑺 > 𝒎𝒊𝒏 𝑳𝓓 𝒉 + 𝜺𝑨𝑷𝑨𝑪] ≤ 𝜹
𝒉∈𝑯
𝑨𝑷𝑨𝑪 𝑷𝑨𝑪
Definition: Sample complexity 𝒎𝑪𝑼
𝑯 𝜺, 𝜹 ⟹ 𝑯 𝑯
The function HCU 2 enables to determine the minimal number of data for
which follows a uniform convergence with accuracy and confidance .
𝑨𝑷𝑨𝑪 𝑪𝑼 𝑨𝑷𝑨𝑪 𝑪𝑼
𝑯 𝑯 𝑯 𝑯
G eneral O bjective
Objective of proof:
Prove that if is a finite class of hypotheses is agnostic PAC learnable.
But, we know that:
Remarks
If is - representative has the uniform convergence property is agnostic PAC
learnable.(theorem 1 and Lemma)
Objective: reformulation
We should simply prove that, if and we have sufficient amount of data, is -
representative (avec probability 1- ).
Then is agnostic PAC learnable(Remarks)
13
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
The measures of concentration inqualities are statistical tools that allow to quantify the
deviation between the empirical mean and the real mean when is finite.
Notice:
This probability decreases exponentially if the size of the sample increases.
m
D i s
i =1
16
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
Proof - S tep 1
Let’s suppose that , and let’s prove that is - representative:
𝒟 S
This implies to prove that :
𝒟 S is small
S↝𝒟 m
To bound this inequality we will use the Hoeffding’s Inequality.
We have:
m
𝒟 S i
S↝𝒟 m z↝𝒟
i =1
17
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
Proof - S tep 1
In that case we have:
1 m 1 m
i i , , i =1 i m i=1 i
z↝𝒟 m
Then:
m
– 2m 𝜀 2
i
z↝𝒟
i=1
–𝟐𝒎𝜀 𝟐 Eq.1
𝓓 𝑺
𝑺↝𝓓𝒎
So:
𝒟 S if is sufficiently large.
Proof - S tep 2
Now, let’s generalize Eq.1 for all hypotheses .
Let’s suppose that owns several hypotheses, and let’s prove that using the Boole’s
inequality that is - representative:
𝒟 S
We have:
𝒟 S , the probability of failure of is small)
We have that fails if is not - representative.
Proof - S tep 2
is not − representative with respect to ]
D S
We have:
D S D S
h∈H
According to Boole’s inequality:
D S 𝓓 𝑺
𝑺↝𝓓𝒎
h∈H h∈H
According to Hoeffding inequality, we have:
– 2m 𝜀 2
𝒟 S
S↝𝒟 m
Proof - S tep 2
So:
– 2m 𝜀 2
𝒟 S
S↝𝒟 m
h∈H h∈H
Then we will have that:
is not − representative ] – 2m 𝜀 2
So:
–𝟐𝒎 𝜀 𝟐 Eq.2
𝓓 𝑺
𝑺↝𝓓𝒎
Finally:
𝒟 S if is sufficiently big.
Proof - S tep 3
Let’s determine the necessary number of data so that can be - representative.
We know that HCU is the minimal number of data so that can be -representative
with probability .
So, we want that is not − representative ] be
So:
– 2m 𝜀 2
𝒟 S
S↝𝒟 m
Hereby, the necessary number of data is:
22
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
Theorem2:
Let be a finite class of hypotheses, let be a set of data and the cost
function.
So, follows a uniform convergence learning with sample complexity:
𝟐𝑯 𝟐𝑯
𝑪𝑼 𝒍𝒏 𝜹 𝑪𝑼 𝒍𝒏 𝜹
𝑯 𝟐𝝐𝟐 𝑯 𝟐𝝐𝟐
Moreover, follows agnostic PAC learning with H algorithm, with sample complexity:
APAC CU
H H 2
𝟐𝑯
𝑨𝑷𝑨𝑪 𝑪𝑼 𝜺 𝟐 𝒍𝒏 𝜹
• 𝑯 𝑯 𝝐𝟐
𝟐
23
Course of M.L. by Pr. HSSAYNI
Supervised Learning Passive Offline Algorithm (SLPOA)
𝑳𝑫 𝒉𝑺 ≤ 𝜺
1. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫,𝒇 (𝒉𝑺 ) > 𝜺 ≤ 𝟏 − 𝜹