0% found this document useful (0 votes)
5 views

CH 3

Uploaded by

sidikiensias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

CH 3

Uploaded by

sidikiensias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Université Mohammed V de Rabat

École Nationale Supérieure d'Informatique et d'Analyse des Systèmes


-----------------------------------------------------------------------------------------------------------------

Course of
Machine Learning: Theory and Algorithms
by
Professor HSSAYNI

National Higher School of Computer Science and Systems Analysis


Mohammed V University in Rabat
Email: [email protected]
Tel: 0635906499

-1-
Part 1: Machine learning theory

1. Learning framework
2. Uniform convergence:
1. 𝜺-representative sample.
2. Uniform convergence.
3. Learnability of infinite size hypotheses classes
4. Tradeoff Bias/Variance
5. Non-Uniform learning.

Course of M.L. by Pr. HSSAYNI


Reminder
Labeling the inputs 𝒙𝒊 by 𝓓((𝒙, 𝒚)/𝒙) Input generation 𝒙𝒊 by unknown
distribution 𝓓𝑿
𝒚𝟏 , 𝒚𝟐 … 𝒚𝑴 𝒙𝟏 , 𝒙𝟐 … 𝒙𝑴
𝑷𝑨𝑪 𝐥𝐧( 𝑯 ൗ𝜹) Data preprocessing
1. 𝜺, 𝜹 ∈ 𝑽 𝟎 → 𝒎 ≥ 𝒎𝑯 𝜺, 𝜹 = 𝜺
2. 𝜺, 𝜹 ∈ 𝑽 𝟎 → 𝒎 ≥ 𝒎𝑨𝑷𝑨𝑪
𝑯 𝜺, 𝜹

Training sample Testing sample


𝑺 = 𝒙𝟏 , 𝒚𝟏 , 𝒙𝟐 , 𝒚𝟐 , … 𝒙𝒎 , 𝒚𝒎 𝑺𝒕𝒆𝒔𝒕 = 𝒙𝒎+𝟏 , 𝒚𝒎+𝟏 , … 𝒙𝑴 , 𝒚𝑴

𝑳𝑫 𝒉𝑺 < 𝜺
1. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫,𝒇 (𝒉𝑺 ) > 𝜺 ≤ 𝟏 − 𝜹
𝑳𝑺 𝒉𝑺 Error
2. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫 (𝒉𝑺 ) > min 𝑳𝑫 𝒉 + 𝜺 ≤ 𝟏 − 𝜹
Model measure 𝒉∈𝑯

𝑨𝜶 𝑺 = 𝒉𝑺 = argmin 𝑳𝑺 (𝒉)
𝒉∈𝑯
Final hypothesis
𝒉𝑺 𝒉𝑺
𝒉
h𝐲𝐩𝐨𝐭𝐡𝐞𝐬𝐞𝐬 𝐬𝐞𝐭
h∈H

Course of ML by Pr. HSSAYNI 3


Reminder
• PAC if hypotheses of realizability: targget 𝑓 exist
• APAC if there isn’t hypotheses of realizability f with probability
H
ln
• If and hypotheses of realizability holds then we have PAC ∀ 𝑚 ≥ 𝑚PAC
H 𝜀, 𝛿 = s
𝛿

Learning PAC : PAC


H
• 2 PAC
𝑯 H
𝑺↝𝓓𝒎 𝒙 𝓓,𝒇 𝑺

Learning APAC: APAC


H
• 2 APAC
𝑯 H
𝑺↝𝓓𝒎 𝒙 𝓓 𝑺 𝓓
𝒉∈𝑯

4
Course of M.L. by Pr. HSSAYNI
Motivation
Our aim in this chapter is to proove this proposition(there isn’t a realizability hypotheses ):

if is a finite class of hypotheses, then follows an Agnostic PAC learning.


Tool:
Uniform convergence.
Learning Uniforme convegrence Agnostic PAC learning

Learning Uniforme convegrence: CU -representative


H

• 2 CU CU
H H

𝒙 𝒔 𝑺 𝑫 𝑺 𝒙 𝒔 𝑺 𝑫 𝑺

• 𝒔 𝑺 𝑫 𝑺 𝑫 𝑺 5
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample

Definition:
The sample is -representative with respect to if :

s D

Notice:
If is -representative, so H is a good learning strategy.
• 𝒔 𝑫
• 𝑫
6
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample

Lemma:
If is -representative with respect to then:

𝑫 𝒔 𝑫
𝒉∈𝑯

Such that:

H 𝜶 S S
h ∈H

7
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample

proof:
Let be -representative then:
, we have:
s D S D
We know that S is the output of H , so:

S S
h ∈H
So, , we have:
S S S
Since is -representative, then for S, We also have:
D S S S

8
Course of M.L. by Pr. HSSAYNI
2.1. -repres entative s ample

proof:

D S S S S D

So, :

D S D
Then :

D s D
h∈H

9
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
Definition:
We say that has the uniform convergence property with respect to ( , ), if there exist:
CU 2 2
▪a H such that: over
𝑪𝑼
▪ is a sample of size 𝑯 , whose points are drawn by , such that with
probability of at least , is -representative (avec probability ):

𝒙 𝒔 𝑺 𝑫 𝑺 𝒙 𝒔 𝑺 𝑫 𝑺

𝒙 𝒔 S 𝑫 S bad hypothesis(there isn’t a -representative )


Such that H 𝜶 S S
h∈H
Notice:
If has the uniform convergence property is called « Glivenko-Cantelli class ».

Course of M.L. by Pr. HSSAYNI 10


H: hypotheses set
PAC: 𝒎𝑷𝑨𝑪
𝑯 𝜺, 𝜹
𝑷𝑺↝𝓓𝒎 [𝑺𝒙 , 𝑳𝓓,𝒇 𝒉 > 𝜺] ≤ 𝜹

APAC: 𝒎𝑨𝑷𝑨𝑪
𝑯 𝜺, 𝜹
𝑷𝑺↝𝓓𝒎[ 𝑳𝓓 𝒉𝑺 > 𝒎𝒊𝒏 𝑳𝓓 𝒉 + 𝜺𝑨𝑷𝑨𝑪] ≤ 𝜹
𝒉∈𝑯

CU: 𝒎𝑪𝑼 𝑯 𝜺𝑪𝑼 , 𝜹


𝑷𝑺↝𝓓𝒎[ 𝐿s ℎ − 𝐿D ℎ > 𝜀] ≤ 𝛿
S is 𝛆-representative

Course of M.L. by Pr. HSSAYNI 11


2.2. Uniform converg ence

𝑨𝑷𝑨𝑪 𝑷𝑨𝑪
Definition: Sample complexity 𝒎𝑪𝑼
𝑯 𝜺, 𝜹 ⟹ 𝑯 𝑯
The function HCU 2 enables to determine the minimal number of data for
which follows a uniform convergence with accuracy and confidance .

Theorem1: (Lemma 1 and definition Of CU)


If follows a uniform convergence with complexity HCU , then follows
Agnostic PAC learning with complexity HAPAC such that:

𝑨𝑷𝑨𝑪 𝑪𝑼 𝑨𝑷𝑨𝑪 𝑪𝑼
𝑯 𝑯 𝑯 𝑯

Moreover, H succeeds in the agnostic PAC learning of .


Course of M.L. by Pr. HSSAYNI 12
2.2. Uniform converg ence

G eneral O bjective
Objective of proof:
Prove that if is a finite class of hypotheses is agnostic PAC learnable.
But, we know that:
Remarks
If is - representative has the uniform convergence property is agnostic PAC
learnable.(theorem 1 and Lemma)

Objective: reformulation
We should simply prove that, if and we have sufficient amount of data, is -
representative (avec probability 1- ).
Then is agnostic PAC learnable(Remarks)
13
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence

Proof strategy of the general objective


Step 1:
Consider that owns one hypothesis , and let’s prove that is - representative:
𝒟 S
Step 2:
Let’s supppose that owns many hypotheses, and let’s prove using Boole’s inequality that
is - representative:
𝒟 S
Step 3:
Let’s determine the necessary number of data so that can be - representative.

Course of M.L. by Pr. HSSAYNI 14


2.2. Uniform converg ence

Definition: Law of larg e s cale numbers


Let 1 2 m be a random variables such that is their real mean and
1 m 1 m
i=1 i their empirical mean. i=1 i
when
m m
So, if , the empirical mean converges converge to the real mean, with probability
equal to 1.

The measures of concentration inqualities are statistical tools that allow to quantify the
deviation between the empirical mean and the real mean when is finite.

Among these inqualities, there exists ‘’Hoeffding’s inequality’’.

Course of M.L. by Pr. HSSAYNI


15
2.2. Uniform converg ence

Definition: Hoeffding ’s Inequality


Let’s suppose that 1 2 m are random variables , having the real mean ,
such that these variables have values in . So:
m
– 2m 𝜀 2
b– a 2
i
Real mean i=1
(𝐿𝒟 general)
𝐿S Empirical mean

Notice:
This probability decreases exponentially if the size of the sample increases.
m

D i s
i =1
16
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence

Proof - S tep 1
Let’s suppose that , and let’s prove that is - representative:

𝒟 S
This implies to prove that :
𝒟 S is small
S↝𝒟 m
To bound this inequality we will use the Hoeffding’s Inequality.
We have:
m

𝒟 S i
S↝𝒟 m z↝𝒟
i =1

17
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence

Proof - S tep 1
In that case we have:
1 m 1 m
i i , , i =1 i m i=1 i
z↝𝒟 m
Then:
m
– 2m 𝜀 2
i
z↝𝒟
i=1

–𝟐𝒎𝜀 𝟐 Eq.1
𝓓 𝑺
𝑺↝𝓓𝒎
So:
𝒟 S if is sufficiently large.

Course of M.L. by Pr. HSSAYNI 18


2.2. Uniform converg ence

Proof - S tep 2
Now, let’s generalize Eq.1 for all hypotheses .
Let’s suppose that owns several hypotheses, and let’s prove that using the Boole’s
inequality that is - representative:
𝒟 S
We have:
𝒟 S , the probability of failure of is small)
We have that fails if is not - representative.

This implies to prove that :


is not − representative ] is small

Course of M.L. by Pr. HSSAYNI 19


2.2. Uniform converg ence

Proof - S tep 2
is not − representative with respect to ]
D S
We have:
D S D S
h∈H
According to Boole’s inequality:
D S 𝓓 𝑺
𝑺↝𝓓𝒎
h∈H h∈H
According to Hoeffding inequality, we have:
– 2m 𝜀 2
𝒟 S
S↝𝒟 m

Course of M.L. by Pr. HSSAYNI 20


2.2. Uniform converg ence

Proof - S tep 2
So:
– 2m 𝜀 2
𝒟 S
S↝𝒟 m
h∈H h∈H
Then we will have that:
is not − representative ] – 2m 𝜀 2

So:
–𝟐𝒎 𝜀 𝟐 Eq.2
𝓓 𝑺
𝑺↝𝓓𝒎
Finally:
𝒟 S if is sufficiently big.

Course of M.L. by Pr. HSSAYNI 21


2.2. Uniform converg ence

Proof - S tep 3
Let’s determine the necessary number of data so that can be - representative.
We know that HCU is the minimal number of data so that can be -representative
with probability .
So, we want that is not − representative ] be
So:
– 2m 𝜀 2
𝒟 S
S↝𝒟 m
Hereby, the necessary number of data is:

22
Course of M.L. by Pr. HSSAYNI
2.2. Uniform converg ence
Theorem2:
Let be a finite class of hypotheses, let be a set of data and the cost
function.
So, follows a uniform convergence learning with sample complexity:

𝟐𝑯 𝟐𝑯
𝑪𝑼 𝒍𝒏 𝜹 𝑪𝑼 𝒍𝒏 𝜹
𝑯 𝟐𝝐𝟐 𝑯 𝟐𝝐𝟐
Moreover, follows agnostic PAC learning with H algorithm, with sample complexity:

APAC CU
H H 2
𝟐𝑯
𝑨𝑷𝑨𝑪 𝑪𝑼 𝜺 𝟐 𝒍𝒏 𝜹
• 𝑯 𝑯 𝝐𝟐
𝟐
23
Course of M.L. by Pr. HSSAYNI
Supervised Learning Passive Offline Algorithm (SLPOA)

Labeling the inputs 𝒙𝒊 by 𝓓((𝒙, 𝒚)/𝒙) Input generation 𝒙𝒊 by unknown


distribution 𝓓𝑿
𝒚𝟏 , 𝒚𝟐 … 𝒚𝑴 𝒙𝟏 , 𝒙𝟐 … 𝒙𝑴
𝐥𝐧( 𝑯 ൗ𝜹)
1. 𝜺, 𝜹 → 𝒎 ≥ 𝒎𝑷𝑨𝑪
𝑯 𝜺, 𝜹 =
𝜺 Data preprocessing
𝐥𝐧(𝟐 𝑯 ൗ𝜹)
2. 𝜺, 𝜹 → 𝒎 ≥ 𝒎𝑨𝑷𝑨𝑪
𝑯 𝜺, 𝜹 =
𝟐𝜺𝟐
𝟐𝐥𝐧( 𝑯 ൗ𝜹)
3. 𝜺, 𝜹 → 𝒎 ≥ 𝒎𝑼𝑪
𝑯 𝜺, 𝜹 = 𝜺𝟐
Testing sample
Training sample 𝑺𝒕𝒆𝒔𝒕 = 𝒙𝒎+𝟏 , 𝒚𝒎+𝟏 , … 𝒙𝑴 , 𝒚𝑴
𝑺 = 𝒙𝟏 , 𝒚𝟏 , 𝒙𝟐 , 𝒚𝟐 , … 𝒙𝒎 , 𝒚𝒎

𝑳𝑫 𝒉𝑺 ≤ 𝜺
1. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫,𝒇 (𝒉𝑺 ) > 𝜺 ≤ 𝟏 − 𝜹

𝑳𝑺 𝒉𝑺 Error 2. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑫 (𝒉𝑺 ) > min 𝑳𝑫 𝒉 + 𝜺 ≤ 𝟏 − 𝜹


𝒉∈𝑯
Model measure 3. 𝑷𝑺~𝓓𝒎 𝑺𝑿 : 𝑳𝑺 𝒉𝑺 − 𝑳𝑫 (𝒉𝑺 ) > 𝜺 ≤ 𝟏 − 𝜹
𝑨𝜶 𝑺 = 𝒉𝑺 = argmin 𝑳𝑺 (𝒉)
𝒉∈𝑯
Final hypothesis
𝒉𝑺 𝒉𝑺
𝒉
h𝐲𝐩𝐨𝐭𝐡𝐞𝐬𝐞𝐬 𝐬𝐞𝐭
h∈H

Course of ML by Pr. HSSAYNI 24

You might also like