0% found this document useful (0 votes)

2 views

supervised learning

The document discusses supervised learning and focuses on empirical risk minimization (ERM) as a method for estimating the lowest expected risk in classification problems. It outlines the concepts of generalization error, excess risk decomposition, and the challenges of overfitting in relation to ERM. The document also emphasizes the importance of understanding the relationship between empirical and expected risks, and introduces Hoeffding's inequality as a tool for controlling generalization error.

Uploaded by

aaaaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

supervised learning

Uploaded by

aaaaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Supervised Learning (COMP0078)

6. Learning Theory (Part I)

Carlo Ciliberto

University College London

Department of Computer Science

1
Previous Classes

In the previous classes we:

• Formulated the supervised learning problem.

• Focused on Empirical Risk Minmization (ERM).
• Discussed overfitting and how to tackle it (heuristically).

2
Previous Classes

We studied a few different algorithms (in particular focusing on

classification):

• Nearest neighbor
• Least-squares
• Support Vector Machines
• Logistic Regression
• Decision Trees
• Ensemble methods (Bagging, Boosting)

We observed that many of these methods can be formulated as

ERM problems.

3
Outline

Equipped with our shiny new tool-set of supervised learning

algorithms, we will now go back to asking ourselves the
question:

| When/why/how do these methods “work”?

Outline:

• Emprical Risk Minimization (Again!)

• Generalization error
• Excess Risk Decomposition
• Regularization & Bias-Variance Trade-off (Again!)

4
Refresher on the Learning Problem

The goal of Supervised Learning is to find a “good” estimator

fn : X → Y, approximating the lowest expected risk

Z
inf E(f ), E(f ) = ℓ(f (x), y) dρ(x, y)
f :X →Y X ×Y

given only a finite number of (training) examples (xi , yi )ni=1

sampled independently from the unknown distribution ρ.

5
A Wishlist

What do we mean by... “good”?

6
A Wishlist

What do we mean by... “good”?

To have a low excess risk E(fn ) − E(f∗ )

• Consistency. Does E(fn ) − E(f∗ ) → 0

• in Expectation?
• in Probability?
as the size of the training set S = (xi , yi )ni=1 of points
randomly sampled from ρ grows, n → +∞.

• Learning Rates. How “fast” is consistency achieved?

Nonasymptotic bounds: finite sample complexity, tail
bounds, error bounds...

6
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

7
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk

n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea?

7
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk

n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea? A simple calculation shows that
n n
1X 1X
ES∼ρn [En (f )] = E(xi ,yi )∼ρ [ℓ(f (xi ), yi )] = E(f ) = E(f )
n n
i=1 i=1
The expectation of En (f ) is the expected risk E(f )!
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk

Nice! But... how close is En (f ) to E(f ) with respect to n?

8
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

1 Pn
Let X and (Xi )ni=1 be i.i.d. random variables, X̄n = n i=1 Xi .

Then
E[(X̄n − E(X ))2 ] = Var(X̄n )

8
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

1 Pn
Let X and (Xi )ni=1 be i.i.d. random variables, X̄n = n i=1 Xi .

Then
Var(X )
E[(X̄n − E(X ))2 ] = Var(X̄n ) =
n

Therefore, the expected (squared) distance between the

empirical mean of the Xi and their expectation E(X ) goes to
zero as O(1/n)... (Assuming X has finite variance).

8
Empirical Vs Expected Risk

If Xi = ℓ(f (xi ), yi ), we have X̄n = En (f ) and therefore

Vf
E[(En (f ) − E(f ))2 ] =
n
Where Vf = Var(ℓ(f (x), y )). In particular

r
Vf
E[|En (f ) − E(f )|] ≤
n

9
Empirical Vs Expected Risk

If Xi = ℓ(f (xi ), yi ), we have X̄n = En (f ) and therefore

Vf
E[(En (f ) − E(f ))2 ] =
n
Where Vf = Var(ℓ(f (x), y )). In particular

r
Vf
E[|En (f ) − E(f )|] ≤
n

Exercise. Can we similarly get a “tail” bound?

P |En (f ) − E(f )| ≤ ε(n, δ) ≥ 1 − δ

9
Empirical Vs Expected

Assume there exists a minimizer f∗ : X → Y of the expected risk

E(f∗ ) = inf E(f )

f ∈F

Then, for any f : X → Y we can decompose the excess risk as

E(f ) − E(f∗ ) = E(f ) − En (f ) + En (f ) − En (f∗ ) + En (f∗ ) − E(f∗ ),

We can therefore leverage on the statistical relation between En

and E to study the expected risk in terms of the empirical risk.

This leads naturally to Empirical Risk Minimization

10
Empirical Risk Minimization (ERM)

Let fn be the minimizer of the empirical risk

fn = arg min En (f )
f ∈F

We automatically have En (fn ) − En (f∗ ) ≤ 0 (for any training set).

Then...

E [E(fn ) − E(f∗ )] ≤ E [E(fn ) − En (fn )] (Question. Why?)

We can “just” focus on studying only the generalization error

E [E(fn ) − En (fn )]

11
Generalization Error

How can we control the generalization error

En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]

12
Generalization Error

How can we control the generalization error

En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]̸= 0 (Question. Why?)

12
Generalization Error

How can we control the generalization error

En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]̸= 0 (Question. Why?)

En and fn both depend on the sampled training data. Therefore,

we cannot use the result
r
Var(ℓ(fn (x), y))
E[|En (fn ) − E(fn )|] ≤
n
which indeed is not true in general...
12
Issues with ERM

LetX = Y = R, ρ with dense support1 and ℓ(y, y) = 0 ∀y ∈ Y.

For any S = (xi , yi )ni=1 with distinct inputs xi , let fn : X → Y be

(
yi if x = xi for some i = 1, . . . , n
fn (x) =
0 otherwise

Then, for any number n of training points:

• E [En (fn )] = 0
• E [E(fn )] = E(0), which is greater than E(f ∗ ) (unless f ∗ ≡ 0)

Therefore E [E(fn ) − En (fn )] = E(0) ↛ 0 as n increases!

1
and such that every pair (x, y ) has measure zero according to ρ

13
Overfitting

An estimator fn is said to overfit the training data if ∀n ∈ N:

• E [E(fn ) − E(f∗ )] > C for a constant C > 0, and

• E [En (fn ) − En (f∗ )] ≤ 0

According to this definition ERM overfits...

14
ERM on Finite Hypotheses Spaces

Is ERM hopeless? Consider the case X and Y finite.

Then, F = Y X = {f : X → Y} is finite as well (albeit possibly

very large), and therefore:

E|En (fn ) − E(fn )| ≤ E sup |En (f ) − E(f )|

f ∈F
X p
≤ E|En (f ) − E(f )| ≤ |F | VF /n
f ∈F

where VF = supf ∈F Vf and |F| denotes the cardinality of F.

15
ERM on Finite Hypotheses Spaces

Is ERM hopeless? Consider the case X and Y finite.

Then, F = Y X = {f : X → Y} is finite as well (albeit possibly

very large), and therefore:

E|En (fn ) − E(fn )| ≤ E sup |En (f ) − E(f )|

f ∈F
X p
≤ E|En (f ) − E(f )| ≤ |F | VF /n
f ∈F

where VF = supf ∈F Vf and |F| denotes the cardinality of F.

Here ERM works! limn→+∞ E|E(fn ) − E(f∗ )| = 0

15
ERM on Finite Hypotheses (Sub) Spaces

The same argument holds in general: let H ⊂ F be a finite

space of hypotheses (even if F is not). Then,
p
E|En (fn ) − E(fn )| ≤ |H| VH /n

In particular, if f∗ ∈ H, then
p
E|E(fn ) − E(f∗ )| ≤ |H| VH /n

and ERM is a good estimator for the problem considered.

16
Example: Threshold functions

Consider a binary classification problem Y = {0, 1}. Someone

has told us that the minimizer of the risk is a “threshold
function” fa∗ (x) = 1[a∗ ,+∞) with a∗ ∈ [−1, 1].

1.5
a
b

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

We can learn on H = {fa |a ∈ R} = [−1, 1]. However on a

computer we can only represent real numbers up to a given
precision. 17
Example: Threshold Functions (with precision p)

Discretization: given a p > 0, we can consider

Hp = {fa | a ∈ [−1, 1], a · 10p = [a · 10p ]}

with [a] denoting the integer part (i.e. the closest integer) of a
scalar a. The value p can be interpreted as the “precision” of
our space of functions Hp . Note that |Hp | = 2 · 10p

If f ∗ ∈ Hp , then we have automatically that

p √
E|E(fn ) − E(f∗ )| ≤ |Hp | VH /n ≤ 10p / n

(This is because VH ≤ 1/4 Question. Why?)

18
Rates in Expectation Vs Probability

In practice, even for small values of p

√
E|E(fn ) − E(f∗ )| ≤ 10p / n

will need a very large n in order to have a meaningful bound on

the expected error.

Interestingly, we can get much better constants (not rates

though!) by working in probability...

19
Hoeffding’s Inequality

Let X1 , . . . , Xn independent random variables s.t. Xi ∈ [ai , bi ].

1 Pn
Let X = n i=1 Xi . Then,

2n2 ϵ2

P X − E X ≥ ϵ ≤ 2 exp − Pn 2
i=1 (bi − ai )

20
Applying Hoeffding’s inequality

Assume that ∀f ∈ H, x ∈ X , y ∈ Y the loss is bounded

|ℓ(f (x), y)| ≤ M by some constant M > 0. Then, for any f ∈ H
we have

nϵ2
P (|En (f ) − E(f )| ≥ ϵ) ≤ 2 exp(− )
2M 2

21
Controlling the Generalization Error

We would like to control the generalization error En (fn ) − E(fn )

of our estimator in probability. One possible way to do that is by
controlling the generalzation error of the whole set H.

P (|En (fn ) − E(fn )| ≥ ϵ) ≤ P sup |En (f ) − E(f )| ≥ ϵ
f ∈H

The latter term is the probability that least one of the events
|En (f ) − E(f )| ≥ ϵ occurs for f ∈ H. In other words the
probability of the union of such events. Therefore

X
P sup |En (f ) − E(f )| ≥ ϵ ≤ P (|En (f ) − E(f )| ≥ ϵ)
f ∈H f ∈H

by the so-called union bound.

22
Hoeffding the Generalization Error

By applying Hoeffding’s inequality,

nϵ2
P (|En (fn ) − E(fn )| ≥ ϵ) ≤ 2|H| exp(− )
2M 2

Or, equivalently, that for any δ ∈ (0, 1],

r
2M 2 log(2|H|/δ)
|En (fn ) − E(fn )| ≤
n
with probability at least 1 − δ.

23
Example: Threshold Functions (in Probability)

Going back to Hp space of threshold functions...

r
4 + 6p − 2 log δ
|En (fn ) − E(fn )| ≤
n
since M = 1 and
log 2|H| = log 4 · 10p = log 4 + p log 10 ≤ 2 + 3p.

For example, let δ = 0.001. We can say that

r
6p + 18
|En (fn ) − E(fn )| ≤
n
holds at least 99.9% of the times.

24
Bounds in Expectation Vs Probability

Comparing the two bounds

√
E |En (fn ) − E(fn )| ≤ 10p / n (Expectation)

While, with probability greater than 99.9%

r
6p + 18
|En (fn ) − E(fn )| ≤ (Probability)
n

Although we cannot be 100% sure of it, we can be quite

confident that the generalization error will be much smaller than
what the bound in expectation tells us...
Rates: note however that the rates of convergence to 0 are the
√
same (i.e. O(1/ n)).
25
Improving the bound in Expectation

Exploiting the bound in probability and the knowledge that on

Hp the excess risk is bounded by a constant, we can improve
the bound in expectation...
Let X be a random variable s.t. |X | < M for some constant
M > 0. Then, for any ϵ > 0 we have

E |X | ≤ ϵ P (|X | ≤ ϵ) + MP (|X | > ϵ)

Applying to our problem: for any δ ∈ (0, 1]

r
2M 2 log(2|Hp |/δ)
E |En (fn ) − E(fn )| ≤ (1 − δ) + δM
n
Therefore only log |Hp | appears (no |Hp | alone).

26
Infinite Hypotheses Spaces

What if f∗ ∈ H \ Hp for any p > 0?

ERM on Hp will never minimize the expected risk. There will

always be a gap for E(fn,p ) − E(f∗ ).

For p → +∞ it is natural to expect such gap to decrease... BUT

if p increases too fast (with respect to the number n of
examples) we cannot control the generalization error anymore!
r
6p + 18
|En (fn ) − E(fn )| ≤ → +∞ for p → +∞
n

Therefore we need to increase p gradually as a function p(n) of

the number of training examples. This approach is known as
regularization.

27
Approximation Error for Threshold Functions

Consider fp = 1[ap ,+∞) = arg minf ∈Hp E(f ) with ap ∈ [−1, 1].
We decompose the excess risk E(fn ) − E(f∗ ):

E(fn ) − En (fn ) + En (fn ) − En (fp ) +En (fp ) − E(fp ) + E(fp ) − E(f∗ )

| {z }
≤0

We already know how to control the generalization of fn (via the

supremum over Hp ) and fp (since it is a single function).
Moreover, we have that the approximation error is

E(fp ) − E(f∗ ) ≤ |ap − a∗ | ≤ 10−p (why?)

Note that it does not depend on training data!

28
Approximation Error for Threshold Functions II

Putting everything together: for any δ ∈ (0, 1] and p ≥ 0,

r
4 + 6p − 2 log δ
E(fn ) − E(f∗ ) ≤ 2 + 10−p = ϕ(n, δ, p)
n
holds with probability greater or equal to 1 − δ.

So, for any n and δ, we can choose the best precision as

p(n, δ) = arg min ϕ(n, δ, p)

p≥0

which leads to an error bound ϵ(n, δ) = ϕ(n, δ, p(n, δ)) holding

with probability larger or equal than 1 − δ.

29
Regularization

Most hypotheses spaces are “too” large and therefore prone to

overfitting. Regularization is the process of controlling the
“freedom” of an estimator as a function on the number of
training examples.

Idea. Parametrize H as a union H = ∪γ>0 Hγ of hypotheses

spaces Hγ that are not prone to overfitting (e.g. finite spaces).
γ is known as the regularization parameter (e.g. the precision p
in our examples). Assume Hγ ⊂ Hγ ′ if γ ≤ γ ′ .

Regularization Algorithm. Given n training points, find an

estimator fγ,n on Hγ (e.g. ERM on Hγ ). Let γ = γ(n) increase
as n → +∞.

30
Regularization and Decomposition of the Excess Risk

Let γ > 0 and fγ = argmin E(f )

f ∈Hγ

We can decompose the excess risk E(fγ,n ) − E(f∗ ) as

E(fγ,n ) − E(fγ ) + E(fγ ) − inf E(f ) + inf E(f ) − E(f∗ )

| {z } f ∈H f ∈H
Sample error
| {z } | {z }
Approximation error Irreducible error

31
Irreducible Error

inf E(f ) − E(f∗ )

f ∈H

Recall: H is the “largest” possible Hypotheses space we are

considering.

If the irreducible error is zero, H is called universal (e.g. the

RKHS induced by the Gaussian kernel is a universal
Hypotheses space).

32
Approximation Error

E(fγ ) − inf E(f )

f ∈H

• Does not depend on the dataset (deterministic).

• Does depend on the distribution ρ.
• Also referred to as bias.

33
Convergence of the Approximation Error

Under mild assumptions,

lim E(fγ ) − inf E(f ) = 0

γ→+∞ f ∈H

34
Density Results

lim E(fγ ) − E(f∗ ) = 0

γ→+∞

• Convergence of Approximation error

+
• Universal Hypotheses space

Note: It corresponds to a density property of the space H in

F = {f : X → Y}

35
Approximation error bounds

E(fγ ) − inf E(f ) ≤ A(ρ, γ)

f ∈H

• No rates without assumptions – related to the so-called No

Free Lunch Theorem.
• Studied in Approximation Theory using tools such as
Kolmogorov n-width, K-functionals, interpolation spaces. . .
Prototypical result:
If f∗ is s-“regular”2 parameter equal to s, then
A(ρ, γ) = cγ −s .
2
Some abstract notion of regularity parametrizing the class of target
functions. Typical example: f∗ in a Sobolev space W s,2 .
36
Sample Error

E(fγ,n ) − E(fγ )

Random quantity depending on the data.

Two main ways to study it:

• Capacity/Complexity estimates on Hγ .

• Stability.

37
Sample Error Decomposition

We have seen how to decompose E(fγ,n ) − E(fγ ) in

E(fγ,n ) − En (fγ,n ) + En (fγ,n ) − En (fγ ) + En (fγ ) − E(fγ )

| {z } | {z } | {z }
Generalization error Excess empirical Risk Generalization error

38
Generalization Error(s)

As we have observed,

E(fγ,n ) − En (fγ,n ) and En (fγ ) − E(fγ )

Can be controlled by studying the empirical process

sup |En (f ) − E(f )|

f ∈Hγ

Example: we have already observed that for a finite space Hγ

" # r
VHγ
E sup |En (f ) − E(f )| ≤ |Hγ |
f ∈Hγ n

39
ERM on Finite Spaces and Computational Efficiency

The strategy used for threshold functions can be generalized to

any H for which it is possible to find a finite discretization Hp
with respect to the L1 (X , ρX ) norm (e.g. H compact with
respect to such norm).

However, it could be computationally very expensive to find the

empirical risk minimizer on a discretization Hp , since in
principle it could be necessary to evaluate En (f ) for any f ∈ Hp .

40
ERM on Convex Spaces?

As we have seen in previous classes, ERM on convex (thus

dense) spaces is often much more amenable to computations.

In principle, we have observed that on infinite hypotheses

spaces it is difficult to control the generalization error but...

...we might be able to leverage the discretization argument

used for threshold functions to control the generalization error
of ERM for a larger family of hypotheses spaces.

41
Example: Risks for Continuous functions

Let X ⊂ Rd be a compact (i.e., closed and bounded) space and

C(X ) be the space of continuous functions. Let ∥ · ∥∞ be
defined for any f ∈ C(X ) as ∥f ∥∞ = supx∈X |f (x)|.

If the loss function ℓ : Y × Y → R is such that ℓ(·, y) is uniformly

Lipschitz with constant L > 0, for any y ∈ Y, we have that

1) |E(f1 ) − E(f2 )| ≤ L∥f1 − f2 ∥L1 (X ,ρX ) ≤ L∥f1 − f2 ∥∞ , and

n
1X
2) |En (f1 ) − En (f2 )| ≤ |ℓ(f1 (xi ), yi ) − ℓ(f2 (xi ), yi )| ≤ L∥f1 − f2 ∥∞
n
i=1

Therefore, “close” functions in ∥ · ∥∞ will have similar expected

and empirical risks!

42
Example: Covering numbers

We define the covering number of H of radius η > 0 as the

cardinality of a minimal cover of H with balls of radius η.
m
[
N (H, η) = inf{m | H ⊆ Bη (hi ) hi ∈ H}
i=1

Image credits: Lorenzo Rosasco.

Example. If H ∼
= BR (0) is a ball of radius R in Rd :
N (BR (0), η) = (4R/η)d 43
Example: Covering numbers (continued)

Note that the resolution η is related to the precision in the

sense of a distance between two hypothesis. We can apply the
same reasoning used for threshold functions
r
VH
E sup |E(fn ) − E(f )| ≤ 2Lη + N (H, η)
f ∈H n

For η → 0 the covering number N (H, η) → +∞. However, for

n → +∞ the bound tends to zero.
It is typically possible to find an η(n) for which the bound tends
to zero as n → +∞.
(Exercise. find such an η(n) for H a ball of radius R)

44
Example: Covering numbers (continued)

Same argument can be reproduced for bounds in probability,

namely for any δ ∈ [0, 1),

r
2M 2 log(2N (H, η)/δ)
sup |E(fn ) − E(f )| ≤ 2Lη +
f ∈H n

holds with probability at least 1 − δ.

45
Complexity Measures

In general, the error

sup |En (f ) − E(f )|

f ∈Hγ

Can be controlled via capacity/complexity measures:

• Covering numbers,
• combinatorial dimension, e.g. VC-dimension, fat-shattering
dimension
• Rademacher complexities
• Gaussian complexities
• ...
46
Prototypical Results

A prototypical result (under suitable assumptions, e.g.

regularity of f∗ ):

E(fγ,n ) − E(f ∗ ) ≤ E(fγ,n ) − E(fγ ) + E(fγ ) − E(f ∗ )

| {z } | {z }
≲ γ β n−α ≲ γ −τ
(Variance) (Bias)

47
Choosing γ(n) in practice

The best γ(n) depends on the unknown distribution ρ. So how

can we choose such parameter in practice?

Problem known as model selection. Possible approaches:

• Cross validation,
• complexity regularization/structural risk minimization,
• balancing principles.
• ...

48
Abstract Regularization

We got a new perspective on the concept of regularization:

controlling the expressiveness of the hypotheses space
according to the number of training examples in order to
guarantee good prediction performance and consistency.

There are many ways to implement this strategy in practice:

• Tikhonov (and Ivanov) regularization

• Spectral filtering
• Early stopping
• Random sampling
• ...

49
Wrapping Up

Building on a few (reasoble?) assumptions on the learning

problem, we:

• Have shown how the empirical risk can be a proxy for the
expected.
• Identified the main reasons behind overfitting and
discussed how to counteract it (in a more principle way!).
• Highlighted the key role played by the choice of the
hypotheses space and how their “complexity” affect
performance.

Next class we will focus on one such measure of complexity

and derive upper bounds on the generalization error.

50
Recommended Reading

Chapter 4, 5 and 6 of Shalev-Shwartz, Shai, and Shai

Ben-David. Understanding machine learning: From theory to
algorithms. Cambridge university press, 2014.

Econometrics - Fumio Hayashi (Solutions)
No ratings yet
Econometrics - Fumio Hayashi (Solutions)
19 pages
Muslim Sects and Divisions: The Section On Muslim Sects in Kitab Al-Milal Wa '1-Nibal
No ratings yet
Muslim Sects and Divisions: The Section On Muslim Sects in Kitab Al-Milal Wa '1-Nibal
71 pages
MATH 499 Homework 2
100% (3)
MATH 499 Homework 2
2 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Career Adapt-Abilities Scale (CAAS)
100% (1)
Career Adapt-Abilities Scale (CAAS)
3 pages
Class14 PDF
No ratings yet
Class14 PDF
29 pages
Class 02
No ratings yet
Class 02
42 pages
02-first-model-of-learning
No ratings yet
02-first-model-of-learning
37 pages
1. Statistical Learning Theory
No ratings yet
1. Statistical Learning Theory
100 pages
480-note-lin
No ratings yet
480-note-lin
11 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Lecture 1
No ratings yet
Lecture 1
5 pages
RIP Routing Protocol
No ratings yet
RIP Routing Protocol
27 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Selected theoretical aspects of ML and deep learning
No ratings yet
Selected theoretical aspects of ML and deep learning
46 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Chapter 08
100% (2)
Chapter 08
202 pages
cs229 Notes4 PDF
No ratings yet
cs229 Notes4 PDF
11 pages
Emiprical Risk Minimization
No ratings yet
Emiprical Risk Minimization
12 pages
LearningTheory
No ratings yet
LearningTheory
19 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Lect 02
No ratings yet
Lect 02
36 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Statistical Decision Theory, Least Squares, and Bias Variance Tradeoff
No ratings yet
Statistical Decision Theory, Least Squares, and Bias Variance Tradeoff
3 pages
Empirical Risk Minimization For Losses Without Variance: Catoni 2012
No ratings yet
Empirical Risk Minimization For Losses Without Variance: Catoni 2012
43 pages
Introduction
No ratings yet
Introduction
11 pages
Slides 1 Handout
No ratings yet
Slides 1 Handout
23 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
Lecture Notes For ECE 695-09/08/03
No ratings yet
Lecture Notes For ECE 695-09/08/03
3 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Mukherjee AImemo
No ratings yet
Mukherjee AImemo
55 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
prml_solution_manual-2
No ratings yet
prml_solution_manual-2
122 pages
Notes 2
No ratings yet
Notes 2
10 pages
Overfitting: Extracting Too Much
No ratings yet
Overfitting: Extracting Too Much
17 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Econometric Theorems
No ratings yet
Econometric Theorems
63 pages
ML Lecture23
No ratings yet
ML Lecture23
57 pages
Error Propagation
No ratings yet
Error Propagation
22 pages
Asymptotic Theory For OLS
No ratings yet
Asymptotic Theory For OLS
15 pages
04 Estimation
No ratings yet
04 Estimation
48 pages
4 Estimation
No ratings yet
4 Estimation
33 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
Curs 3
No ratings yet
Curs 3
31 pages
Lecture 3 - ATM PDF
No ratings yet
Lecture 3 - ATM PDF
29 pages
Notes On Asymptotic Theory: IGIER-Bocconi, IZA and FRDB
No ratings yet
Notes On Asymptotic Theory: IGIER-Bocconi, IZA and FRDB
11 pages
Lec 25
No ratings yet
Lec 25
15 pages
Lecture_5 - Copy (3)
No ratings yet
Lecture_5 - Copy (3)
47 pages
Asymp2 Analogy 2006-04-05 Mms
No ratings yet
Asymp2 Analogy 2006-04-05 Mms
63 pages
Notes
No ratings yet
Notes
10 pages
BSDS_slides-Week9
No ratings yet
BSDS_slides-Week9
6 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
1.8. Large Deviation and Some Exponential Inequalities.: B R e DX Essinf G (X), T e DX Esssup G (X)
No ratings yet
1.8. Large Deviation and Some Exponential Inequalities.: B R e DX Essinf G (X), T e DX Esssup G (X)
4 pages
Lec 4
No ratings yet
Lec 4
8 pages
Empirical Risk Minimization
No ratings yet
Empirical Risk Minimization
3 pages
Weatherwax Epstein Hastie Solution Manual
No ratings yet
Weatherwax Epstein Hastie Solution Manual
147 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
Notes
No ratings yet
Notes
32 pages
Lecture1
No ratings yet
Lecture1
8 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Cells Quiz
No ratings yet
Cells Quiz
21 pages
5-Ingredient Creamy Pink Beet Pasta Sauce (Vegan) - Eating by Elaine
No ratings yet
5-Ingredient Creamy Pink Beet Pasta Sauce (Vegan) - Eating by Elaine
2 pages
LAWS 1011 Intro To Law UG Course Outline
No ratings yet
LAWS 1011 Intro To Law UG Course Outline
18 pages
Week 5 Day 1
No ratings yet
Week 5 Day 1
6 pages
Date Sheet for the Associate Degree in Arts Science 2 Years Program Fi5942
No ratings yet
Date Sheet for the Associate Degree in Arts Science 2 Years Program Fi5942
2 pages
Classic Chic_ Music, Fashion, and Modernism -- Mary E_ Davis -- California Studies in 20th-Century Music, 2006 -- University of California Press -- 9780520245426 -- 3f2d1d74792722447600cfea0acd2133 -- Anna’s Archive
No ratings yet
Classic Chic_ Music, Fashion, and Modernism -- Mary E_ Davis -- California Studies in 20th-Century Music, 2006 -- University of California Press -- 9780520245426 -- 3f2d1d74792722447600cfea0acd2133 -- Anna’s Archive
354 pages
Buckling of Beams and Columns Under Combined Axial and Horizontal Loading With Various Axial Loading Application Locations
No ratings yet
Buckling of Beams and Columns Under Combined Axial and Horizontal Loading With Various Axial Loading Application Locations
12 pages
Daftar Pustaka 1
No ratings yet
Daftar Pustaka 1
4 pages
Report Cptu-27 Sta 34+606 CL
No ratings yet
Report Cptu-27 Sta 34+606 CL
12 pages
Cambridge IGCSE™: Chemsitry 0620/42 February/March 2022
No ratings yet
Cambridge IGCSE™: Chemsitry 0620/42 February/March 2022
9 pages
Personal Assistant-2
No ratings yet
Personal Assistant-2
2 pages
Aarav GA - 1 - 1MeasuringPractice
No ratings yet
Aarav GA - 1 - 1MeasuringPractice
3 pages
Mgbs Group Presentation Finalfinal
No ratings yet
Mgbs Group Presentation Finalfinal
15 pages
2017 - 2018 Odd ICT Scheme and Syllabus
No ratings yet
2017 - 2018 Odd ICT Scheme and Syllabus
49 pages
All About Me Part 2
No ratings yet
All About Me Part 2
13 pages
Praveen Kumar Policing The Police Enforcement
No ratings yet
Praveen Kumar Policing The Police Enforcement
10 pages
Spotify Hits Awards v5
No ratings yet
Spotify Hits Awards v5
9 pages
Davidson Three Vows
No ratings yet
Davidson Three Vows
144 pages
DSC, Solubility, NMR BCDX
No ratings yet
DSC, Solubility, NMR BCDX
14 pages
Educ 319 Task Five - Diversity Lesson
No ratings yet
Educ 319 Task Five - Diversity Lesson
6 pages
Allyah Denise M. Tumarong - EAPP G12 - Assignment
No ratings yet
Allyah Denise M. Tumarong - EAPP G12 - Assignment
3 pages
halliburton HT-400
No ratings yet
halliburton HT-400
29 pages
AIIMS Selective Broad Topics - Dr. Vishnu Somakumar
No ratings yet
AIIMS Selective Broad Topics - Dr. Vishnu Somakumar
1 page
Kolbrin Bible Summary
No ratings yet
Kolbrin Bible Summary
7 pages
Basic Vocal Training Part 2
No ratings yet
Basic Vocal Training Part 2
6 pages
Acpi C
No ratings yet
Acpi C
5 pages
Antenna Fundamentals 6sep2016
No ratings yet
Antenna Fundamentals 6sep2016
15 pages
AP艺术史官方大纲
No ratings yet
AP艺术史官方大纲
384 pages