0% found this document useful (0 votes)
2 views

supervised learning

The document discusses supervised learning and focuses on empirical risk minimization (ERM) as a method for estimating the lowest expected risk in classification problems. It outlines the concepts of generalization error, excess risk decomposition, and the challenges of overfitting in relation to ERM. The document also emphasizes the importance of understanding the relationship between empirical and expected risks, and introduces Hoeffding's inequality as a tool for controlling generalization error.

Uploaded by

aaaaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

supervised learning

The document discusses supervised learning and focuses on empirical risk minimization (ERM) as a method for estimating the lowest expected risk in classification problems. It outlines the concepts of generalization error, excess risk decomposition, and the challenges of overfitting in relation to ERM. The document also emphasizes the importance of understanding the relationship between empirical and expected risks, and introduces Hoeffding's inequality as a tool for controlling generalization error.

Uploaded by

aaaaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Supervised Learning (COMP0078)

6. Learning Theory (Part I)

Carlo Ciliberto

University College London


Department of Computer Science

1
Previous Classes

In the previous classes we:

• Formulated the supervised learning problem.


• Focused on Empirical Risk Minmization (ERM).
• Discussed overfitting and how to tackle it (heuristically).

2
Previous Classes

We studied a few different algorithms (in particular focusing on


classification):

• Nearest neighbor
• Least-squares
• Support Vector Machines
• Logistic Regression
• Decision Trees
• Ensemble methods (Bagging, Boosting)

We observed that many of these methods can be formulated as


ERM problems.

3
Outline

Equipped with our shiny new tool-set of supervised learning


algorithms, we will now go back to asking ourselves the
question:

| When/why/how do these methods “work”?

Outline:

• Emprical Risk Minimization (Again!)


• Generalization error
• Excess Risk Decomposition
• Regularization & Bias-Variance Trade-off (Again!)

4
Refresher on the Learning Problem

The goal of Supervised Learning is to find a “good” estimator


fn : X → Y, approximating the lowest expected risk

Z
inf E(f ), E(f ) = ℓ(f (x), y) dρ(x, y)
f :X →Y X ×Y

given only a finite number of (training) examples (xi , yi )ni=1


sampled independently from the unknown distribution ρ.

5
A Wishlist

What do we mean by... “good”?

6
A Wishlist

What do we mean by... “good”?

To have a low excess risk E(fn ) − E(f∗ )

• Consistency. Does E(fn ) − E(f∗ ) → 0


• in Expectation?
• in Probability?
as the size of the training set S = (xi , yi )ni=1 of points
randomly sampled from ρ grows, n → +∞.

• Learning Rates. How “fast” is consistency achieved?


Nonasymptotic bounds: finite sample complexity, tail
bounds, error bounds...

6
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use


them to gather some information about ρ (or better, on E(f ))?

7
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use


them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk


n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea?

7
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use


them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk


n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea? A simple calculation shows that
n n
1X 1X
ES∼ρn [En (f )] = E(xi ,yi )∼ρ [ℓ(f (xi ), yi )] = E(f ) = E(f )
n n
i=1 i=1
The expectation of En (f ) is the expected risk E(f )!
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use


them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk


n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea? A simple calculation shows that
n n
1X 1X
ES∼ρn [En (f )] = E(xi ,yi )∼ρ [ℓ(f (xi ), yi )] = E(f ) = E(f )
n n
i=1 i=1
The expectation of En (f ) is the expected risk E(f )!
7
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

8
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

1 Pn
Let X and (Xi )ni=1 be i.i.d. random variables, X̄n = n i=1 Xi .

Then
E[(X̄n − E(X ))2 ] = Var(X̄n )

8
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

1 Pn
Let X and (Xi )ni=1 be i.i.d. random variables, X̄n = n i=1 Xi .

Then
Var(X )
E[(X̄n − E(X ))2 ] = Var(X̄n ) =
n

Therefore, the expected (squared) distance between the


empirical mean of the Xi and their expectation E(X ) goes to
zero as O(1/n)... (Assuming X has finite variance).

8
Empirical Vs Expected Risk

If Xi = ℓ(f (xi ), yi ), we have X̄n = En (f ) and therefore

Vf
E[(En (f ) − E(f ))2 ] =
n
Where Vf = Var(ℓ(f (x), y )). In particular

r
Vf
E[|En (f ) − E(f )|] ≤
n

9
Empirical Vs Expected Risk

If Xi = ℓ(f (xi ), yi ), we have X̄n = En (f ) and therefore

Vf
E[(En (f ) − E(f ))2 ] =
n
Where Vf = Var(ℓ(f (x), y )). In particular

r
Vf
E[|En (f ) − E(f )|] ≤
n

Exercise. Can we similarly get a “tail” bound?


 
P |En (f ) − E(f )| ≤ ε(n, δ) ≥ 1 − δ

9
Empirical Vs Expected

Assume there exists a minimizer f∗ : X → Y of the expected risk

E(f∗ ) = inf E(f )


f ∈F

Then, for any f : X → Y we can decompose the excess risk as

E(f ) − E(f∗ ) = E(f ) − En (f ) + En (f ) − En (f∗ ) + En (f∗ ) − E(f∗ ),

We can therefore leverage on the statistical relation between En


and E to study the expected risk in terms of the empirical risk.

This leads naturally to Empirical Risk Minimization

10
Empirical Risk Minimization (ERM)

Let fn be the minimizer of the empirical risk

fn = arg min En (f )
f ∈F

We automatically have En (fn ) − En (f∗ ) ≤ 0 (for any training set).

Then...

E [E(fn ) − E(f∗ )] ≤ E [E(fn ) − En (fn )] (Question. Why?)

We can “just” focus on studying only the generalization error

E [E(fn ) − En (fn )]

11
Generalization Error

How can we control the generalization error


En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]

12
Generalization Error

How can we control the generalization error


En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]̸= 0 (Question. Why?)

12
Generalization Error

How can we control the generalization error


En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]̸= 0 (Question. Why?)

En and fn both depend on the sampled training data. Therefore,


we cannot use the result
r
Var(ℓ(fn (x), y))
E[|En (fn ) − E(fn )|] ≤
n
which indeed is not true in general...
12
Issues with ERM

LetX = Y = R, ρ with dense support1 and ℓ(y, y) = 0 ∀y ∈ Y.

For any S = (xi , yi )ni=1 with distinct inputs xi , let fn : X → Y be


(
yi if x = xi for some i = 1, . . . , n
fn (x) =
0 otherwise

Then, for any number n of training points:

• E [En (fn )] = 0
• E [E(fn )] = E(0), which is greater than E(f ∗ ) (unless f ∗ ≡ 0)

Therefore E [E(fn ) − En (fn )] = E(0) ↛ 0 as n increases!


1
and such that every pair (x, y ) has measure zero according to ρ

13
Overfitting

An estimator fn is said to overfit the training data if ∀n ∈ N:

• E [E(fn ) − E(f∗ )] > C for a constant C > 0, and

• E [En (fn ) − En (f∗ )] ≤ 0

According to this definition ERM overfits...

14
ERM on Finite Hypotheses Spaces

Is ERM hopeless? Consider the case X and Y finite.

Then, F = Y X = {f : X → Y} is finite as well (albeit possibly


very large), and therefore:

E|En (fn ) − E(fn )| ≤ E sup |En (f ) − E(f )|


f ∈F
X p
≤ E|En (f ) − E(f )| ≤ |F | VF /n
f ∈F

where VF = supf ∈F Vf and |F| denotes the cardinality of F.

15
ERM on Finite Hypotheses Spaces

Is ERM hopeless? Consider the case X and Y finite.

Then, F = Y X = {f : X → Y} is finite as well (albeit possibly


very large), and therefore:

E|En (fn ) − E(fn )| ≤ E sup |En (f ) − E(f )|


f ∈F
X p
≤ E|En (f ) − E(f )| ≤ |F | VF /n
f ∈F

where VF = supf ∈F Vf and |F| denotes the cardinality of F.

Here ERM works! limn→+∞ E|E(fn ) − E(f∗ )| = 0


15
ERM on Finite Hypotheses (Sub) Spaces

The same argument holds in general: let H ⊂ F be a finite


space of hypotheses (even if F is not). Then,
p
E|En (fn ) − E(fn )| ≤ |H| VH /n

In particular, if f∗ ∈ H, then
p
E|E(fn ) − E(f∗ )| ≤ |H| VH /n

and ERM is a good estimator for the problem considered.

16
Example: Threshold functions

Consider a binary classification problem Y = {0, 1}. Someone


has told us that the minimizer of the risk is a “threshold
function” fa∗ (x) = 1[a∗ ,+∞) with a∗ ∈ [−1, 1].

1.5
a
b

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

We can learn on H = {fa |a ∈ R} = [−1, 1]. However on a


computer we can only represent real numbers up to a given
precision. 17
Example: Threshold Functions (with precision p)

Discretization: given a p > 0, we can consider

Hp = {fa | a ∈ [−1, 1], a · 10p = [a · 10p ]}

with [a] denoting the integer part (i.e. the closest integer) of a
scalar a. The value p can be interpreted as the “precision” of
our space of functions Hp . Note that |Hp | = 2 · 10p

If f ∗ ∈ Hp , then we have automatically that


p √
E|E(fn ) − E(f∗ )| ≤ |Hp | VH /n ≤ 10p / n

(This is because VH ≤ 1/4 Question. Why?)

18
Rates in Expectation Vs Probability

In practice, even for small values of p



E|E(fn ) − E(f∗ )| ≤ 10p / n

will need a very large n in order to have a meaningful bound on


the expected error.

Interestingly, we can get much better constants (not rates


though!) by working in probability...

19
Hoeffding’s Inequality

Let X1 , . . . , Xn independent random variables s.t. Xi ∈ [ai , bi ].

1 Pn
Let X = n i=1 Xi . Then,

2n2 ϵ2
   
P X − E X ≥ ϵ ≤ 2 exp − Pn 2
i=1 (bi − ai )

20
Applying Hoeffding’s inequality

Assume that ∀f ∈ H, x ∈ X , y ∈ Y the loss is bounded


|ℓ(f (x), y)| ≤ M by some constant M > 0. Then, for any f ∈ H
we have

nϵ2
P (|En (f ) − E(f )| ≥ ϵ) ≤ 2 exp(− )
2M 2

21
Controlling the Generalization Error

We would like to control the generalization error En (fn ) − E(fn )


of our estimator in probability. One possible way to do that is by
controlling the generalzation error of the whole set H.

 
P (|En (fn ) − E(fn )| ≥ ϵ) ≤ P sup |En (f ) − E(f )| ≥ ϵ
f ∈H

The latter term is the probability that least one of the events
|En (f ) − E(f )| ≥ ϵ occurs for f ∈ H. In other words the
probability of the union of such events. Therefore

  X
P sup |En (f ) − E(f )| ≥ ϵ ≤ P (|En (f ) − E(f )| ≥ ϵ)
f ∈H f ∈H

by the so-called union bound.


22
Hoeffding the Generalization Error

By applying Hoeffding’s inequality,

nϵ2
P (|En (fn ) − E(fn )| ≥ ϵ) ≤ 2|H| exp(− )
2M 2

Or, equivalently, that for any δ ∈ (0, 1],

r
2M 2 log(2|H|/δ)
|En (fn ) − E(fn )| ≤
n
with probability at least 1 − δ.

23
Example: Threshold Functions (in Probability)

Going back to Hp space of threshold functions...


r
4 + 6p − 2 log δ
|En (fn ) − E(fn )| ≤
n
since M = 1 and
log 2|H| = log 4 · 10p = log 4 + p log 10 ≤ 2 + 3p.

For example, let δ = 0.001. We can say that


r
6p + 18
|En (fn ) − E(fn )| ≤
n
holds at least 99.9% of the times.

24
Bounds in Expectation Vs Probability

Comparing the two bounds



E |En (fn ) − E(fn )| ≤ 10p / n (Expectation)

While, with probability greater than 99.9%


r
6p + 18
|En (fn ) − E(fn )| ≤ (Probability)
n

Although we cannot be 100% sure of it, we can be quite


confident that the generalization error will be much smaller than
what the bound in expectation tells us...
Rates: note however that the rates of convergence to 0 are the

same (i.e. O(1/ n)).
25
Improving the bound in Expectation

Exploiting the bound in probability and the knowledge that on


Hp the excess risk is bounded by a constant, we can improve
the bound in expectation...
Let X be a random variable s.t. |X | < M for some constant
M > 0. Then, for any ϵ > 0 we have

E |X | ≤ ϵ P (|X | ≤ ϵ) + MP (|X | > ϵ)

Applying to our problem: for any δ ∈ (0, 1]

r
2M 2 log(2|Hp |/δ)
E |En (fn ) − E(fn )| ≤ (1 − δ) + δM
n
Therefore only log |Hp | appears (no |Hp | alone).

26
Infinite Hypotheses Spaces

What if f∗ ∈ H \ Hp for any p > 0?

ERM on Hp will never minimize the expected risk. There will


always be a gap for E(fn,p ) − E(f∗ ).

For p → +∞ it is natural to expect such gap to decrease... BUT


if p increases too fast (with respect to the number n of
examples) we cannot control the generalization error anymore!
r
6p + 18
|En (fn ) − E(fn )| ≤ → +∞ for p → +∞
n

Therefore we need to increase p gradually as a function p(n) of


the number of training examples. This approach is known as
regularization.

27
Approximation Error for Threshold Functions

Consider fp = 1[ap ,+∞) = arg minf ∈Hp E(f ) with ap ∈ [−1, 1].
We decompose the excess risk E(fn ) − E(f∗ ):

E(fn ) − En (fn ) + En (fn ) − En (fp ) +En (fp ) − E(fp ) + E(fp ) − E(f∗ )


| {z }
≤0

We already know how to control the generalization of fn (via the


supremum over Hp ) and fp (since it is a single function).
Moreover, we have that the approximation error is

E(fp ) − E(f∗ ) ≤ |ap − a∗ | ≤ 10−p (why?)

Note that it does not depend on training data!


28
Approximation Error for Threshold Functions II

Putting everything together: for any δ ∈ (0, 1] and p ≥ 0,


r
4 + 6p − 2 log δ
E(fn ) − E(f∗ ) ≤ 2 + 10−p = ϕ(n, δ, p)
n
holds with probability greater or equal to 1 − δ.

So, for any n and δ, we can choose the best precision as

p(n, δ) = arg min ϕ(n, δ, p)


p≥0

which leads to an error bound ϵ(n, δ) = ϕ(n, δ, p(n, δ)) holding


with probability larger or equal than 1 − δ.

29
Regularization

Most hypotheses spaces are “too” large and therefore prone to


overfitting. Regularization is the process of controlling the
“freedom” of an estimator as a function on the number of
training examples.

Idea. Parametrize H as a union H = ∪γ>0 Hγ of hypotheses


spaces Hγ that are not prone to overfitting (e.g. finite spaces).
γ is known as the regularization parameter (e.g. the precision p
in our examples). Assume Hγ ⊂ Hγ ′ if γ ≤ γ ′ .

Regularization Algorithm. Given n training points, find an


estimator fγ,n on Hγ (e.g. ERM on Hγ ). Let γ = γ(n) increase
as n → +∞.

30
Regularization and Decomposition of the Excess Risk

Let γ > 0 and fγ = argmin E(f )


f ∈Hγ

We can decompose the excess risk E(fγ,n ) − E(f∗ ) as

E(fγ,n ) − E(fγ ) + E(fγ ) − inf E(f ) + inf E(f ) − E(f∗ )


| {z } f ∈H f ∈H
Sample error
| {z } | {z }
Approximation error Irreducible error

31
Irreducible Error

inf E(f ) − E(f∗ )


f ∈H

Recall: H is the “largest” possible Hypotheses space we are


considering.

If the irreducible error is zero, H is called universal (e.g. the


RKHS induced by the Gaussian kernel is a universal
Hypotheses space).

32
Approximation Error

E(fγ ) − inf E(f )


f ∈H

• Does not depend on the dataset (deterministic).


• Does depend on the distribution ρ.
• Also referred to as bias.

33
Convergence of the Approximation Error

Under mild assumptions,

lim E(fγ ) − inf E(f ) = 0


γ→+∞ f ∈H

34
Density Results

lim E(fγ ) − E(f∗ ) = 0


γ→+∞

• Convergence of Approximation error


+
• Universal Hypotheses space

Note: It corresponds to a density property of the space H in


F = {f : X → Y}

35
Approximation error bounds

E(fγ ) − inf E(f ) ≤ A(ρ, γ)


f ∈H

• No rates without assumptions – related to the so-called No


Free Lunch Theorem.
• Studied in Approximation Theory using tools such as
Kolmogorov n-width, K-functionals, interpolation spaces. . .
Prototypical result:
If f∗ is s-“regular”2 parameter equal to s, then
A(ρ, γ) = cγ −s .
2
Some abstract notion of regularity parametrizing the class of target
functions. Typical example: f∗ in a Sobolev space W s,2 .
36
Sample Error

E(fγ,n ) − E(fγ )

Random quantity depending on the data.

Two main ways to study it:

• Capacity/Complexity estimates on Hγ .

• Stability.

37
Sample Error Decomposition

We have seen how to decompose E(fγ,n ) − E(fγ ) in

E(fγ,n ) − En (fγ,n ) + En (fγ,n ) − En (fγ ) + En (fγ ) − E(fγ )


| {z } | {z } | {z }
Generalization error Excess empirical Risk Generalization error

38
Generalization Error(s)

As we have observed,

E(fγ,n ) − En (fγ,n ) and En (fγ ) − E(fγ )

Can be controlled by studying the empirical process

sup |En (f ) − E(f )|


f ∈Hγ

Example: we have already observed that for a finite space Hγ

" # r
VHγ
E sup |En (f ) − E(f )| ≤ |Hγ |
f ∈Hγ n

39
ERM on Finite Spaces and Computational Efficiency

The strategy used for threshold functions can be generalized to


any H for which it is possible to find a finite discretization Hp
with respect to the L1 (X , ρX ) norm (e.g. H compact with
respect to such norm).

However, it could be computationally very expensive to find the


empirical risk minimizer on a discretization Hp , since in
principle it could be necessary to evaluate En (f ) for any f ∈ Hp .

40
ERM on Convex Spaces?

As we have seen in previous classes, ERM on convex (thus


dense) spaces is often much more amenable to computations.

In principle, we have observed that on infinite hypotheses


spaces it is difficult to control the generalization error but...

...we might be able to leverage the discretization argument


used for threshold functions to control the generalization error
of ERM for a larger family of hypotheses spaces.

41
Example: Risks for Continuous functions

Let X ⊂ Rd be a compact (i.e., closed and bounded) space and


C(X ) be the space of continuous functions. Let ∥ · ∥∞ be
defined for any f ∈ C(X ) as ∥f ∥∞ = supx∈X |f (x)|.

If the loss function ℓ : Y × Y → R is such that ℓ(·, y) is uniformly


Lipschitz with constant L > 0, for any y ∈ Y, we have that

1) |E(f1 ) − E(f2 )| ≤ L∥f1 − f2 ∥L1 (X ,ρX ) ≤ L∥f1 − f2 ∥∞ , and


n
1X
2) |En (f1 ) − En (f2 )| ≤ |ℓ(f1 (xi ), yi ) − ℓ(f2 (xi ), yi )| ≤ L∥f1 − f2 ∥∞
n
i=1

Therefore, “close” functions in ∥ · ∥∞ will have similar expected


and empirical risks!

42
Example: Covering numbers

We define the covering number of H of radius η > 0 as the


cardinality of a minimal cover of H with balls of radius η.
m
[
N (H, η) = inf{m | H ⊆ Bη (hi ) hi ∈ H}
i=1

Image credits: Lorenzo Rosasco.

Example. If H ∼
= BR (0) is a ball of radius R in Rd :
N (BR (0), η) = (4R/η)d 43
Example: Covering numbers (continued)

Note that the resolution η is related to the precision in the


sense of a distance between two hypothesis. We can apply the
same reasoning used for threshold functions
  r
VH
E sup |E(fn ) − E(f )| ≤ 2Lη + N (H, η)
f ∈H n

For η → 0 the covering number N (H, η) → +∞. However, for


n → +∞ the bound tends to zero.
It is typically possible to find an η(n) for which the bound tends
to zero as n → +∞.
(Exercise. find such an η(n) for H a ball of radius R)

44
Example: Covering numbers (continued)

Same argument can be reproduced for bounds in probability,


namely for any δ ∈ [0, 1),

r
2M 2 log(2N (H, η)/δ)
sup |E(fn ) − E(f )| ≤ 2Lη +
f ∈H n

holds with probability at least 1 − δ.

45
Complexity Measures

In general, the error

sup |En (f ) − E(f )|


f ∈Hγ

Can be controlled via capacity/complexity measures:

• Covering numbers,
• combinatorial dimension, e.g. VC-dimension, fat-shattering
dimension
• Rademacher complexities
• Gaussian complexities
• ...
46
Prototypical Results

A prototypical result (under suitable assumptions, e.g.


regularity of f∗ ):

E(fγ,n ) − E(f ∗ ) ≤ E(fγ,n ) − E(fγ ) + E(fγ ) − E(f ∗ )


| {z } | {z }
≲ γ β n−α ≲ γ −τ
(Variance) (Bias)

47
Choosing γ(n) in practice

The best γ(n) depends on the unknown distribution ρ. So how


can we choose such parameter in practice?

Problem known as model selection. Possible approaches:

• Cross validation,
• complexity regularization/structural risk minimization,
• balancing principles.
• ...

48
Abstract Regularization

We got a new perspective on the concept of regularization:


controlling the expressiveness of the hypotheses space
according to the number of training examples in order to
guarantee good prediction performance and consistency.

There are many ways to implement this strategy in practice:

• Tikhonov (and Ivanov) regularization


• Spectral filtering
• Early stopping
• Random sampling
• ...

49
Wrapping Up

Building on a few (reasoble?) assumptions on the learning


problem, we:

• Have shown how the empirical risk can be a proxy for the
expected.
• Identified the main reasons behind overfitting and
discussed how to counteract it (in a more principle way!).
• Highlighted the key role played by the choice of the
hypotheses space and how their “complexity” affect
performance.

Next class we will focus on one such measure of complexity


and derive upper bounds on the generalization error.

50
Recommended Reading

Chapter 4, 5 and 6 of Shalev-Shwartz, Shai, and Shai


Ben-David. Understanding machine learning: From theory to
algorithms. Cambridge university press, 2014.

51

You might also like