0% found this document useful (0 votes)

2 views

Reproducing Kernel Hilbert Spaces-Greg Durrett

The document discusses the concept of Reproducing Kernel Hilbert Spaces (RKHS) and their application in statistical learning theory, emphasizing the importance of regularization to prevent overfitting in high-dimensional data. It introduces the notion of kernels and their role in defining RKHS, along with the representer theorem and various examples of reproducing kernels. The document also highlights the relationship between RKHS and functional analysis, providing a foundation for understanding their utility in machine learning algorithms.

Uploaded by

Avi Roy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Reproducing Kernel Hilbert Spaces-Greg Durrett

Uploaded by

Avi Roy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

9.

520: Statistical Learning Theory and Applications February 10th, 2010

Reproducing Kernel Hilbert Spaces

Lecturer: Lorenzo Rosasco Scribe: Greg Durrett

1 Introduction
In the previous two lectures, we’ve discussed the connections of the learning problem to statistical
inference. However, unlike in traditional statistics, our primary goal with learning is to predict the
future rather than describe the data at hand. We also typically have a much smaller sample of data
in a much higher-dimensional space, so we cannot blindly choose a model and assume it will be
accurate. If the model is too highly-parameterized, it will react too strongly to the data, we will
overfit the data, and we will fail to learn the underlying phenomenon (see Figure 1 for an example of
this behavior). However, models with too few parameters may not even describe the training data
adequately, and will provide similarly bad performance.
Regularization provides us with one way to strike the appropriate balance in creating our model.
It requires a (possibly large) class of models and a method for evaluating the complexity of each
model in the class. The concept of “kernels” will provide us with a flexible, computationally feasible
method for implementing this scheme.

(a) Data set for which we wish to (b) Smooth function that will likely (c) Function that probably does not
learn a function be a good predictor of future points model the data well, but still mini-
mizes empirical error

Figure 1: Two different functions learned from a small training set.

The goal of these notes will be to introduce a particularly useful family of hypothesis spaces called
reproducing kernel Hilbert spaces (RKHS), each of which is associated with a particular kernel, and
to derive the general solution of Tikhonov regularization in RKHS, known as the representer theorem.

2 Regularization
The goal of regularization is to restore the well-posedness (specifically, making the result depend
smoothly on the data) of the empirical risk minimization (ERM) technique by effectively restricting
the hypothesis space H. One way of doing this is introduce a penalization term in our minimization
as follows:
ERR(f ) +λ pen(f )
| {z } | {z }
empirical error penalization term

3-1
where the regularization parameter λ controls the tradeoff between the two terms. This will then
cause the minimization to seek out simpler functions, which incur less of a penalty.
Tikhonov regularization can be written in this way, as
n
1X
V (f (xi ), yi ) + λkf k2H , (1)
n i=1

where

• λ > 0 is a regularization parameter,

• V (f (x), y) is the loss function, that is the price we pay when we predict f (x) in place of y, and
• k · kH is the norm in the function space H.
This formulation is powerful, as it does not present a specific algorithm, but rather a large class
of algorithms. By choosing V and H differently, we can derive a wide variety of commonly-used
techniques, including traditional linear regression and support vector machines (SVMs).
Given our intuition about what causes overfitting, the penalization should somehow force us to
choose f to be as smooth as possible while still fitting the data. The norm from our function space
H will allow us to encode this criterion, but in order to design this norm appropriately, we need to
describe reproducing kernel Hilbert spaces.

3 Functional Analysis Background

In order to define RKHS, we will make use of several terms from functional analysis, which we
define here. Additional review on functional analysis can be found in the notes from the math camp,
available on the website.

Definition 1 A function space F is a space whose elements are functions, e.g. f : Rd → R.

Definition 2 An inner product is a function h·, ·i : F × F → R that satisfies the following

properties for every f, g ∈ F and α ∈ R:

1. Symmetric: hf, gi = hg, f i

2. Linear: hr1 f1 + r2 f2 , gi = r1 hf1 , gi + r2 hf2 , gi
3. Positive-definite: hf, f i ≥ 0 for all f ∈ F and hf, f i = 0 iff f = 0.

Definition 3 A norm is a nonnegative function k · k : F → R such that for all f, g ∈ F and α ∈ R

1. kf k ≥ 0 and kf k = 0 iff f = 0;

2. kf + gk ≤ kf k + kgk;
3. kαf k = |α| kf k.
p
A norm can be defined via an inner product, as kf k = hf, f i

Note that while the dot product in Rd is an example of an inner product, an inner product is more
general than this, and requires only those properties given above. Similarly, while the Euclidean
norm is an example of a norm, we consider a wider class of norms on the function spaces we will
use.

3-2
Definition 4 A Hilbert space is a complete, (possibly) infinite-dimensional linear space endowed
with an inner product.
p
A norm in H can be naturally defined from the given inner product, as k · k = h·, ·i. Although
it is possible to impose a different norm so long as it satisfies the criteria given above, we will not do
this in general; our norm is assumed to be the norm derived from the inner product. Furthermore,
we always assume that H is separable (contains a countable dense subset) so that H has a countable
orthonormal basis.
While this tells us what a Hilbert space is, it is not intuitively clear why we need this mechanism,
or what we gain by using it. Essentially, a Hilbert space lets us apply concepts from finite-dimensional
linear algebra to infinite-dimensional spaces of functions. In particular, the fact that a Hilbert space
is complete will guarantee the convergence of certain algorithms. More importantly, the presence
of an inner product allows us to make use of orthogonality and projections, which will later become
important.

3.1 Examples of function spaces

• One function space is the space of continuous functions on the interval [a, b], denoted by C[a, b].
A norm can be established by defining
kf k = max |f (x)|
a≤x≤b

However, there is no inner product for the space that induces this norm, so it is not a Hilbert
space.
• Another example is square integrable functions on the interval [a, b], denoted by L2 [a, b]. We
define the inner product as
Z b
hf, gi = f (x)g(x)dx
a
This produces the correct norm:
Z b
kf k = f 2 (x)dx
a
It can be checked that this space is complete, so it is a Hilbert space. However, there is one
problem with the functions in this space. Consider trying to evaluate the function f (x) at the
point x = k. There exists a function g in the space defined as follows:
(
c if x = k
g(x) =
f (x) otherwise
Because it differs from f only at one point, g is clearly still square-integrable, and moreover,
kf − gk = 0. However, we can set the constant c (or, more generally, the value of g(x) at any
finite number of points) to an arbitrary real value. What this means is that a condition on the
integrability of the function is not strong enough to guarantee that we can use it predictively,
since prediction requires evaluating the function at a particular data value. This characteristic
is what will differentiate reproducing kernel Hilbert spaces from ordinary Hilbert spaces, as
we discuss in the next section.

4 Reproducing Kernel Hilbert Spaces

Definition 5 An evaluation functional over the Hilbert space of functions H is a linear functional
Ft : H → R that evaluates each function in the space at the point t, or
Ft [f ] = f (t) for all f ∈ H.

3-3
Definition 6 A Hilbert space H is a reproducing kernel Hilbert space (RKHS) if the evalu-
ation functionals are bounded, i.e. if for all t there exists some M > 0 such that

|Ft [f ]| = |f (t)| ≤ M kf kH for all f ∈ H

This condition is not trivial. For L2 [a, b], we showed above that there exist functions that are square-
integrable, but which have arbitrarily large values on finite point sets. In this case, no choice of M
will give us the appropriate bound on these functions on these point sets.
While this condition might seem obscure or specific, it is actually quite general and is the weakest
possible condition that ensures us both the existence of an inner product and the ability to evaluate
each function in the space at every point in the domain. In practice, it is difficult to work with this
definition directly. We would like to establish an equivalent notion that is more useful in practice.
To do this, we will need the “reproducing kernel” from which the reproducing kernel Hilbert space
takes its name.
First, from the definition of the reproducing kernel Hilbert space, we can use the Riesz represen-
tation theorem to prove the following property.
Theorem 7 If H is a RKHS, then for each t ∈ X there exists a function Kt ∈ H (called the
representer of t) with the reproducing property

Ft [f ] = hKt , f iH = f (t) for all f ∈ H.

This allows us to represent our linear evaluation functional by taking the inner product with an
element of H. Since Kt is a function in H, by the reproducing property, for each x ∈ X we can write

Kt (x) = hKt , Kx iH .

We take this to be the definition of reproducing kernel in H.

Definition 8 The reproducing kernel (rk) of H is a function K : X × X → R, defined by

K(t, x) := Kt (x)

In general, we have the following definition of a reproducing kernel.

Definition 9 Let X be some set, for example a subset of Rd or Rd itself. A function K : X ×X → R
is a reproducing kernel if it is symmetric, i.e. K(x, y) = K(y, x), and positive definite:
n
X
ci cj K(ti , tj ) ≥ 0
i,j=1

for any n ∈ N and choice of t1 , ..., tn ∈ X and c1 , ..., cn ∈ R.

Having this general notion of a reproducing kernel is important because it allows us to define
an RKHS in terms of its reproducing kernel, rather than attempting to derive the kernel from the
definition of the function space directly. The following theorem formally establishes the relationship
between the RKHS and a reproducing kernel.

Theorem 10 A RKHS defines a corresponding reproducing kernel. Conversely, a reproducing ker-

nel defines a unique RKHS.

Proof: To prove the first statement, we must prove that the reproducing kernel K(t, x) =
hKt , Kx iH is symmetric and positive-definite.
Symmetry follows from the symmetry property of inner products:

hKt , Kx iH = hKx , Kt iH .

3-4
K is positive-definite because
n
X n
X n
X 2
ci cj K(ti , tj ) = ci cj hKti , Ktj iH = cj Ktj ≥ 0.
H
i,j=1 i,j=1 j=1

To prove the second statement, given K one can construct the RKHS H as the completion of the
space of functions spanned by the set {Kx |x ∈ X} with an inner product defined as follows: given
two functions f and g in span{Kx |x ∈ X}
s
X
f (x) = αi Kxi (x)
i=1
0
s
X
g(x) = βi Kx0i (x)
i=1

we define their inner product to be

0
s X
X s
hf, giH = αi βj K(xi , x0j ).
i=1 j=1

(This is only a sketch of the proof.)

Now we have a more concrete concept of what an RKHS is and how we might create such spaces
for ourselves. If we can succeed at writing down a reproducing kernel, we know that there exists
an associated RKHS, and we need not concern ourselves with the particulars of the boundedness
criterion.

4.1 Examples of reproducing kernels

• Linear kernel
K(x, x0 ) = x · x0
• Gaussian kernel
kx−x0 k2
K(x, x0 ) = e− σ2 , σ>0
• Polynomial kernel
K(x, x0 ) = (x · x0 + 1)d , d∈N

4.2 Historical remarks

RKHS were explicitly introduced in learning theory by Girosi (1997). Poggio and Girosi (1989) in-
troduced Tikhonov regularization in learning theory and worked with RKHS only implicitly, because
they dealt mainly with hypothesis spaces on unbounded domains, which we will not discuss here. Of
course, RKHS were used much earlier in approximation theory (eg Wahba, 1990...) and computer
vision (eg Bertero, Torre, Poggio, 1988...).
In general, it is quite difficult to find useful function spaces that aren’t RKHS.

5 Norms and Smoothness

We established earlier that if a space of functions can be represented as an RKHS, it has useful
properties (namely the inner product and the ability for each function to be evaluated at each
point) that allow us to use it to solve learning problems. Armed with the notion of kernels, we can
now describe specific examples of RKHS and examine how their different norms provide different
forms of regularization.

3-5
Sobolev kernel Consider functions f : [0, 1] → R with f (0) = f (1) = 0. The kernel
K(x, y) = Θ(y − x)(1 − y)x + Θ(x − y)(1 − x)y
induces the norm Z Z
kf k2H = (f 0 (x))2 dx = ω 2 |F (ω)|2 dω
R∞
where F (ω) = F{f }(ω) = −∞ f (t)e−iωt dt is the Fourier tranform of f . Such a norm is very
useful because it allows us to regularize on the basis of frequency content. In particular, the more
prominent the high-frequency components of f , the higher kf k2H will be; in fact, the norm will be
infinite for any function whose frequency magnitudes do not decay faster than ω1 . This imposes a
condition on the smoothness of the functions, since a high derivative gives rise to high frequency
components.
The (somewhat mysterious) reproducing kernel written above was designed to yield this useful
norm, and was not created arbitrarily.

Gaussian kernel It is possible to see that the Gaussian kernel yields as the norm:
Z
1 σ2 ω2
kf k2H = d
|F (ω)|2 e 2 dω
2π
which penalizes high-frequency components even more harshly.

5.1 Linear case

We illustrate how regularization controls complexity through a simple linear case. Our function
space is 1-dimensional lines through the origin with a linear kernel:
f (x) = w x and K(x, xi ) ≡ x xi
giving an RKHS norm of
kf k2H = hf, f iH = hKw , Kw iH = K(w, w) = w2
so that our measure of complexity is the slope of the line. We want to separate two classes using
lines and see how the magnitude of the slope corresponds to a measure of complexity.
A classification problem can be thought of as “harder” when the distinctions between the two
classes are less pronounced. In Figure 2, we see that the less separation there is between the x-values
of the two classes, the steeper the slope of the line that is required to model the relationship. Having
a norm that increases with slope is therefore a good choice in this case: by penalizing lines with
high slope, we only use complex solutions to problems if doing so is necessary to reduce the training
error.

6 Solving Tikhonov Regularization: The Representer Theo-

rem
6.1 Well-posedness, existence, and uniqueness
Now that we have RKHS and sensible norms to use for regularization, we can revisit Tikhonov
regularization in a more concrete setting. The algorithms (regularization networks) that we want to
study are defined by an optimization problem over RKHS,
n
1X
fSλ = arg min V (f (xi ), yi ) + λkf k2H
f ∈H n
i=1

3-6
! ! !

"#$ "#$ "#$

" " "

%#$ %#$ %#$

'(&)

'()*

'(&)
% % %

!%#$ !%#$ !%#$

!" !" !"

!"#$ !"#$ !"#$

!! !! !!
!! !"#$ !" !%#$ % %#$ " "#$ ! !! !"#$ !" !%#$ % %#$ " "#$ ! !! !"#$ !" !%#$ % %#$ " "#$ !
& & &

(a) Wide margin (b) Moderate margin (c) Small margin

Figure 2: Three different training sets to demonstrate that higher slopes are necessary to describe
the data as the class distinctions become finer.

where the regularization parameter λ is a positive real number, H is the RKHS as defined by the
reproducing kernel K(·, ·), and V (·, ·) is the loss function.
We have imposed stability on this problem through the use of regularization, but we still need to
check the other two criteria of well-posedness. Does there always exist a solution to the minimization,
and is that solution unique? As it turns out, this requires a condition on the loss function. If the
positive loss function V (·, ·) is convex with respect to its first entry, the functional
n
1X
Φ[f ] = V (f (xi ), yi ) + λkf k2H
n i=1

is strictly convex and coercive (meaning that it grows quickly at the extremes of the space),
hence it has exactly one local (and therefore global) minimum.
Both the squared loss and the hinge loss are convex (see Figure 3). On the contrary the 0-1 loss

V = Θ(−f (x)y),

where Θ(·) is the Heaviside step function, is not convex.

9
4

8
3.5
7
3
6

2.5
5
Hinge Loss
L2 loss

2
4

1.5
3

1
2

0.5
1

0
0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
y−f(x) y * f(x)

(a) Square loss (b) Hinge loss

Figure 3: Two examples of convex loss functions.

6.2 The representer theorem

There is one additional issue to resolve. Because H is a function space, we note that it may be
infinite-dimensional. While this is not a problem in theory, it does pose a computational problem:
how can we represent a function with an infinite number of parameters on a computer with a finite

3-7
amount of storage? Our solution to Tikhonov regularization could in principle be impossible to write
down for this reason, but it is a surprising result that it actually has a very compact representation,
as described in the following theorem.

Theorem 11 (The Representer Theorem) The minimizer over the RKHS H, fSλ , of the regu-
larized empirical functional
IS [f ] + λkf k2H ,
can be represented by the expression
n
X
fSλ (x) = ci K(xi , x),
i=1

for some n-tuple (c1 , . . . , cn ) ∈ Rn . Hence, minimizing over the (possibly infinite-dimensional)
Hilbert space boils down to minimizing over Rn .

There are only a finite number n of training set points, so the fact that the minimizer can be
written as a linear combination of kernel terms from these points guarantees that we can represent
the minimizer as a vector in Rn .
We provide a sketch of the proof for this theorem.
Proof: Define the linear subspace of H,
n
X
H0 = {f ∈ H | f = αi Kxi }.
i=1

This is the space spanned by the representers of the training set. Let H0⊥ be the linear subspace of
H orthogonal to H0 , i.e.

H0⊥ = {g ∈ H | hg, f i = 0 for all f ∈ H0 }.

H0 is finite-dimensional, hence closed, so we can write H = H0 ⊕ H0⊥ . Now we see that every
f ∈ H can be uniquely decomposed into a component along H0 , denoted by f0 , and a component
perpendicular to H0 , given by f0⊥ :
f = f0 + f0⊥ .
By orthogonality
kf0 + f0⊥ k2 = kf0 k2 + kf0⊥ k2
and by the reproducing property
IS [f0 + f0⊥ ] = IS [f0 ],
since evaluating f (xi ) = f0 (xi ) + f0⊥ (xi ) to compute the empirical error requires taking the inner
product with the representer Kxi , and doing so nullifies the f0⊥ term while preserving the f0 term
intact.
Combining these two facts, we see that

IS [f0 + f0⊥ ] + λkf0 + f0⊥ k2H = IS [f0 ] + λkf0 k2H + λkf0⊥ k2H ≥ IS [f0 ] + λkf0 k2H

Hence the minimum fSλ = f0 must belong to the linear space H0 .

This mechanism for implementing Tikhonov regularization can be applied to regularized least-
squares regression and support vector machines, as we will do in the next two classes.

3-8

Readiness in Work Immersion
69% (13)
Readiness in Work Immersion
6 pages
OET: Speaking Tips
91% (35)
OET: Speaking Tips
16 pages
Class03 Rkhs Scribe
No ratings yet
Class03 Rkhs Scribe
8 pages
Class 03
No ratings yet
Class 03
40 pages
Reproducing Kernel Hilbert Spaces
No ratings yet
Reproducing Kernel Hilbert Spaces
4 pages
7 PDF
No ratings yet
7 PDF
4 pages
Class03 PDF
No ratings yet
Class03 PDF
40 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
Reproducing Kernel Banach Spaces For Machine Learning: Haizhang Zhang Yuesheng Xu
No ratings yet
Reproducing Kernel Banach Spaces For Machine Learning: Haizhang Zhang Yuesheng Xu
35 pages
What Is An RKHS?: 1 Outline
No ratings yet
What Is An RKHS?: 1 Outline
24 pages
Reproducing Kernel Hilbert Spaces
No ratings yet
Reproducing Kernel Hilbert Spaces
5 pages
A Reproducing Kernel Hilbert Space Framework For Information-Theoretic Learning
No ratings yet
A Reproducing Kernel Hilbert Space Framework For Information-Theoretic Learning
12 pages
Functionspaces PDF
No ratings yet
Functionspaces PDF
15 pages
Applications of Reproducing Kernel Hiibert Spaces
No ratings yet
Applications of Reproducing Kernel Hiibert Spaces
16 pages
B4 2LectureNotes
No ratings yet
B4 2LectureNotes
51 pages
Arthur Gretton - Slides4A
No ratings yet
Arthur Gretton - Slides4A
121 pages
Foundations Computational Mathematics: Online Learning Algorithms
No ratings yet
Foundations Computational Mathematics: Online Learning Algorithms
26 pages
Quantum Information Processing Lecture Notes, Wolf
No ratings yet
Quantum Information Processing Lecture Notes, Wolf
62 pages
RKHS_0
No ratings yet
RKHS_0
13 pages
Vector-Valued Reproducing Kernel Hilbert Spaces: With Applications To Function Extension and Image Colorization
No ratings yet
Vector-Valued Reproducing Kernel Hilbert Spaces: With Applications To Function Extension and Image Colorization
71 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
(1963) Probability Density Functionals and Reproducing Kernel Hilbert Spaces (Parzen)
No ratings yet
(1963) Probability Density Functionals and Reproducing Kernel Hilbert Spaces (Parzen)
15 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Introduction To Functional Analysis PDF
No ratings yet
Introduction To Functional Analysis PDF
103 pages
(De Gruyter Studies in Mathematics, 87) RenÃ© Erlin Castillo - Fourier Meets Hilbert and Riesz - An Introduction to the Corresponding Transforms-De Gruyter (2022)
No ratings yet
(De Gruyter Studies in Mathematics, 87) RenÃ© Erlin Castillo - Fourier Meets Hilbert and Riesz - An Introduction to the Corresponding Transforms-De Gruyter (2022)
307 pages
Lecture Notes OperatorML 080424
No ratings yet
Lecture Notes OperatorML 080424
65 pages
2-03-Hilbert Space
No ratings yet
2-03-Hilbert Space
16 pages
Reproducing Kernel Hilbert Spaces For Penalized Regression: A Tutorial
No ratings yet
Reproducing Kernel Hilbert Spaces For Penalized Regression: A Tutorial
25 pages
Economics 712 Steven N. Durlauf Fall 2010
No ratings yet
Economics 712 Steven N. Durlauf Fall 2010
13 pages
Hilbert Space For Random Processes
No ratings yet
Hilbert Space For Random Processes
11 pages
An_Introduction_to_Wavelets
No ratings yet
An_Introduction_to_Wavelets
30 pages
Paper-II-Functional-Analysis
No ratings yet
Paper-II-Functional-Analysis
125 pages
Script 2
No ratings yet
Script 2
62 pages
Brief Introduction To Hilbert Space
100% (2)
Brief Introduction To Hilbert Space
9 pages
MTH411 Jan 2022 Notes Final
No ratings yet
MTH411 Jan 2022 Notes Final
107 pages
Bedos LinAlg
No ratings yet
Bedos LinAlg
110 pages
Mmii
No ratings yet
Mmii
316 pages
2011 Zhang and Zhang Appl Comp Harm Anan
No ratings yet
2011 Zhang and Zhang Appl Comp Harm Anan
25 pages
Introduction To Hilbert Spaces Cheet Sheet
No ratings yet
Introduction To Hilbert Spaces Cheet Sheet
22 pages
Notes Functional Analysis (2)
No ratings yet
Notes Functional Analysis (2)
178 pages
(Rosenberger, 1997) - Functional Analysis Introduction To Spectral Theory in Hilbert Spaces
No ratings yet
(Rosenberger, 1997) - Functional Analysis Introduction To Spectral Theory in Hilbert Spaces
62 pages
Hilbert
No ratings yet
Hilbert
91 pages
Class04 Feature+Kernels
No ratings yet
Class04 Feature+Kernels
35 pages
Math Camp 1: Functional Analysis
No ratings yet
Math Camp 1: Functional Analysis
50 pages
analysis b notes
No ratings yet
analysis b notes
149 pages
Hilbert Review
No ratings yet
Hilbert Review
16 pages
Apunte Tópicos
No ratings yet
Apunte Tópicos
37 pages
Main A
No ratings yet
Main A
24 pages
Stats 231 / CS229T Homework 3 Solutions
No ratings yet
Stats 231 / CS229T Homework 3 Solutions
6 pages
Functional Analysis Master
No ratings yet
Functional Analysis Master
107 pages
Functional Analysis
0% (2)
Functional Analysis
107 pages
Functional Analysis Master PDF
No ratings yet
Functional Analysis Master PDF
107 pages
s-m-s-t-c--f-a-23-24--lecture1
No ratings yet
s-m-s-t-c--f-a-23-24--lecture1
11 pages
g
No ratings yet
g
5 pages
Functional Analysis
100% (1)
Functional Analysis
131 pages
Lectures in Functional Analysis-Roman Vershynin PDF
No ratings yet
Lectures in Functional Analysis-Roman Vershynin PDF
131 pages
The Kernels of The Hankel and Toeplitz Operator
No ratings yet
The Kernels of The Hankel and Toeplitz Operator
43 pages
Hilberspaces
No ratings yet
Hilberspaces
33 pages
An Introduction to Lebesgue Integration and Fourier Series
From Everand
An Introduction to Lebesgue Integration and Fourier Series
Howard J. Wilcox
No ratings yet
Optimization in Function Spaces
From Everand
Optimization in Function Spaces
Amol Sasane
No ratings yet
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Exercises of Functional Analysis
From Everand
Exercises of Functional Analysis
Simone Malacrida
No ratings yet
A fast algorithm for 2-D KS two sample tests-Yuanhui Xiao
No ratings yet
A fast algorithm for 2-D KS two sample tests-Yuanhui Xiao
6 pages
Transformations in Density Estimation
No ratings yet
Transformations in Density Estimation
12 pages
Robust Kernel Density Estimation with Median-of-Means principle-Humbert
No ratings yet
Robust Kernel Density Estimation with Median-of-Means principle-Humbert
22 pages
sadhanala19a
No ratings yet
sadhanala19a
10 pages
Robust Kernel Density Estimation-Kim And Scott
No ratings yet
Robust Kernel Density Estimation-Kim And Scott
37 pages
How To Teach Beginner Learners Like A Pro
100% (2)
How To Teach Beginner Learners Like A Pro
12 pages
Interview Assignment
No ratings yet
Interview Assignment
8 pages
LAS Math 4 Week 6.2 Francisca R. Mompero
100% (2)
LAS Math 4 Week 6.2 Francisca R. Mompero
7 pages
CV Gaire 12 2022
No ratings yet
CV Gaire 12 2022
18 pages
EEE141L
No ratings yet
EEE141L
2 pages
Physics Project 2
No ratings yet
Physics Project 2
17 pages
Open House - Class-9 & 11 Sci - Comm
No ratings yet
Open House - Class-9 & 11 Sci - Comm
2 pages
EDU406 Topic 01
No ratings yet
EDU406 Topic 01
2 pages
Tips Regarding Upsc Cse Preparation: Yogesh Kumbhejkar Air 8 Cse 2015
No ratings yet
Tips Regarding Upsc Cse Preparation: Yogesh Kumbhejkar Air 8 Cse 2015
11 pages
Foreign Language Teaching and Learning PDF
No ratings yet
Foreign Language Teaching and Learning PDF
7 pages
Paid Full
No ratings yet
Paid Full
16 pages
Measuring Productivity British English Teacher 2
No ratings yet
Measuring Productivity British English Teacher 2
11 pages
Hay Wingo Phonics
100% (1)
Hay Wingo Phonics
118 pages
g9 System of Quadratic Equation.
No ratings yet
g9 System of Quadratic Equation.
15 pages
Instructional Planning RESEARCH 2 COT 1
No ratings yet
Instructional Planning RESEARCH 2 COT 1
9 pages
10 1001@jama 2019 21579
No ratings yet
10 1001@jama 2019 21579
2 pages
What Is Thoracic Surgery dl4
No ratings yet
What Is Thoracic Surgery dl4
2 pages
Zsigmondy's Theorem
No ratings yet
Zsigmondy's Theorem
4 pages
Cambridge O Level: Physics 5054/32
No ratings yet
Cambridge O Level: Physics 5054/32
9 pages
UB2009 - Worksheet19
No ratings yet
UB2009 - Worksheet19
3 pages
XI Book List
No ratings yet
XI Book List
4 pages
Gibbs Reflective Cycle
100% (1)
Gibbs Reflective Cycle
13 pages
Han_Ong
No ratings yet
Han_Ong
5 pages
Medical Report: Patient Name: Dob: ID: Clinic: Physician: Report Date
No ratings yet
Medical Report: Patient Name: Dob: ID: Clinic: Physician: Report Date
6 pages
Lightning Literature 8 Reviews
100% (1)
Lightning Literature 8 Reviews
5 pages
Ancestors of John Taylor B. 1403 Shadowchurst Kent, England
No ratings yet
Ancestors of John Taylor B. 1403 Shadowchurst Kent, England
2 pages
Cy08msp Codebook 5thdecember23
No ratings yet
Cy08msp Codebook 5thdecember23
4,351 pages
(FROHLICH, 2004) Fueling The Ethnography Imagination by Design
No ratings yet
(FROHLICH, 2004) Fueling The Ethnography Imagination by Design
16 pages

Reproducing Kernel Hilbert Spaces-Greg Durrett

Uploaded by

Reproducing Kernel Hilbert Spaces-Greg Durrett

Uploaded by

9.

520: Statistical Learning Theory and Applications February 10th, 2010

Reproducing Kernel Hilbert Spaces

Figure 1: Two different functions learned from a small training set.

• λ > 0 is a regularization parameter,

3 Functional Analysis Background

Definition 1 A function space F is a space whose elements are functions, e.g. f : Rd → R.

Definition 2 An inner product is a function h·, ·i : F × F → R that satisfies the following

1. Symmetric: hf, gi = hg, f i

Definition 3 A norm is a nonnegative function k · k : F → R such that for all f, g ∈ F and α ∈ R

3.1 Examples of function spaces

4 Reproducing Kernel Hilbert Spaces

|Ft [f ]| = |f (t)| ≤ M kf kH for all f ∈ H

Ft [f ] = hKt , f iH = f (t) for all f ∈ H.

We take this to be the definition of reproducing kernel in H.

In general, we have the following definition of a reproducing kernel.

for any n ∈ N and choice of t1 , ..., tn ∈ X and c1 , ..., cn ∈ R.

Theorem 10 A RKHS defines a corresponding reproducing kernel. Conversely, a reproducing ker-

we define their inner product to be

(This is only a sketch of the proof.)

4.1 Examples of reproducing kernels

4.2 Historical remarks

5 Norms and Smoothness

5.1 Linear case

6 Solving Tikhonov Regularization: The Representer Theo-

"#$ "#$ "#$

" " "

%#$ %#$ %#$

!%#$ !%#$ !%#$

!" !" !"

!"#$ !"#$ !"#$

(a) Wide margin (b) Moderate margin (c) Small margin

where Θ(·) is the Heaviside step function, is not convex.

(a) Square loss (b) Hinge loss

Figure 3: Two examples of convex loss functions.

6.2 The representer theorem

H0⊥ = {g ∈ H | hg, f i = 0 for all f ∈ H0 }.

Hence the minimum fSλ = f0 must belong to the linear space H0 .

You might also like