0% found this document useful (0 votes)

2 views

Chap6.1-KernelMethods

The document discusses kernel methods in machine learning, focusing on the comparison between linear models and memory-based models, the role of kernel functions, and their applications in various algorithms like support vector machines. It explains the dual representation of linear models, the construction and validation of kernels, and how kernels can be extended to symbolic inputs. Additionally, it highlights the advantages of using kernel methods, particularly in handling high-dimensional data without explicitly mapping to feature spaces.

Uploaded by

calculus78910

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Chap6.1-KernelMethods

Uploaded by

calculus78910

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Machine Learning Srihari

Kernel Methods

Sargur Srihari
[email protected]

1
Machine Learning Srihari

Topics in Kernel Methods

1. Linear Models vs Memory-based models

2. Stored Sample Methods
3. Kernel Functions
• Dual Representations
• Constructing Kernels
4. Extension to Symbolic Inputs
5. Fisher Kernel

2
Machine Learning Srihari

Linear Models vs Memory-based models

• Linear parametric models for regression and

classification have the form y(x,w)
• During learning we either get a maximum likelihood
estimate of w or a posterior distribution of w
• Training data is then discarded
• Prediction based only on vector w
• This is true of Neural networks as well
• Memory-based methods keep the training
samples (or a subset) and use them during the
prediction phase
3
Machine Learning Srihari

Memory-Based Methods
• Training data points are used in prediction
• Examples of such methods
• Parzen probability density model
• Linear combination of kernel functions centered on each training data point

• Nearest neighbor classification

• These are memory-based methods
• They require a metric to be defined
• Fast to train, slow to predict

4
Machine Learning Srihari

Kernel Functions
• Linear models can be re-cast
• into equivalent dual where predictions are based on
kernel functions evaluated at training points
• Kernel function is given by
k (x,x ) = ϕ(x)T ϕ(x )
• where ϕ(x) is a fixed nonlinear mapping (basis function)
• Kernel is a symmetric function of its arguments
k (x,x ) = k (x ,x)
• Kernel can be interpreted as similarity of x and x
• Simplest is identity mapping in feature space ϕ(x) = x
• In which case k (x,x ) = xTx
• Called Linear Kernel 5
Machine Learning Srihari

Kernel Trick
• Formulated as inner product allows extending
well-known algorithms
• by using the kernel trick
• Basic idea of kernel trick
• If an input vector x appears only in the form of
scalar products then we can replace scalar
products with some other choice of kernel
• Used widely
• in support vector machines
• in developing non-linear variant of PCA
• In kernel Fisher discriminant 6
Machine Learning Srihari

Other Forms of Kernel Functions

• Function of difference between arguments
k (x,x ) = k (x-x )
• Called stationary kernel since invariant to
translation
• Radial basis functions
k (x,x ) = k (||x-x ||)
• Depends on magnitude of distance between arguments
• Note that the kernel function is scalar while x is M-dimensional
For these to be valid kernel
functions they should be
shown to have the property
k (x,x ) = φ (x)T φ(x )
7
Machine Learning Srihari

Dual Representation

• Linear models for regression/classification can

be reformulated in terms of dual representation
• In which kernel function arises naturally
• Plays important role in SVMs
• Consider linear regression model
• Parameters from minimizing sum-of-squares
N
1 2 λ
J(w) = ∑{wT φ (x n ) − t n } + wT w
2 n=1 2
ϕ is the set of M
where w = (w0 ,.., w M-1 )T , φ = ( φ0 ,..φ M-1 )T basis functions
we have N samples {x1 ,..x N } or feature vector
λ is the regularization coefficient

• Minimum obtained by setting gradient of J(w) wrt w equal to zero

Machine Learning Srihari

Solution for w: linear combination of φ (xn)

• Equating derivative J(w) wrt w to zero we get
1 N
w = − ∑{wT φ (x n ) − t n }φ (x n )
λ n=1
N
= ∑ an φ (x n )
n=1

= ΦT a
• Solution for w: linear combination of φ (xn) whose
coefficients
€
are functions of w where
• Φ is design matrix whose nth row is given by φ (xn)T
% φ 0 (x1 ) . . φ M −1 (x1 ) (
' *
' . . *
Φ = 'φ 0 (x n ) . . φ M −1 (x n )* is a N × M matrix
' . . *
'&φ (x ) . . φ M −1 (x N )*)
0 N

• Vector a=(a1,..,aN)T has the definition

€
1 T
an = −
λ
{
w φ(x n )−tn }
Machine Learning Srihari

Transformation from w to a

• Thus we have w=Φ a T

• Instead of working with parameter vector w we

can reformulate least squares algorithm in
terms of parameter vector a
• giving rise to dual representation
1 T
an = −
λ
{w φ(x n )−tn }
• Although the definition of a still includes w
It can be eliminated by the use of the kernel function

10
Machine Learning Srihari

Gram Matrix and Kernel Function

• Gram matrix K=ΦΦT is N x N matrix

• with elements
Knm= ϕ (xn)Tϕ (xm)=k(xn,xm)
• where kernel function k (x,x ) = ϕ (x)T ϕ (x )
⎡ k(x ,x ) . . k(x 1,x N ) ⎤⎥
⎢ 1 1 Gram Matrix Definition:
⎢ ⎥ Given N vectors, it is the
K = ⎢⎢ . ⎥
⎥
⎢ . ⎥ matrix of all inner products
⎢ k(x ,x ) k(x N ,x N 1 ) ⎥⎥
⎢⎣ N 1 ⎦

• Notes:
• Φ is NxM and K is NxN Note: N M times M N
• K is a matrix of similarities of pairs of samples (thus it is symmetric) 11
Machine Learning Srihari
Error Function in Terms of Gram Matrix
• Error Function is
1 N
λ
J(w) = ∑ {w φ(x )−t } + w w
2
T T
n n
2 n=1 2
• Substituting w = ΦTa into J(w) gives
1 1 λ
J(w) = aT ΦΦT ΦΦTa −aT ΦΦT t + tT t + aT ΦΦTa
2 2 2
where t = (t1,..,tN)T

• Error function is written in terms of Gram matrix as

1 1 λ
J(a) = aT KKa −aT Kt + tT t + aT Ka
2 2 T 2
• Solving for a by combining w=Φ a and an = −
1 T
{w φ (x n ) − tn }
λ
a =(K +λIN)-1t
Solution for a can be expressed as a linear€ combination of elements of ϕ(x)
whose coefficients are entirely in terms of kernel k(x,x ) from which we can
recover original formulation in terms of parameters w
Machine Learning Srihari
Prediction Function

• Prediction for new input x

• We can write a =(K +λIN)-1t by combining w =ΦTa and
1 T
an = −
λ
{w φ (x n ) − tn }
• Substituting back into linear regression model,
€
y(x) = wT φ(x)
= aT Φφ(x)
= k(x)T (K + λI N )−1 t where k(x) has elements kn (x) = k(x n ,x)

• Prediction is a linear combination of the target

values from the training set.
Machine Learning Srihari

Advantage of Dual Representation

• Solution for a is entirely in terms of k(x,x )
• From a we can recover w using w = Φta
• In parametric formulation, solution is wML = (ΦT Φ)−1 ΦT t

• We invert M x M matrix, since Φ is N x M

• In dual formulation, solution is a =(K +λIN)-1t
• We are inverting an N x N matrix
• An apparent disadvantage, since N >>M
• Advantage of dual: work with kernel k(x,x ), so:
• avoid working with a feature vector ϕ(x) and
• problems associated with very high or infinite
dimensionality of x 14
Machine Learning Srihari

Constructing Kernels
• To exploit kernel substitution need valid kernels
• Two methods:
• (1) from ϕ (x) to k(x,x’)
• (2) from k(x,x’) to ϕ (x)
• First Method
• Choose ϕ (x) and use it to find corresponding kernel
k(x,x ') = φ(x)T φ(x ')
M
=∑ φi (x)φi (x ')
i=1

• where ϕi(x) are basis functions

• One-dimensional input space is shown next
15
Machine Learning Srihari

Kernel Functions from basis functions

One-dimensional input space

Polynomials Gaussian Logistic Sigmoid

Basis
Functions
ϕi(x)

Kernel
Functions
k(x,x ) = ϕ (x)Tϕ (x’)
Red cross is x

16
Machine Learning Srihari

Method 2: Direct Construction of Kernels

• Kernel has to correspond to a scalar product in
some (perhaps infinite dimensional) space
• Consider kernel function k(x,z) = (xTz)2
k(x,z) = (x T z)2 = (x1z1 + x 2z 2 )2
• In 2-D space = x12z12 + 2x1z1x 2z 2 + x 22z 22
= (x12, 2x1x 2,x 22 )(z12, 2z1z 2,z 22 )T
= φ(x)T φ(z)
• Feature mapping takes the form φ(x) = (x12, 2x1x 2, x 22 )
• Comprises of all second order terms with a specific weighting
• Inner product needs computing 6 feature values, 9 multiplications
• Kernel function k(x,z) has 2 multiplies and a squaring
• For(xTz+c)2 we get constant, linear, second order terms
• For (xTz+c)M we get all powers of x (monomials)

17
Machine Learning Srihari

Testing whether a function is a valid kernel

• Without constructing ϕ(x), Necessary and
sufficient condition for k(x,x ) to be a kernel is
• Gram matrix K, with elements k(xn,xm) is positive
{xn}semi-definite for all possible choices of the set
• Positive semi-definite is not the same thing as a matrix whose elements are non-
negative
zT Kz ≥ 0 for non - zero vectors z with real entries
• It means
i.e., ∑ ∑ K nm zn zm ≥ 0 for any real numbers zn , zm
n m

• Mercer s theorem: any continuous, symmetric, positive semi-definite kernel

function k(x, y) can be expressed as a dot product in a high-dimensional
€
space

• New kernels can be constructed from simpler

kernels as building blocks 18
Machine Learning Srihari

Techniques for Constructing Kernels

• Given valid kernels k1(x,x ) and k2(x,x ) the following new
kernels will be valid
1. k(x,x ) =ck1(x,x )
Where
2. k(x,x )=f(x)k1(x,x )f(x ) f (.) is any function

3. k(x,x )=q(k1(x,x )) q(.) is a polynomial with non-negative coefficients

4. k(x,x )=exp(k1(x,x ))
5. k(x,x )=k1(x,x )+k2(x,x )
6. k(x,x )=k1(x,x )k2(x,x )
7. k(x,x )=k3(f(x).f(x )) f(x) is a function from x to RM
k3 is a valid kernel in RM
8. k(x,x )=xTAx A is a symmetric positive semidefinite matrix

9. k(x,x )=ka(xa,xb )+kb(xb,xb ) xa and xb are variables with x=(xa, xb)

10. k(x,x )=ka(xa,xa )kb(xb,xb ) ka and kb are valid kernel functions
Machine Learning Srihari

Kernels for specific applications

• Requirements for k(x,x )

• It is symmetric
• Its Gram matrix is positive semidefinite
• It expresses the appropriate similarity between x
and x for the intended application

20
Machine Learning Srihari
Gaussian Kernel
• Commonly used kernel is
k(x,x ) = exp (-||x-x ||2/2σ2)
• It is seen as a valid kernel by expanding square
||x-x ||2 = xTx + (x )Tx -2xTx
• To give
k(x,x ) = exp (-xTx/2σ2) exp (-xTx /σ2) exp (-(x )Tx /2σ2)
• From kernel construction rules 2 and 4
• together with validity of linear kernel k(x,x )=xTx
• Can be extended to non-Euclidean distances
k(x,x ) = exp {(-1/2σ2)[κ(x,x )+κ (x ,x )-2κ (x,x )]}
21
Machine Learning Srihari

Use of a kernel method

3-D space ϕ(x)

2-D space x

Infinite dimensional space ϕ(x)

Induced by a Gaussian kernel

Oil slick 2D- image

that looks like a feature space x 22
Machine Learning Srihari

Extension of Kernels to Symbolic Inputs

• Important contribution of kernel viewpoint:
• Inputs are symbolic, not vectors of real numbers
• Kernels defined for graphs, sets, strings, text documents
• If A1 and A2 are two subsets of objects
• A simple kernel is k(A1,A2 ) = 2|A ∩A |
1 2

• where |A | indicates no of elements in A

• A valid kernel since it can be shown to correspond to an
inner product in a feature space A={1,2,3,4,5}
A1={2,3,4,5}, A2={1,2,4,5}, A1∩A2={2,4,5}
Hence k(A1,A2)=8

Example feature vectors:

ϕ(A1)=[2 2 2 2] and ϕ(A2) =[1 1 1 1]
such that
ϕ(A1)Tϕ(A2)=8
Machine Learning Srihari

Kernels for Complex Objects

• ML methods studied so far require input
represented as a fixed-size feature vector xiε RD
• For certain objects it is not clear how best to
represent them as feature vectors, E.g.,
• Text or protein sequence of variable length
• Molecular structure with complex 3-D geometry
• Evolutionary tree which has variable size and shape
• Solutions
• Define generative model, whose parameters are
features. Plug these features into standard models
• Measure similarity between objects which does not
require preprocessing into feature vector format 24
Machine Learning Srihari

Kernels for Comparing Documents

• In IR and document classification, need a way
of comparing documents xi and xi’
• Bag of words representation: xij is the no of times
word j occurs in document i xi=[xi1,..,xiD]

• Cosine similarity k(x ,x ) = x iT x i '

i i'
|| x i ||2|| x i ' ||2

• It measures cosine of angle between xi and xi’

• Since xi is a count vector (and hence non-negative) the
cosine similarity is between 0 and 1
• Where 0 means the vectors are orthogonal and therefore have no
words in common

25
Machine Learning Srihari

Disadvantage of Cosine Similarity

• It does not work well for two reasons
1. Stop words
• If xi has any word in common with xi’ it is deemed similar,
even though some words such as the, and occur commonly
in many documents and are not discriminative
2. High frequency words in a document
• If a discriminatory word occurs many times in a document,
the similarity is artificially boosted
• Even though word usage would be bursty, i.e., once a word is used
in a document it is very likely to be used again

26
Machine Learning Srihari

TF-IDF
• Cosine similarity performance can be improved
with some preprocessing
• Use feature vector called TF-IDF representation
• Term frequency-Inverse document frequency
tf(x ij ) ! log(1 + x ij )
• TF is a log-transform of the count
N
• IDF is defined as idf(j) ! log
1 + ∑ I(x > 0)
N
i=1 ij
xij is the no of times word j occurs in document i

• Where N is the total no of documents

• Denominator counts how many documents contain term j
• Finally we define tf-idf(xi) ! [tf(x ij )×idf(j)]Vj =1

• We use this inside cosine similarity measure

φ(x i )T φ(x i ' )
k(x i ,x i ' ) =
|| φ(x i ) ||2|| φ(x i ' ) ||2

• Where ϕ (x)=tf-idf(x) 27
Machine Learning Srihari

String Kernels
• Real power of kernels: inputs are structured
• Compare two variable-length strings
• Strings x, and x’ of length D, D’ defined over A
• E.g., two amino acid sequences defined over 20-letter
alphabet A={A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V}
• x is the sequence of length 110
IPTSALVKETLALLSTHRTLLIANETLRIPVPVHKN……VNQFLDYLQEFLGVMNTEWI

• x’ is a string of length 153

• PHRRDLCSRSIWLARKIRSDLTALTESYVKHQGLWSELTEAERLQENLQAYRTFHVLLA……..

• The strings have the substring LQE in common

• We can define the similarity of two strings to be the no of
substrings they have in common
• Definition given next
28
Machine Learning Srihari

Mercer Kernel
• If s is a substring of x we can write x=usv for
some (possibly empty) strings u,s and v
• ϕs(x) is no of times substring s appears in x
• Define kernel between two strings x and x’ as
k(x,x ') = ∑ ws φs (x)φs (x ')
s∈A*

• Where ws≥0 and A* is the set of all strings from alphabet A

(known as Kleene * operator)
• Can be computed in O(|x|+|x’|) time
• Cases of interest:
• ws=0 for |s|>1 we get bag-of-characters kernel– defines
ϕ(x) as no of times each character in A occurs in x
• If we require s to be bordered by white space we get bag-
29
of- words kernel
Machine Learning Srihari

Combining Discriminative and

Generative Models
• Generative models deal naturally with missing
data and with HMM of varying length
• Discriminative models such as SVM have better
performance
• Can use a generative model to define a kernel
and use kernel in discriminative approach

30
Machine Learning Srihari

Kernels based on Generative Models

• For a model p(x) we define a kernel by

k (x,x ) = p(x) p(x )
• A valid kernel since an inner product in 1-dimensional
feature space defined by the mapping p(x)
• Two inputs x and x are similar if they have
high probabilities

31
Machine Learning Srihari

Kernel Functions based on Mixture Densities

• Extension to sums/products of distributions
k(x,x ') = ∑ p(x | i)p(x ' | i)p(i)
i

• where p(i) are positive weighting coefficients

• A valid kernel based on two rules of kernel construction:
k(x,x ) =ck1(x,x ) and k(x,x )=k1(x,x )+k2(x,x )
• Two inputs x and x will give a large value of k , and
hence appear similar, if they have a significant
probability under a range of different components
• Taking the limit to infinite sum
k(x,x ') = ∫ p(x | z)p(x ' | z)p(z)dz

• where z is a continuous latent variable

32
Machine Learning Srihari

Kernels for Sequences

• Data consists of ordered sequences of length L
X={x1,..,xL}
• Generative model for sequences is HMM
• Hidden states Z={z1,..,zL}
• Kernel Function for measuring similarity of
sequences X and X is
k(X, X ') = ∑ p(X | Z)p(X ' | Z ')p(Z)
Z

• Both observed sequences are generated by same

hidden sequence Z

33
Machine Learning Srihari

Fisher Kernel
• Alternative technique for generative models
• In document retrieval, protein sequences
• Consider parametric generative model p(x|θ)
where θ denotes vector of parameters
• Goal: find kernel that measures similarity of two
vectors x and x induced by the generative model
• Define Fisher score as gradient wrt θ
g(θ,x) = ∇θ ln p(x | θ ) A vector of same dimensionality as θ

• Fisher Kernel is k(x, x ') = g(θ, x)T F−1g(θ, x’) Fisher score is more
generally the gradient
€ F = E x [ g(θ,x)g(θ,x)T ] of the log-likelihood

where F is the Fisher information matrix 34

Machine Learning Srihari

Fisher Information Matrix

• Presence of Fisher information matrix causes
kernel to be invariant under non-linear
parametrization of the density model θ àψ(θ)
• In practice, infeasible to evaluate Fisher Information
Matrix. Instead use the approximation
N
1
F ≈ ∑ g(θ,x )g(θ,x )
n n
T
k(x,x') = g(θ ,x) F g(θ,x’)
T −1
N n=1

• This is the covariance matrix of the Fisher scores

• So the Fisher kernel corresponds to whitening of the
€ €
Fisher scores k(x,x') = g(θ,x)T g(θ,x')
• More simply omit F and use non-invariant kernel
35
€
Machine Learning
Sigmoidal Kernel Srihari

• A link between SVMs and neural network

k (x,x ) = tanh (axTx + b)
• Its Gram matrix is not positive semidefinite
• But used in practice because it gives SVMs a
superficial resemblance to neural networks
• Bayesian neural network with an appropriate
prior reduces to a Gaussian process
• Provides a deeper link between neural networks
and kernel methods

Kernel Methods For Pattern Analysis
100% (2)
Kernel Methods For Pattern Analysis
478 pages
Kernel Methods!: Sargur Srihari!
No ratings yet
Kernel Methods!: Sargur Srihari!
29 pages
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
No ratings yet
2021 UNAS REFER Rafi Yon Saputra 173112706420242 Kernel Primer
65 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Lecture 8_Kernels
No ratings yet
Lecture 8_Kernels
32 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
KernelMethods
No ratings yet
KernelMethods
19 pages
SVM and Kernels
No ratings yet
SVM and Kernels
13 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Ds 11
No ratings yet
Ds 11
21 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
Mva - Slides Machine Learning With Kernel Methods
No ratings yet
Mva - Slides Machine Learning With Kernel Methods
644 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Kernel Method
No ratings yet
Kernel Method
5 pages
SVM 4
No ratings yet
SVM 4
8 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
No ratings yet
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
6 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
0701907v3
No ratings yet
0701907v3
53 pages
Kernel_Methods_in_Machine_Learning
No ratings yet
Kernel_Methods_in_Machine_Learning
3 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Lecture 10
No ratings yet
Lecture 10
15 pages
Machine Learning Course - Kernel Regression
No ratings yet
Machine Learning Course - Kernel Regression
9 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Lect 3
No ratings yet
Lect 3
14 pages
Kernel Methods: Feature Mapping at No Cost
No ratings yet
Kernel Methods: Feature Mapping at No Cost
25 pages
Vahid
No ratings yet
Vahid
18 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nystr Om Method, and Use of Kernels in Machine Learning: Tutorial and Survey
No ratings yet
Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nystr Om Method, and Use of Kernels in Machine Learning: Tutorial and Survey
31 pages
On The Nystr Om Method For Approximating A Gram Matrix For Improved Kernel-Based Learning
No ratings yet
On The Nystr Om Method For Approximating A Gram Matrix For Improved Kernel-Based Learning
23 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
SVM-ML-AI_lecturenotes_cs725
No ratings yet
SVM-ML-AI_lecturenotes_cs725
9 pages
Lecture 19 - Nonlinear Learning With Kernels (1) - Plain
No ratings yet
Lecture 19 - Nonlinear Learning With Kernels (1) - Plain
15 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Paul Honeiné, Cédric Richard, Patrick Flandrin, Jean-Baptiste Pothin
No ratings yet
Paul Honeiné, Cédric Richard, Patrick Flandrin, Jean-Baptiste Pothin
4 pages
Be Central
No ratings yet
Be Central
98 pages
Kernel Ridge Regression
No ratings yet
Kernel Ridge Regression
8 pages
EE353 - 769 10 Kernel SVM
No ratings yet
EE353 - 769 10 Kernel SVM
20 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Kernels and Kernelized Perceptron: Instructor: Alan Ritter
No ratings yet
Kernels and Kernelized Perceptron: Instructor: Alan Ritter
13 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Curriculum Vitae: Prof. D. Boolchandani
No ratings yet
Curriculum Vitae: Prof. D. Boolchandani
5 pages
Across_the_Spectrum_In-Depth_Review_AI-Based_Models_for_Phishing_Detection
No ratings yet
Across_the_Spectrum_In-Depth_Review_AI-Based_Models_for_Phishing_Detection
28 pages
"Advancements in Machine Learning for Early Detection of Plant Diseases by Gayatri Rahangdale | Supesh Falke | Gauri Bharti | Shweta Dewalkar | Prof. Anupam Chaube | Prof. Rina Shirpurkar Link To Download Paper: https://www.ijtsrd.com/other-scientific-research-area/other/69417/advancements-in-machine-learning-for-early-detection-of-plant-diseases/gayatri-rahangdale #callforpaperinternationaljournal #ugclistofjournals #bestinternationaljournal Plant detection is a critical task in agricultural automation and environmental monitoring. It involves identifying and classifying plant species, diseases, or other relevant plant characteristics using various techniques, such as image processing, machine learning, and remote sensing. Advances in computer vision and artificial intelligence have enabled the development of robust plant detection systems capable of analyzing vast amounts of data in real time. These systems can be employed for applications such as precision agriculture, where acc
No ratings yet
"Advancements in Machine Learning for Early Detection of Plant Diseases by Gayatri Rahangdale | Supesh Falke | Gauri Bharti | Shweta Dewalkar | Prof. Anupam Chaube | Prof. Rina Shirpurkar Link To Download Paper: https://www.ijtsrd.com/other-scientific-research-area/other/69417/advancements-in-machine-learning-for-early-detection-of-plant-diseases/gayatri-rahangdale #callforpaperinternationaljournal #ugclistofjournals #bestinternationaljournal Plant detection is a critical task in agricultural automation and environmental monitoring. It involves identifying and classifying plant species, diseases, or other relevant plant characteristics using various techniques, such as image processing, machine learning, and remote sensing. Advances in computer vision and artificial intelligence have enabled the development of robust plant detection systems capable of analyzing vast amounts of data in real time. These systems can be employed for applications such as precision agriculture, where acc
9 pages
Machine Larning
No ratings yet
Machine Larning
14 pages
Data Assimilation Vs Data Mining
No ratings yet
Data Assimilation Vs Data Mining
28 pages
(PDF Download) Remote Sensing Digital Image Analysis Sixth Edition John Alan Richards Fulll Chapter
100% (2)
(PDF Download) Remote Sensing Digital Image Analysis Sixth Edition John Alan Richards Fulll Chapter
64 pages
MLT syllabus
No ratings yet
MLT syllabus
1 page
Python Data Science 2024 - Explo - Wilson, Stephen
No ratings yet
Python Data Science 2024 - Explo - Wilson, Stephen
170 pages
Introduction To Machine Learning: Enrique Vinicio Carrera
No ratings yet
Introduction To Machine Learning: Enrique Vinicio Carrera
98 pages
Deep CNN Model Based On VGG16 For Breast Cancer Classification
No ratings yet
Deep CNN Model Based On VGG16 For Breast Cancer Classification
6 pages
MACHINE LEARNING Question Bank
No ratings yet
MACHINE LEARNING Question Bank
11 pages
Interview Questions For Machine Learning Total 215 Questions
100% (1)
Interview Questions For Machine Learning Total 215 Questions
70 pages
Fall Detection For Elderly People Using Machine Learning
No ratings yet
Fall Detection For Elderly People Using Machine Learning
4 pages
LAKSHMI Documentation
No ratings yet
LAKSHMI Documentation
36 pages
Development of A QGIS Plugin To Obtain Parameters
No ratings yet
Development of A QGIS Plugin To Obtain Parameters
20 pages
Fake News Detection A Study
No ratings yet
Fake News Detection A Study
15 pages
Quantifying Rooftop Photovoltaic Solar Energy Potential - A Machine Leaning Approach (Assouline - 2017)
No ratings yet
Quantifying Rooftop Photovoltaic Solar Energy Potential - A Machine Leaning Approach (Assouline - 2017)
19 pages
3-1 SML Lab Instructor Manual - Cordinator Copy For The A.Y 2023-24
No ratings yet
3-1 SML Lab Instructor Manual - Cordinator Copy For The A.Y 2023-24
77 pages
A User-Centric Machine Learning
No ratings yet
A User-Centric Machine Learning
11 pages
Internship-report-on-ai
No ratings yet
Internship-report-on-ai
32 pages
Smart Antennas Adaptive Beamforming Through Statistical Signal Processing Techniques
No ratings yet
Smart Antennas Adaptive Beamforming Through Statistical Signal Processing Techniques
6 pages
ML 01
No ratings yet
ML 01
15 pages
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
No ratings yet
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
6 pages
2.2092_05_03_24
No ratings yet
2.2092_05_03_24
9 pages
2 PB
No ratings yet
2 PB
10 pages
Prognostic Biomarkers Identification For Diabetes Prediction by Utilizing Machine Learning Classifiers
No ratings yet
Prognostic Biomarkers Identification For Diabetes Prediction by Utilizing Machine Learning Classifiers
6 pages
Science BSC Computer Science Semester 5 2022 November Elective I Artificial Intelligence Cbcs
No ratings yet
Science BSC Computer Science Semester 5 2022 November Elective I Artificial Intelligence Cbcs
29 pages
A Machine Learning Approach For Intrusion Detection
No ratings yet
A Machine Learning Approach For Intrusion Detection
6 pages
Estimating the Volume of Civil Construction Materi
No ratings yet
Estimating the Volume of Civil Construction Materi
22 pages
Response Surface-Ansys-Presentation-2019
No ratings yet
Response Surface-Ansys-Presentation-2019
52 pages

Chap6.1-KernelMethods

Uploaded by

Chap6.1-KernelMethods

Uploaded by

Machine Learning Srihari

Topics in Kernel Methods

1. Linear Models vs Memory-based models

Linear Models vs Memory-based models

• Linear parametric models for regression and

• Nearest neighbor classification

Other Forms of Kernel Functions

• Linear models for regression/classification can

• Minimum obtained by setting gradient of J(w) wrt w equal to zero

Solution for w: linear combination of φ (xn)

• Vector a=(a1,..,aN)T has the definition

• Thus we have w=Φ a T

• Instead of working with parameter vector w we

Gram Matrix and Kernel Function

• Gram matrix K=ΦΦT is N x N matrix

• Error function is written in terms of Gram matrix as

• Prediction for new input x

• Prediction is a linear combination of the target

Advantage of Dual Representation

• We invert M x M matrix, since Φ is N x M

• where ϕi(x) are basis functions

Kernel Functions from basis functions

Polynomials Gaussian Logistic Sigmoid

Method 2: Direct Construction of Kernels

Testing whether a function is a valid kernel

• Mercer s theorem: any continuous, symmetric, positive semi-definite kernel

• New kernels can be constructed from simpler

Techniques for Constructing Kernels

3. k(x,x )=q(k1(x,x )) q(.) is a polynomial with non-negative coefficients

9. k(x,x )=ka(xa,xb )+kb(xb,xb ) xa and xb are variables with x=(xa, xb)

Kernels for specific applications

• Requirements for k(x,x )

Use of a kernel method

3-D space ϕ(x)

Infinite dimensional space ϕ(x)

Oil slick 2D- image

Extension of Kernels to Symbolic Inputs

• where |A | indicates no of elements in A

Example feature vectors:

Kernels for Complex Objects

Kernels for Comparing Documents

• Cosine similarity k(x ,x ) = x iT x i '

• It measures cosine of angle between xi and xi’

Disadvantage of Cosine Similarity

• Where N is the total no of documents

• We use this inside cosine similarity measure

• x’ is a string of length 153

• The strings have the substring LQE in common

• Where ws≥0 and A* is the set of all strings from alphabet A

Combining Discriminative and

Kernels based on Generative Models

• For a model p(x) we define a kernel by

Kernel Functions based on Mixture Densities

• where p(i) are positive weighting coefficients

• where z is a continuous latent variable

Kernels for Sequences

• Both observed sequences are generated by same

where F is the Fisher information matrix 34

Fisher Information Matrix

• This is the covariance matrix of the Fisher scores

• A link between SVMs and neural network

You might also like