L11_ML
L11_ML
S
NIT-K
MACHINE LEARNING
Lecture 11 by Nandagopal S A
Assistant Lecturer
Department of M.A.C.S
NIT-K
1 Machine Learning
Regression
INTRODUCTION
▶ The simplest linear model for regression is one that involves a linear
combination of the input variables
ypx, wq “ w0 ` w1 x1 ` . . . ` wD xD
where x “ px1 , . . . , xD qT .
▶ This is known as linear regression. The key property of this model is that it is
a linear function of the parameters w0 , . . . , wD .
▶ The simplest linear model for regression is one that involves a linear
combination of the input variables
ypx, wq “ w0 ` w1 x1 ` . . . ` wD xD
where x “ px1 , . . . , xD qT .
▶ This is known as linear regression. The key property of this model is that it is
a linear function of the parameters w0 , . . . , wD .
▶ The simplest linear model for regression is one that involves a linear
combination of the input variables
ypx, wq “ w0 ` w1 x1 ` . . . ` wD xD
where x “ px1 , . . . , xD qT .
▶ This is known as linear regression. The key property of this model is that it is
a linear function of the parameters w0 , . . . , wD .
▶ It is also, however, a linear function of the input variables xi , and this imposes
significant limitations on the model. We therefore extend the class of models
by considering linear combinations of fixed nonlinear functions of the input
variables, of the form
M´1
ÿ
ypx, wq “ w0 ` wj ϕj pxq
j“1
where ϕj pxq are known as basis functions. By denoting the maximum value of
the index j by M ´ 1, the total number of parameters in this model will be M.
▶ The parameter w0 allows for any fixed offset in the data and is sometimes
called a bias parameter (not to be confused with ’bias’ in a statistical sense).
It is often convenient to define an additional dummy ’basis function’ ϕ0 pxq “ 1
so that
M´1
ÿ
ypx, wq “ wj ϕj pxq “ wT ϕpxq
j“0
˘T
▶ where w “ pw0 , . . . , wM´1 qT and ϕ “ ϕ0 , . . . , ϕM´1 . In many practical
`
▶ The parameter w0 allows for any fixed offset in the data and is sometimes
called a bias parameter (not to be confused with ’bias’ in a statistical sense).
It is often convenient to define an additional dummy ’basis function’ ϕ0 pxq “ 1
so that
M´1
ÿ
ypx, wq “ wj ϕj pxq “ wT ϕpxq
j“0
˘T
▶ where w “ pw0 , . . . , wM´1 qT and ϕ “ ϕ0 , . . . , ϕM´1 . In many practical
`
▶ where the µj govern the locations of the basis functions in input space, and
the parameter s governs their spatial scale.
▶ These are ’Gaussian’ basis functions, although it should be noted that they
are not required to have a probabilistic interpretation, and in particular the
normalization coefficient is unimportant because these basis functions will be
multiplied by adaptive parameters wj .
▶ where the µj govern the locations of the basis functions in input space, and
the parameter s governs their spatial scale.
▶ These are ’Gaussian’ basis functions, although it should be noted that they
are not required to have a probabilistic interpretation, and in particular the
normalization coefficient is unimportant because these basis functions will be
multiplied by adaptive parameters wj .
x ´ µj
ˆ ˙
ϕj pxq “ σ
s
where σpaq is the logistic sigmoid function defined by
1
σpaq “
1 ` expp´aq
▶ Equivalently, we can use the ’tanh’ function because this is related to the
logistic sigmoid by tanhpaq “ 2σpaq ´ 1, and so a general linear combination of
logistic sigmoid functions is equivalent to a general linear combination of
’tanh’ functions. These various choices of basis function are illustrated below.
▶ Yet another possible choice of basis function is the Fourier basis, which
leads to an expansion in sinusoidal functions.
▶ Equivalently, we can use the ’tanh’ function because this is related to the
logistic sigmoid by tanhpaq “ 2σpaq ´ 1, and so a general linear combination of
logistic sigmoid functions is equivalent to a general linear combination of
’tanh’ functions. These various choices of basis function are illustrated below.
▶ Yet another possible choice of basis function is the Fourier basis, which
leads to an expansion in sinusoidal functions.