0% found this document useful (0 votes)
64 views

Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines

- Smoothing splines fit a regression spline model with a knot placed at each data point, rather than selecting knots. This avoids knot selection but leads to an overparameterized model. - To address overfitting, a roughness penalty is added to the residual sum of squares that penalizes the curvature of the spline. This leads to a natural cubic spline (NCS) solution with knots only at the data points. - The smoothing parameter controls the tradeoff between residual sum of squares and roughness penalty, with larger values favoring smoother fits. It is often chosen by cross-validation.

Uploaded by

Datpm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines

- Smoothing splines fit a regression spline model with a knot placed at each data point, rather than selecting knots. This avoids knot selection but leads to an overparameterized model. - To address overfitting, a roughness penalty is added to the residual sum of squares that penalizes the curvature of the spline. This leads to a natural cubic spline (NCS) solution with knots only at the data points. - The smoothing parameter controls the tradeoff between residual sum of squares and roughness penalty, with larger values favoring smoother fits. It is often chosen by cross-validation.

Uploaded by

Datpm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Spline Models

• Introduction to CS and NCS

• Regression splines

• Smoothing splines

3
a
Cubic Splines

• knots: a < ⇠1 < ⇠2 < · · · < ⇠m < b

• A function g defined on [a, b] is a cubic spline w.r.t knots {⇠i }m


i=1 if:

1) g is a cubic polynomial in each of the m + 1 intervals,

g(x) = di x3 + ci x2 + bi x + ai , x 2 [⇠i , ⇠i+1 ]

where i = 0 : m, ⇠0 = a and ⇠m+1 = b;

2) g is continuous up to the 2nd derivative: since g is continuous up to


the 2nd derivative for any point inside an interval, it suffices to check

g (0,1,2) (⇠i+ ) = g (0,1,2) (⇠i ), i = 1 : m.

a From now on, x 2 R is one-dimensional.

4
• How many free parameters we need to represent g? m + 4.

We need 4 parameters (d1 , ci , bi , ai ) for each of the (m + 1) intervals, but


we also have 3 constraints at each of the m knots, so

4(m + 1) 3m = m + 4.

5
Suppose the knots {⇠i }m
i=1 are given.

If g1 (x) and g2 (x) are two cubic splines, so is a1 g1 (x) + a2 g2 (x), where a1 and
a2 are two constants.

That is, for a set of given knots, the corresponding cubic splines form a linear
space (of functions) with dim (m + 4).

6
• A set of basis functions for cubic splines (wrt knots {⇠i }m
i=1 ) is given by

h0 (x) = 1; h1 (x) = x;

h2 (x) = x2 ; h3 (x) = x3 ;

hi+3 (x) = (x ⇠i )3+ , i = 1, 2, . . . , m.

• That is, any cubic spline f (x) can be uniquely expressed as

m+3
X
f (x) = 0 + j hj (x).
i=1

• Of course, there are many other choices of the basis functions. For
example, R uses the B-splines basis functions.

7
Natural Cubic Splines (NCS)

• A cubic spline on [a, b] is a NCS if its second and third derivatives are zero
at a and b.

• That is, a NCS is linear in the two extreme intervals [a, ⇠1 ] and [⇠m , b].
Note that the linear function in two extreme intervals are totally
determined by their neighboring intervals.

• The degree of freedom of NCS’s with m knots is m.

• For a curve estimation problem with data (xi , yi )ni=1 , if we put n knots at
the n data points (assumed to be unique), then we obtain a smooth curve
(using NCS) passing through all y’s.

8
Regression Splines

• A basis expansion approach:

g(x) = 1 h1 (x) + 2 h2 (x) + ··· + p hp (x),

where p = m + 4 for regression with cubic splines and p = m for NCS.

• Represent the model on the observed n data points using matrix notation,

ˆ = arg min ky F k2 ,

9
where
0 1 0 1
··· 0 1
B y1 C B h1 (x1 ) h2 (x1 ) hp (x1 ) C
B C B C
B C B C B 1 C
B y2 C B h1 (x2 ) h2 (x2 ) ··· hp (x2 ) C B C
B C =B C B ··· C
B C B C B C
B ··· C B C @ A
B C B C
@ A @ A p
p⇥1
yn h1 (xn ) h2 (xn ) ··· hp (xn )
n⇥1 n⇥p

• We can obtain the design matrix F by commands bs or ns in R, and then


call the regression function lm.

• Use K-fold CV to select the number of knots.

10
Understand how R counts the degree-of-feedom.

• To generate a cubic spline basis for a given set of xi ’s, you can use the
command bs.

• You can tell R the location of knots.

• Or you can tell R the df. Recall that a cubic spline with m knots has
m + 4 df, so we need m = df 4 knots. By default, R puts knots at the
1/(m + 1), . . . , m/(m + 1) quantiles of x1:n .

11
How R counts the df is a little confusing. The df in command bs actually
means the number of columns of the design matrix returned by bs . So if the
intercept is not included in the design matrix (which is the default), then the
df in command bs is equal to the real df minus 1.

So the following three design matrices (the first two are of n ⇥ 5 and the last
one is of n ⇥ 6) correspond to the same regression model with cubic splines of
df 6.

> bs(x, knots=quantile(x, c(1/3, 2/3)));


> bs(x, df=5);
> bs(x, df=6, intercept=TRUE);

12
• To generate a NCS basis for a given set of xi ’s, use the command ns.

• Recall that the linear functions in the two extreme intervals are totally
determined by the other cubic splines. So data points in the two extreme
intervals (i.e., outside the two boundary knots) are wasted since they do
not a↵ect the fitting. Therefore, by default, R puts the two boundary knots
as the min and max of xi ’s.

• You can tell R the location of knots, which are the interior knots. Recall
that a NCS with m knots has m df. So the df is equal to the number of
(interior) knots plus 2, where 2 means the two boundary knots.

13
• Or you can tell R the df. If intercept = TRUE, then we need m = df 2
knots, otherwise we need m = df 1 knots. Again, by default, R puts
knots at the 1/(m + 1), . . . , m/(m + 1) quantiles of x1:n .

• The following three design matrices (the first two are of n ⇥ 3 and the last
one is of n ⇥ 4) correspond to the same regression model with NCS of df 4.

> ns(x, knots=quantile(x, c(1/3, 2/3)));


> ns(x, df=3);
> ns(x, df=4, intercept=TRUE);

14
Choice of Knots

• Location of knots: to simplify this problem, we ignore the selection of


locations – by default, the knots are located at the quantiles of xi ’s.

• Number of knots: can be formulated as a variable selection problem (an


easier version, since there are just p models, not 2p ).

2
• AIC/BIC/Radj

• m-fold CV (cross-validation)

15
Summary: Regression Splines

• Use LS to fit a spline model: Specify the DFa p, and then fit a regression
model with a design matrix of p columns (including the intercept).

• How to do it in R?

• How to select the number/location of knots?


a Not the polynomial degree, but the DF of the spline, related to the number of knots.

16
Smoothing Splines

• In Regression Splines (let’s use NCS), we need to choose the number and
the location of knots.

• What’s a Smoothing Spline? Start with an easy but “horrible” solution:


put knots at all the observed data points (x1 , . . . , xn ):

yn⇥1 = Fn⇥n n⇥1 .

Instead of selecting knots, let’s do ridge-type shrinkage (⌦ will be defined


later):
h i
t
min ky F k2 + ⌦ ,

where the tuning parameter is often chosen by CV or GCV.

• Next we’ll see how smoothing splines are derived from a di↵erent aspect.

19
Roughness Penalty Approach

• Let S[a, b] be the space of all “smooth” functions defined on [a, b].

• Among all the functions in S[a, b], look for the minimizer of the following
penalized residual sum of squares
n
X Z b
00
2
RSS(g, ) = [yi g(xi )] + [g (x)]2 dx, (1)
i=1 a

where is a smoothing parameter.

• Theorem. ĝ = arg min RSS(g, ) is a NCS with knots at the n data


points x1 , . . . , xn (xi 6= xj ).

20
(WLOG, assume n 2.) Let g be a function on [a, b] and g̃ be a NCS with

g(xi ) = g̃(xi ), i = 1 : n. Does such g̃ exist?

Then
Z Z
00 00
2 2
g g̃ (⇤)

with equality only if g̃ ⌘ g.

PROOF : Let h(x) = g(x) g̃(x). So h(xi ) = 0 for i = 1, . . . , n.

Then (⇤) holds true because


Z Z Z
g 002 = g̃ 002 + h002
Z
+2 g̃ 00 h00
| {z }
=0

21
Smoothing Splines

Pn
Write g(x) = i=1 i hi (x) where hi ’s are basis functions for NCS with knots
at x1 , . . . , xn .
n
X
[yi g(xi )]2 = (y F )t (y F ),
i=1

where Fn⇥n with Fij = hj (xi ).


Z b Z hX i2
⇥ 00 ⇤2 00
g (x) dx = i hi (x) dx
a i
X Z
00 00
t
= i j hi (x)hj (x)dx = ⌦ ,
i,j

Rb 00 00
where ⌦n⇥n with ⌦ij = a
hi (x)hj (x)dx.

22
So
t
RSS( , ) = (y F )t (y F )+ ⌦ ,

and the solution is

ˆ = arg min RSS( , )

= (Ft F + ⌦) 1
Ft y

23
• Demmler & Reinsch (1975): a basis with double orthogonality property, i.e.

Ft F = I, ⌦ = diag(di ),

where d1 = d2 = 0 (Why?).

• Using this basis, we have

ˆ = (Ft F + ⌦) 1
Ft y
1
= (I + diag(di )) Ft y,

i.e.,
ˆi = 1 ˆ(LS) .
1 + di i

24
• Smoother matrix S

ŷ = F ˆ = F(Ft F + ⌦) 1
Ft y = S y.

• Using D&R basis,


⇣ 1 ⌘ t
S = Fdiag F.
1 + di
So columns of F are the eigen-vectors of S , which does not depend on .

• E↵ective df of a smoothing spline:


n
X 1
df ( ) = trS = .
i=1
1 + di

25
Choice of

• Leave-one-out CV
n
1X
CV( ) = [ yi ĝ [ i]
(xi )]2
n i=1
n ✓ ◆2
1X yi ĝ(xi )
= .
n i=1 1 S (i, i)

• Generalized CV
Xn ✓ ◆2
1 yi ĝ(xi )
GCV( ) = 1
n i=1 1 n trS

26
Summary: Smoothing Splines

• Start with a model with the maximum complexity: NCS with knots at n
(unique) x points.

• Fit a Ridge Regression model on the data. If we parameterize the NCS


function space by the DR basis, then the design matrix is orthogonal and
the corresponding coefficient is penalized di↵erently for each basis: no
penalty for the two linear basis functions, higher penalty for wigglier basis
functions.

• How to do it in R?

• How to select the tuning parameter or equivalently the df?

• What if we have collected two obs at the same location x?

27
Weighted Smoothing Splines

Suppose the first two obs have the same x value, i.e.,

(x1 , y1 ), (x2 , y2 ), where x1 = x2 .

Then
2
X
⇥ ⇤2 ⇥ ⇤2 ⇥ y1 + y2 y 1 + y2 ⇤2
y1 g(x1 ) + y2 g(x1 ) = yi + g(x1 )
i=1
2 2
y 1 + y2 2 y 1 + y2 2
= y1 + y2
2 2
⇥ y1 + y 2 ⇤2
+2 g(x1 )
2

So we can replace the first two obs by one, (x1 , y1 +y


2 ), and its weight is 2
2

while the weights for other obs are 1.

28

You might also like