Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines
Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines
• Regression splines
• Smoothing splines
3
a
Cubic Splines
4
• How many free parameters we need to represent g? m + 4.
4(m + 1) 3m = m + 4.
5
Suppose the knots {⇠i }m
i=1 are given.
If g1 (x) and g2 (x) are two cubic splines, so is a1 g1 (x) + a2 g2 (x), where a1 and
a2 are two constants.
That is, for a set of given knots, the corresponding cubic splines form a linear
space (of functions) with dim (m + 4).
6
• A set of basis functions for cubic splines (wrt knots {⇠i }m
i=1 ) is given by
h0 (x) = 1; h1 (x) = x;
h2 (x) = x2 ; h3 (x) = x3 ;
m+3
X
f (x) = 0 + j hj (x).
i=1
• Of course, there are many other choices of the basis functions. For
example, R uses the B-splines basis functions.
7
Natural Cubic Splines (NCS)
• A cubic spline on [a, b] is a NCS if its second and third derivatives are zero
at a and b.
• That is, a NCS is linear in the two extreme intervals [a, ⇠1 ] and [⇠m , b].
Note that the linear function in two extreme intervals are totally
determined by their neighboring intervals.
• For a curve estimation problem with data (xi , yi )ni=1 , if we put n knots at
the n data points (assumed to be unique), then we obtain a smooth curve
(using NCS) passing through all y’s.
8
Regression Splines
• Represent the model on the observed n data points using matrix notation,
ˆ = arg min ky F k2 ,
9
where
0 1 0 1
··· 0 1
B y1 C B h1 (x1 ) h2 (x1 ) hp (x1 ) C
B C B C
B C B C B 1 C
B y2 C B h1 (x2 ) h2 (x2 ) ··· hp (x2 ) C B C
B C =B C B ··· C
B C B C B C
B ··· C B C @ A
B C B C
@ A @ A p
p⇥1
yn h1 (xn ) h2 (xn ) ··· hp (xn )
n⇥1 n⇥p
10
Understand how R counts the degree-of-feedom.
• To generate a cubic spline basis for a given set of xi ’s, you can use the
command bs.
• Or you can tell R the df. Recall that a cubic spline with m knots has
m + 4 df, so we need m = df 4 knots. By default, R puts knots at the
1/(m + 1), . . . , m/(m + 1) quantiles of x1:n .
11
How R counts the df is a little confusing. The df in command bs actually
means the number of columns of the design matrix returned by bs . So if the
intercept is not included in the design matrix (which is the default), then the
df in command bs is equal to the real df minus 1.
So the following three design matrices (the first two are of n ⇥ 5 and the last
one is of n ⇥ 6) correspond to the same regression model with cubic splines of
df 6.
12
• To generate a NCS basis for a given set of xi ’s, use the command ns.
• Recall that the linear functions in the two extreme intervals are totally
determined by the other cubic splines. So data points in the two extreme
intervals (i.e., outside the two boundary knots) are wasted since they do
not a↵ect the fitting. Therefore, by default, R puts the two boundary knots
as the min and max of xi ’s.
• You can tell R the location of knots, which are the interior knots. Recall
that a NCS with m knots has m df. So the df is equal to the number of
(interior) knots plus 2, where 2 means the two boundary knots.
13
• Or you can tell R the df. If intercept = TRUE, then we need m = df 2
knots, otherwise we need m = df 1 knots. Again, by default, R puts
knots at the 1/(m + 1), . . . , m/(m + 1) quantiles of x1:n .
• The following three design matrices (the first two are of n ⇥ 3 and the last
one is of n ⇥ 4) correspond to the same regression model with NCS of df 4.
14
Choice of Knots
2
• AIC/BIC/Radj
• m-fold CV (cross-validation)
15
Summary: Regression Splines
• Use LS to fit a spline model: Specify the DFa p, and then fit a regression
model with a design matrix of p columns (including the intercept).
• How to do it in R?
16
Smoothing Splines
• In Regression Splines (let’s use NCS), we need to choose the number and
the location of knots.
• Next we’ll see how smoothing splines are derived from a di↵erent aspect.
19
Roughness Penalty Approach
• Let S[a, b] be the space of all “smooth” functions defined on [a, b].
• Among all the functions in S[a, b], look for the minimizer of the following
penalized residual sum of squares
n
X Z b
00
2
RSS(g, ) = [yi g(xi )] + [g (x)]2 dx, (1)
i=1 a
20
(WLOG, assume n 2.) Let g be a function on [a, b] and g̃ be a NCS with
Then
Z Z
00 00
2 2
g g̃ (⇤)
21
Smoothing Splines
Pn
Write g(x) = i=1 i hi (x) where hi ’s are basis functions for NCS with knots
at x1 , . . . , xn .
n
X
[yi g(xi )]2 = (y F )t (y F ),
i=1
Rb 00 00
where ⌦n⇥n with ⌦ij = a
hi (x)hj (x)dx.
22
So
t
RSS( , ) = (y F )t (y F )+ ⌦ ,
= (Ft F + ⌦) 1
Ft y
23
• Demmler & Reinsch (1975): a basis with double orthogonality property, i.e.
Ft F = I, ⌦ = diag(di ),
where d1 = d2 = 0 (Why?).
ˆ = (Ft F + ⌦) 1
Ft y
1
= (I + diag(di )) Ft y,
i.e.,
ˆi = 1 ˆ(LS) .
1 + di i
24
• Smoother matrix S
ŷ = F ˆ = F(Ft F + ⌦) 1
Ft y = S y.
25
Choice of
• Leave-one-out CV
n
1X
CV( ) = [ yi ĝ [ i]
(xi )]2
n i=1
n ✓ ◆2
1X yi ĝ(xi )
= .
n i=1 1 S (i, i)
• Generalized CV
Xn ✓ ◆2
1 yi ĝ(xi )
GCV( ) = 1
n i=1 1 n trS
26
Summary: Smoothing Splines
• Start with a model with the maximum complexity: NCS with knots at n
(unique) x points.
• How to do it in R?
27
Weighted Smoothing Splines
Suppose the first two obs have the same x value, i.e.,
Then
2
X
⇥ ⇤2 ⇥ ⇤2 ⇥ y1 + y2 y 1 + y2 ⇤2
y1 g(x1 ) + y2 g(x1 ) = yi + g(x1 )
i=1
2 2
y 1 + y2 2 y 1 + y2 2
= y1 + y2
2 2
⇥ y1 + y 2 ⇤2
+2 g(x1 )
2
28