0% found this document useful (0 votes)

64 views

Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines

- Smoothing splines fit a regression spline model with a knot placed at each data point, rather than selecting knots. This avoids knot selection but leads to an overparameterized model. - To address overfitting, a roughness penalty is added to the residual sum of squares that penalizes the curvature of the spline. This leads to a natural cubic spline (NCS) solution with knots only at the data points. - The smoothing parameter controls the tradeoff between residual sum of squares and roughness penalty, with larger values favoring smoother fits. It is often chosen by cross-validation.

Uploaded by

Datpm

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines

Uploaded by

Datpm

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Spline Models

• Introduction to CS and NCS

• Regression splines

• Smoothing splines

3
a
Cubic Splines

• knots: a < ⇠1 < ⇠2 < · · · < ⇠m < b

• A function g defined on [a, b] is a cubic spline w.r.t knots {⇠i }m

i=1 if:

1) g is a cubic polynomial in each of the m + 1 intervals,

g(x) = di x3 + ci x2 + bi x + ai , x 2 [⇠i , ⇠i+1 ]

where i = 0 : m, ⇠0 = a and ⇠m+1 = b;

2) g is continuous up to the 2nd derivative: since g is continuous up to

the 2nd derivative for any point inside an interval, it suffices to check

g (0,1,2) (⇠i+ ) = g (0,1,2) (⇠i ), i = 1 : m.

a From now on, x 2 R is one-dimensional.

4
• How many free parameters we need to represent g? m + 4.

We need 4 parameters (d1 , ci , bi , ai ) for each of the (m + 1) intervals, but

we also have 3 constraints at each of the m knots, so

4(m + 1) 3m = m + 4.

5
Suppose the knots {⇠i }m
i=1 are given.

If g1 (x) and g2 (x) are two cubic splines, so is a1 g1 (x) + a2 g2 (x), where a1 and
a2 are two constants.

That is, for a set of given knots, the corresponding cubic splines form a linear
space (of functions) with dim (m + 4).

6
• A set of basis functions for cubic splines (wrt knots {⇠i }m
i=1 ) is given by

h0 (x) = 1; h1 (x) = x;

h2 (x) = x2 ; h3 (x) = x3 ;

hi+3 (x) = (x ⇠i )3+ , i = 1, 2, . . . , m.

• That is, any cubic spline f (x) can be uniquely expressed as

m+3
X
f (x) = 0 + j hj (x).
i=1

• Of course, there are many other choices of the basis functions. For
example, R uses the B-splines basis functions.

7
Natural Cubic Splines (NCS)

• A cubic spline on [a, b] is a NCS if its second and third derivatives are zero
at a and b.

• That is, a NCS is linear in the two extreme intervals [a, ⇠1 ] and [⇠m , b].
Note that the linear function in two extreme intervals are totally
determined by their neighboring intervals.

• The degree of freedom of NCS’s with m knots is m.

• For a curve estimation problem with data (xi , yi )ni=1 , if we put n knots at
the n data points (assumed to be unique), then we obtain a smooth curve
(using NCS) passing through all y’s.

8
Regression Splines

• A basis expansion approach:

g(x) = 1 h1 (x) + 2 h2 (x) + ··· + p hp (x),

where p = m + 4 for regression with cubic splines and p = m for NCS.

• Represent the model on the observed n data points using matrix notation,

ˆ = arg min ky F k2 ,

9
where
0 1 0 1
··· 0 1
B y1 C B h1 (x1 ) h2 (x1 ) hp (x1 ) C
B C B C
B C B C B 1 C
B y2 C B h1 (x2 ) h2 (x2 ) ··· hp (x2 ) C B C
B C =B C B ··· C
B C B C B C
B ··· C B C @ A
B C B C
@ A @ A p
p⇥1
yn h1 (xn ) h2 (xn ) ··· hp (xn )
n⇥1 n⇥p

• We can obtain the design matrix F by commands bs or ns in R, and then

call the regression function lm.

• Use K-fold CV to select the number of knots.

10
Understand how R counts the degree-of-feedom.

• To generate a cubic spline basis for a given set of xi ’s, you can use the
command bs.

• You can tell R the location of knots.

• Or you can tell R the df. Recall that a cubic spline with m knots has
m + 4 df, so we need m = df 4 knots. By default, R puts knots at the
1/(m + 1), . . . , m/(m + 1) quantiles of x1:n .

11
How R counts the df is a little confusing. The df in command bs actually
means the number of columns of the design matrix returned by bs . So if the
intercept is not included in the design matrix (which is the default), then the
df in command bs is equal to the real df minus 1.

So the following three design matrices (the first two are of n ⇥ 5 and the last
one is of n ⇥ 6) correspond to the same regression model with cubic splines of
df 6.

> bs(x, knots=quantile(x, c(1/3, 2/3)));

> bs(x, df=5);
> bs(x, df=6, intercept=TRUE);

12
• To generate a NCS basis for a given set of xi ’s, use the command ns.

• Recall that the linear functions in the two extreme intervals are totally
determined by the other cubic splines. So data points in the two extreme
intervals (i.e., outside the two boundary knots) are wasted since they do
not a↵ect the fitting. Therefore, by default, R puts the two boundary knots
as the min and max of xi ’s.

• You can tell R the location of knots, which are the interior knots. Recall
that a NCS with m knots has m df. So the df is equal to the number of
(interior) knots plus 2, where 2 means the two boundary knots.

13
• Or you can tell R the df. If intercept = TRUE, then we need m = df 2
knots, otherwise we need m = df 1 knots. Again, by default, R puts
knots at the 1/(m + 1), . . . , m/(m + 1) quantiles of x1:n .

• The following three design matrices (the first two are of n ⇥ 3 and the last
one is of n ⇥ 4) correspond to the same regression model with NCS of df 4.

> ns(x, knots=quantile(x, c(1/3, 2/3)));

> ns(x, df=3);
> ns(x, df=4, intercept=TRUE);

14
Choice of Knots

• Location of knots: to simplify this problem, we ignore the selection of

locations – by default, the knots are located at the quantiles of xi ’s.

• Number of knots: can be formulated as a variable selection problem (an

easier version, since there are just p models, not 2p ).

2
• AIC/BIC/Radj

• m-fold CV (cross-validation)

15
Summary: Regression Splines

• Use LS to fit a spline model: Specify the DFa p, and then fit a regression
model with a design matrix of p columns (including the intercept).

• How to do it in R?

• How to select the number/location of knots?

a Not the polynomial degree, but the DF of the spline, related to the number of knots.

16
Smoothing Splines

• In Regression Splines (let’s use NCS), we need to choose the number and
the location of knots.

• What’s a Smoothing Spline? Start with an easy but “horrible” solution:

put knots at all the observed data points (x1 , . . . , xn ):

yn⇥1 = Fn⇥n n⇥1 .

Instead of selecting knots, let’s do ridge-type shrinkage (⌦ will be defined

later):
h i
t
min ky F k2 + ⌦ ,

where the tuning parameter is often chosen by CV or GCV.

• Next we’ll see how smoothing splines are derived from a di↵erent aspect.

19
Roughness Penalty Approach

• Let S[a, b] be the space of all “smooth” functions defined on [a, b].

• Among all the functions in S[a, b], look for the minimizer of the following
penalized residual sum of squares
n
X Z b
00
2
RSS(g, ) = [yi g(xi )] + [g (x)]2 dx, (1)
i=1 a

where is a smoothing parameter.

• Theorem. ĝ = arg min RSS(g, ) is a NCS with knots at the n data

points x1 , . . . , xn (xi 6= xj ).

20
(WLOG, assume n 2.) Let g be a function on [a, b] and g̃ be a NCS with

g(xi ) = g̃(xi ), i = 1 : n. Does such g̃ exist?

Then
Z Z
00 00
2 2
g g̃ (⇤)

with equality only if g̃ ⌘ g.

PROOF : Let h(x) = g(x) g̃(x). So h(xi ) = 0 for i = 1, . . . , n.

Then (⇤) holds true because

Z Z Z
g 002 = g̃ 002 + h002
Z
+2 g̃ 00 h00
| {z }
=0

21
Smoothing Splines

Pn
Write g(x) = i=1 i hi (x) where hi ’s are basis functions for NCS with knots
at x1 , . . . , xn .
n
X
[yi g(xi )]2 = (y F )t (y F ),
i=1

where Fn⇥n with Fij = hj (xi ).

Z b Z hX i2
⇥ 00 ⇤2 00
g (x) dx = i hi (x) dx
a i
X Z
00 00
t
= i j hi (x)hj (x)dx = ⌦ ,
i,j

Rb 00 00
where ⌦n⇥n with ⌦ij = a
hi (x)hj (x)dx.

22
So
t
RSS( , ) = (y F )t (y F )+ ⌦ ,

and the solution is

ˆ = arg min RSS( , )

= (Ft F + ⌦) 1
Ft y

23
• Demmler & Reinsch (1975): a basis with double orthogonality property, i.e.

Ft F = I, ⌦ = diag(di ),

where d1 = d2 = 0 (Why?).

• Using this basis, we have

ˆ = (Ft F + ⌦) 1
Ft y
1
= (I + diag(di )) Ft y,

i.e.,
ˆi = 1 ˆ(LS) .
1 + di i

24
• Smoother matrix S

ŷ = F ˆ = F(Ft F + ⌦) 1
Ft y = S y.

• Using D&R basis,

⇣ 1 ⌘ t
S = Fdiag F.
1 + di
So columns of F are the eigen-vectors of S , which does not depend on .

• E↵ective df of a smoothing spline:

n
X 1
df ( ) = trS = .
i=1
1 + di

25
Choice of

• Leave-one-out CV
n
1X
CV( ) = [ yi ĝ [ i]
(xi )]2
n i=1
n ✓ ◆2
1X yi ĝ(xi )
= .
n i=1 1 S (i, i)

• Generalized CV
Xn ✓ ◆2
1 yi ĝ(xi )
GCV( ) = 1
n i=1 1 n trS

26
Summary: Smoothing Splines

• Start with a model with the maximum complexity: NCS with knots at n
(unique) x points.

• Fit a Ridge Regression model on the data. If we parameterize the NCS

function space by the DR basis, then the design matrix is orthogonal and
the corresponding coefficient is penalized di↵erently for each basis: no
penalty for the two linear basis functions, higher penalty for wigglier basis
functions.

• How to do it in R?

• How to select the tuning parameter or equivalently the df?

• What if we have collected two obs at the same location x?

27
Weighted Smoothing Splines

Suppose the first two obs have the same x value, i.e.,

(x1 , y1 ), (x2 , y2 ), where x1 = x2 .

Then
2
X
⇥ ⇤2 ⇥ ⇤2 ⇥ y1 + y2 y 1 + y2 ⇤2
y1 g(x1 ) + y2 g(x1 ) = yi + g(x1 )
i=1
2 2
y 1 + y2 2 y 1 + y2 2
= y1 + y2
2 2
⇥ y1 + y 2 ⇤2
+2 g(x1 )
2

So we can replace the first two obs by one, (x1 , y1 +y

2 ), and its weight is 2
2

while the weights for other obs are 1.

MATH1005 Final Exam From 2021
No ratings yet
MATH1005 Final Exam From 2021
14 pages
Bogazici Exam
No ratings yet
Bogazici Exam
2 pages
Metrologic MS6720 (Omni) Manual
No ratings yet
Metrologic MS6720 (Omni) Manual
164 pages
Homework 2
No ratings yet
Homework 2
4 pages
hw01 Cvxopt sp19
No ratings yet
hw01 Cvxopt sp19
3 pages
UFAZ 2022 Matlab Exercises
No ratings yet
UFAZ 2022 Matlab Exercises
10 pages
Quiz - 1: EC60128: Linear Algebra and Error Control Techniques Amitalok J. Budkuley (Amitalok@ece - Iitkgp.ac - In)
No ratings yet
Quiz - 1: EC60128: Linear Algebra and Error Control Techniques Amitalok J. Budkuley (Amitalok@ece - Iitkgp.ac - In)
5 pages
Chapter2 Algorithmic Paradigms
No ratings yet
Chapter2 Algorithmic Paradigms
55 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Chap - 02 - P2 Karnaugh
No ratings yet
Chap - 02 - P2 Karnaugh
57 pages
UNIT 2 PART 2 (1)
No ratings yet
UNIT 2 PART 2 (1)
46 pages
Math Syllabus Summary
No ratings yet
Math Syllabus Summary
12 pages
Midsem_Solution
No ratings yet
Midsem_Solution
9 pages
Kernel
No ratings yet
Kernel
3 pages
Kamza's Maths Revision Book
No ratings yet
Kamza's Maths Revision Book
17 pages
CommLab_Sp17_Lecture_6_v0
No ratings yet
CommLab_Sp17_Lecture_6_v0
4 pages
1 Year Exams Cats
No ratings yet
1 Year Exams Cats
20 pages
Bisection Method
No ratings yet
Bisection Method
25 pages
1 Introduction To Limits: Math 180 Worksheets
0% (1)
1 Introduction To Limits: Math 180 Worksheets
4 pages
Math 180 Discussion Worksheets-W1
No ratings yet
Math 180 Discussion Worksheets-W1
4 pages
2014 2 HSBM Trial Q
No ratings yet
2014 2 HSBM Trial Q
3 pages
Physics 236a Assignment, Week 4
No ratings yet
Physics 236a Assignment, Week 4
3 pages
New Light Mechanics 2025
No ratings yet
New Light Mechanics 2025
317 pages
ˆ β = (X X) X y: dyˆ i,i dy
No ratings yet
ˆ β = (X X) X y: dyˆ i,i dy
7 pages
QIC3
No ratings yet
QIC3
4 pages
MECH4450 Introduction To Finite Element Methods
No ratings yet
MECH4450 Introduction To Finite Element Methods
16 pages
ITF One Shot
No ratings yet
ITF One Shot
220 pages
MTL 104: Linear Algebra: Department of Mathematics Minor Max. Marks 30
No ratings yet
MTL 104: Linear Algebra: Department of Mathematics Minor Max. Marks 30
2 pages
Least Square Fit Line
No ratings yet
Least Square Fit Line
6 pages
Integration: Area and Estimating Finite Sums
No ratings yet
Integration: Area and Estimating Finite Sums
12 pages
Tangent and Cotangent Function g3
No ratings yet
Tangent and Cotangent Function g3
20 pages
Assignment 1: Log (1 2x) X 2 1 X
No ratings yet
Assignment 1: Log (1 2x) X 2 1 X
2 pages
Coursework2 Problems
No ratings yet
Coursework2 Problems
4 pages
ravishankarIJCMS17 20 2010 PDF
No ratings yet
ravishankarIJCMS17 20 2010 PDF
10 pages
Theorist's Toolkit Lecture 8: High Dimensional Geometry and Geometric Random Walks
No ratings yet
Theorist's Toolkit Lecture 8: High Dimensional Geometry and Geometric Random Walks
8 pages
ml_cheat (1)
No ratings yet
ml_cheat (1)
9 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Mathematical Tripos: at The End of The Examination
No ratings yet
Mathematical Tripos: at The End of The Examination
26 pages
A Short Proof of The Error Term in Simpson's Rule: Hajrudin Fejzi C August 28, 2017
No ratings yet
A Short Proof of The Error Term in Simpson's Rule: Hajrudin Fejzi C August 28, 2017
2 pages
4 Dirik Kamil Mat - Slovaca
No ratings yet
4 Dirik Kamil Mat - Slovaca
10 pages
C52-Applications of Integration - Part2
No ratings yet
C52-Applications of Integration - Part2
19 pages
Nonlinearequations
No ratings yet
Nonlinearequations
22 pages
A Corner of Mathematical Olympiad and Competition Book II: Phnom Penh, October 24, 2014 Prepare By: Keo Sodara
No ratings yet
A Corner of Mathematical Olympiad and Competition Book II: Phnom Penh, October 24, 2014 Prepare By: Keo Sodara
650 pages
MIT18 100BF10 Pset7
No ratings yet
MIT18 100BF10 Pset7
2 pages
Lec Note 6 2024
No ratings yet
Lec Note 6 2024
4 pages
Sol 5
No ratings yet
Sol 5
6 pages
Conic Programming in GAMS
No ratings yet
Conic Programming in GAMS
36 pages
550-Spr2022-hw6 (1)
No ratings yet
550-Spr2022-hw6 (1)
3 pages
Chapter 9 - Integration
No ratings yet
Chapter 9 - Integration
9 pages
EJC_H2_2022_Prelim_P1
No ratings yet
EJC_H2_2022_Prelim_P1
4 pages
Hough Transform
No ratings yet
Hough Transform
16 pages
GEC 210 Part 1 - Tutorial Questions
No ratings yet
GEC 210 Part 1 - Tutorial Questions
2 pages
GEC 210 Part 1 - Tutorial Questions
No ratings yet
GEC 210 Part 1 - Tutorial Questions
2 pages
Variatinal Quantum Algorithms Lesson 4
No ratings yet
Variatinal Quantum Algorithms Lesson 4
17 pages
Chap6_st (2)
No ratings yet
Chap6_st (2)
70 pages
1-sem-ESA_MODEL QP Intro cal
No ratings yet
1-sem-ESA_MODEL QP Intro cal
3 pages
alok mock
No ratings yet
alok mock
13 pages
STA 2311 Statistical Prgramming II - MARCH2016
No ratings yet
STA 2311 Statistical Prgramming II - MARCH2016
4 pages
Unit01_solutions
No ratings yet
Unit01_solutions
12 pages
Universiti Tun Hussein Onn Malaysia Final Examination Semester Ii SESSION 2010/2011
No ratings yet
Universiti Tun Hussein Onn Malaysia Final Examination Semester Ii SESSION 2010/2011
23 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Make A Simple Frequency Counter Using PIC16F877A (Working and Tested) - Ehobbyprojects
No ratings yet
Make A Simple Frequency Counter Using PIC16F877A (Working and Tested) - Ehobbyprojects
12 pages
Counter
No ratings yet
Counter
11 pages
TBS1000B and TBS1000B EDU Oscilloscope User Manual
No ratings yet
TBS1000B and TBS1000B EDU Oscilloscope User Manual
154 pages
Exercises Principle Electronics Communications
No ratings yet
Exercises Principle Electronics Communications
5 pages
Danh Sach Ebooks
No ratings yet
Danh Sach Ebooks
12 pages
Book Your Flight: Via Catone 21, Vatican City - Prati, 00192 Rome (Show Map)
No ratings yet
Book Your Flight: Via Catone 21, Vatican City - Prati, 00192 Rome (Show Map)
3 pages
Computer Networks Prof. Sujoy Ghosh Department of Computer Science and Engineering: Indian Institute of Technology, Kharagpur Lecture - 7 Switches-1
No ratings yet
Computer Networks Prof. Sujoy Ghosh Department of Computer Science and Engineering: Indian Institute of Technology, Kharagpur Lecture - 7 Switches-1
43 pages
I 2 C
100% (1)
I 2 C
88 pages
Aodv Guide
100% (1)
Aodv Guide
7 pages
Proteus and GSM With GPRS
No ratings yet
Proteus and GSM With GPRS
2 pages
Al Alloy & CCA Catalog
No ratings yet
Al Alloy & CCA Catalog
8 pages
Oct 1, Pennywise - Castlegar, Slocan Valley
No ratings yet
Oct 1, Pennywise - Castlegar, Slocan Valley
40 pages
Yamaha Co Hamamatsu KX r730
No ratings yet
Yamaha Co Hamamatsu KX r730
2 pages
1 - PPT Module (20 Files Merged)
No ratings yet
1 - PPT Module (20 Files Merged)
714 pages
Delfi Spec LimbProtectionSleeve
No ratings yet
Delfi Spec LimbProtectionSleeve
2 pages
01 Digitial DigReport Eg1 REPORT Digital Economy 27 December 2017 FINAL
No ratings yet
01 Digitial DigReport Eg1 REPORT Digital Economy 27 December 2017 FINAL
36 pages
Question and Answer - 25
No ratings yet
Question and Answer - 25
30 pages
621111 Gdu Berri Tbe Dtl Eng_rev A
No ratings yet
621111 Gdu Berri Tbe Dtl Eng_rev A
2 pages
Fasika Mekonnen
No ratings yet
Fasika Mekonnen
101 pages
Apparel Quality Management: Assignment No.1
No ratings yet
Apparel Quality Management: Assignment No.1
11 pages
01wknm15_week01_2015
No ratings yet
01wknm15_week01_2015
110 pages
Schmidt Hammer
No ratings yet
Schmidt Hammer
3 pages
Maxon GP22A Specs
No ratings yet
Maxon GP22A Specs
1 page
820 Series TDS
100% (1)
820 Series TDS
2 pages
Nike v. Skechers
No ratings yet
Nike v. Skechers
14 pages
Experimental Study of Concrete Filled Tubular Short Columns
100% (1)
Experimental Study of Concrete Filled Tubular Short Columns
7 pages
Income Tax Compliance
No ratings yet
Income Tax Compliance
4 pages
Quality of Work Life MANOJ BABU (12
No ratings yet
Quality of Work Life MANOJ BABU (12
46 pages
Negotiation Skills Part 1 - Training Agenda
No ratings yet
Negotiation Skills Part 1 - Training Agenda
6 pages
5000 Series Brochure - V1.1
No ratings yet
5000 Series Brochure - V1.1
12 pages
Unfair Trade Practices
100% (1)
Unfair Trade Practices
35 pages
Waiver For Additional Teacher
No ratings yet
Waiver For Additional Teacher
1 page
Femtocell, A Key Element in Mobile Broadband: Jean-Baptiste Vezin
No ratings yet
Femtocell, A Key Element in Mobile Broadband: Jean-Baptiste Vezin
8 pages
Helicopter Sizing and Calculations
0% (1)
Helicopter Sizing and Calculations
117 pages
Application Migration Plan
100% (1)
Application Migration Plan
46 pages
Khawaja Sameer L1F17BSAF0046 Submitted To Sir Abid Noor
No ratings yet
Khawaja Sameer L1F17BSAF0046 Submitted To Sir Abid Noor
3 pages
Assignment No. 01
No ratings yet
Assignment No. 01
14 pages
Misc Berhad: Prepared By: Goh Yi-Kheng S2100835 Nur Athirah Ahmad S2117928 Nur Khairunnisa Zainal Abidin S2001111
No ratings yet
Misc Berhad: Prepared By: Goh Yi-Kheng S2100835 Nur Athirah Ahmad S2117928 Nur Khairunnisa Zainal Abidin S2001111
38 pages
Roll On Ingredients
No ratings yet
Roll On Ingredients
5 pages