0% found this document useful (0 votes)

14 views5 pages

homework2

Uploaded by

hangyuju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

homework2

Uploaded by

hangyuju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

STAT154 Modern Statistical Prediction and Machine Learning

p-values, F statistics, logistic regression,

Lecturer: Song Mei. GSI: Ruiqi Zhang. Assignment 2 - Due on 09/22/2024

Homework submissions are expected to be in pdf format produced by LATEX.

For exercises with multiple sub-questions, please also annotate the sub-question id in your solution.
Questions solely for students enrolled in 254 are marked in question titles. Students enrolled in 154 could
ignore these questions.
For coding exercises, you shoud use Python. For submission of coding exercises: report the results and
figures produced by the simulations, and also paste the source code.

Theoretical Exercises
Q1 (Converting ⋆-values to p-values)
Let X be a random variable (on R) which has a density p(x). Assume that p(x) > 0 for any x ∈ R. Let
F1 (s) = P(X ≤ s) and F2 (s) = P(X ≥ s). Show that F1 (X) ∼ Unif([0, 1]) and F2 (X) ∼ Unif([0, 1]). (please
avoid using confusing notations P(X ≤ X))

Q2 (Deriving the log-likelihood function of logistic regression with label Y ∈ {−1, 1})
Let (xi , yi )i∈[n] ∼iid (X, Y ) ∈ Rd × {−1, 1} (this is different from the {0, 1} label in class), where Pβ (Y =
1|X) = exp(⟨X, β⟩)/(1 + exp(⟨X, β⟩)) and Pβ (Y = −1|X) = 1 − Pβ (Y = 1|X). Let the likelihood function
of the dataset be Ln (β). Please write down and simplify log Ln (β). Calculate the expression of gradient
∇β [log Ln (β)] and Hessian ∇2β [log Ln (β)].

Q3 (Projection matrix 1)
Let P 1 , P 2 ∈ Rn×n be two projection matrices (so that P i = P T 2
i , P i = P i ) with P 1 P 2 = 0. Let the rank
of P i be ri (so that we must have r1 + r2 ≤ n). Let D 1 = diag(1r1 , 0T
T T T T
n−r1 ), D 2 = diag(0r1 , 1r2 , 0n−r1 −r2 ) ∈
n×n
R be two diagonal matrices with diagonal elements in {0, 1}. We would like to show that, there exists
an orthogonal matrix U ∈ Rn×n , such that P 1 = U D 1 U T and P 2 = U D 2 U T (i.e., P 1 and P 2 are
simultaneously diagonalizable). To show this, one can proceed in the following way.

1. Show that there exists orthogonal matrices V 1 , V 2 ∈ Rn×n , such that P 1 = V 1 D 1 V T 1 and P 2 =
V 2 D2 V T
2 (hint: use the properties that the eigenvalues of projection matrices are either 0 or 1).
2. Let U 1 ∈ Rn×r1 be a submatrix of V 1 ∈ Rn×n by selecting the first r1 columns of V 1 . Let U 2 ∈ Rn×r2
T
be a submatrix of V 2 ∈ Rn×n by selecting the r1 + 1 to r1 + r2 columns of V 2 . Show that P 1 = U 1 U 1
T
and P 2 = U 2 U 2 .
T T
3. Show that U 1 U 2 = 0r1 ×r2 (use the properties that P 1 P 2 = 0, U i U i = Iri ).
4. Show that there exists U 3 ∈ Rn×(n−r1 −r2 ) , such that if we define U = [U 1 , U 2 , U 3 ] ∈ Rn×n , then U
is an orthogonal matrix.
T T
5. Show that P 1 = U D 1 U and P 2 = U D 2 U . Then this U matrix is the U matrix we would like to
find.

1
Q4 (Projection matrix 2)
Let X ∈ Rn×d with n ≥ d and assume that X has full column rank. Define P = X(X T X)−1 X T ∈ Rn×n .
Let T ⊆ {1, 2, . . . d} with |T | = t and let X T ∈ Rn×t be the submatrix of X by selecting columns with
indices in T . Let P T = X T (X T T XT )
−1
XT
T ∈ R
n×n
. Define P 1 = In − P and P 2 = P − P T . We would
like to show that P 1 and P 2 are projection matrices, rank(P 1 ) = n − d, rank(P 2 ) = d − t, and P 1 P 2 = 0.
To show this, one can proceed in the following way.

1. Show that P = P T , P T = P T 2 2
T , P = P , and P T = P T , so that P and P T are projection matrices.

2. Show that P 1 = P T 2
1 and P 1 = P 1 , so that P 1 is a projection matrix with rank n − d.

3. Show that P X = X so that P X T = X T .

4. Show that P P T = P T P = P T .
5. Show that P 2 = P T 2
2 and P 2 = P 2 , so that P 2 is a projection matrix with rank d − t.

6. Show that P 1 P 2 = 0 using the properties above.

Q5 (Showing that F statistics follows the F distribution)

Let (xi , yi )i∈[n] ⊆ Rd × R. Let X = [x1 , . . . , xn ]T ∈ Rn×d , and y = (y1 , . . . , yn )T ∈ Rn . Assume that
n ≥ d and X has full column rank. Let S ⊆ {1, 2, . . . , d} and S c = {1, 2, . . . , d} \ S with |S c | = d0 . Let
X S c ∈ Rn×d0 be the submatrix of X by selecting columns with indices in S c .
Assume that the null hypothesis to be true, so that yi = ⟨β S c , xi,S c ⟩ + εi , εi ∼iid N (0, σ 2 ) (in matrix
form, we have y = X S c β S c + ε).
Define RSS1 = minβ′ ∈Rd ∥y − Xβ ′ ∥22 and RSS0 = minβ′Sc ∈Rd0 ∥y − X S c β ′S c ∥22 . We would like to show
that RSS0 − RSS1 ∼ σ 2 · χ2 (d − d0 ), RSS1 ∼ σ 2 · χ2 (n − d), and RSS0 − RSS1 is independent of RSS1 (so
that F = [(RSS0 − RSS1 )/(d − d0 )]/[RSS1 /(n − d)] follows the F distribution).
To show this, one can proceed in the following way.

1. Show that RSS1 = εT (In − X(X T X)−1 X T )ε and RSS0 = εT (In − X S c (X T

Sc X Sc )
−1
XT
S c )ε, where
T n
ε = (ε1 , . . . , εn ) ∈ R .
2. Define P 1 = (In −X(X T X)−1 X T ) and P 2 = (X(X T X)−1 X T −X S c (X T
Sc X Sc )
−1
XT
S c ), then RSS1 =
T T
ε P 1 ε, RSS1 − RSS0 = ε P 2 ε. Use Q4 to show that, P 1 and P 2 are projection matrices with rank
respectively n − d and d − d0 , and P 1 P 2 = 0.

3. By Q3, there exists an orthogonal matrix U ∈ Rn×n such that P 1 = U D 1 U T and P 2 = U D 2 U T ,

where D 1 = diag(1T T T T T
n−d , 0d ), D 2 = diag(0n−d , 1d−d0 , 0d0 ) ∈ R
n×n
. Use this to show that

(a) εT P 1 ε ∼ σ 2 χ2 (n − d), εT P 2 ε ∼ σ 2 χ2 (d − d0 ).
(b) εT P 1 ε is independent of εT P 2 ε (hint: if ε̄ ∼ N (0, In ), then the coordinates of ε̄ are independent).

Q6 (Fisher information matrix. Question for 254)

Assume Z ∼ pθ (z) for θ ∈ Rd for general d ∈ N. The fisher information matrix is defined as I(θ) =
−EZ∼pθ [∇2θ log pθ (Z)]. Show that
1. I(θ) = EZ∼pθ [∇θ log pθ (Z)∇θ log pθ (Z)T ].
2. Let pθ (Z) be Gaussian distribution N (θ, Σ) for Σ ∈ Rd×d , give the expression of I(θ).
λZ −λ
3. Let pλ (Z) = Z! e be Poisson distribution (that is, taking θ = λ ∈ R), give the expression of I(λ).

2
Computational Exercise: Two algorithms to find the least squares solution
Recall that the least squares solution βb = arg minb ∥Xb − y∥2 is given by β b = (X ⊤ X)−1 X ⊤ y. Generally,
2
the most numerically taxing component of computing β b is computing the inverse (X ⊤ X)−1 . This inverse can
either not exist (if X is not full column rank), be numerically unstable (e.g. if X is ”poorly conditioned”),
or computationally slow (if p is large). In this exercise, we will walk through two alternative algorithms that
can be used to solve the least squares problem in these situations.

Q1
In this problem, we will use the Boston dataset, which we used during the lab, to test the algorithms. In
Python the dataset can be loaded using the sklearn library. Please install sklearn by running ”pip install
scikit-learn”, and then load the dataset and standardize the features using the following code
from s k l e a r n . d a t a s e t s import f e t c h o p e n m l
b o s t o n = f e t c h o p e n m l ( name=” Boston ” , v e r s i o n =1, a s f r a m e=True )
X raw = b o s t o n . data
y raw = b o s t o n . t a r g e t
from s k l e a r n . p r e p r o c e s s i n g import S t a n d a r d S c a l e r
s c a l e r = StandardScaler ()
s c a l e r . s e t o u t p u t ( t r a n s f o r m= ’ pandas ’ )
X raw = s c a l e r . f i t t r a n s f o r m ( X raw )
After running this code, you will get a dataframe ”X raw” with 506 rows and 13 columns, and ”y raw” with
506 rows and 1 column.
Consider a model regression median home value on six of the available features:

medv = β0 + β1 crim + β2 dis + β3 indus + β4 lstat + β5 tax + β6 rm + ε,

where ε ∼ N (0, σ 2 ). Compute β̂ = (X ⊤ X)−1 X ⊤ y, and evaluate the RSS: ∥y −X β∥

b 2 (remember to append
2
a column of ones to your data matrix to account for the intercept term).

Q2: The QR decomposition and backsubstitution

The QR decomposition of a matrix A ∈ Rn×p , which n ≥ p, expresses A = QR where Q ∈ Rn×p with
orthogonal columns (i.e. Q⊤ Q = I), and R ∈ Rp×p is an upper triangular matrix. Furthermore, recall that
setting the gradient ∇β ∥Xβ − y∥22 = 0 leads to the so-called normal equations:

X ⊤ Xβ = X ⊤ y.

Rather than multiplying either side by the inverse of X ⊤ X, if we plug in the QR decomposition of X, we
obtain

X ⊤ Xβ = X ⊤ y ⇐⇒ R⊤ Q⊤ Q Rβ = R⊤ Q⊤ y ⇐⇒ Rβ = Q⊤ y.
| {z }
I

The problem of solving equations of the form Ru = v for u, where R is a triangular matrix, can be done
efficiently using an algorithm called backward substitution. The idea of backward substitution is simple:
suppose we had a system of a equations:

2x + y + z = 3 (1)
y − 2z = 1 (2)
2z = 4. (3)

3
Algorithm 1 Backward substitution: find vector u such that Ru = v.
Require: Upper triangular matrix R ∈ Rp×p , vector v ∈ Rp .
Initialize u = 0p (the all zeros vector in Rp ).
for i = p, . . . , 1 do
t ← vi
for j > i do
t ← t − Rij · uj
end for
ui ← t/Rii
end for
return u

A natural way to solve this would be to first solve for z in eqn (3), then plug this into eqn (2) to find y, and
finally plug these both into eqn (1) to find x. This is simple precisely because this system is triangular. The
general algorithm is given in Algorithm 1.
Use a QR decomposition together with backward substitution algorithm to find the least squares esti-
mator β b , and compute the RSS ∥y − X β b ∥2 , and compare them to the results you obtained in Q1.
qr qr 2
Note: the QR decomposition can be done using the built-in function numpy.linalg.qr() in Python. You
should attempt to implement the backward substitution algorithm yourself, but if you get stuck you can get
some credit if you use pre-implemented versions: in Python, this algorithm is implemented in the function
scipy.linalg.solve triangular(R, v); in R it is implemented backsolve(R,v).

Q3: Gradient descent

Algorithm 2 Gradient descent for linear regression

Require: Data X, y, step-size α > 0, tolerance for gradient norm ϵ.
Randomly initialize β
while ∥X ⊤ Xβ − X ⊤ y∥2 > ϵ do
β ← β − 2α(X ⊤ Xβ − X ⊤ y)
end while
return β

Another method, which can be used to solve a wide variety of optimization problems, is the gradient descent
algorithm, which can be used to find a minimum x = arg minx′ f (x′ ) of a differentiable function f . The
algorithm starts at a random initialization x0 , and iteratively updates x using the iterations

x(t+1) = x(t) − α · ∇x f (x(t) )

until we reach some convergence criterion. Here α > 0 is a step-size parameter which needs to be chosen
before running the algorithm. In the context of linear regression, we can use this method to find the solution
b by applying it to the function f (β) = ∥Xβ − y∥2 . In particular, for linear regression, the gradient is given
β 2
by

∇β f (β) = 2X ⊤ Xβ − 2X ⊤ y.

This is repeated until, for example, the norm of the gradient ∥∇β f (β)∥2 is sufficiently small. The full
algorithm is given in Algorithm 2.
Implement and run the gradient descent algorithm to approximately find the least squares coefficients β
b .
gd
Also compute the RSS ∥y − X β b ∥2 . Compare your answer with the ones you obtained in the previous two
gd 2
parts. (Note: You may have to experiment with several values of step size α, though we recommend starting

4
with a very small value, e.g. α = 0.0000001, and adjust it up or down, by monitoring the decay of the loss
function or the size of the gradient. )

IMS Design and Log Analysis: Reference Manual
No ratings yet
IMS Design and Log Analysis: Reference Manual
179 pages
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
Assignment - Reading The Veldt
100% (1)
Assignment - Reading The Veldt
2 pages
Solutions For Applied Numerical Linear Algebra PDF
No ratings yet
Solutions For Applied Numerical Linear Algebra PDF
75 pages
5.2 Orthogonal Complements and Projections
No ratings yet
5.2 Orthogonal Complements and Projections
17 pages
Compare and Contrast Patterns of Written Texts Across Disciplines
67% (18)
Compare and Contrast Patterns of Written Texts Across Disciplines
4 pages
EE263s Homework 4
No ratings yet
EE263s Homework 4
11 pages
HW2 Solution
No ratings yet
HW2 Solution
9 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
Homework 7 Solutions: 5.2 - Diagonalizability
No ratings yet
Homework 7 Solutions: 5.2 - Diagonalizability
7 pages
Midterm1sol PDF
No ratings yet
Midterm1sol PDF
4 pages
MA412 Q&A Part 4
No ratings yet
MA412 Q&A Part 4
53 pages
MAST30025: Linear Statistical Models: Week 2 Lab
No ratings yet
MAST30025: Linear Statistical Models: Week 2 Lab
7 pages
Cs421 Cheat Sheet
No ratings yet
Cs421 Cheat Sheet
2 pages
Homework1 2024
No ratings yet
Homework1 2024
2 pages
2DRR00 20240126
No ratings yet
2DRR00 20240126
5 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
HW
No ratings yet
HW
6 pages
Lecture II - Docx - 12
No ratings yet
Lecture II - Docx - 12
12 pages
Homework 4 MATH2050
No ratings yet
Homework 4 MATH2050
7 pages
Math 313 (Linear Algebra) Final Exam Practice KEY
No ratings yet
Math 313 (Linear Algebra) Final Exam Practice KEY
13 pages
Homework1_solution
No ratings yet
Homework1_solution
9 pages
DAMA_50_exam_final_23-24
No ratings yet
DAMA_50_exam_final_23-24
8 pages
Homework 2 MATH2050
No ratings yet
Homework 2 MATH2050
10 pages
MA1101R 2020 Solutions
No ratings yet
MA1101R 2020 Solutions
8 pages
dis1
No ratings yet
dis1
5 pages
Tutorial 2 2023
No ratings yet
Tutorial 2 2023
10 pages
MML Book Additional Exercises
No ratings yet
MML Book Additional Exercises
9 pages
Answer Key to Exercises_LN3_ver2
No ratings yet
Answer Key to Exercises_LN3_ver2
16 pages
Ecomt Solns1
No ratings yet
Ecomt Solns1
15 pages
MAST30025: Linear Statistical Models: Solutions To Week 8 Lab
No ratings yet
MAST30025: Linear Statistical Models: Solutions To Week 8 Lab
4 pages
ES_key (2)
No ratings yet
ES_key (2)
6 pages
Degree in Data Science and Engineering Group 96/196 Linear Algebra. Test 2. December 9, 2020
No ratings yet
Degree in Data Science and Engineering Group 96/196 Linear Algebra. Test 2. December 9, 2020
4 pages
data analysis
No ratings yet
data analysis
40 pages
Final Exam - Practice Questions
No ratings yet
Final Exam - Practice Questions
6 pages
Final Review SolutionsWritten
No ratings yet
Final Review SolutionsWritten
13 pages
Midterm Solutions: SOLUTION. We Can Write F (U
100% (1)
Midterm Solutions: SOLUTION. We Can Write F (U
7 pages
NNLS1 2019 HW4 Solutions
No ratings yet
NNLS1 2019 HW4 Solutions
11 pages
Matrix Perturbation Theory
No ratings yet
Matrix Perturbation Theory
18 pages
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
No ratings yet
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
321 pages
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
No ratings yet
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
18 pages
Cheat Sheet (Regular Font) PDF
No ratings yet
Cheat Sheet (Regular Font) PDF
4 pages
HW 6 Solutions
No ratings yet
HW 6 Solutions
10 pages
Ee127-Fa2018-Mt1-El Ghaoui-Soln
No ratings yet
Ee127-Fa2018-Mt1-El Ghaoui-Soln
15 pages
CS209 Practice Problems 1 ML
No ratings yet
CS209 Practice Problems 1 ML
4 pages
LMnotes 04
No ratings yet
LMnotes 04
9 pages
Good 08 linalg-2020jan24-solution
No ratings yet
Good 08 linalg-2020jan24-solution
10 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
Practice Final Exam. Solutions
No ratings yet
Practice Final Exam. Solutions
6 pages
homework1
No ratings yet
homework1
3 pages
Col726 2302 Ass2 Solutions
No ratings yet
Col726 2302 Ass2 Solutions
6 pages
Teaching Notes 3
No ratings yet
Teaching Notes 3
10 pages
(MATH2111) (2017) (F) Final In5mue 14501
No ratings yet
(MATH2111) (2017) (F) Final In5mue 14501
12 pages
Numerical Analisis 2015
No ratings yet
Numerical Analisis 2015
357 pages
Chapter1_Numerical Analysis II 2023-2024
No ratings yet
Chapter1_Numerical Analysis II 2023-2024
30 pages
Honors Linear Algebra Final 2013
No ratings yet
Honors Linear Algebra Final 2013
15 pages
LAforAIML 2
No ratings yet
LAforAIML 2
3 pages
EE263s Homework 3
No ratings yet
EE263s Homework 3
12 pages
Exercises Session1 PDF
No ratings yet
Exercises Session1 PDF
4 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Exercise book financial maths
No ratings yet
Exercise book financial maths
95 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Id3v2.4.0 Structure
No ratings yet
Id3v2.4.0 Structure
14 pages
009 - Franco Arab - Alhamshary-Egypt
No ratings yet
009 - Franco Arab - Alhamshary-Egypt
13 pages
Student MS Project Report 4
No ratings yet
Student MS Project Report 4
30 pages
The Envy of Angels - Jaeger
No ratings yet
The Envy of Angels - Jaeger
511 pages
Teaching Bahasa Indonesia To Foreigners
No ratings yet
Teaching Bahasa Indonesia To Foreigners
12 pages
Movie Web Project Written Report
No ratings yet
Movie Web Project Written Report
10 pages
The Three-Step Technique of Essay
No ratings yet
The Three-Step Technique of Essay
3 pages
Discount Cash Register Shopping Center/mall Cash Line Receipt
No ratings yet
Discount Cash Register Shopping Center/mall Cash Line Receipt
2 pages
Adon-Olam
No ratings yet
Adon-Olam
2 pages
Karl Popper's The Three Worlds of Knowledge2
No ratings yet
Karl Popper's The Three Worlds of Knowledge2
3 pages
Dicionário de Japonês PDF
No ratings yet
Dicionário de Japonês PDF
1,726 pages
Employability Skill Annual 1
0% (1)
Employability Skill Annual 1
19 pages
Choosing The Right Statistical Test
No ratings yet
Choosing The Right Statistical Test
6 pages
2022 Unilorin Post Utme Secret Guide-1
No ratings yet
2022 Unilorin Post Utme Secret Guide-1
12 pages
HiperPlus Broch REVC
No ratings yet
HiperPlus Broch REVC
2 pages
CS508-Assignment 2 Solution Fall 2024 by M.junaid Qazi
No ratings yet
CS508-Assignment 2 Solution Fall 2024 by M.junaid Qazi
3 pages
Homework 09
No ratings yet
Homework 09
4 pages
Analysis of Algorithms
No ratings yet
Analysis of Algorithms
26 pages
stvp-stm8
No ratings yet
stvp-stm8
3 pages
Ilocano Proverbs and Sayings
100% (1)
Ilocano Proverbs and Sayings
9 pages
Little CLoud Lesson
No ratings yet
Little CLoud Lesson
2 pages
19 Year Wise SBI Clerk Prelim Mains Previous Year Solved Papers 2023 2009 5th Edition 2024 Disha Experts 1 2
No ratings yet
19 Year Wise SBI Clerk Prelim Mains Previous Year Solved Papers 2023 2009 5th Edition 2024 Disha Experts 1 2
179 pages
54 Pastoral Epistles Student Handouts
No ratings yet
54 Pastoral Epistles Student Handouts
107 pages
Duas
No ratings yet
Duas
3 pages
Introduction To C++ Graphics
No ratings yet
Introduction To C++ Graphics
17 pages
Chitra
No ratings yet
Chitra
11 pages
LS English 7 Unit 3 Test Answers
No ratings yet
LS English 7 Unit 3 Test Answers
3 pages

homework2

Uploaded by

homework2

Uploaded by

STAT154 Modern Statistical Prediction and Machine Learning

p-values, F statistics, logistic regression,

Homework submissions are expected to be in pdf format produced by LATEX.

3. Show that P X = X so that P X T = X T .

6. Show that P 1 P 2 = 0 using the properties above.

Q5 (Showing that F statistics follows the F distribution)

1. Show that RSS1 = εT (In − X(X T X)−1 X T )ε and RSS0 = εT (In − X S c (X T

3. By Q3, there exists an orthogonal matrix U ∈ Rn×n such that P 1 = U D 1 U T and P 2 = U D 2 U T ,

Q6 (Fisher information matrix. Question for 254)

medv = β0 + β1 crim + β2 dis + β3 indus + β4 lstat + β5 tax + β6 rm + ε,

where ε ∼ N (0, σ 2 ). Compute β̂ = (X ⊤ X)−1 X ⊤ y, and evaluate the RSS: ∥y −X β∥

Q2: The QR decomposition and backsubstitution

Q3: Gradient descent

Algorithm 2 Gradient descent for linear regression

x(t+1) = x(t) − α · ∇x f (x(t) )

You might also like