0% found this document useful (0 votes)

21 views

Worksheet For Quiz

This document contains a worksheet on spectral clustering, PCA, and linear regression. It provides 5 questions with solutions. Question 1 asks about graph Laplacians, question 2 uses PCA to represent 4 data points in 1 dimension. Question 3 performs linear regression on 3 datapoints using least squares, regularized least squares, and Bayesian approaches. Question 4 proves that a graph Laplacian matrix is positive semi-definite. Question 5 identifies the correct formula to reconstruct data points from their first principal component.

Uploaded by

Aniket Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Worksheet For Quiz

Uploaded by

Aniket Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Worksheet on “Spectral Clustering, PCA and Linear

Regression”
PRML – CS5691 (Jul–Nov 2023)
October 27, 2023

1. (5 marks) Consider the undirected, weighted graph given below. (a) Write down the Laplacian
matrix L for this graph; (b) Write down the largest set of orthonormal eigen vectors of this graph
Laplacian with eigen value 0, and (c) Redo part (b) for a graph where the edge (3,4) is removed.

 
100 −100 0 0 0 0
−100 150 −50 0 0 0 
 
 0 −50 150 −100 0 0 
Solution: (a) L =  
 0
 0 −100 110 −10 0 
 0 0 0 −10 110 −100
0 0 0 0 −100 100
√
(b) The set comprising a single vector, which is the all ones vector divided by 6 is the solution.
Eigen value 0 has multiplicity 1 (the number of connected components in the graph), so there
is no other orthonormal set with more than one eigen vector for eigen value 0.
(c) Eigen value 0 has multiplicity 2. The two corresponding eigen vectors are the indicator
vectors of the two components in the modified graph (normalized by the square root of the
corresponding component’s size), i.e., the vectors √13 [1 1 1 0 0 0]T , and √13 [0 0 0 1 1 1]T .

2. (5 marks) Consider the following data matrix, representing four sample points xn ∈ R2

 
4 1
2 3
X=
5

4
1 0

Use principal component analysis (PCA) to represent the above data in only one direction. Re-
port PC1 from the dataset, PC1-based representation of the last datapoint x4 = [1 0]T , and the
reconstruction error of x4 .

1
Solution:

1. Mean µ = 3 2

2. Mean centered data:

 
1 −1
′
−1 1
X =
2

2
−2 −2

3. Covariance matrix:

′ T ′10 6
(X ) X =
6 10

4. λ = 16, 4
5. Eigenvector corresponding to largest eigenvalue (16) is P C1 = [ √12 , √12 ]T .
Project x′4 = [−2 − 2]T onto this PC1, and add mean vector back to find the PC1-based
reconstruction/representation of x4 . Also calculate the resulting reconstruction error of
this datapoint.

3. (6 marks) Consider the dataset of three datapoints below, and we would like to predict y using x.
   
1 2 3
X = 1 4 Y = 6
1 6 7
(a) Perform least squares regression and report the resulting regression coefficients wLS .
(b) Perform regularised least squares regression and find the optimal regression coefficients wRLS ,
where the weight of the penalty term λ is assumed to be 1.
(c) Assume a maximum likelihood approach for linear regression. Under this setting, for a new
datapoint xnew = [1 1]T :
(i) What is the predicted ynew value? Show your calculation.
(ii) What is the uncertainty around ynew (in terms of the estimated variance of ynew |xnew )?
(iii) Can you use this variance to not just report a single predicted value for ynew , but an interval
that has 95% probability of containing the true ynew value? If so, specify this interval.
(Note: Assume that the regression coefficients or its distribution estimated from the data
are correct, and use the fact that approximately 95% of the values sampled from a normal
distribution lie within two standard deviations from the mean.)
(d) Answer the three sub-parts of part (3) above when Bayesian linear regression approach is used
instead of MLE approach. Assume parameters α (precision parameter of Gaussian prior of wi )
and β (precision parameter of the Gaussian y|x) to each be 1.

Solution:

T 3 12
(a) X X =
12 56

56 −12
(X T X)−1 = 24
1
−12 3
T −1 T 1 32
wLS = (X X) X Y = 24
24

Page 2
(b) Repeat above calculation but with 1 added to each diagonal entry of X T X.
(c) (i) Since wLS = wM L , we have ynew = xTnew wLS = [1 1]wLS = (32 + 24)/24.
(ii) Use the formula for (1/β̂M L ) from slides, which is basically the average of the squared
residuals of the three datapoints, to calculate this variance.
Yes, the interval is [µ̂ − 2σ̂, µ̂ + 2σ̂], where µ̂ is predicted ynew from part (a) and σ̂ is
(iii) q
1/β̂ from part (b) above. Substitute these values from the above parts and simplify
the interval.
(d) The solution for Bayesian linear regression is similar, with the main difference being that the
mean is calculated using wRLS instead of wLS , and the variance of the predictive distribution
being a function of xnew instead of being the same for all x.
(i) Since wRLS = wM AP = wmean−of −posterior , we have ynew = xTnew wRLS = . . . (com-
plete calculation).
(ii) Refer to slides for the formulas involving parameters mN , SN for the posterior of w
and results for the posterior predictive distribution. Verify that those formulas applied
for the current problem yields: var(ynew |xnew ) = 1 + xTnew SN xnew = 1 + xTnew (I +
X T X)xnew . Complete the calculation.
(iii) Yes, the interval is [µ̂ − 2σ̂, µ̂ + 2σ̂], where µ̂ is predicted ynew from part (a) and σ̂ is
the variance from part (b) above. Substitute these values from the above parts and
simplify the interval.

Check how numerically different the Bayesian vs. MLE based linear regression answers
above are.

4. (2 marks) Prove that a Laplacian matrix of an undirected simple graph is positive semi-definite.

Solution: We have already seen this in class using adjacency matrix representation A of a
graph.
Another way to prove this is using the incidence matrix representation of a graph. If B is the
incidence matrix of an orientation of the edges of G (each edge (i, j) in an undirected graph is
only represented in one of the directions, either (i, j) or (j, i), but not both), then we can show
L = BB T . So xT Lx = ||Bx||2 ≥ 0.
Some students have asked for additional reference on Laplacian. One such reference is here:
https://www.cs.yale.edu/homes/spielman/561/2009/lect02-09.pdf

5. (4 marks) We would like to reduce the dimensionality of a set of data points DN = {xn }N n=1 using
PCA. Let ui denote the ith PC of the dataset, and x̄ the average of all the datapoints in DN .
Let each xn ∈ R3 . Now, choose all formula(s) from below that will correctly compute the PC1-based
reconstruction x̃n of xn ? Justify your answer.

(F 1) : x̃n = (xTn u1 )u1 + (x̄T u2 )u2 + (x̄T u3 )u3

(F 2) : x̃n = x̄ + ((xn − x̄)T u1 )u1
(F 3) : x̃n = (xTn u1 )u1 + (xTn u2 )u2 + (x̄T u3 )u3
(F 4) : x̃n = (x̄T u1 )u1 + (xTn u2 )u2 + (xTn u3 )u3

Page 3
Solution: F1 and F2. Both will give the same answer, because we’ve proved a more general
statement in class below and we let D = 3, M = 1 in this proof to show that F1 and F2 above
are the same.
If xn ∈ RD and x̃n is a reconstruction of xn using only the top-M PCs of the dataset, then we’ve
showed this proof in class:
M
X D
X
x̃n = (xTn ui )ui + (x̄T ui )ui (reconstn. formula from class slides)
i=1 i=M +1
M
X D
X
= x̄ − x̄ + (xTn ui )ui + (x̄T ui )ui (add and subtract x̄)
i=1 i=M +1
D
X M
X D
X
= x̄ − (x̄T ui )ui + (xTn ui )ui + (x̄T ui )ui (represent x̄ in terms of basis vectors {ui } of RD )
i=1 i=1 i=M +1
M
X
= x̄ + ((xn − x̄)T ui )ui
i=1

Geometrically also, you can try to visualize the above proof for D=2 and M=1 as below.

6. (2 marks) Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ, X3 = Gender (1 for
Female and 0 for Male), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA
and Gender. The response is starting salary after graduation (in thousands of dollars). Suppose we
use least squares to fit the model and get β̂0 = 50, β̂1 = 20, β̂2 = 0.07, β̂3 = 35, β̂4 = 0.01, β̂5 = −10.
For a fixed value of IQ and GPA, females earn more on average than males provided that the GPA
is high enough. Is this statement correct? Justify.

Solution: False.
The least squares line is given by, ŷ = 50 + 20 (GPA) + 0.07 (IQ) + 35 (Gender) + 0.01 (GPA
× IQ) - 10 (GPA × Gender)
which becomes for the males, ŷ = 50 + 20 (GPA) + 0.07 (IQ) + 0.01 (GPA × IQ),
and for the females it becomes, ŷ = 85 + 10 (GPA) + 0.07 (IQ) + 0.01 (GPA × IQ).
So the starting salary for males is higher than for females on average if 50 + 20 (GPA) ≥ 85 +
10 (GPA), which is equivalent to GPA ≥ 3.5.

Page 4
7. (4 marks) Consider the dataset D in the table below.

n x (or xn ) y (or yn )
#1 1 1
#2 2 2
#3 4 3
#4 5 4
#5 6 4

(a) Find the line y = wx + b that minimizes the squared vertical distance of the datapoints to the
line (i.e., the squared errors in y). Specifically, find the w, b that minimizes
5
X
((wxn + b) − yn )2
n=1
.
(b) Find the line y = mx + c that minimizes the squared perpendicular distance (i.e., shortest
distance) of the datapoints to the line. Specifically, find the m, c that minimizes
5
X ((mxn + c) − yn )2
n=1
m2 + 1
.
(Note: Distance of a point to a line from geometry is used to get each summation term above.)
(c) The two minimization problems above are each related to which ML task seen in class?

Solution: You can verify that linear regression least-squares solution (wLS formula seen in
class) will solve part (a), and PCA u1 (PC1) formula will solve part (b). Use matrix notations
and relevant matrix-vector-based formulas from class to obtain a quick solution to both parts.
Doing the usual ”setting the gradient to zero” approach to these minimization problems will
also work, but may take longer time.

End Note: Please also go through Assignment 2 questions and tutorials related to Spectral
clustering, PCA, and Linear Regression.

Page 5

MAST90083 2021 S2 Exam Paper
No ratings yet
MAST90083 2021 S2 Exam Paper
4 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
EC400 Slides Lecture 1
No ratings yet
EC400 Slides Lecture 1
44 pages
hw01s
No ratings yet
hw01s
10 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
HW 1
No ratings yet
HW 1
3 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Final Exam AIML2023
No ratings yet
Final Exam AIML2023
3 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
MATH3714-Jan-2024 (1)
No ratings yet
MATH3714-Jan-2024 (1)
9 pages
endsem_ML_regular_AK
No ratings yet
endsem_ML_regular_AK
7 pages
hw3_red
No ratings yet
hw3_red
4 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
P05 LinearRegression SolutionNotes
No ratings yet
P05 LinearRegression SolutionNotes
4 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
unit-5-ad3491-fundamentals-of-data-science-unit-5-notes (1)
No ratings yet
unit-5-ad3491-fundamentals-of-data-science-unit-5-notes (1)
24 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
hw3
No ratings yet
hw3
7 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
endsem_ML_makeup_AK-_1_
No ratings yet
endsem_ML_makeup_AK-_1_
7 pages
cs419endsemsols
No ratings yet
cs419endsemsols
6 pages
Quiz3_2023
No ratings yet
Quiz3_2023
2 pages
DAMA_50_exam_final_23-24
No ratings yet
DAMA_50_exam_final_23-24
8 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Ps 1
No ratings yet
Ps 1
5 pages
Ps 1
No ratings yet
Ps 1
5 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
final2008f-solution
No ratings yet
final2008f-solution
18 pages
CS-30004(DSA)-CS_END_NOV_2024
No ratings yet
CS-30004(DSA)-CS_END_NOV_2024
17 pages
final2008f-solution_SVM_PCA_HMM_BN
No ratings yet
final2008f-solution_SVM_PCA_HMM_BN
18 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
Unit 2 ML_Ver 2
No ratings yet
Unit 2 ML_Ver 2
129 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
HW3-2
No ratings yet
HW3-2
4 pages
PRML Test 2
No ratings yet
PRML Test 2
3 pages
worksheet_2 (2)
No ratings yet
worksheet_2 (2)
5 pages
Question Bank
No ratings yet
Question Bank
6 pages
Practice_Problems_for_ML_Midterms
No ratings yet
Practice_Problems_for_ML_Midterms
5 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
Exam 2 2223
No ratings yet
Exam 2 2223
4 pages
Final Compre - Solutions - updated FoDS
No ratings yet
Final Compre - Solutions - updated FoDS
12 pages
ES_key (2)
No ratings yet
ES_key (2)
6 pages
LAforAIML 2
No ratings yet
LAforAIML 2
3 pages
IE506 Assignment1
No ratings yet
IE506 Assignment1
2 pages
ML4N_exam_sample_2024
No ratings yet
ML4N_exam_sample_2024
6 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Mid-Term A2 ML Solution
No ratings yet
Mid-Term A2 ML Solution
7 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
TABLE OF SPECIFICATION ( MATH -8 ) (1)
No ratings yet
TABLE OF SPECIFICATION ( MATH -8 ) (1)
2 pages
Ms_Faust_slopeintform_notes
No ratings yet
Ms_Faust_slopeintform_notes
7 pages
Solutions To The 83rd William Lowell Putnam Mathematical Competition Saturday, December 3, 2022
No ratings yet
Solutions To The 83rd William Lowell Putnam Mathematical Competition Saturday, December 3, 2022
7 pages
CalcIII PartialDerivatives Solutions
No ratings yet
CalcIII PartialDerivatives Solutions
37 pages
Matrices (QM) Ii
No ratings yet
Matrices (QM) Ii
23 pages
Introduction To Mechanics (B.SC) Engineering Mechanics Ch04 - Kinematics
100% (1)
Introduction To Mechanics (B.SC) Engineering Mechanics Ch04 - Kinematics
17 pages
Algorithm - Multiply Polynomials - Stack Overflow
No ratings yet
Algorithm - Multiply Polynomials - Stack Overflow
4 pages
Quad Set
No ratings yet
Quad Set
12 pages
Maths Syllabus
No ratings yet
Maths Syllabus
3 pages
09 Polynomials
No ratings yet
09 Polynomials
15 pages
Math 10 Week 7 Las 3
No ratings yet
Math 10 Week 7 Las 3
2 pages
Arihant Maths NDA.pdf - Mathematics - Notes - Teachmint 4
No ratings yet
Arihant Maths NDA.pdf - Mathematics - Notes - Teachmint 4
51 pages
Double-And Half-Angle Identities: Use A Double-Angle Identity To Find The Exact Value of Each Expression
No ratings yet
Double-And Half-Angle Identities: Use A Double-Angle Identity To Find The Exact Value of Each Expression
10 pages
Hermit Polynomial
No ratings yet
Hermit Polynomial
22 pages
MATH7 Q2 WEEK5 HYBRID VERSION2 Enhanced
No ratings yet
MATH7 Q2 WEEK5 HYBRID VERSION2 Enhanced
16 pages
06 Finite Elements Basics
No ratings yet
06 Finite Elements Basics
28 pages
Solving Linear Equations 8: Chapter Outline
No ratings yet
Solving Linear Equations 8: Chapter Outline
40 pages
Graphing Polynomials Student Guided Notes
No ratings yet
Graphing Polynomials Student Guided Notes
11 pages
E1Qalg A
No ratings yet
E1Qalg A
2 pages
Limit at Infinity
No ratings yet
Limit at Infinity
9 pages
Galois Theory and Applications Solved Exercises and Problems (Mohamed Ayad) (Z-Library)
No ratings yet
Galois Theory and Applications Solved Exercises and Problems (Mohamed Ayad) (Z-Library)
453 pages
G 2 Construction
No ratings yet
G 2 Construction
9 pages
Functional Analysis: Prof. Dr. Patrick Dondl Adapted From Lecture Notes by Josias Reppekus (TU-München) 23rd July 2016
No ratings yet
Functional Analysis: Prof. Dr. Patrick Dondl Adapted From Lecture Notes by Josias Reppekus (TU-München) 23rd July 2016
73 pages
Exploration: Multiplying Integers: Two Sets of Three Positive Items
No ratings yet
Exploration: Multiplying Integers: Two Sets of Three Positive Items
2 pages
Binomial Theorem Cpp 1
No ratings yet
Binomial Theorem Cpp 1
4 pages
Unit 4 Lesson 1 Product and Quotient rules
No ratings yet
Unit 4 Lesson 1 Product and Quotient rules
2 pages
Sectional Curvature On Mathematica
No ratings yet
Sectional Curvature On Mathematica
15 pages
Xi Formulae Complete Book
No ratings yet
Xi Formulae Complete Book
36 pages

Worksheet For Quiz

Uploaded by

Worksheet For Quiz

Uploaded by

Worksheet on “Spectral Clustering, PCA and Linear

2. Mean centered data:

(F 1) : x̃n = (xTn u1 )u1 + (x̄T u2 )u2 + (x̄T u3 )u3

You might also like