0% found this document useful (0 votes)
254 views

Ps 0

This document summarizes key concepts and proofs from CS 229: Machine Learning Problem Set 0. It addresses calculating gradients, Hessians, and properties of symmetric positive semi-definite matrices. Key results shown include formulas for the gradient and Hessian of functions of matrix forms, properties of eigenvectors and eigenvalues of diagonalizable matrices, and proofs that eigenvalues of symmetric positive semi-definite matrices are non-negative.

Uploaded by

WilliamMa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
254 views

Ps 0

This document summarizes key concepts and proofs from CS 229: Machine Learning Problem Set 0. It addresses calculating gradients, Hessians, and properties of symmetric positive semi-definite matrices. Key results shown include formulas for the gradient and Hessian of functions of matrix forms, properties of eigenvectors and eigenvalues of diagonalizable matrices, and proofs that eigenvalues of symmetric positive semi-definite matrices are non-negative.

Uploaded by

WilliamMa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CS 229: Machine Learning

Problem Set 0

William Ma

July 20, 2017


1 Question 1
1a Part a
Given f (x) = 12 xT Ax + bT x where A is a symmetric matrix and and b Rn is a vector,
we can calculate x f (x) by taking the partial derivative
n n n
f (x) h1 X X X i
= xi Aij xj + bi xi
xk xk 2
i=1 j=1 i=1
1h X X X X i
= Aij xi xj + Aik xi xk + Akj xk xj + Akk xk
xk 2
i6=k j6=k i6=k j6=k
n
X
+ bi xi
xk
i=1
1X 1X
= Aik xi + Akj xj + Akk x2k + bk
2 2
i6=k j6=k
n n
1 X 1 X
= Aik xi + Akj xj + bk
2 2
i=1 j=1
n
X
= Aik xi + bk
i=1

Now we can easily see that, if x f (x) = 2Ax + b

1b Part b
Given that f (x) = g(h(x)), where g : R R is differentiable and h : Rn R is
differentiable, we can expand f (x) to arrive at the solution

f (x)
= g(h(x))
xk xk
By invoking Chain Rule,

f (x)
= g 0 (h(x)) h(x)
xk xk
Combining these back into a vector,
0
g (h(x)) x 1 h(x)

f (x) = .. 0
= g (h(x))h(x)

.
g 0 (h(x)) xn h(x)

1
1c Part c
Given f (x) = 12 xT Ax + bT x where A is a symmetric matrix and and b Rn is a vector,
we can calculate the Hessian as follows
n n n
2 f (x) 2 h 1 X X X i
= x i A ij x j + bi x i
x2k x2k 2 i=1 j=1 i=1
2 1h X X X X i
= Aij xi xj + Aik xi xk + Akj xk xj + Akk x2k
x2k 2
i6=k j6=k i6=k j6=k
n
2 X
+ bi xi
x2k i=1
1X 1X
= Aik + Akj + 2Akk xk
2 2
i6=k j6=k
n n
1 X 1 X
= Aik + Akj
2 2
i=1 j=1
n
X
= Aik
i=1

Thus, the 2 f (x) = A.

1d Part d
Given f (x) = g(aT x), where g : R R is continuously differentiable and a Rn is a
vector, we can calculate f (x) using the result we got from problem 1a and 1b

f (x) = g 0 (aT x) (aT x)


= g 0 (aT x)a

However, for the Hessian, we have to expand, apply Chain rule to each term, then
recombine back into a vector.
2 f (x) 2
= xj g(aT x)
xi xj xk xi
n n
00 T X X
= g (a x) ak xk al xl
xi xj
k=1 l=1
00 T
g 00 (aT x)a1 an

g (a x)a1 a1 . . .
00 T
= g (a x)ai aj =
.. .. ..
. . .
g 00 (aT x)an a1 . . . g 00 (aT x)an an
= g 00 (aT x)aaT

Thus, 2 f (x) = g 00 (aT x)aaT .

2
2 Problem 2
2a Part a
nn
Proof. Given z Rn and that A = zz T , A S+ if A = AT and xT Ax 0.
A = AT
zz T = (zz T )T
zz T = (z T )T z T = zz t
Thus, A = AT .
xT Ax 0
xT zz T x 0
(xT z)(xT z)T 0
(xT z)2 0
nn
Thus, since A = AT and xT Ax 0, A S+ .

2b Part b
Given z Rn is a non-zero vector and A = zz t , the null-space of A is 1 since, Ax = 0
only when x is orthogonal to z, which implies that z T x = 0 as shown.
Ax = 0
zz T x = 0
z(0) = 0
Thus, the null-space is 1. Using the rank-nullity theorem, the rank of A is n 1.

2c Part c
Proof. Given A Snn
+ and B Rmn is arbitary,
BAB T = (BAB T )T
BAB T = (B T )T AT B T
BAB T = BAB T
Thus, BAB T = (BAB T )T .
xT BAB T x 0
(xT B)A(xT B)T 0
Since A Snn T T T T T
+ , then yAy 0. We can simply let y = x B for (x B)A(x B) 0 to
T T T T T T mm
be true. Thus, since BAB = (BAB ) and x BAB x 0, BAB S+ .

3
3 Problem 3
3a part a
Proof. Given that A is diagonalizable, such that A = T T 1 , and t(i) Rn is the i-th
column of T ,

At(i) = T T 1 t(i)

The inverse of a matrix, M Rnn multiplied by x(i) , the i-th column of M , returns
always returns a n n matrix, N , where
(
1, if j = i and k = i
Njk =
0, otherwise

Thus,

At(i) = T T 1 t(i) = T (i)


= t(i) i = i t(i)

Thus, At(i) = i t(i) where (t(i) , i ) are the eigenvector/eigenvalue pair of A.

3b Part b
Proof. Given that A is symmetric, A = U U 1 , U is orthogonal, and u(i) Rn is the
i-th column of T ,

Au(i) = U U T u(i)
= U U (1) u(i)

We can use the result we got from problem 3a and get that Au(i) = i u(i) , where (u(i) , i )
are the eigenvector/eigenvalue pair of A.

3c Part c
Proof. Given A Snn
+ and i is an eigenvalue of A,

xT Ax 0
xT U U T x 0
(xT U )(xT U )T 0
nn
Since is a diagonal matrix, S+ , which implies that i 0.

You might also like