Lec7matrixnorm Part3
Lec7matrixnorm Part3
min kA − Bk
B∈Rn×d : rank(B)=k
• Recommender systems
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 25/49
Matrix norm and low-rank approximation
Remark. The theorem still holds true if the equality constraint rank(B) = k is
relaxed to rank(B) ≤ k (which will also include all the lower-rank matrices).
2
Proof available at https://en.wikipedia.org/wiki/Low-rank_approximation
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 26/49
Matrix norm and low-rank approximation
In this problem, the approximation error under either norm (spectral or Frobenius)
is the same: kX − X1 k = σ2 = 1.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 27/49
Matrix norm and low-rank approximation
• Image compression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 28/49
Matrix norm and low-rank approximation
n
X b b
min kxi − PS (xi )k22 b
S b
i=1
b PS (xi)
Remark. This problem is different from
b
ordinary linear regression: b
b
• No predictor-response distinction
errors b
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 29/49
Matrix norm and low-rank approximation
b
Theorem 0.7. An orthogonal best-fit
k-dimensional plane to the data X = b b
[x1 , . . . , xn ]T ∈ Rn×d is given by b
x = x̄ + Vk · α b b
matrix b
b
Proof. Suppose an arbitrary k-
dimensional plane S is used to fit the b b
data, with a fixed point m ∈ Rd , and xi b
an orthonormal basis b b PS (xi)
b
B = [b1 , . . . , bk ] ∈ Rd×k . b
b
That is, b2 b
m b
b
BT B = Ik , b1 b
BBT : orthogonal projection onto S
S b b
The projection of each data point xi
b
onto the candidate plane is
+0
PS (xi ) = m + BBT (xi − m).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 31/49
Matrix norm and low-rank approximation
Using multivariable calculus, we can show that for any fixed B an optimal m is
1 X def
m∗ = xi = x̄.
n
Plugging in x̄ for m and letting x̃i = xi − x̄ gives that
X
min kx̃i − BBT x̃i k2 .
B
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 32/49
Matrix norm and low-rank approximation
e = UΣVT
X
Denote by X
e k the best rank-k approximation of X:
e
e k = U k Σk V T .
X k
and a minimizer is the matrix consisting of the top k right singular vectors of X,
e
i.e.,
B = Vk ≡ V(:, 1 : k).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 33/49
Matrix norm and low-rank approximation
Verify: If B = Vk , then
T e k VT
XBB
e = XV k
e 1 , . . . , vk ]VT
= X[v k
= [σ1 u1 , . . . , σk uk ]VkT
= [u1 , . . . , uk ] diag(σ1 , . . . , σk )VkT
= Uk Σk VkT
e k.
=X
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 34/49
Matrix norm and low-rank approximation
Proof of m∗ = x̄:
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 35/49
Matrix norm and low-rank approximation
Note that I−BBT is also an orthogonal projection matrix (onto the complement).
Thus,
(I − BBT )T (I − BBT ) = (I − BBT )2 = I − BBT .
It follows that
X X
∇g(m) = − 2(I − BBT )(xi − m) = −2(I − BBT ) xi − nm
This equation has infinitely many solutions, but the simplest one is
X 1X
xi − nm = 0 −→ m= xi .
n
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 36/49