0% found this document useful (0 votes)
7 views

Lec7matrixnorm Part3

The document discusses low-rank matrix approximation and its applications. It states that the best rank-k approximation of a matrix A under the Frobenius and spectral norms is its truncated SVD Ak, which is the approximation that minimizes the error. It also provides an example of computing the best rank-1 approximation of a sample matrix. Additionally, it covers orthogonal best-fit subspace problems and proves that the optimal solution is given by the top k right singular vectors of the centered data matrix.

Uploaded by

Somnath Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lec7matrixnorm Part3

The document discusses low-rank matrix approximation and its applications. It states that the best rank-k approximation of a matrix A under the Frobenius and spectral norms is its truncated SVD Ak, which is the approximation that minimizes the error. It also provides an example of computing the best rank-1 approximation of a sample matrix. Additionally, it covers orthogonal best-fit subspace problems and proves that the optimal solution is given by the top k right singular vectors of the centered data matrix.

Uploaded by

Somnath Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Matrix norm and low-rank approximation

Low-rank approximation of matrices


Problem. For any matrix A ∈ Rn×d and integer k ≥ 1, find the rank-k matrix
B that is the closest to A (under a given norm such as Frobenius, or spectral):

min kA − Bk
B∈Rn×d : rank(B)=k

Remark. This problem arises in a number of tasks, e.g.,

• Orthogonal least squares fitting

• Data compression (and noise reduction)

• Recommender systems

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 25/49
Matrix norm and low-rank approximation

Theorem 0.6 (Eckart–Young–Mirsky). Given A ∈ Rn×d and 1 ≤ k ≤ rank(A),


Pk
let Ak be the truncated SVD of A with the largest k terms: Ak = i=1 σi ui viT .
Then Ak is the best rank-k approximation to A in terms of both the Frobenius
and spectral norms:2
sX
min kA − BkF = kA − Ak kF = σi2
B : rank(B)=k
i>k

min kA − Bk2 = kA − Ak k2 = σk+1 .


B : rank(B)=k

Remark. The theorem still holds true if the equality constraint rank(B) = k is
relaxed to rank(B) ≤ k (which will also include all the lower-rank matrices).

2
Proof available at https://en.wikipedia.org/wiki/Low-rank_approximation

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 26/49
Matrix norm and low-rank approximation

Example 0.5. For the matrix


 
1 −1
X = 0 1 ,
 
1 0

the best rank-1 approximation is


 2   
√ 1 −1
√  61   1 
T
X1 = σ1 u1 v1 = 3 − √6  √2 − √12  1
= − 2 1 
2 
.
1
√1
2 − 12
6

In this problem, the approximation error under either norm (spectral or Frobenius)
is the same: kX − X1 k = σ2 = 1.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 27/49
Matrix norm and low-rank approximation

Applications of low-rank approximation

• Orthogonal least-squares fitting

• Image compression

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 28/49
Matrix norm and low-rank approximation

Orthogonal Best-Fit Subspace


b
Problem: Given data x1 , . . . , xn ∈ Rd
and an integer 0 < k < d, find the k-D b b
orthogonal “best-fit” plane by solving xi b

n
X b b
min kxi − PS (xi )k22 b
S b
i=1
b PS (xi)
Remark. This problem is different from
b
ordinary linear regression: b
b
• No predictor-response distinction

• Orthogonal (not vertical) fitting


S b b

errors b

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 29/49
Matrix norm and low-rank approximation

b
Theorem 0.7. An orthogonal best-fit
k-dimensional plane to the data X = b b
[x1 , . . . , xn ]T ∈ Rn×d is given by b

x = x̄ + Vk · α b b

where x̄ is the center of the data set


x̄ b
b v2 b
1X
x̄ = xi b
n v1
b
and Vk = [v1 . . . vk ] is a d × k ma- b
trix whose columns are the top k right
singular vectors of the centered data
S b b

matrix b

e = [x1 − x̄, . . . , xn − x̄]T = X−1x̄T .


X
+0
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 30/49
Matrix norm and low-rank approximation

b
Proof. Suppose an arbitrary k-
dimensional plane S is used to fit the b b
data, with a fixed point m ∈ Rd , and xi b
an orthonormal basis b b PS (xi)
b
B = [b1 , . . . , bk ] ∈ Rd×k . b
b
That is, b2 b
m b
b
BT B = Ik , b1 b
BBT : orthogonal projection onto S
S b b
The projection of each data point xi
b
onto the candidate plane is
+0
PS (xi ) = m + BBT (xi − m).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 31/49
Matrix norm and low-rank approximation

Accordingly, we may rewrite the original problem as


n
X
min kxi − m − BBT (xi − m)k2
m∈Rd , B∈Rd×k
i=1
BT B=Ik

Using multivariable calculus, we can show that for any fixed B an optimal m is
1 X def
m∗ = xi = x̄.
n
Plugging in x̄ for m and letting x̃i = xi − x̄ gives that
X
min kx̃i − BBT x̃i k2 .
B

In matrix notation, this becomes


T 2 e = [x̃1 , . . . , x̃n ]T ∈ Rn×d .
min kX
e − XBB
e kF , where X
B

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 32/49
Matrix norm and low-rank approximation

Let the full SVD of the centered data matrix X


e be

e = UΣVT
X

Denote by X
e k the best rank-k approximation of X:
e

e k = U k Σk V T .
X k

Then the minimum is attained when


T e k,
XBB
e =X

and a minimizer is the matrix consisting of the top k right singular vectors of X,
e
i.e.,
B = Vk ≡ V(:, 1 : k).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 33/49
Matrix norm and low-rank approximation

Verify: If B = Vk , then
T e k VT
XBB
e = XV k
e 1 , . . . , vk ]VT
= X[v k
= [σ1 u1 , . . . , σk uk ]VkT
= [u1 , . . . , uk ] diag(σ1 , . . . , σk )VkT
= Uk Σk VkT
e k.
=X

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 34/49
Matrix norm and low-rank approximation

Proof of m∗ = x̄:

First, rewrite the above objective function as


n
X n
X
g(m) = kxi − m − BBT (xi − m)k2 = k(I − BBT )(xi − m)k2
i=1 i=1

and apply the formula



kAxk2 = 2AT Ax
∂x
to find its gradient:
X
∇g(m) = − 2(I − BBT )T (I − BBT )(xi − m)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 35/49
Matrix norm and low-rank approximation

Note that I−BBT is also an orthogonal projection matrix (onto the complement).
Thus,
(I − BBT )T (I − BBT ) = (I − BBT )2 = I − BBT .
It follows that
X X 
∇g(m) = − 2(I − BBT )(xi − m) = −2(I − BBT ) xi − nm

Any minimizer m must satisfy


X 
2(I − BBT ) xi − nm = 0

This equation has infinitely many solutions, but the simplest one is
X 1X
xi − nm = 0 −→ m= xi .
n

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 36/49

You might also like