0% found this document useful (0 votes)
22 views

sol14

Homework 14 for EECS 16B at UC Berkeley focuses on topics such as orthonormal matrices and logistic regression for classification. It includes reading assignments, problem-solving steps for linearization, and iterative algorithms for minimizing cost functions. The homework emphasizes understanding derivatives and their applications in machine learning contexts.

Uploaded by

npcsignupaccnt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

sol14

Homework 14 for EECS 16B at UC Berkeley focuses on topics such as orthonormal matrices and logistic regression for classification. It includes reading assignments, problem-solving steps for linearization, and iterative algorithms for minimizing cost functions. The homework emphasizes understanding derivatives and their applications in machine learning contexts.

Uploaded by

npcsignupaccnt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Homework 14 @ 2021-05-04 18:52:48-07:00

EECS 16B Designing Information Devices and Systems II


Spring 2021 UC Berkeley Homework 14
This homework is due on Sunday, May 2, 2021, at 11:00PM. Self-grades and
HW Resubmission are due on Tuesday, May 4, 2021, at 11:00PM.

1. Reading Lecture Notes


Staying up to date with lectures is an important part of the learning process in this course. Here are links to
the notes that you need to read for this week: Note 16, Note 17

(a) Write out an N × N orthonormal matrix U whose columns represent the DFT basis.
Solution:  
2π(0)(0) 2π(0)(1) 2π(0)(2) 2π(0)(N −1)
j j j j
 e 2π(1)(0)
N e N e N ··· e N

 j N 2π(1)(1) 2π(1)(2) 2π(N −1)
j j
 e e e ··· ej
N N N


1  2π(2)(0) 2π(2)(1) 2π(2)(2) 2π2(N −1) 
U=√  ej N ej N ej N ··· ej N 
N
.. .. .. .. ..


 2π(N. −1)(0)
 . . . . 

2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
ej N ej N ej N · · · ej N

2πkm
In other words, the kmth entry Ukm = √1 ej N
N

EECS 16B, Spring 2021, Homework 14 1


Homework 14 @ 2021-05-04 18:52:48-07:00

2. Linearization to help classification: discovering logistic regression and how to solve it


You can, in spirit, reduce the problem of linear multi-class classification to that of binary classification
by picking vectors that correspond to each of the categories “X” as compared with all the other examples
categorized into a hybrid synthetic category of “not-X”. This will give rise to vectors corresponding to each
category with the winner selected by seeing which one wins. However, we will focus here on the binary
problem since that is the conceptual heart of this approach.
As was discussed in lecture, the naive straightforward way of picking the decision boundary (by looking
at the mean example of each category and drawing the perpendicular bisector) is not always the best. The
included Jupyter Notebook includes synthetic examples that illustrate the different things that can happen so
that you can better appreciate the pathway that leads us to discover logistic regression as a natural approach
to solve this problem based on the conceptual building blocks that you have already seen.
It is no exageration to say that logistic regression is the default starting method for doing classification in
machine learning contexts, the same way that straightforward linear regression is the default starting method
for doing regression. A lot of other approaches can be viewed as being built on top of logistic regression.
Consequently, getting to logistic regression is a nice ending-point for this part of the 16AB story as pertains
to classification.
Let’s start by giving things some names. Consider trying to classify a set of measurements ~xi with given
labels `i . For the binary case of interest here, we will think of the labels as being “+” and “-”. For expository
convenience, and because we don’t want to have to carry it around separately, we will fold our threshold
implicitly into the weights by augmenting our given measurements with the constant “1” in the first position
of each ~xi . Now, the classification rule becomes simple. We want to learn a vector of weights w ~ so that we
can deem any point with ~x> i w
~ > 0 as being a member of the “+” category and anything with ~
x >w
i ~ < 0 as
being a member of the “-” category.
The way that we will do this, is to do a minimization in the spirit of least squares. Except, instead of
necessarily using some sort of squared loss function, we will just consider a generic cost function that can
depend on the label and the prediction for the point. For the i-th data point in our training data, we will incur
a cost c(~x>
i w,
~ `i ) for a total cost that we want to minimize as:

m
X
arg min ctotal (w)
~ = c(~x>
i w,
~ `i ) (1)
w
~
i=1

Because this can be a nonlinear function, our goal is to solve this iteratively as a sequence of least-squares
problems that we know how to solve.
Consider the following algorithm:
~ = ~0
1: w . Initialize the weights to ~0
2: while Not done do . Iterate towards solution
3: Compute w ~ > ~xi . Generate current estimated labels
4: Compute ddw~ c(w ~ > ~xi , `i ) . Generate derivatives with respect to w
~ of the cost for update step
2
5: Compute ddw~ 2 c(w ~ > ~xi , `i ) . Generate second derivatives of the cost for update step
6: ~ = LeastSquares(·, ·)
δw . We will derive what to call least squares on
7: w
~ =w ~ + δw~ . Update parameters
8: end while
9: Return w
~

EECS 16B, Spring 2021, Homework 14 2


Homework 14 @ 2021-05-04 18:52:48-07:00

The key step above is figuring out with what arguments to call LeastSquares while only having the labels `i
and the points ~xi .
When the function f~(~x, ~y ) : Rn × Rk → Rm takes in vectors and outputs a vector, the relevant derivatives
for linearization are also represented by matrices:
 ∂f ∂f1

1
···
 ∂x[1]
.. ..
∂x[n]
.. 
~
D~x f = 
 . . . .

∂fm ∂fm
∂x[1] ··· ∂x[n]
 ∂f1 ∂f1

∂y[1] ··· ∂y[k]
 . .. .. 
D~y f~ =  .
 . . . .

∂fm ∂fm
∂y[1] ··· ∂y[k]

where    
x[1] y[1]
 .   . 
~x =  .   . 
 .  ~y =  .  .
x[n] y[n]

Then, the linearization (first-order expansion) becomes

f~(~x, ~y ) ≈ f~(~x0 , ~y0 ) + D~x f~ · (~x − ~x0 ) + D~y f~ · (~y − ~y0 ) . (2)

(a) Now, suppose we wanted to approximate the cost for each data point

~ = c(~x>
ci (w) i w,
~ `i ) (3)

where  
w[1]
 . 
w
~ = . 
 . 
w[n]

in the neighborhood of a weight vector w ~ ∗ . Our goal is to write out the first-order expression for
approximating the cost function ci (w ~
~ ∗ + δw). This should be something in vector/matrix form like
you have seen for the approximation of nonlinear systems by linear systems. We don’t want to take
any second derivatives just yet — only first derivatives. We have outlined a skeleton for the derivation
with some parts missing. Follow the guidelines in each sub-section.
i) Comparing to eq. (2), we know that ci (w ~ ≈ ci (w
~ ∗ + δw) ~ ∗ ) + ddw~ ci (w ~ Write out the vector
~ ∗ )δw.
form of ddw~ ci (w~∗ ).
Solution:
d h i
ci (w~∗ ) = ∂c∂w[1]
i (w
~∗ )
··· ∂ci (w~∗ )
∂w[n]
dw~
ii) Write out the partial derivatives of ci (w)~ with respect to w[g], the g th component of w. ~ (HINT:
Use the linearity of derivatives and sums to compute the partial derivatives
Pn with respect to each of
w[g] ~
x >w~ =
the
P terms. Don’t forget the chain rule and the fact that i j=1 i [j]w[j] = xi [g]w[g] +
x
j6=g xi [j]w[j].)
Solution:

EECS 16B, Spring 2021, Homework 14 3


Homework 14 @ 2021-05-04 18:52:48-07:00

Using the hint, we calculate the partial derivative with respect to each w[g] term.
Using the chain rule,

d ∂
ci (w)[g]
~ = ci (w)
~
dw~ ∂w[g]

= c(~x>
i w,
~ `i )
∂w[g]
d ∂
= c(~x>
i w,
~ `i ) (~x> w)
~
>
d(~x w) ~ ∂w[g] i
 
n
∂  X
= c0 (~x>
i w,
~ `i ) xi [g]w[g] + xi [j]w[j]
∂w[g]
j6=g
0
=c (~x>
i w,
~ `i )xi [g]

Note that c0 (~x>


i w,
~ `i ) = d
d(x~i > w)
~
c(~x>
i w,
~ `i ).
d
iii) With what you had above, can you fill in the missing part to express the row vector dw~ ci (w)?
~

d
~ = c0 (~x>
ci (w) i w,
~ `i )
dw~
Solution:
d
~ = c0 (~x>
ci (w) ~ `i )~x>
i w, i
dw~
(b) Now, we want a better approximation that includes second derivatives. For a general function, we
would look for
f (~x0 + δx) ~ + 1 δx
~ ≈ f (~x0 ) + f 0 (~x0 )δx ~ > f 00 (~x0 )δx
~ (4)
2
where f 0 (~x0 ) is an appropriate row vector and, as you’ve seen in the note, f 00 (~x0 ) is called Hessian
that represents the second derivatives.
i) Comparing to eq. (4), we know that

d 2
ci (w ~ ≈ ci (w
~ ∗ + δw) ~ ∗) + ~ + 1 δw
ci (w~∗ )δw ~ > d ci (w~∗ )δw
~
dw~ 2 dw~2
d2
Write out the matrix form of dw
c (w~∗ ).
~2 i
Solution:  
∂ 2 ci (w~∗ ) ∂ 2 ci (w~∗ )
···
d2  ∂w[1]∂w[1] ∂w[1]∂w[n] 
ci (w~∗ ) = 
 .. .. .. 
. . .
dw~2

∂ 2 ci (w~∗ ) ∂ 2 ci (w~∗ )
 
∂w[n]∂w[1] ··· ∂w[n]∂w[n]

∂ ci (w)
~ 2
ii) Take the second derivatives of the cost ci (w),
~ i.e. solve for ∂w[g]∂w[h] .
(HINT: You should use the answer to part (a) and just take another derivative. Once again, use
the linearity of derivatives and sums to compute the partial derivatives with respect to each of the
∂2
w[h] terms. This will give you ∂w[g]∂w[h] . Don’t forget the chain rule and again use the fact that
>
Pn P
~xi w~ = j=1 xi [j]w[j] = xi [h]w[h] + j6=h xi [j]w[j].)

EECS 16B, Spring 2021, Homework 14 4


Homework 14 @ 2021-05-04 18:52:48-07:00

∂ 2 ci (w)
~
Solution: Proceeding in a similar manner as above, let us find ∂w[g]∂w[h] .

d2 ∂2
 
∂ d
ci (w)[g,
~ h] = ci (w)
~ = ci (w)[g]
~
dw~2 ∂w[g]∂w[h] ∂w[h] dw ~

= c0 (x~i > w,
~ `i )xi [g]
∂w[h]

= c00 (x~i > w,
~ `i ) (x~i > w)x
~ i [g]
∂w[h]
= c00 (~x>
i w,
~ `i )xi [g]xi [h]

d2
Note that c00 (~x>
i w,
~ `i ) = d(x~i > w)
~ 2
c(~x>
i w,
~ `i ).
iii) The expression in part (ii) is for the [g, h]-th component of the second derivative. 12 times this
~
times δw[g] ~
times δw[h] would give us that component’s contribution to the second-derivative term
in the approximation, and we have to sum this up over all g and h to get the total contribution of the
second-derivative term in the approximation. Now, we want to group terms to restructure this into
matrix-vector form by utilizing the outer-product form of matrix multiplication. What should the
space in the following expression be filled with?

d2
~ = c00 (~x>
ci (w) i w,
~ `i )
dw~2
Solution:
d2
~ = c00 (~x>
ci (w) ~ `i )~xi ~x>
i w, i
dw~2

~ Since we even-
Pmapproximation of ci (w
(c) Now we have successfully expressed the second order ~ ∗ + δw).
~ = i=1 ci (w),
tually want to minimize the total cost ctotal (w) ~ can you write out the second order
approximation of ctotal (w ~ using results from (a) and (b)?
~ ∗ + δw)
Solution: From previous parts, we get

d 2
ci (w ~ ≈ ci (w
~ ∗ + δw) ~ ∗) + ~ + 1 δw
ci (w~∗ )δw ~ > d ci (w~∗ )δw
~
dw ~ 2 ~2
dw
1 ~ > 00 >
~ ∗ ) + c0 (~x>
= ci (w i w ~∗ , `i )~x> ~
i δw + δw c (~ xi w~∗ , `i )~xi ~x> ~
i δw
2

Based on the linearity of derivatives, to get the second order approximation of ctotal (w ~ we just
~ ∗ + δw),
sum up the second order approximation for each ci (w ~
~ ∗ + δw):
m m  
X X 1 ~ > 00 >
ctotal (w ~ =
~ ∗ + δw) ci (w ~ ≈
~ ∗ + δw) ~ ∗ ) + c0 (~x>
ci (w i w
~ , `
∗ i i)~
x > ~
δw + δw c (~
x i w
~ , ` )~
x
∗ i i i ~
x > ~
δw
2
i=1 i=1

~ in form of C + Pm (~q> δw
(d) Now in this part, we want to re-write ctotal (w~∗ + δw) ~ − bi )2 .
i=1 i
i) Let’s first rewrite a general second order polynomial f (x) = ax2 + bx + c in the form of f (x) =
r + (px + q)2 . Find p, q, r in terms of a, b, c. This procedure is called "completing the square". Then,
use this to argue that
arg min ax2 + bx + c = arg min(px + q)2
x x

EECS 16B, Spring 2021, Homework 14 5


Homework 14 @ 2021-05-04 18:52:48-07:00

b2 √ b 2 √ b b2
Solution: ax2 + bx + c = c − 4a + ( ax + √
2 a
) . Therefore p = a, q = √
2 a
, r =c− 4a .
Since r is a constant, arg minx r + (px + q)2 = arg minx (px + q)2 , hence we have

arg min ax2 + bx + c = arg min(px + q)2


x x

~ in the form of C + Pm ~
ii) Now rewrite ctotal (w~∗ + δw) qi> δw
i=1 (~ − bi )2 . What are C, qi , and bi ?
Solution:

m  
~ ≈
X
0 > > ~ 1 ~ > 00 > > ~
ctotal (w
~ ∗ + δw) ~ ∗ ) + c (~xi w~∗ , `i )~xi δw + δw c (~xi w~∗ , `i )~xi ~xi δw
ci (w
2
i=1
  2  2 
m 0 >
c (x~i w~∗ , `i ) −c0 (x~i > w~∗ , `i )  
r
X 
ci (w~∗ ) −  1 00 > > ~
= 1 00 >
+ c (x~i w~ ∗ , `i )x~i δw − q  
2

i=1
 4 2 c (x~ i w
~ ,
∗ i ` ) >
2c00 (x~i w~∗ , `i )

  2   2 
m 0 > m
c (x~i w~∗ , `i )  X −c0 (x~i > w~∗ , `i )  
r
X   1 > > ~
= ci (w~∗ ) − + c00 (x~ w~ , ` )x~ δw −
i ∗ i i
  
>
q
 2c 00 (x~ w
~ , ` )   2 >

i=1 i ∗ i i=1 2c00 (x~i w~∗ , `i )

Comparing to the expected format we can see that


  
 2
m c0 (x~i > w~∗ , `i )  −c0 (x~i > w~∗ , `i )
r
X  1 00 >
C= ci (w~∗ ) − 2c00 (x~ > w~ , ` )  ~qi = c (x~i w~∗ , `i )x~i bi = q
 
2
i=1 i ∗ i 2c00 (x~i > w~∗ , `i )

(e) Consider a least squares problem:


2
x~? = arg min A~x − ~b
~
x

Show that:
m
2 X
A~x − ~b = ai > ~x − bi )2
(~
i=1

where  
− ~a>1 −
− ~a> −
 
2
A= .. .
.
 
 
− ~a>m −

Use this to intepret your expression from Part (d) as a standard least squares problem. What are
the rows of A?
2
Solution: A~x − ~b is by definition equal to the sum of all entries squared of the vector A~x − ~b.
2
Therefore A~x − ~b = m ai > ~x − bi )2 . Matching terms with our expression of ctotal (w ~ in
P
i=1 (~ ~ + δw)

EECS 16B, Spring 2021, Homework 14 6


Homework 14 @ 2021-05-04 18:52:48-07:00

Part (d), we get  q 


1 00 > >
 q 2 c (x~1 w~∗ , `1 )x~1 
 
1 00 > > 

2 c (x~2 w~∗ , `2 )x~2
A=
 
 .
..


q 
 
1 00 > >
2 c (x~m w ~∗ , `m )x~m
.
(f) Consider the following cost functions:
squared error: c+ 2 − 2
sq (p) = (p − 1) , csq (p) = (p + 1) ;
exponential: c+ −p − p
exp (p) = e , cexp (p) = e ;
and logistic: c+ −p , c− p

logistic (p) = ln 1 + e logistic (p) = ln(1 + e ).
Compute the first and second derivatives of the above expressions with respect to p.
Solution: First Derivatives:
d +
c (p) = 2(p − 1)
dp sq
d −
c (p) = 2(p + 1)
dp sq
d +
c (p) = −e−p
dp exp
d −
c (p) = ep
dp exp
d + e−p
clogistic (p) = −
dp 1 + e−p
d − ep
clogistic (p) =
dp 1 + ep

Notice that these are pretty cheap to compute, given that we have to compute the original loss functions
in the first place.
Second Derivatives:
d2 +
c (p) = 2
dp2 sq
d2 −
c (p) = 2
dp2 sq
d2 +
c (p) = e−p
dp2 exp
d2 −
c (p) = ep
dp2 exp
d2 + e−p
clogistic (p) =
dp2 (1 + e−p )2
d2 − ep
clogistic (p) =
dp2 (1 + ep )2

EECS 16B, Spring 2021, Homework 14 7


Homework 14 @ 2021-05-04 18:52:48-07:00

Notice that all of these second derivatives are positive. Moreover, calculating them takes essentially
no more work than getting the first derivatives. In particular, it is useful to note that

d2 + e−p d d
clogistic (p) = = | c+ logistic (p)|(1 − | c+ (p)|)
dp2 −p
(1 + e ) 2 dp dp logistic
d2 − ep d d
clogistic (p) = = | c− logistic (p)|(1 − | c− (p)|)
dp2 p
(1 + e ) 2 dp dp logistic

so the basic nature of the logistic loss’ second derivative becomes more clear. If the first derivative
has magnitude 12 , this second derivative is maximized — that happens when the prediction p is 0 or
maximally uncertain. The second derivative for logistic loss shrinks away from there.
When you take 70, this particular form p(1 − p) will start to ring a bell. That sound is the gateway to
understanding a different reason for why logistic regression is popular in practice — but for that, you
need to understand Probability. It is a glorious coincidence that something so natural from an optimiza-
tion point of view also turns out to have useful interpretations involving the language of probability.
You will understand this after 126.
(g) Run the Jupyter Notebook and answer the following questions.
i) In Example 2, why does mean classification fail?
Solution: The mean classifier misclassifies some points because there is some information in the
distribution that can’t be accurately captured by the mean of the data points of each category.
ii) In Example 3, for what data distributions does ordinary least squares fail?
Solution: When there are extreme outliers in the dataset, the decision boundary in this case will be
"pulled" away from the desired location, thus least squares would fail.
iii) Run the code cells in Example 4. By performing updates to w ~ according to what you derived
in previous parts of the question, how many iterations does it take for exponential and logistic
regression to converge?
Solution: You should see that the decision boundaries are almost fixed after 3-4 iterations. These
iterated least-squares approaches are very fast to converge in general. This is why in practice, logistic
or exponential regression costs almost the same to run as ordinary least squares.
Congratulations! You now know the basic optimization-theoretic perspective on logistic regression.
After you understand the probability material in 70 and 126, you can understand the probabilistic
perspective on it as well. After you understand the optimization material in 127, you will understand
even more about the optimization-theoretic perspective on the problem including why this approach
actually converges.

EECS 16B, Spring 2021, Homework 14 8


Homework 14 @ 2021-05-04 18:52:48-07:00

3. Extending Orthonormality to Complex Vectors


So far in the course, we have only dealt with real vectors. However, it is often useful to also think about com-
plex vectors, as you’ll soon see with the DFT. In this problem, we will extend several important properties
of orthonormal matrices to the complex case.
The main difference is that the normal Euclidean inner product is no longer a valid metric, and we must
define a new complex inner product as
k
X

h~u, ~v i = ~v T ~u = ~v ~u = ui vi ,
i=1

where the ∗ operation will complex conjugate and transpose its argument (order doesn’t matter), and is
aptly called the conjugate transpose. Note that for real numbers, the complex inner product simplifies to
the real inner product. In all the theorems you’ve seen in this class, you can replace every inner product
with the complex inner product to show an analogous result for complex vectors: least squares becomes
x̂ = (A∗ A)−1 A∗~b, upper triangularization becomes A = U T U ∗ , Spectral Theorem becomes A = U ΛU ∗ ,
SVD becomes A = U ΣV ∗ .
*" # " #+
1+j −3 − j
(a) To get some practice computing complex inner products, what is , and
2 2+j
*" # " #+
−3 − j 1+j
, ? Does the order of the vectors in the complex inner product matter i.e. is it
2+j 2
commutative?
Solution:
*" # " #+
1+j −3 − j
, = (1 + j)(−3 − j) + (2)(2 + j)
2 2+j
= (1 + j)(−3 + j) + 2(2 − j)
= −3 + j − 3j − 1 + 4 − 2j = −4j
*" # " #+
−3 − j 1+j
, = (−3 − j)(1 + j) + (2 + j)(2)
2+j 2
= (−3 − j)(1 − j) + (2 + j)(2)
= −3 + 3j − j − 1 + 4 + 2j = 4j

The two inner products are different so clearly the complex inner product is not commutative and the
order of the vectors matters. In fact, when you swap the arguments for the inner product, you will get
the complex conjugate result, which you see an example of above.
h i
(b) Let U = ~u1 · · · ~un be an n by n complex matrix, where its columns ~u1 , ~u2 , . . . , ~un form an
orthonormal basis for Cn , i.e. (
1 if i = j
~u∗i ~uj =
0 if i 6= j
Such a complex matrix is called unitary in math literature, to distinguish from real orthonormal matri-
ces. Show that U −1 = U ∗ , where U ∗ is the conjugate transpose of U .

EECS 16B, Spring 2021, Homework 14 9


Homework 14 @ 2021-05-04 18:52:48-07:00

Solution: By definition, U −1 U = I. We want to show that U ∗ satisfies it so let’s write down U ∗


first:
 
~u∗1
 . 
U∗ =  . 
 . , (5)
~u∗n

where ~u1 , . . . , ~un are the column vectors of U . Then, the entry at the i-th row and j-th column of U ∗ U
should be ~u∗i ~uj . If we write down the general form for each element of U ∗ U :
(
∗ ∗ 1 if i = j
(U U )ij = ~ui ~uj = ,
0 if i 6= j

which is the identity matrix, since ~u1 , . . . , ~un is an orthonormal basis.


Now for any square matrices A, B such that AB = I, right multiplying by B −1 gives ABB −1 =
IB −1 so A = B −1 . B −1 must exist since det(A) det(B) = det(I) 6= 0 so det(B) 6= 0. Thus since
we showed U ∗ U = I, then U ∗ = U −1 .
(c) Show that U preserves complex inner products, i.e. if ~v , w
~ are vectors of length n, then

h~v , wi
~ = hU~v , U wi
~ .

HINT: Note that (AB)∗ = B ∗ A∗ . This is since (AB)∗ = (AB)T = B T AT = B T AT = B ∗ A∗


Solution: For this question, we want to show that:

h~v , wi ~ ∗~v = hU~v , U wi


~ =w ~

Using the definition of complex inner products we can write:

hU~v , U wi ~ ∗ U~v
~ = (U w)

Using the form for the complex conjugate of a matrix-vector product as stated in the problem:

~ ∗ U~v = w
(U w) ~ ∗ U ∗ U~v

From the previous problem we know that U ∗ U = I. Therefore:

hU~v , U wi ~ ∗~v = h~v , wi.


~ =w ~

(d) Show that if ~u1 , . . . , ~un are the columns of a unitary matrix U , they must be linearly indepen-
dent.
~ = ni=1 αi ~ui , then first show that αi = hw,
P
(Hint: Suppose w ~ ~ui i. From here ask yourself whether a
nonzero linear combination of the {~ui } could ever be identically zero.)
This basic fact shows how orthogonality is a very nice special case of linear independence.
Solution: Suppose they are not linearly independent, then there exist α1 , . . . , αn ∈ C such that
~ = ni=1 αi ~ui = ~0, while at least one of αi is non-zero. We can then take the inner product of both
P
w
sides with ~uj , for all j:
* n + n
X X
hw,
~ ~uj i = αi ~ui , ~uj = αi h~ui , ~uj i = αj .
i=1 i=1

EECS 16B, Spring 2021, Homework 14 10


Homework 14 @ 2021-05-04 18:52:48-07:00

Since ~u1 , . . . , ~un form an orthonormal basis, we know that h~ui , ~uj i will be 1 when i = j and 0
otherwise, which is why only αj survives in the above summation. Since w ~ = ~0, then αj should be 0
for all inner products h~uj , wi.
~ However, this is a contradiction to our assumption that at least one of
the αi is non-zero. Therefore, ~u1 , . . . , ~un are linearly independent.
This confirms what we know — that orthonormality is a particularly robust guarantee of linear inde-
pendence.
(e) Now let V be another n × n matrix, where the columns of the matrix form an orthonormal basis for
Cn , i.e. V is unitary. Show that the columns of the product U V also form an orthonormal basis
for Cn . )
Solution: Since V is a unitary matrix, we have V ∗ V = I. To show that the columns of U V also
form an orthonormal basis, we could write down its conjugate transpose, (U V )∗ , and apply it to U V :

(U V )∗ (U V ) = V ∗ U ∗ U V = V ∗ V = I ,

which means the columns of U V form an orthonormal basis for Cn .


(f) We can also extend the idea of symmetric matrices to complex vectors, though we again will need to
replace the transpose with the conjugate transpose. If M = M ∗ , then M is called a Hermitian matrix,
and the Spectral Theorem will say it can be diagonalized by a unitary U , i.e. M = U ΛU ∗ , where Λ is
a diagonal matrix with the eigenvalues along the diagonal. Show that M has real eigenvalues.
HINT: Use the fact that M ∗ = M .
Solution: We first calculate M ∗

M ∗ = (U ΛU ∗ )∗ = U Λ∗ U ∗ .

Since M ∗ = M = U ΛU ∗ , this means that Λ∗ = Λ. The only case where this is true is when all the
elements of Λ are real, which means the eigenvalues of M are always real.

EECS 16B, Spring 2021, Homework 14 11


Homework 14 @ 2021-05-04 18:52:48-07:00

4. Roots of Unity
An N th root of unity is a complex number ω satisfying the equation ω N = 1 (or equivalently ω N − 1 = 0).
In this problem we explore some properties of the roots of unity, as they end up being essential for the DFT.

(a) Show that the polynomial z N − 1 factors as


 
N
X −1
z N − 1 = (z − 1)  zk  .
k=0

Solution:  
N
X −1 N
X N
X −1
(z − 1)  zk  = zk − zk = zN − z0 = zN − 1
k=0 k=1 k=0


k = ej N k for k ∈ Z is an N -th root of unity. From
(b) Show that any complex number of the form ωN

here on, let ωN = ej N .
Solution:  2π N
kN
ωN = ej N k = ej2πk = e0 = 1

This means that the N numbers ωN 0 , ω 1 , ω 2 , . . . , ω N −1 are the solutions to the equation z N = 1 and
N N N
hence roots of the polynomial z N − 1.
(c) For a given integer N ≥ 2, using the previous parts, give the complex roots of the polynomial
1 + z + z 2 + ... + z N −1 .
Solution: (z − 1)(1 + z + z 2 + ... + z N −1 ) = z N − 1, and we just showed that the roots of z N − 1
are the ω k = ej2πk for k = 0, . . . , N − 1, therefore the roots of z 7→ 1 + z + z 2 + ... + z N −1 are the
ωNk = ej 2πk
N for k = 1, . . . , N − 1.

We can see this by factoring and matching factors together. A polynomial with a leading coefficient
of 1 can be factored into its roots. So z N − 1 = (z − ωN 0 )(z − ω 1 )(z − ω 2 ) · · · (z − ω N −1 ) =
QN −1 N N N
k ) = (z − 1)
Q N −1 k ). Dividing both sides by z − 1 gives us what we want.
k=0 (z − ωN k=1 (z − ω N
(d) What are the fourth roots of unity? Draw the fourth roots of unity in the complex plane. Where
do they lie in relation to the unit circle?
π
Solution: Using the formula for the roots of unity from part (b), ω40 = e0 = 1, ω41 = ej 2 = j, ω42 =
2π 3π
ej 2 = −1, ω43 = ej 2 = −j. All roots of unity must be on the unit circle since they have magnitude
1. From the definition of the roots of unity, we know that each root of unity is 2π/N = π/2 radians
apart, and this can be seen graphicaly in the plot below.

EECS 16B, Spring 2021, Homework 14 12


Homework 14 @ 2021-05-04 18:52:48-07:00

(e) What are the fifth roots of unity? Draw the fifth roots of unity in the complex plane.

Solution: Using the formula for the roots of unity from part (b), ω50 = e0 = 1, ω51 = ej 5 , ω52 =
4π 6π 8π
ej 5 , ω53 = ej 5 , ω54 = ej 5 . Again, we know that each root of unity should be 2π/5 radians or 72◦
apart by the definition, and it can be seen graphically below.


(f) For N = 5, ω5 = ej 5 , simplify ω542 such that the exponent is less than 5 and greater than 0.
Solution:
ω542 = ω58·5 ω52 = (ω55 )8 ω52 = 18 ω52 = ω52
Every 5 powers of the 5th root of unity will be 1, so it simplifies to the remainder of 42 divided by 5.
k+N k for all integers k, both
(g) Let’s generalize what you saw in the previous part. Prove that ωN = ωN
positive and negative. This shows that the roots of unity have a periodic structure.

EECS 16B, Spring 2021, Homework 14 13


Homework 14 @ 2021-05-04 18:52:48-07:00

Solution:
k+N k N k k
ωN = ωN ωN = ωN · 1 = ωN

(h) What is the complex conjugate of ω5 in terms of the 5th roots of unity? What is the complex
conjugate of ω542 in terms of the 5th roots of unity? What is the complex conjugate of ω54 in terms
of the 5th roots of unity?
Solution: Using part (g),

ω̄5 = ω5−1 = ω54


ω̄542 = ω5−42 = ω53
ω̄54 = ω5−4 = ω5

Notice here that we can think about going around the circle of roots of unity, since they are periodic
and wrap around to where they started.
This is something called modulo arithmetic and is naturally connected to cycles like this. It is tradition-
ally viewed as taking the remainder (in this case after dividing by N = 5), however some people get
confused when asked to take the remainder of a negative number divided by a positive one. Here, we
just remember that we can turn a negative number between −(N − 1) and −1 into a positive number
by adding N to it.
You will learn a lot more about modulo arithmetic in CS 70 — here in 16B, you just get a teaser
because it emerges naturally when thinking about the roots of unity and their powers.
PN −1 km
(i) Compute m=0 ωN where ωN is an N th root of unity. Does the answer make sense in terms of
the plot you drew?
k = 1, then this is easy: we have
Solution: If ωN
N
X −1 N
X −1
0
ωN = 1 = N.
m=0 m=0

k 6= 1. Consequently, we can
This happens whenever k is a multiple of N . For all other k, we know ωN
use the formula we found in part (a) to write
N −1 kN − 1
X
km ωN
ωN = =0
ωN − 1
m=0

since ωN is a root of unity and so ωNN = 1 and so ω kN = 1 as well. This makes intuitive sense because
N
all the roots of unity are spaced evenly around the circle. Therefore summing them up, we have to get
zero by symmetry.

EECS 16B, Spring 2021, Homework 14 14


Homework 14 @ 2021-05-04 18:52:48-07:00

5. Discrete Fourier Transform (DFT)


In order to get practice with calculating the Discrete Fourier Transform (DFT), this problem will have you
calculate the DFT for a few variations on a cosine signal.
Consider a sampled signal that is a function of discrete time x[t]. We can represent it as a vector of discrete
samples over time ~x, of length N .
h iT
~x = x[0] . . . x[N − 1] (6)

We can represent the DFT basis with the matrix U , and it is given by
 
1 1 1 ··· 1
2π 2π(2) 2π(N −1)
ej N ej N ··· ej N
 
1 
1   2π(2)
j N
2π(4)
j N j
2π2(N −1)

U = √ 1 e e ··· e N .

N  ..
 .. .. .. 
 . . . . 

2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
1 ej N ej N · · · ej N

2πij
In other words, for the ij th entry of U , Uij = √1 ej N . From this, we can see that the DFT basis matrix
N
is symmetric, so U = U T . Another very important property of the DFT basis is that U is orthonormal, so
U ∗ U = I. We want to find the coordinates of ~x in the DFT basis, and we know these coordinates are given
by
X~ = U −1 ~x.

We call the components of X ~ the DFT coefficients of the time-domain signal ~x. We can think of the compo-
~
nents of X as weights that represent ~x in the DFT basis. As we will explore in the problem, each coefficient
can be thought of as a measurement for which frequency is present in the signal.
You can use Numpy or other calculation tools to evaluate cosines or do matrix multiplication, but you will
not get credit if you directly calculate the DFT using a function in Numpy. You must show your work to get
credit.

(a) What is U −1 ?
Solution: Since U is orthonormal, U −1 = U ∗ which means we transpose and complex conjugate U .
However, note that U is symmetric, so we just need to complex conjugate the matrix, meaning every
exponent of the exponential becomes negative. Thus,
 
1 1 1 ··· 1
2π 2π(2) 2π(N −1)
e−j N e−j N ··· e−j N
 
1 
−1 ∗ 1 
−j N
2π(2)
−j
2π(2)(2)
−j
2π2(N −1)

U = U = √ 1 e e N ··· e N


N  ..
 .. .. .. 
. . . . 

2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
−j −j −j
1 e N e N ··· e N

h i
(b) Let the columns of U = ~u0 ~u1 . . . ~uN −1 . Prove that ~uk = ~uN −k for k = 1, 2, . . . , N − 1.
Solution:
2πmk 2πmk
~uk [m] = ej N = e−j N (7)

EECS 16B, Spring 2021, Homework 14 15


Homework 14 @ 2021-05-04 18:52:48-07:00

Using the fact that any multiple of 2π won’t affect the exponent,
2πmk 2πmk
e−j N = ej(2πm− N
)
(8)
2πm(N −k)
j
=e N = ~uN −k [m] (9)

Since this holds for all m, then ~uk = ~uN −k when k = 1, . . . , N − 1. It doesn’t hold for other k since
 N − k for the columns of U wouldn’t be valid.
then the indices of k and

(c) Decompose cos 7 n into a sum of complex exponentials.
Solution: From the inverse Euler formula,
 
2π 1 2πn 1 2πn
cos n = ej 7 + e−j 7
7 2 2

  2πn 2πn
(d) If x1 [n] = cos 7 , write x1 [n] as a linear combination of y+ [n] = ej 7 and y− [n] = e−j 7 .
2πn

Solution: Note that we can replace both complex exponentials with the given y functions, so
1 1
x1 [n] = y+ [n] + y− [n]
2 2

(e) Now think of ~x1 as a length N = 7 vector which is x1 [n] sampled at n = 0, h1, . . . , 6. We do the
i same
to similarly get ~y+ , ~y− . Write ~y+ and ~y− in terms of the columns of U = ~u0 ~u1 . . . ~u6 .
HINT: Use part (b).
2πn
Solution: We can notice that ~u1 is exactly √1N ej 7 evaluated at the 7 sample points. This is exactly
√ √ 2πn
~y+ up to scale, so ~y+ = N~u1 = 7~u1 . From part (b), we know that ~u6 = ~u1 which is √1N e−j 7
√ √
evaluated at the sample points. This is exactly a scaling of ~y− , so ~y− = N~u6 = 7~u6 .
(f) Using the last 3 parts, compute the DFT coefficients X ~ 1 for signal ~x1 .
√ √
7 7
Solution: Combining the last 3 parts, we know ~x1 = 2 ~u1 + 2 ~u6 . We also know that since the
columns of U are ~u0 , . . . , ~u6 , then
 
1 1 1 ··· 1
 
2π 2π(2) 2π(N −1)
~u∗0
−j −j
··· e−j ~u∗1
  
1 e N e N N   

1  
−j N
2π(2)
−j
2π(2)(2) 2π2(N −1)
 
~u∗2
∗ e−j N

U = √ 1 e e N ··· =
   (10)
N .
 .. . . ..
 ..
.. ..
 
.

.   

 
2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
1 e−j N e−j N ··· e−j N
~uN −1

EECS 16B, Spring 2021, Homework 14 16


Homework 14 @ 2021-05-04 18:52:48-07:00

Then to find our DFT coefficients,


~ 1 = U ∗ ~x1
X (11)
 
~u∗
 0∗  √ √
~u1  7 7
=  . (
  ~u1 + ~u6 ) (12)
.
. 2 2
~u∗6
     
0 0 0
√ 
0 1  7
      2 
√ 0 √ 0  0 
   
7  7   
= 0 + 0 =  0 (13)
2   2     
0 0  0 
   
0 0  √0 
 

1 0 7
2

In the last equation, we use the orthonormality property of the DFT basis vectors, i.e. that
(
∗ 1 i=j
~ui uj = (14)
0 i 6= j.

Equivalently, we can write the result as


√
 7 k = 1, 6
X1 [k] = 2 (15)
0 k 6= 1, 6.

(g) Plot the time domain representation of x1 [n]. Plot the magnitude, |X1 [n]|, and plot the phase,
∠X1 [n], for the DFT representation X ~ 1 . You should notice that the DFT coefficients correspond to
which frequencies were present in the original signal, which is why the DFT coefficients are called the
frequency domain representation.
Solution:
Time Domain
1
x1 [n]

0.5

n
−1 1 2 3 4 5 6 7

−0.5

−1

EECS 16B, Spring 2021, Homework 14 17


Homework 14 @ 2021-05-04 18:52:48-07:00

Frequency Domain Magnitude


4
|X1 [k]|

k
−1 1 2 3 4 5 6 7

−1
Frequency Domain Phase
3 ∠X [k]
1

1
k
−1 1 2 3 4 5 6 7
−1

−2

−3
~ 1 in this case are all real-valued, the phase is zero-valued.
Because the coefficients X
 
(h) We define x2 [n] = cos 4π7 n for N = 7 samples n ∈ {0, 1, . . . , 6}. Compute the DFT coefficients
~ 2 for signal ~x2 .
X
Solution: Following from the above, we can write ~x2 , the vector of samples of the function, in terms
of the columns of our DFT basis matrix U :
 
2π(2) 1 2π(2) 1 2π(2)
x2 [n] = cos n = ej 7 n + e−j 7 n (16)
7 2 2

1 √ √ 7
~x2 = ( N u~2 + N u~2 ) = (u~2 + u~5 ) (17)
2 2

From this, we can see that our DFT coefficients vector X~ 2 will only have non-zero values for rows
~
k = 2 and k = 5. The elements of X2 can be written as
√
 7 k = 2, 5
X2 [k] = 2 (18)
0 k 6= 2, 5.

EECS 16B, Spring 2021, Homework 14 18


Homework 14 @ 2021-05-04 18:52:48-07:00

(i) Plot the time domain representation of x2 [n]. Plot the magnitude, |X2 [n|, and plot the phase,
~ 2.
∠X2 [n], for the DFT representation X
Solution:
Time Domain
1
x[n]

0.5

n
−1 1 2 3 4 5 6 7

−0.5

−1
Frequency Domain Magnitude
4
|X[k]|

k
−1 1 2 3 4 5 6 7

−1
Frequency Domain Phase
3 ∠X[k]

1
k
−1 1 2 3 4 5 6 7
−1

−2

−3

(j) To generalize this result, say we have some p ∈ {1, 2, 3} which scales the frequency of our signal ~xp ,

EECS 16B, Spring 2021, Homework 14 19


Homework 14 @ 2021-05-04 18:52:48-07:00

 
which we define as xp [n] = cos 2π 7 pn for N = 7 samples n ∈ {0, 1, . . . , 6}. Compute the DFT
~ p for signal ~xp in terms of this scalar p.
coefficients X
Solution: As in the last two problems, we can represent ~xp in terms of the columns of the DFT basis
matrix:
 
2π(p) 1 2π(p) 1 2π(p)
xp [n] = cos n = ej 7 n + e−j 7 n (19)
7 2 2

1 √ √ 7
~xp = ( 7u~p + 7 ~up ) = (u~p + ~u7−p ) (20)
2 2
~ p will occur at k = p and k = 7 − p:
The only nonzero entries in X
√
 7 k = p, 7 − p
Xp [k] = 2 (21)
0 k 6= p, 7 − p.

h iT
(k) Let’s see what happens when we have an even number of samples. We define ~s = 1 0 1 0 1 0 ,
which has N = 6 samples. Compute the DFT coefficients S ~ for signal ~s.
 
Hint: Write ~s as a+b cos 2π
6 pn for some constants a, b, p, and use the fact that this signal has period
2.
Solution: We have N = 6 samples, which we will denote n ∈ {0, 1, . . . , 5}. The signal repeats after
a period T = 2, so the cosine function should also have period 2. This will occur when p = 3. Then
to account for the scaling and shifting of the cosine from [−1, 1] to [0, 1], we need a = 12 , b = 21 . This
gives
 
1 1 2π
s[n] = + cos (3)n (22)
2 2 6

We can decompose s[n] as


1 1 2π(3) 1 2π(3)
s[n] = e(0)n + e−j 6 n + ej 6 n (23)
2 4 4
1√ 1 √ √
~s = N u~0 + ( N u~3 + N ~u3 ) (24)
2√ √4
6 6
= u~0 + (u~3 + ~u6−3 ) (25)
√2 √4
6 6
= u~0 + u~3 . (26)
2 2

From part (b), ~u3 = ~u3 . Therefore both of the ~u3 and ~u3 terms are added together and contribute to
the k = N2 = 3rd term of S. ~ Additionally, we have a term from k = 0 because of the constant offset
of the signal with 0 frequency. Thus,
√
 6 k = 0, 3.
S[k] = 2 (27)
0 k 6= 0, 3.

EECS 16B, Spring 2021, Homework 14 20


Homework 14 @ 2021-05-04 18:52:48-07:00

Having ~u3 = ~u3 shows us an interesting result of using an even number of samples. When N is even,
the ~u N column will be entirely real because
2

2π N
u N [l] = e−j N 2
l
= e−jπl ,
2

and so the entries will alternate between −1 and 1.

EECS 16B, Spring 2021, Homework 14 21


Homework 14 @ 2021-05-04 18:52:48-07:00

6. Survey
Please fill out the survey here. As long as you submit it, you will get credit. Thank you!

7. Homework Process and Study Group


Citing sources and collaborators are an important part of life, including being a student!
We also want to understand what resources you find helpful and how much time homework is taking, so we
can change things in the future if possible.

(a) What sources (if any) did you use as you worked through the homework?
(b) If you worked with someone on this homework, who did you work with?
List names and student ID’s. (In case of homework party, you can also just describe the group.)

Contributors:

• Kourosh Hakhamaneshi.

• Kuan-Yun Lee.

• Nathan Lambert.

• Sidney Buchbinder.

• Gaoyue Zhou.

• Anant Sahai.

• Ashwin Vangipuram.

• John Maidens.

• Geoffrey Négiar.

• Yen-Sheng Ho.

• Harrison Wang.

• Regina Eckert.

EECS 16B, Spring 2021, Homework 14 22

You might also like