sol14
sol14
(a) Write out an N × N orthonormal matrix U whose columns represent the DFT basis.
Solution:
2π(0)(0) 2π(0)(1) 2π(0)(2) 2π(0)(N −1)
j j j j
e 2π(1)(0)
N e N e N ··· e N
j N 2π(1)(1) 2π(1)(2) 2π(N −1)
j j
e e e ··· ej
N N N
1 2π(2)(0) 2π(2)(1) 2π(2)(2) 2π2(N −1)
U=√ ej N ej N ej N ··· ej N
N
.. .. .. .. ..
2π(N. −1)(0)
. . . .
2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
ej N ej N ej N · · · ej N
2πkm
In other words, the kmth entry Ukm = √1 ej N
N
m
X
arg min ctotal (w)
~ = c(~x>
i w,
~ `i ) (1)
w
~
i=1
Because this can be a nonlinear function, our goal is to solve this iteratively as a sequence of least-squares
problems that we know how to solve.
Consider the following algorithm:
~ = ~0
1: w . Initialize the weights to ~0
2: while Not done do . Iterate towards solution
3: Compute w ~ > ~xi . Generate current estimated labels
4: Compute ddw~ c(w ~ > ~xi , `i ) . Generate derivatives with respect to w
~ of the cost for update step
2
5: Compute ddw~ 2 c(w ~ > ~xi , `i ) . Generate second derivatives of the cost for update step
6: ~ = LeastSquares(·, ·)
δw . We will derive what to call least squares on
7: w
~ =w ~ + δw~ . Update parameters
8: end while
9: Return w
~
The key step above is figuring out with what arguments to call LeastSquares while only having the labels `i
and the points ~xi .
When the function f~(~x, ~y ) : Rn × Rk → Rm takes in vectors and outputs a vector, the relevant derivatives
for linearization are also represented by matrices:
∂f ∂f1
1
···
∂x[1]
.. ..
∂x[n]
..
~
D~x f =
. . . .
∂fm ∂fm
∂x[1] ··· ∂x[n]
∂f1 ∂f1
∂y[1] ··· ∂y[k]
. .. ..
D~y f~ = .
. . . .
∂fm ∂fm
∂y[1] ··· ∂y[k]
where
x[1] y[1]
. .
~x = . .
. ~y = . .
x[n] y[n]
f~(~x, ~y ) ≈ f~(~x0 , ~y0 ) + D~x f~ · (~x − ~x0 ) + D~y f~ · (~y − ~y0 ) . (2)
(a) Now, suppose we wanted to approximate the cost for each data point
~ = c(~x>
ci (w) i w,
~ `i ) (3)
where
w[1]
.
w
~ = .
.
w[n]
in the neighborhood of a weight vector w ~ ∗ . Our goal is to write out the first-order expression for
approximating the cost function ci (w ~
~ ∗ + δw). This should be something in vector/matrix form like
you have seen for the approximation of nonlinear systems by linear systems. We don’t want to take
any second derivatives just yet — only first derivatives. We have outlined a skeleton for the derivation
with some parts missing. Follow the guidelines in each sub-section.
i) Comparing to eq. (2), we know that ci (w ~ ≈ ci (w
~ ∗ + δw) ~ ∗ ) + ddw~ ci (w ~ Write out the vector
~ ∗ )δw.
form of ddw~ ci (w~∗ ).
Solution:
d h i
ci (w~∗ ) = ∂c∂w[1]
i (w
~∗ )
··· ∂ci (w~∗ )
∂w[n]
dw~
ii) Write out the partial derivatives of ci (w)~ with respect to w[g], the g th component of w. ~ (HINT:
Use the linearity of derivatives and sums to compute the partial derivatives
Pn with respect to each of
w[g] ~
x >w~ =
the
P terms. Don’t forget the chain rule and the fact that i j=1 i [j]w[j] = xi [g]w[g] +
x
j6=g xi [j]w[j].)
Solution:
Using the hint, we calculate the partial derivative with respect to each w[g] term.
Using the chain rule,
d ∂
ci (w)[g]
~ = ci (w)
~
dw~ ∂w[g]
∂
= c(~x>
i w,
~ `i )
∂w[g]
d ∂
= c(~x>
i w,
~ `i ) (~x> w)
~
>
d(~x w) ~ ∂w[g] i
n
∂ X
= c0 (~x>
i w,
~ `i ) xi [g]w[g] + xi [j]w[j]
∂w[g]
j6=g
0
=c (~x>
i w,
~ `i )xi [g]
d
~ = c0 (~x>
ci (w) i w,
~ `i )
dw~
Solution:
d
~ = c0 (~x>
ci (w) ~ `i )~x>
i w, i
dw~
(b) Now, we want a better approximation that includes second derivatives. For a general function, we
would look for
f (~x0 + δx) ~ + 1 δx
~ ≈ f (~x0 ) + f 0 (~x0 )δx ~ > f 00 (~x0 )δx
~ (4)
2
where f 0 (~x0 ) is an appropriate row vector and, as you’ve seen in the note, f 00 (~x0 ) is called Hessian
that represents the second derivatives.
i) Comparing to eq. (4), we know that
d 2
ci (w ~ ≈ ci (w
~ ∗ + δw) ~ ∗) + ~ + 1 δw
ci (w~∗ )δw ~ > d ci (w~∗ )δw
~
dw~ 2 dw~2
d2
Write out the matrix form of dw
c (w~∗ ).
~2 i
Solution:
∂ 2 ci (w~∗ ) ∂ 2 ci (w~∗ )
···
d2 ∂w[1]∂w[1] ∂w[1]∂w[n]
ci (w~∗ ) =
.. .. ..
. . .
dw~2
∂ 2 ci (w~∗ ) ∂ 2 ci (w~∗ )
∂w[n]∂w[1] ··· ∂w[n]∂w[n]
∂ ci (w)
~ 2
ii) Take the second derivatives of the cost ci (w),
~ i.e. solve for ∂w[g]∂w[h] .
(HINT: You should use the answer to part (a) and just take another derivative. Once again, use
the linearity of derivatives and sums to compute the partial derivatives with respect to each of the
∂2
w[h] terms. This will give you ∂w[g]∂w[h] . Don’t forget the chain rule and again use the fact that
>
Pn P
~xi w~ = j=1 xi [j]w[j] = xi [h]w[h] + j6=h xi [j]w[j].)
∂ 2 ci (w)
~
Solution: Proceeding in a similar manner as above, let us find ∂w[g]∂w[h] .
d2 ∂2
∂ d
ci (w)[g,
~ h] = ci (w)
~ = ci (w)[g]
~
dw~2 ∂w[g]∂w[h] ∂w[h] dw ~
∂
= c0 (x~i > w,
~ `i )xi [g]
∂w[h]
∂
= c00 (x~i > w,
~ `i ) (x~i > w)x
~ i [g]
∂w[h]
= c00 (~x>
i w,
~ `i )xi [g]xi [h]
d2
Note that c00 (~x>
i w,
~ `i ) = d(x~i > w)
~ 2
c(~x>
i w,
~ `i ).
iii) The expression in part (ii) is for the [g, h]-th component of the second derivative. 12 times this
~
times δw[g] ~
times δw[h] would give us that component’s contribution to the second-derivative term
in the approximation, and we have to sum this up over all g and h to get the total contribution of the
second-derivative term in the approximation. Now, we want to group terms to restructure this into
matrix-vector form by utilizing the outer-product form of matrix multiplication. What should the
space in the following expression be filled with?
d2
~ = c00 (~x>
ci (w) i w,
~ `i )
dw~2
Solution:
d2
~ = c00 (~x>
ci (w) ~ `i )~xi ~x>
i w, i
dw~2
~ Since we even-
Pmapproximation of ci (w
(c) Now we have successfully expressed the second order ~ ∗ + δw).
~ = i=1 ci (w),
tually want to minimize the total cost ctotal (w) ~ can you write out the second order
approximation of ctotal (w ~ using results from (a) and (b)?
~ ∗ + δw)
Solution: From previous parts, we get
d 2
ci (w ~ ≈ ci (w
~ ∗ + δw) ~ ∗) + ~ + 1 δw
ci (w~∗ )δw ~ > d ci (w~∗ )δw
~
dw ~ 2 ~2
dw
1 ~ > 00 >
~ ∗ ) + c0 (~x>
= ci (w i w ~∗ , `i )~x> ~
i δw + δw c (~ xi w~∗ , `i )~xi ~x> ~
i δw
2
Based on the linearity of derivatives, to get the second order approximation of ctotal (w ~ we just
~ ∗ + δw),
sum up the second order approximation for each ci (w ~
~ ∗ + δw):
m m
X X 1 ~ > 00 >
ctotal (w ~ =
~ ∗ + δw) ci (w ~ ≈
~ ∗ + δw) ~ ∗ ) + c0 (~x>
ci (w i w
~ , `
∗ i i)~
x > ~
δw + δw c (~
x i w
~ , ` )~
x
∗ i i i ~
x > ~
δw
2
i=1 i=1
~ in form of C + Pm (~q> δw
(d) Now in this part, we want to re-write ctotal (w~∗ + δw) ~ − bi )2 .
i=1 i
i) Let’s first rewrite a general second order polynomial f (x) = ax2 + bx + c in the form of f (x) =
r + (px + q)2 . Find p, q, r in terms of a, b, c. This procedure is called "completing the square". Then,
use this to argue that
arg min ax2 + bx + c = arg min(px + q)2
x x
b2 √ b 2 √ b b2
Solution: ax2 + bx + c = c − 4a + ( ax + √
2 a
) . Therefore p = a, q = √
2 a
, r =c− 4a .
Since r is a constant, arg minx r + (px + q)2 = arg minx (px + q)2 , hence we have
~ in the form of C + Pm ~
ii) Now rewrite ctotal (w~∗ + δw) qi> δw
i=1 (~ − bi )2 . What are C, qi , and bi ?
Solution:
m
~ ≈
X
0 > > ~ 1 ~ > 00 > > ~
ctotal (w
~ ∗ + δw) ~ ∗ ) + c (~xi w~∗ , `i )~xi δw + δw c (~xi w~∗ , `i )~xi ~xi δw
ci (w
2
i=1
2 2
m 0 >
c (x~i w~∗ , `i ) −c0 (x~i > w~∗ , `i )
r
X
ci (w~∗ ) − 1 00 > > ~
= 1 00 >
+ c (x~i w~ ∗ , `i )x~i δw − q
2
i=1
4 2 c (x~ i w
~ ,
∗ i ` ) >
2c00 (x~i w~∗ , `i )
2 2
m 0 > m
c (x~i w~∗ , `i ) X −c0 (x~i > w~∗ , `i )
r
X 1 > > ~
= ci (w~∗ ) − + c00 (x~ w~ , ` )x~ δw −
i ∗ i i
>
q
2c 00 (x~ w
~ , ` ) 2 >
i=1 i ∗ i i=1 2c00 (x~i w~∗ , `i )
Show that:
m
2 X
A~x − ~b = ai > ~x − bi )2
(~
i=1
where
− ~a>1 −
− ~a> −
2
A= .. .
.
− ~a>m −
Use this to intepret your expression from Part (d) as a standard least squares problem. What are
the rows of A?
2
Solution: A~x − ~b is by definition equal to the sum of all entries squared of the vector A~x − ~b.
2
Therefore A~x − ~b = m ai > ~x − bi )2 . Matching terms with our expression of ctotal (w ~ in
P
i=1 (~ ~ + δw)
Notice that these are pretty cheap to compute, given that we have to compute the original loss functions
in the first place.
Second Derivatives:
d2 +
c (p) = 2
dp2 sq
d2 −
c (p) = 2
dp2 sq
d2 +
c (p) = e−p
dp2 exp
d2 −
c (p) = ep
dp2 exp
d2 + e−p
clogistic (p) =
dp2 (1 + e−p )2
d2 − ep
clogistic (p) =
dp2 (1 + ep )2
Notice that all of these second derivatives are positive. Moreover, calculating them takes essentially
no more work than getting the first derivatives. In particular, it is useful to note that
d2 + e−p d d
clogistic (p) = = | c+ logistic (p)|(1 − | c+ (p)|)
dp2 −p
(1 + e ) 2 dp dp logistic
d2 − ep d d
clogistic (p) = = | c− logistic (p)|(1 − | c− (p)|)
dp2 p
(1 + e ) 2 dp dp logistic
so the basic nature of the logistic loss’ second derivative becomes more clear. If the first derivative
has magnitude 12 , this second derivative is maximized — that happens when the prediction p is 0 or
maximally uncertain. The second derivative for logistic loss shrinks away from there.
When you take 70, this particular form p(1 − p) will start to ring a bell. That sound is the gateway to
understanding a different reason for why logistic regression is popular in practice — but for that, you
need to understand Probability. It is a glorious coincidence that something so natural from an optimiza-
tion point of view also turns out to have useful interpretations involving the language of probability.
You will understand this after 126.
(g) Run the Jupyter Notebook and answer the following questions.
i) In Example 2, why does mean classification fail?
Solution: The mean classifier misclassifies some points because there is some information in the
distribution that can’t be accurately captured by the mean of the data points of each category.
ii) In Example 3, for what data distributions does ordinary least squares fail?
Solution: When there are extreme outliers in the dataset, the decision boundary in this case will be
"pulled" away from the desired location, thus least squares would fail.
iii) Run the code cells in Example 4. By performing updates to w ~ according to what you derived
in previous parts of the question, how many iterations does it take for exponential and logistic
regression to converge?
Solution: You should see that the decision boundaries are almost fixed after 3-4 iterations. These
iterated least-squares approaches are very fast to converge in general. This is why in practice, logistic
or exponential regression costs almost the same to run as ordinary least squares.
Congratulations! You now know the basic optimization-theoretic perspective on logistic regression.
After you understand the probability material in 70 and 126, you can understand the probabilistic
perspective on it as well. After you understand the optimization material in 127, you will understand
even more about the optimization-theoretic perspective on the problem including why this approach
actually converges.
where the ∗ operation will complex conjugate and transpose its argument (order doesn’t matter), and is
aptly called the conjugate transpose. Note that for real numbers, the complex inner product simplifies to
the real inner product. In all the theorems you’ve seen in this class, you can replace every inner product
with the complex inner product to show an analogous result for complex vectors: least squares becomes
x̂ = (A∗ A)−1 A∗~b, upper triangularization becomes A = U T U ∗ , Spectral Theorem becomes A = U ΛU ∗ ,
SVD becomes A = U ΣV ∗ .
*" # " #+
1+j −3 − j
(a) To get some practice computing complex inner products, what is , and
2 2+j
*" # " #+
−3 − j 1+j
, ? Does the order of the vectors in the complex inner product matter i.e. is it
2+j 2
commutative?
Solution:
*" # " #+
1+j −3 − j
, = (1 + j)(−3 − j) + (2)(2 + j)
2 2+j
= (1 + j)(−3 + j) + 2(2 − j)
= −3 + j − 3j − 1 + 4 − 2j = −4j
*" # " #+
−3 − j 1+j
, = (−3 − j)(1 + j) + (2 + j)(2)
2+j 2
= (−3 − j)(1 − j) + (2 + j)(2)
= −3 + 3j − j − 1 + 4 + 2j = 4j
The two inner products are different so clearly the complex inner product is not commutative and the
order of the vectors matters. In fact, when you swap the arguments for the inner product, you will get
the complex conjugate result, which you see an example of above.
h i
(b) Let U = ~u1 · · · ~un be an n by n complex matrix, where its columns ~u1 , ~u2 , . . . , ~un form an
orthonormal basis for Cn , i.e. (
1 if i = j
~u∗i ~uj =
0 if i 6= j
Such a complex matrix is called unitary in math literature, to distinguish from real orthonormal matri-
ces. Show that U −1 = U ∗ , where U ∗ is the conjugate transpose of U .
where ~u1 , . . . , ~un are the column vectors of U . Then, the entry at the i-th row and j-th column of U ∗ U
should be ~u∗i ~uj . If we write down the general form for each element of U ∗ U :
(
∗ ∗ 1 if i = j
(U U )ij = ~ui ~uj = ,
0 if i 6= j
h~v , wi
~ = hU~v , U wi
~ .
hU~v , U wi ~ ∗ U~v
~ = (U w)
Using the form for the complex conjugate of a matrix-vector product as stated in the problem:
~ ∗ U~v = w
(U w) ~ ∗ U ∗ U~v
(d) Show that if ~u1 , . . . , ~un are the columns of a unitary matrix U , they must be linearly indepen-
dent.
~ = ni=1 αi ~ui , then first show that αi = hw,
P
(Hint: Suppose w ~ ~ui i. From here ask yourself whether a
nonzero linear combination of the {~ui } could ever be identically zero.)
This basic fact shows how orthogonality is a very nice special case of linear independence.
Solution: Suppose they are not linearly independent, then there exist α1 , . . . , αn ∈ C such that
~ = ni=1 αi ~ui = ~0, while at least one of αi is non-zero. We can then take the inner product of both
P
w
sides with ~uj , for all j:
* n + n
X X
hw,
~ ~uj i = αi ~ui , ~uj = αi h~ui , ~uj i = αj .
i=1 i=1
Since ~u1 , . . . , ~un form an orthonormal basis, we know that h~ui , ~uj i will be 1 when i = j and 0
otherwise, which is why only αj survives in the above summation. Since w ~ = ~0, then αj should be 0
for all inner products h~uj , wi.
~ However, this is a contradiction to our assumption that at least one of
the αi is non-zero. Therefore, ~u1 , . . . , ~un are linearly independent.
This confirms what we know — that orthonormality is a particularly robust guarantee of linear inde-
pendence.
(e) Now let V be another n × n matrix, where the columns of the matrix form an orthonormal basis for
Cn , i.e. V is unitary. Show that the columns of the product U V also form an orthonormal basis
for Cn . )
Solution: Since V is a unitary matrix, we have V ∗ V = I. To show that the columns of U V also
form an orthonormal basis, we could write down its conjugate transpose, (U V )∗ , and apply it to U V :
(U V )∗ (U V ) = V ∗ U ∗ U V = V ∗ V = I ,
M ∗ = (U ΛU ∗ )∗ = U Λ∗ U ∗ .
Since M ∗ = M = U ΛU ∗ , this means that Λ∗ = Λ. The only case where this is true is when all the
elements of Λ are real, which means the eigenvalues of M are always real.
4. Roots of Unity
An N th root of unity is a complex number ω satisfying the equation ω N = 1 (or equivalently ω N − 1 = 0).
In this problem we explore some properties of the roots of unity, as they end up being essential for the DFT.
Solution:
N
X −1 N
X N
X −1
(z − 1) zk = zk − zk = zN − z0 = zN − 1
k=0 k=1 k=0
2π
k = ej N k for k ∈ Z is an N -th root of unity. From
(b) Show that any complex number of the form ωN
2π
here on, let ωN = ej N .
Solution: 2π N
kN
ωN = ej N k = ej2πk = e0 = 1
This means that the N numbers ωN 0 , ω 1 , ω 2 , . . . , ω N −1 are the solutions to the equation z N = 1 and
N N N
hence roots of the polynomial z N − 1.
(c) For a given integer N ≥ 2, using the previous parts, give the complex roots of the polynomial
1 + z + z 2 + ... + z N −1 .
Solution: (z − 1)(1 + z + z 2 + ... + z N −1 ) = z N − 1, and we just showed that the roots of z N − 1
are the ω k = ej2πk for k = 0, . . . , N − 1, therefore the roots of z 7→ 1 + z + z 2 + ... + z N −1 are the
ωNk = ej 2πk
N for k = 1, . . . , N − 1.
We can see this by factoring and matching factors together. A polynomial with a leading coefficient
of 1 can be factored into its roots. So z N − 1 = (z − ωN 0 )(z − ω 1 )(z − ω 2 ) · · · (z − ω N −1 ) =
QN −1 N N N
k ) = (z − 1)
Q N −1 k ). Dividing both sides by z − 1 gives us what we want.
k=0 (z − ωN k=1 (z − ω N
(d) What are the fourth roots of unity? Draw the fourth roots of unity in the complex plane. Where
do they lie in relation to the unit circle?
π
Solution: Using the formula for the roots of unity from part (b), ω40 = e0 = 1, ω41 = ej 2 = j, ω42 =
2π 3π
ej 2 = −1, ω43 = ej 2 = −j. All roots of unity must be on the unit circle since they have magnitude
1. From the definition of the roots of unity, we know that each root of unity is 2π/N = π/2 radians
apart, and this can be seen graphicaly in the plot below.
(e) What are the fifth roots of unity? Draw the fifth roots of unity in the complex plane.
2π
Solution: Using the formula for the roots of unity from part (b), ω50 = e0 = 1, ω51 = ej 5 , ω52 =
4π 6π 8π
ej 5 , ω53 = ej 5 , ω54 = ej 5 . Again, we know that each root of unity should be 2π/5 radians or 72◦
apart by the definition, and it can be seen graphically below.
2π
(f) For N = 5, ω5 = ej 5 , simplify ω542 such that the exponent is less than 5 and greater than 0.
Solution:
ω542 = ω58·5 ω52 = (ω55 )8 ω52 = 18 ω52 = ω52
Every 5 powers of the 5th root of unity will be 1, so it simplifies to the remainder of 42 divided by 5.
k+N k for all integers k, both
(g) Let’s generalize what you saw in the previous part. Prove that ωN = ωN
positive and negative. This shows that the roots of unity have a periodic structure.
Solution:
k+N k N k k
ωN = ωN ωN = ωN · 1 = ωN
(h) What is the complex conjugate of ω5 in terms of the 5th roots of unity? What is the complex
conjugate of ω542 in terms of the 5th roots of unity? What is the complex conjugate of ω54 in terms
of the 5th roots of unity?
Solution: Using part (g),
Notice here that we can think about going around the circle of roots of unity, since they are periodic
and wrap around to where they started.
This is something called modulo arithmetic and is naturally connected to cycles like this. It is tradition-
ally viewed as taking the remainder (in this case after dividing by N = 5), however some people get
confused when asked to take the remainder of a negative number divided by a positive one. Here, we
just remember that we can turn a negative number between −(N − 1) and −1 into a positive number
by adding N to it.
You will learn a lot more about modulo arithmetic in CS 70 — here in 16B, you just get a teaser
because it emerges naturally when thinking about the roots of unity and their powers.
PN −1 km
(i) Compute m=0 ωN where ωN is an N th root of unity. Does the answer make sense in terms of
the plot you drew?
k = 1, then this is easy: we have
Solution: If ωN
N
X −1 N
X −1
0
ωN = 1 = N.
m=0 m=0
k 6= 1. Consequently, we can
This happens whenever k is a multiple of N . For all other k, we know ωN
use the formula we found in part (a) to write
N −1 kN − 1
X
km ωN
ωN = =0
ωN − 1
m=0
since ωN is a root of unity and so ωNN = 1 and so ω kN = 1 as well. This makes intuitive sense because
N
all the roots of unity are spaced evenly around the circle. Therefore summing them up, we have to get
zero by symmetry.
We can represent the DFT basis with the matrix U , and it is given by
1 1 1 ··· 1
2π 2π(2) 2π(N −1)
ej N ej N ··· ej N
1
1 2π(2)
j N
2π(4)
j N j
2π2(N −1)
U = √ 1 e e ··· e N .
N ..
.. .. ..
. . . .
2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
1 ej N ej N · · · ej N
2πij
In other words, for the ij th entry of U , Uij = √1 ej N . From this, we can see that the DFT basis matrix
N
is symmetric, so U = U T . Another very important property of the DFT basis is that U is orthonormal, so
U ∗ U = I. We want to find the coordinates of ~x in the DFT basis, and we know these coordinates are given
by
X~ = U −1 ~x.
We call the components of X ~ the DFT coefficients of the time-domain signal ~x. We can think of the compo-
~
nents of X as weights that represent ~x in the DFT basis. As we will explore in the problem, each coefficient
can be thought of as a measurement for which frequency is present in the signal.
You can use Numpy or other calculation tools to evaluate cosines or do matrix multiplication, but you will
not get credit if you directly calculate the DFT using a function in Numpy. You must show your work to get
credit.
(a) What is U −1 ?
Solution: Since U is orthonormal, U −1 = U ∗ which means we transpose and complex conjugate U .
However, note that U is symmetric, so we just need to complex conjugate the matrix, meaning every
exponent of the exponential becomes negative. Thus,
1 1 1 ··· 1
2π 2π(2) 2π(N −1)
e−j N e−j N ··· e−j N
1
−1 ∗ 1
−j N
2π(2)
−j
2π(2)(2)
−j
2π2(N −1)
U = U = √ 1 e e N ··· e N
N ..
.. .. ..
. . . .
2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
−j −j −j
1 e N e N ··· e N
h i
(b) Let the columns of U = ~u0 ~u1 . . . ~uN −1 . Prove that ~uk = ~uN −k for k = 1, 2, . . . , N − 1.
Solution:
2πmk 2πmk
~uk [m] = ej N = e−j N (7)
Using the fact that any multiple of 2π won’t affect the exponent,
2πmk 2πmk
e−j N = ej(2πm− N
)
(8)
2πm(N −k)
j
=e N = ~uN −k [m] (9)
Since this holds for all m, then ~uk = ~uN −k when k = 1, . . . , N − 1. It doesn’t hold for other k since
N − k for the columns of U wouldn’t be valid.
then the indices of k and
2π
(c) Decompose cos 7 n into a sum of complex exponentials.
Solution: From the inverse Euler formula,
2π 1 2πn 1 2πn
cos n = ej 7 + e−j 7
7 2 2
2πn 2πn
(d) If x1 [n] = cos 7 , write x1 [n] as a linear combination of y+ [n] = ej 7 and y− [n] = e−j 7 .
2πn
Solution: Note that we can replace both complex exponentials with the given y functions, so
1 1
x1 [n] = y+ [n] + y− [n]
2 2
(e) Now think of ~x1 as a length N = 7 vector which is x1 [n] sampled at n = 0, h1, . . . , 6. We do the
i same
to similarly get ~y+ , ~y− . Write ~y+ and ~y− in terms of the columns of U = ~u0 ~u1 . . . ~u6 .
HINT: Use part (b).
2πn
Solution: We can notice that ~u1 is exactly √1N ej 7 evaluated at the 7 sample points. This is exactly
√ √ 2πn
~y+ up to scale, so ~y+ = N~u1 = 7~u1 . From part (b), we know that ~u6 = ~u1 which is √1N e−j 7
√ √
evaluated at the sample points. This is exactly a scaling of ~y− , so ~y− = N~u6 = 7~u6 .
(f) Using the last 3 parts, compute the DFT coefficients X ~ 1 for signal ~x1 .
√ √
7 7
Solution: Combining the last 3 parts, we know ~x1 = 2 ~u1 + 2 ~u6 . We also know that since the
columns of U are ~u0 , . . . , ~u6 , then
1 1 1 ··· 1
2π 2π(2) 2π(N −1)
~u∗0
−j −j
··· e−j ~u∗1
1 e N e N N
1
−j N
2π(2)
−j
2π(2)(2) 2π2(N −1)
~u∗2
∗ e−j N
U = √ 1 e e N ··· =
(10)
N .
.. . . ..
..
.. ..
.
.
∗
2π(N −1) 2π2(N −1) 2π(N −1)(N −1)
1 e−j N e−j N ··· e−j N
~uN −1
In the last equation, we use the orthonormality property of the DFT basis vectors, i.e. that
(
∗ 1 i=j
~ui uj = (14)
0 i 6= j.
(g) Plot the time domain representation of x1 [n]. Plot the magnitude, |X1 [n]|, and plot the phase,
∠X1 [n], for the DFT representation X ~ 1 . You should notice that the DFT coefficients correspond to
which frequencies were present in the original signal, which is why the DFT coefficients are called the
frequency domain representation.
Solution:
Time Domain
1
x1 [n]
0.5
n
−1 1 2 3 4 5 6 7
−0.5
−1
k
−1 1 2 3 4 5 6 7
−1
Frequency Domain Phase
3 ∠X [k]
1
1
k
−1 1 2 3 4 5 6 7
−1
−2
−3
~ 1 in this case are all real-valued, the phase is zero-valued.
Because the coefficients X
(h) We define x2 [n] = cos 4π7 n for N = 7 samples n ∈ {0, 1, . . . , 6}. Compute the DFT coefficients
~ 2 for signal ~x2 .
X
Solution: Following from the above, we can write ~x2 , the vector of samples of the function, in terms
of the columns of our DFT basis matrix U :
2π(2) 1 2π(2) 1 2π(2)
x2 [n] = cos n = ej 7 n + e−j 7 n (16)
7 2 2
√
1 √ √ 7
~x2 = ( N u~2 + N u~2 ) = (u~2 + u~5 ) (17)
2 2
From this, we can see that our DFT coefficients vector X~ 2 will only have non-zero values for rows
~
k = 2 and k = 5. The elements of X2 can be written as
√
7 k = 2, 5
X2 [k] = 2 (18)
0 k 6= 2, 5.
(i) Plot the time domain representation of x2 [n]. Plot the magnitude, |X2 [n|, and plot the phase,
~ 2.
∠X2 [n], for the DFT representation X
Solution:
Time Domain
1
x[n]
0.5
n
−1 1 2 3 4 5 6 7
−0.5
−1
Frequency Domain Magnitude
4
|X[k]|
k
−1 1 2 3 4 5 6 7
−1
Frequency Domain Phase
3 ∠X[k]
1
k
−1 1 2 3 4 5 6 7
−1
−2
−3
(j) To generalize this result, say we have some p ∈ {1, 2, 3} which scales the frequency of our signal ~xp ,
which we define as xp [n] = cos 2π 7 pn for N = 7 samples n ∈ {0, 1, . . . , 6}. Compute the DFT
~ p for signal ~xp in terms of this scalar p.
coefficients X
Solution: As in the last two problems, we can represent ~xp in terms of the columns of the DFT basis
matrix:
2π(p) 1 2π(p) 1 2π(p)
xp [n] = cos n = ej 7 n + e−j 7 n (19)
7 2 2
√
1 √ √ 7
~xp = ( 7u~p + 7 ~up ) = (u~p + ~u7−p ) (20)
2 2
~ p will occur at k = p and k = 7 − p:
The only nonzero entries in X
√
7 k = p, 7 − p
Xp [k] = 2 (21)
0 k 6= p, 7 − p.
h iT
(k) Let’s see what happens when we have an even number of samples. We define ~s = 1 0 1 0 1 0 ,
which has N = 6 samples. Compute the DFT coefficients S ~ for signal ~s.
Hint: Write ~s as a+b cos 2π
6 pn for some constants a, b, p, and use the fact that this signal has period
2.
Solution: We have N = 6 samples, which we will denote n ∈ {0, 1, . . . , 5}. The signal repeats after
a period T = 2, so the cosine function should also have period 2. This will occur when p = 3. Then
to account for the scaling and shifting of the cosine from [−1, 1] to [0, 1], we need a = 12 , b = 21 . This
gives
1 1 2π
s[n] = + cos (3)n (22)
2 2 6
From part (b), ~u3 = ~u3 . Therefore both of the ~u3 and ~u3 terms are added together and contribute to
the k = N2 = 3rd term of S. ~ Additionally, we have a term from k = 0 because of the constant offset
of the signal with 0 frequency. Thus,
√
6 k = 0, 3.
S[k] = 2 (27)
0 k 6= 0, 3.
Having ~u3 = ~u3 shows us an interesting result of using an even number of samples. When N is even,
the ~u N column will be entirely real because
2
2π N
u N [l] = e−j N 2
l
= e−jπl ,
2
6. Survey
Please fill out the survey here. As long as you submit it, you will get credit. Thank you!
(a) What sources (if any) did you use as you worked through the homework?
(b) If you worked with someone on this homework, who did you work with?
List names and student ID’s. (In case of homework party, you can also just describe the group.)
Contributors:
• Kourosh Hakhamaneshi.
• Kuan-Yun Lee.
• Nathan Lambert.
• Sidney Buchbinder.
• Gaoyue Zhou.
• Anant Sahai.
• Ashwin Vangipuram.
• John Maidens.
• Geoffrey Négiar.
• Yen-Sheng Ho.
• Harrison Wang.
• Regina Eckert.