EE263s Homework 3
EE263s Homework 3
EE263s homework 3
1. Linearizing range measurements. Consider a single (scalar) measurement y of the distance or range of
x ∈ Rn to a fixed point or beacon at a, i.e., y = kx − ak.
(a) Show that the linearized model near x0 can be expressed as δy = k T δx, where k is the unit vector
(i.e., with length one) pointing from a to x0 . Derive this analytically, and also draw a picture
(for n = 2) to demonstrate it.
(b) Consider the error e of the linearized approximation, i.e.,
The relative error of the approximation is given by η = e/kx0 − ak. We know, of course, that the
absolute value of the relative error is very small provided δx is small. In many specific applications,
it is possible and useful to make a stronger statement, for example, to derive a bound on how
large the error can be. You will do that here. In fact you will prove that
α2
0≤η≤
2
where α = kδxk/kx0 − ak is the relative size of δx. For example, for a relative displacement of
α = 1%, we have η ≤ 0.00005, i.e., the linearized model is accurate to about 0.005%. To prove
this bound you can proceed as follows:
p
• Show that η = −1 + 1 + α2 + 2β − β where β = k T δx/kx0 − ak.
• Verify that |β| ≤ α.
p
• Consider the function g(β) = −1 + 1 + α2 + 2β − β with |β| ≤ α. By maximizing and
minimizing g over the interval −α ≤ β ≤ α show that
α2
0≤η≤ .
2
Solution.
(a) For the linearized model we have
∂y
δy = δx
∂x
so all we have to do is to compute the matrix ∂y/∂x. Since y = kx−ak we have y 2 = (x−a)T (x−a)
and differentiating both sides with respect to x gives
∂y
2 y = 2(x − a)T
∂x
and therefore
∂y (x − a)T (x − a)T
= = ,
∂x y kx − ak
so δy = k T δx with k = (x − a)/kx − ak. Clearly, k points from a to x and is of length one since
(x − a)T (x − a)
kT k = = 1.
kx − ak2
1
x
δx
kx − ak x0
k T δx
kx0 − ak
k
a
p
(b) • First we show that η = −1 + 1 + α2 + 2β − β where β = k T δx/kx0 − ak. Note that
e = kx0 + δx − ak − kx0 − ak − k T δx
x0 − a δx k T δx
= kx0 − ak k + k−1− ,
kx0 − ak kx0 − ak kx0 − ak
and after dividing both sides by kx0 − ak and using k = (x0 − a)/kx0 − ak, β = k T δx/kx0 − ak
and η = e/kx0 − ak we get
δx
η = kk + k − 1 − β. (1)
kx0 − ak
But
s
T
δx δx δx
kk + k = k+ k+
kx0 − ak kx0 − ak kx0 − ak
s
k T δx kδxk2
= kkk2 + 2 + .
kx0 − ak kx0 − ak2
δx p
kk + k = 1 + 2β + α2 .
kx0 − ak
• It is easy to see that |β| ≤ α. Simply we can use the Cauchy-Schwarz inequality for the
vectors k and δx/kx0 − ak, i.e.,
T
k δx kδxk
kx0 − ak ≤ kkk kx0 − ak ,
2
• At this point, all we need to do to derive p a bound on how large the error can be is to
maximize and minimize the function g(β) = 1 + 2β + α2 − 1 − β over the interval |β| ≤ α
or −α ≤ β ≤ α. The maximum or minimum of a smooth function (g(β)) over a given interval
(−α ≤ β ≤ α) can only occur at the endpoints of the interval (β = ±α) or at the extremums
(points β with g ′ (β) = 0). For g(β) we have:
– Value at endpoint β = α.
p
g(α) = 1 + 2α + α2 − 1 − α
p
= (1 + α)2 − 1 − α (note: 1 + α > 0 since α ≥ 0)
= 1+α−1−α
= 0.
– Value at endpoint β = −α.
p
g(−α) = 1 − 2α + α2 − 1 − α
p
= (1 − α)2 − 1 + α
= |1 − α| − (1 − α)
0; 0≤α≤1
=
2(α − 1); α > 1.
Therefore g(−α) ≥ 0 for all α because 2(α − 1) > 0 for α > 1.
– Extremum value.
1
g ′ (β) = p − 1.
1 + 2β + α2
p
Setting g ′ (β) = 0 we get 1 + 2β + α2 = 1 or 1+2β +α2 = 1 and therefore βex. = −α2 /2.
The function value at the extremum βex. = −α2 /2 is
p α2
g(βex. ) = 1 − α2 + α2 − 1 +
2
α2
= .
2
Clearly, g(β) ≥ 0 for all β satisfying |β| ≤ α because the value of g(β) at the endpoints β = ±α
and at the extremum β = α2 /2 are all non-negative. Thus we have achieved the lower bound
on the relative error η, i.e., we have shown that η ≥ 0. For the upper bound we need to be
a bit more careful. The upper bound we get is either g(α), g(−α) or g(βex. ). First note that
g(α) = 0 is always less than or equal to g(βex. ) = α2 /2 ≥ 0 so the choice of g(α) is immediately
ruled out as the maximum of g. Now consider g(−α) and g(βex. ). For 0 ≤ α ≤ 1 we obviously
have g(βex. ) ≥ g(−α) = 0. For α > 1 we also have g(βex. ) ≥ g(−α) = 2(α − 1) because
α2 /2 ≥ 2(α − 1) is equivalent to α2 − 4α + 4 ≥ 0 which is true since α2 − 4α + 4 = (α − 2)2
is a complete square. Thus, we acheive an upper bound on g(β) for all β satisfying |β| ≤ α
as g(β) ≤ α2 /2. Therefore we have shown that η ≤ α2 /2 and we are done.
(Note: when βex. falls outside the interval β ≤ |α|, it is possible to achieve a tighter upper
bound for g. In this case, the maximum of g over β ≤ |α| is obtained at the endpoint
β = −α/2. The extremum βex. = −α2 /2 falls outside β ≤ |α| when α2 /2 > α or α > 2.
Therefore, a tighter upper bound on η for α > 2 becomes η ≤ g(−α) = 2(α − 1).)
2. Halfspace. Suppose a, b ∈ Rn are two given points. Show that the set of points in Rn that are closer
to a than b is a halfspace, i.e.:
{x | kx − ak ≤ kx − bk } = { x | cT x ≤ d}
for appropriate c ∈ Rn and d ∈ R. Give c and d explicitly, and draw a picture showing a, b, c, and the
halfspace.
3
Solution. It is easy to see geometrically what is going on: the hyperplane that goes right between a
and b splits Rn into two parts; the points closer to a (than b) and the points closer to b (than a). More
precisely, the hyperplane is normal to the line through a and b, and intersects that line at the midpoint
between a and b. Now that we have the idea, let’s try to derive it algebraically. Let x belong to the
set of points in Rn that are closer to a than b. Therefore kx − ak < kx − bk or kx − ak2 < kx − bk2 so
xT x − xT a − aT x + aT a < xT x − xT b − bT x + bT b
or
−2aT x + aT a < −2bT x + bT b
and finally
1 T
(b − a)T x < (b b − aT a). (2)
2
Thus (2) is in the form cT x < d with c = b − a and d = 12 (bT b − aT a) and therefore we have shown that
the set of points in Rn that are closer to a than b is a halfspace. Note that the hyperplane cT x = d is
perpendicular to c = b − a.
cT x = d
cT x > d
cT x < d
3. Some properties of the product of two matrices. For each of the following statements, either show that
it is true, or give a (specific) counterexample.
4
• If AB is full rank then A and B are full rank.
• If A and B are full rank then AB is full rank.
• If A and B have zero nullspace, then so does AB.
• If A and B are onto, then so is AB.
You can assume that A ∈ Rm×n and B ∈ Rn×p . Some of the false statements above become true
under certain assumptions on the dimensions of A and B. As a trivial example, all of the statements
above are true when A and B are scalars, i.e., n = m = p = 1. For each of the statements above, find
conditions on n, m, and p that make them true. Try to find the most general conditions you can. You
can give your conditions as inequalities involving n, m, and p, or you can use more informal language
such as “A and B are both skinny.”
Solution. First note that an m × n matrix is full rank if and only if the maximum number of
independent columns or rows is equal to min{m, n}.
• If AB is full rank then A and B are full rank. False. Consider the following counter example:
1 1
A= 1 0 , B= , AB = 1 1 .
0 0
5
• A fat, B skinny, AB fat (or A fat, B skinny, AB skinny.) The statement is not true in this case.
Consider:
1 1
1 0 0 1 1 = 1 1 .
| {z } 0 0 | {z }
full rank | {z } full rank
not full rank
However, if we add the constraint that AB is square then the statement becomes correct. To
show this we use the facts that for a full rank fat matrix A all rows are independent, so xT A = 0
implies x = 0, and for a full rank skinny matrix B all columns are independent, so Bx = 0 implies
that x = 0. We first prove by contradiction that AB full rank implies that A is full rank. If A
(fat) is not full rank, then there exists an x 6= 0 such that xT A = 0, and therefore, xT AB = 0.
This implies that the rows of AB (a square matrix) are dependent which is impossible since AB
is full rank and we are done. Now we prove that B should be full rank as well. If B (skinny) is
not full rank, then Bx = 0 for x 6= 0 which implies that ABx = 0, or the columns of AB (a full
rank square matrix) are dependent which is a contradiction. Hence B is full rank too and we are
done.
• A skinny, B fat, AB fat (or A skinny, B fat, AB skinny.) The statement is true in this case.
First note that if AB is full rank then A should be square. We have
and since A is skinny and B is fat, rank(A) ≤ n and rank(B) ≤ n and therefore
rank(AB) ≤ n.
Now since AB is full rank and fat, then rank(AB) = m so m ≤ n. However, A is skinny so m ≥ n
and therefore we can only have m = n or that A is square. Now it is easy to prove that AB full
rank implies that A and B are full rank. We first prove that A is full rank by contradiction.
Suppose that A (square) is not full rank so there exists a non-trivial linear combination of its
rows that is equal to zero, i.e., x 6= 0 and xT A = 0. Therefore, xT AB = 0 which implies that a
linear combination of the rows of AB (a fat matrix) is zero which is impossible because AB is full
rank. This shows that A should be full rank. Now we show that B should be full rank as well.
Since A is full rank and square, then A−1 exists so B = A−1 (AB). Suppose that B (fat) is not
full rank so there exists an x 6= 0 such that xT B = 0 and therefore xT A−1 (AB) = 0. But xT A−1
is nonzero because x is nonzero and A−1 is invertible, which implies that a linear combination of
the rows of AB (a full rank fat matrix) is zero. This is impossible of course and we have shown
by contradiction that B should be full rank and we are done.
• A fat, B fat, AB skinny (or A skinny, B skinny, AB fat.) If A is fat, B is fat and AB is skinny,
then A, B and AB can only be square matrices. A being fat implies that m ≤ n and B being fat
implies that n ≤ p and we get p ≥ m. However, p ≤ m because AB is skinny, so we can only have
m = p, and therefore m = n as well. In other words, A, B and AB are square. As a result, this
case (A square, B square, AB square) falls into the previous category (A skinny, B fat, AB fat)
and hence the statement is true.
To summarize, the most general conditions for the statement to be true are:
• A fat, B skinny, AB square,
• A square, B fat, AB fat,
• A skinny, B square, AB skinny.
6
Comment : Another way to do this part:
The following inequalities are always true, regardless of the sizes of A, B and AB:
Now, with the three numbers m, n and p, there are six different cases. However, as mentioned before,
we only need to check three cases, since the other three can be obtained by taking transposes. Using
the above inequalities in each case, we get:
• m ≤ n ≤ p: rank(A) = m, m ≤ rank(B) ≤ n
Thus in this case A will be full rank, but we can’t say anything about B. The only way to be
able to infer that B is also full rank is to have m = n. So the claim will be true if m = n ≤ p.
• m ≤ p ≤ n: rank(A) = m, m ≤ rank(B) ≤ p
Similar to the previous case, to be able to infer both A and B are full rank, we should have m = p.
So the condition in this case will be m = p ≤ n.
• n ≤ m ≤ p: m ≤ rank(A) ≤ n, but n ≤ m, so we must have m = n ≤ p, yielding rank(A) =
rank(B) = m.
Therefore, the most general conditions where the claim is true are:
m = n ≤ p, n = p ≤ m, m = p ≤ n
Which are the same conditions as the ones obtained before.
Now we consider the second statement: “If A and B are full rank then AB is full rank.” Again we
consider different cases:
• A fat, B fat, AB fat (or A skinny, B skinny, AB skinny.) The statement is true in this case.
Since AB is fat, we need to prove that xT AB = 0 implies that x = 0. But this is easy: xT AB = 0
implies that xT A = 0 (because B is fat and full rank) and xT A = 0 implies that x = 0 (because
A is fat and full rank) and we are done.
• A fat, B skinny, AB fat (or A fat, B skinny, AB skinny.) The statement is not true in this case.
Consider the counter example:
0
1 0 = [0] .
| {z } 1 |{z}
full rank
| {z } not full rank
full rank
• A skinny, B fat, AB fat (or A skinny, B fat, AB skinny.) The statement is not true in this case.
Consider:
1 1 0
1 0 = .
0 | {z } 0 0
| {z } full rank | {z }
full rank not full rank
• A fat, B fat, AB skinny (or A skinny, B skinny, AB fat.) As shown previously, if A is fat, B is
fat and AB is skinny, then A, B and AB can only be square matrices. Therefore, this case falls
into the category of A fat, B fat, AB fat for which the statement is true.
To summarize, the statement is true only if
7
• A fat, B fat, AB fat,
• A skinny, B skinny, AB skinny.
4. Some true/false questions. Determine if the following statements are true or false. No justification
or discussion is needed for your answers. What we mean by “true” is that the statement is true for
all values of the matrices and vectors given. You can’t assume anything about the dimensions of
the matrices (unless it’s explicitly stated), but you can assume that the dimensions are such that all
expressions make sense. For example, the statement “A + B = B + A” is true, because no matter what
the dimensions of A and B (which must, however, be the same), and no matter what values A and
B have, the statement holds. As another example, the statement A2 = A is false, because there are
(square) matrices for which this doesn’t hold. (There are also matrices for which it does hold, e.g., an
identity matrix. But that doesn’t make the statement true.)
a. If all coefficients (i.e., entries) of the matrix A are positive, then A is full rank.
1 1
Solution. False. The matrix has all entries positive and is singular, hence not full
1 1
rank.
b. If A and B are onto, then A + B must be onto.
Solution. False. The 1 × 1 matrix A = 1 is full rank, and so is the matrix B = −1. But
A + B = 0 (the 1 × 1 zero), which is not onto.
A C
c. If A and B are onto, then so is the matrix .
0 B
Solution. True. To show this matrix is onto, we need to show that we can solve the equations
y1 A C x1
=
y2 0 B x2
for any y1 and y2 . (These are all vectors.) The bottom block row is y2 = Bx2 . Using the fact
that B is onto, we can find at least one x2 such that y2 = Bx2 . The top block row is
y1 = Ax1 + Cx2 ,
A
Solution. False. Let A and B both be the 1 × 1 matrix 1. These are each onto, but =
B
1
is not.
1
A
e. If the matrix is onto, then so are the matrices A and B.
B
8
A
Solution. True. To say that is onto means that for any vector y, we can find at least
B
one x that satisfies
A
y= x.
B
Let’s use this to show that A and B are both onto. First let’s consider the equation z = Au. We
can solve this by finding an x that satisfies
z A
= x.
0 B
This
means
that Ax = 0 and Bx = 0. From the first, we conclude that x = 0. This shows that
A
is full rank.
B
V ⊥ = { x | hx, yi = 0, ∀y ∈ V } .
Solution.
(a) We do not need to check all properties of a vector space to hold for V ⊥ , since many of them hold
only because V ⊥ ⊆ Rn and the vector sum and scalar product definitions over V ⊥ and Rn are
the same. We only need to verify the following properties:
• 0 ∈ V ⊥.
• ∀x1 , x2 ∈ V ⊥ : x1 + x2 ∈ V ⊥ .
• ∀α ∈ R, ∀x ∈ V ⊥ : αx ∈ V ⊥ .
9
The first property comes from the fact that h0, yi = 0 for all y ∈ Rn and therefore 0 ∈ V ⊥ .
To verify the second property, we pick two arbitrary elements x1 and x2 in V ⊥ and show that
x1 + x2 ∈ V ⊥ . Let y be any vector in Rn . We have
hαx, yi = αhx, yi
= α·0 (since x ∈ V ⊥ )
= 0,
or V T x = 0. Therefore V ⊥ = N (V T ).
(c) Suppose that w1 , w2 , . . . , wk is an orthonormal basis for V. Consider the projection of x on V,
i.e.,
v := (w1T x)w1 + (w2T x)w2 + · · · + (wkT x)wk .
Clearly, v ∈ V because it is a linear combination of the basis vectors wi . Now we show that x − v
(projection error) is an element in V ⊥ . To do this we only have to verify that x − v ⊥ wi or
wiT (x − v) = 0 for i = 1, . . . , k. This is easy because
and
(v1⊥ − v2⊥ )T (v1⊥ − v2⊥ ) = kv1⊥ − v2⊥ k2 = 0
so v1 = v2 and v1⊥ = v2⊥ or the decomposition is unique.
10
(d) This follows from the previous part. In part (5c) we showed that any vector in Rn can be expressed
as the sum of two elements in V and V ⊥ . Therefore, if {wi }ki=1 is a basis for V and {ui }li=1 is a
basis for V ⊥ , for arbitrary x ∈ Rn the scalars αi and βi exist such that
k
X l
X
x= αi wi + βi u i
i=1 i=1
or the set of vectors {w1 , . . . , wk , u1 , . . . , ul } span Rn . In fact, the vectors wi for i = 1, . . . , k are
orthogonal to the vectors ui for i = 1, . . . , l by the definition of V ⊥ and are therefore independent.
Since the set of vectors {w1 , . . . , wk , u1 , . . . , ul } span Rn and w1 , . . . , wk , u1 , . . . , ul are independent
we get
dim V + dim V ⊥ = k + l = n.
(e) To show that U ⊥ ⊆ V ⊥ we take an arbitrary element x ∈ U ⊥ and prove that x ∈ V ⊥ . Since
x ∈ U ⊥ then x ⊥ y for all y ∈ U. But V ⊆ U so we also have x ⊥ y for all y ∈ V. By definition of
V ⊥ , this is nothing but to state that x ∈ V ⊥ and we are done.
6. Temperatures in a multi-core processor. We are concerned with the temperature of a processor at two
critical locations. These temperatures, denoted T = (T1 , T2 ) (in degrees C), are affine functions of the
power dissipated by three processor cores, denoted P = (P1 , P2 , P3 ) (in W). We make 4 measurements.
In the first, all cores are idling, and dissipate 10W. In the next three measurements, one of the processors
is set to full power, 100W, and the other two are idling. In each experiment we measure and note the
temperatures at the two critical locations.
P1 P2 P3 T1 T2
10W 10W 10W 27◦ 29◦
100W 10W 10W 45◦ 37◦
10W 100W 10W 41◦ 49◦
10W 10W 100W 35◦ 55◦
Suppose we operate all cores at the same power, p. How large can we make p, without T1 or T2
exceeding 70◦ ?
You must fully explain your reasoning and method, in addition to providing the numerical solution.
Solution. The temperature vector T is an affine function of the power vector P , i.e., we have T =
AP + b for some matrix A ∈ R2×3 and some vector b ∈ R2 . Once we find A and b, we can predict the
temperature T for any value of P .
The first approach is to (somewhat laboriously) write equations describing the measurements in terms
of the elements of A. Let aij denote the (i, j) entry of A. We can write out the relations T = AP + b
for the 4 experiments listed above as the set of 8 equations
11
Next, we define a vector of unknowns, x = (a11 , a12 , a13 , a21 , a22 , a23 , b1 , b2 ) ∈ R8 . We rewrite the 8
equations above as Cx = d, where
10 10 10 0 0 0 1 0 27
0 0 0 10 10 10 0 1 29
100 10 10 0 0 0 1 0 45
0 0 0 100 10 10 0 1 37
C= , d= .
10 100 10 0 0 0 1 0 41
0 0 0 10 100 10 0 1 49
10 10 100 0 0 0 1 0 35
0 0 0 10 10 100 0 1 55
We solve for x as x = C −1 d. (It turns out that C is invertible.) Putting the entries of x into the
appropriate places in A and b, we have
0.200 0.156 0.089 22.6
A= , b= .
0.089 0.222 0.289 23.0
At this point we can predict T for any P (assuming we trust the affine model).
Substituting P = (p, p, p) into T = AP + b, we get
Both of these temperatures are increasing in p (it would be quite surprising if this were not the case).
The value of p for which T1 = 70 is p = (70 − 22.6)/0.444 = 106.8W. The value of p for which T2 = 70
is p = (70 − 23)/0.6 = 78.3W. Thus, the maximum value of p for which both temperatures do not
exceed 70◦ is p = 78.3W.
Alternative solution. Another way of solving this problem is to directly exploit the fact that T is
an affine function of P . This means that if we form any linear combination of the power vectors used
in the experiment, with the coefficients summing to one, the temperature vector will also be the same
linear combination of the temperatures.
By averaging the last three experiments we find if the powers are P = (40, 40, 40), then the temperature
vector is T = (40.33, 47.00). (Note that this is really a prediction, based on the observed experimental
data and the affineness assumption; it’s not a new experiment!)
Now we form a new power vector of the form
P = (1 − θ)(10, 10, 10) + θ(40, 40, 40) = (10 + 30θ, 10 + 30θ, 10 + 30θ),
where θ ∈ R. The coefficients 1−θ and θ sum to one, so since T is affine, we find that the corresponding
temperature vector is
just as above. The first coefficient hits 70 at θ = 3.22; the second coefficient hits 70 at θ = 2.23. Thus,
θ can be as large as θ = 2.27. This corresponds to the powers P = (78.3, 78.3, 78.3).
12