A First Course in Optimization: Answers To Selected Exercises
A First Course in Optimization: Answers To Selected Exercises
Charles L. Byrne
Department of Mathematical Sciences
University of Massachusetts Lowell
Lowell, MA 01854
1 Preface 3
2 Introduction 5
2.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Geometric Programming 13
4.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Convex Sets 15
5.1 References Used in the Exercises . . . . . . . . . . . . . . . 15
5.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Linear Programming 23
6.1 Needed References . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8 Differentiation 27
8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9 Convex Functions 29
9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10 Fenchel Duality 33
10.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
i
CONTENTS 1
11 Convex Programming 35
11.1 Referenced Results . . . . . . . . . . . . . . . . . . . . . . . 35
11.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
12 Iterative Optimization 39
12.1 Referenced Results . . . . . . . . . . . . . . . . . . . . . . . 39
12.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
13 Modified-Gradient Algorithms 43
14 Quadratic Programming 45
16 Conjugate-Direction Methods 51
16.1 Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . . 51
16.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
18 Likelihood Maximization 59
19 Operators 61
19.1 Referenced Results . . . . . . . . . . . . . . . . . . . . . . . 61
19.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Preface
In the chapters that follow you will find solutions to many, but not all,
of the exercises in the text. Some chapters in the text have no exercises,
but those chapters are included here to keep the chapter numbers the same
as in the text. Please note that the numbering of exercises within each
chapter may differ from the numbering in the text itself, because certain
exercises have been skipped.
3
4 CHAPTER 1. PREFACE
Chapter 2
Introduction
2.1 Exercise
2.1 For n = 1, 2, ..., let
1
An = {x| kx − ak ≤ },
n
and let αn and βn be defined by
αn = inf{f (x)| x ∈ An },
and
βn = sup{f (x)| x ∈ An }.
α ≤ γ ≤ β.
5
6 CHAPTER 2. INTRODUCTION
• e) Show that
α = lim inf f (x)
x→a
and
β = lim sup f (x).
x→a
According to the hint in a), we use the fact that An+1 ⊆ An . For b),
since αn is the infimum, there must be a member of An , call it xn , such
that
1
αn ≤ f (xn ) ≤ αn + .
n
Then the sequence {xn } converges to a and the sequence {f (xn )} converges
to α.
Now suppose that {xm } converges to a and {f (xm )} converges to γ.
We don’t know that each xm is in Am , but we do know that, for each n,
there must be a term of the sequence, call it xmn , that is within n1 of a,
that is, xmn is in An . From c) we have
αn ≤ f (xmn ) ≤ βn .
α ≤ γ ≤ β.
Optimization Without
Calculus
3.1 Exercises
3.1 Let A be the arithmetic mean of a finite set of positive numbers, with
x the smallest of these numbers, and y the largest. Show that
xy ≤ A(x + y − A),
A2 − A(x + y) + xy = (A − x)(A − y) ≤ 0,
1 4
f (x) = x2 + 2
+ 4x + ,
x x
over positive x. Note that the minimum value of f (x, y) cannot be found
by a straight-forward application of the AGM Inequality to the four terms
taken together. Try to find a way of rewriting f (x), perhaps using more
than four terms, so that the AGM Inequality can be applied to all the terms.
The product of the four terms is 16, whose fourth root is 2, so the AGM
Inequality tells us that f (x) ≥ 8, with equality if and only if all four terms
7
8 CHAPTER 3. OPTIMIZATION WITHOUT CALCULUS
are equal. But it is not possible for all four terms to be equal. Instead, we
can write
1 1 1 1 1
f (x) = x2 + +x+ +x+ +x+ +x+ .
x2 x x x x
These ten terms have a product of 1, which tells us that
f (x) ≥ 10,
Write
1
f (x, y) = xxy = (3x)(3x)(5y),
45
and
45 = 6x + 5y = 3x + 3x + 5y.
with equality if and only if (x, y) = c(3, −1) for some number c. Then solve
for c.
over non-negative x.
3.1. EXERCISES 9
When we multiply everything out, we find that there are N ones, and
N (N −1)
2 pairs of the form aamn
+ aam
n
, for m 6= n. Each of these pairs has
1
the form x + x , which is always greater than or equal to two. Therefore,
the entire product is greater than or equal to N + N (N − 1) = N 2 , with
equality if and only if all the an are the same.
and
1 1 1 T 1 1 N N T
S −1 = u (u ) + u2 (u2 )T + ... + u (u ) . (3.2)
λ1 λ2 λN
10 CHAPTER 3. OPTIMIZATION WITHOUT CALCULUS
λ1 ≥ ... ≥ λN > 0
xT Qx ≤ λ1 ,
1 = kxk2 = xT x = xT Ix,
We can write
4 6 12
2 3 6 2 3 6
6 9 18 = 49( , , )T ( , , ),
7 7 7 7 7 7
12 18 36
so that
4 6 12
(2x + 3y + 6z)2 = |(x, y, z)(2, 3, 6)T |2 = (x, y, z) 6 9 18 (x, y, z)T .
12 18 36
The only positive eigenvalue of the matrix is 7 and the corresponding nor-
malized eigenvector is ( 72 , 37 , 67 )T . Then use Exercise 3.11.
xp yq
xy ≤ + ,
p q
with equality if and only if xp = y q . Hint: use the GAGM Inequality.
3.14 For given constants c and d, find the largest and smallest values of
cx + dy taken over all points (x, y) of the ellipse
x2 y2
+ = 1.
a2 b2
3.15 Find the largest and smallest values of 2x+y on the circle x2 +y 2 = 1.
Where do these values occur? What does this have to do with eigenvectors
and eigenvalues?
becomes
Show that the dot product vec(A)· vec(B) = vec(B)T vec(A) can be ob-
tained by
vec(A)· vec(B) = trace (AB T ) = trace (B T A).
Easy.
12 CHAPTER 3. OPTIMIZATION WITHOUT CALCULUS
Chapter 4
Geometric Programming
4.1 Exercise
4.1 Show that there is no solution to the problem of minimizing the func-
tion
2
g(t1 , t2 ) = + t 1 t 2 + t1 , (4.1)
t1 t2
√
over t1 > 0, t2 > 0. Can g(t1 , t2 ) ever be smaller than 2 2?
Show that the only vector δ that works must have δ3 = 0, which is not
allowed.
To answer the second part, begin by showing that if t2 ≤ 1 then
g(t1 , t2 ) ≥ 4, and if t1 ≥ 1, then g(t1 , t2 ) ≥ 3. We want to consider
the case in which t1 → 0, t2 → ∞, and both t1 t2 and (t1 t2 )−1 remain
bounded. For example, we try
1
t2 = (ae−t1 + 1),
t1
2
for some positive a. Then g(t1 , t2 ) → a+1 + a + 1 as t1 → 0. Which a
gives√the smallest value? Minimizing √ the limit, with respect to a, gives
a = 2 − 1, for which the limit is 2 2.
More generally, let
f (t1 )
t2 = ,
t1
for some positive function f (t1 ) such that f (t1 ) does not go to zero as t1
goes to zero. Then we have
2
g(t1 , t2 ) = + f (t1 ) + t1 ,
f (t1 )
13
14 CHAPTER 4. GEOMETRIC PROGRAMMING
2
which converges to + f (0), as t1 → 0. The minimum of this limit, as a
f (0)
√ √
function of f (0), occurs when f (0) = 2, and the minimum limit is 2 2.
Chapter 5
Convex Sets
15
16 CHAPTER 5. CONVEX SETS
5.2 Exercises
5.1 Let C ⊆ RJ , and let xn , n = 1, ..., N be members of C. For n =
1, ..., N , let αn > 0, with α1 + ... + αN = 1. Show that, if C is convex, then
the convex combination
α1 x1 + α2 x2 + ... + αN xN
is in C.
5.2 Prove Proposition 5.1. Hint: show that the set C is convex.
Note that it is not obvious that C is convex; we need to prove this fact.
We need to show that if we take two convex combinations of members of
S, say x and y, and form z = (1 − α)x + αy, then z is also a member of C.
To be concrete, let
XN
x= βn xn ,
n=1
and
M
X
y= γm y m ,
m=1
5.2. EXERCISES 17
where the xn and y m are in S and the positive βn and γm both sum to
one. Then
XN M
X
z= (1 − α)βn xn + αγm y m ,
n=1 m=1
5.3 Show that the subset of RJ consisting of all vectors x with ||x||2 = 1
is not convex.
Don’t make the problem harder than it is; all we need is one counter-
example. Take x and −x, both with norm one. Then 12 (x + (−x)) = 0.
Now x and y are given, and z is their midpoint. From the Parallelogram
Law,
kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 ,
we have
kx + yk2 + kx − yk2 = 4,
so that
1
kzk2 + kx − yk2 = 1,
4
from which we conclude that kzk < 1, unless x = y.
Suppose we remove some x with kxk = 1. In order for the set without
x to fail to be convex it is necessary that there be some y 6= x and z 6= x in
C for which x = y+z 2 . But we know that if kyk = kzk = 1, then kxk < 1,
unless y = z, in which case we would also have y = z = x. So we cannot
have kyk = kzk = 1. But what if their norms are less than one? In that
case, the norm of the midpoint of y and z is not greater than the larger of
18 CHAPTER 5. CONVEX SETS
5.6 Prove that every subspace of RJ is convex, and every linear manifold
is convex.
Show that
γ
H(a, γ) = H(a, 0) + a.
kak2
For (a), let x0 be the center of the circle and x be the point on its
circumference. Then the normal cone at x is the set of all z of the form
z = x + γ(x − x0 ), for γ ≥ 0. For (b), if the point x lies on one of the sides
of the rectangle, then the normal cone consists of the points along the line
segment through x formed by the outward normal to the side. If the point
x is a corner, then the normal cone consists of all points in the intersection
of the half-spaces external to the rectangle whose boundary lines are the
two sides that meet at x.
PH x = x + (γ − ha, xi)a,
we find that
PH (x+y) = x+y +(γ −ha, x+yi)a = x+(γ −ha, xi)a+y +(γ −ha, yi)a−γa
We also have
and
b = γ1 a1 + γ2 a2 − γ1 ha2 , a1 ia2 .
We can then associate with the linear operator L the matrix
5.10 Let C be a convex set and f : C ⊆ RJ → (−∞, ∞]. Prove that f (x)
is a convex function, according to Definition 5.1, if and only if, for all x
and y in C, and for all 0 < α < 1, we have
5.11 Given a point s in a convex set C, where are the points x for which
s = PC x?
20 CHAPTER 5. CONVEX SETS
hx − s, c − si ≤ 0,
σC (a) = supha, ci ,
c∈C
1
Hints: Consider the unit vector d (x − PC x), and use Cauchy’s Inequality
and Proposition 5.2.
Then
therefore, d ≤ m.
From Cauchy’s Inequality, we know that
ha, x − PC xi ≤ kx − PC xkkak = d,
for any vector a with norm one. Therefore, for any a with norm one, we
have
d ≥ ha, xi − ha, PC xi ≥ ha, xi − σC (a).
Therefore, we can conclude that d ≥ m, and so d = m.
• (b) Take A = {0, 1}, B = {0, 2} and C = {−1, 0, 1}. Then we have
A + C = {−1, 0, 1, 2},
and
B + C = {−1, 0, 1, 2, 3}.
2A + C = (A + A) + C = A + (A + C) ⊆ A + (B + C) = (A + C) + B
⊆ (B + C) + B = (B + B) + C = 2B + C.
Continuing in this way, show that
nA + C ⊆ nB + C,
or
1 1
A+ C ⊆ B + C,
n n
for n = 1, 2, 3, .... Now select a point a in A; we show that a is in B.
Since C is a bounded set, we know that there is a constant K > 0
such that kck ≤ K, for all c in C. For each n there are cn and dn in
C and bn in B such that
1 n 1
a+ c = bn + d n .
n n
Then we have
1 n
bn = a + (c − dn ). (5.7)
n
Since C is bounded, we know that n1 (cn − dn ) converges to zero, as
n → ∞. Then, taking limits on both sides of Equation (5.7), we find
that {bn } → a, which must be in B, since B is closed.
22 CHAPTER 5. CONVEX SETS
Chapter 6
Linear Programming
6.2 Exercises
6.1 Prove Theorem 6.1 and its corollaries.
Since y is feasible, we know that AT y ≤ c, and since x is feasible, we
know that x ≥ 0 and Ax = b. Therefore, we have
w = bT y = (Ax)T y = xT AT y ≤ xT c = cT x = z.
This tells us that, if there are feasible x and y, then z = cT x is bounded
below, as we run over all feasible x, by any w. Consequently, if z is not
bounded, there cannot be any w, so there can be no feasible y.
If both x and y are feasible and z = cT x = bT y = w, then we cannot
decrease z by replacing x, nor can we increase w by replacing y. Therefore,
x and y are the best we can do.
23
24 CHAPTER 6. LINEAR PROGRAMMING
6.3 Show that when the simplex method has reached the optimal solution
for the primal problem PS, the vector y with y T = cTB B −1 becomes a feasible
vector for the dual problem and is therefore the optimal solution for DS.
Hint: Clearly, we have
z = cT x = cTB B −1 b = y T b = w,
cN ≥ N T y.
Then we have
B T (B T )−1 cB
T BT BT y
A y= y= ≤ = c,
NT NT y cN
Since
ẑ = cT x̂ = µ = bT ŷ = ŵ,
it follows that cT p̂ = bT q̂ = 1. We also have
and
x̂T Aŷ = (AT x̂)T ŷ ≥ bT ŷ = µ,
so that
x̂T Aŷ = µ,
and
1
p̂T Aq̂ = .
µ
For any probabilities p and q we have
1 T 1 1
pT Aq̂ = p Aŷ ≤ pT c = ,
µ µ µ
and
1 T T 1 1
p̂T Aq = (AT p̂)T q = (A x̂) q ≥ bT q = .
µ µ µ
25
26 CHAPTER 7. MATRIX GAMES AND OPTIMIZATION
Differentiation
8.1 Exercises
8.1 Let Q be a real, positive-definite symmetric matrix. Define the Q-
inner product on RJ to be
xa y b
f (x, y) = ,
xp + yq
with f (0, 0) = 0. In each of the five cases below, determine if the function
is continuous, Gâteaux, Fréchet or continuously differentiable at (0, 0).
• 1) a = 2, b = 3, p = 2, and q = 4;
• 2) a = 1, b = 3, p = 2, and q = 4;
• 3) a = 2, b = 4, p = 4, and q = 8;
27
28 CHAPTER 8. DIFFERENTIATION
• 4) a = 1, b = 2, p = 2, and q = 2;
• 5) a = 1, b = 2, p = 2, and q = 4.
Chapter 9
Convex Functions
9.1 Exercises
9.1 Prove Proposition 9.1.
x−a
The key idea here is to use α = b−a , so that x = (1 − α)a + αb.
29
30 CHAPTER 9. CONVEX FUNCTIONS
9.3 Show that, if x̂ minimizes the function g(x) over all x in RJ , then
x = 0 is in the sub-differential ∂g(x̂).
This is easy.
9.4 If f (x) and g(x) are convex functions on RJ , is f (x) + g(x) convex?
Is f (x)g(x) convex?
It is easy to show that the sum of two convex functions is again convex.
The product of two is, however, not always convex; take f (x) = −1 and
g(x) = x2 , for example.
9.5 Let ιC (x) be the indicator function of the closed convex set C, that is,
(
0, if x ∈ C;
ιC (x) =
+∞, if x ∈/ C.
9.6 Let g(t) be a strictly convex function for t > 0. For x > 0 and y > 0,
define the function
y
f (x, y) = xg( ).
x
Use induction to prove that
N
X
f (xn , yn ) ≥ f (x+ , y+ ),
n=1
PN
for any positive numbers xn and yn , where x+ = n=1 xn . Also show
that equality obtains if and only if the finite sequences {xn } and {yn } are
proportional.
We show this for the case of N = 2; the more general case is similar.
We need to show that
Write
y1 y2 x
1 y1 x2 y2
f (x1 , y1 )+f (x2 , y2 ) = x1 g( )+x2 g( ) = (x1 +x2 ) g( )+ g( )
x1 x2 x1 + x2 x1 x1 + x2 x2
x1 y1 x2 y2 y+
≥ (x1 + x2 )g( + ) = x+ g( ) = f (x+ , y+ ).
x1 + x2 x1 x1 + x2 x2 x+
9.1. EXERCISES 31
9.7 Use√the result in Exercise 9.6 to obtain Cauchy’s Inequality. Hint: let
g(t) = − t.
√
Using the result in Exercise 9.6, and the choice of g(t) = − t, we obtain
N r r
X yn y+
− xn ≥ −x+
n=1
xn x+
so that
N
X √ √ √ √
xN yn ≤ x+ y+ ,
n=1
t
Hint: let g(t) = − 1+t .
9.9 For x > 0 and y > 0, let f (x, y) be the Kullback-Leibler function,
x
f (x, y) = KL(x, y) = x log + y − x.
y
Use Exercise 9.6 to show that
N
X
KL(xn , yn ) ≥ KL(x+ , y+ ).
n=1
Fenchel Duality
10.1 Exercises
10.1 Let A be a real symmetric positive-definite matrix and
1
f (x) = hAx, xi.
2
Show that
1 −1
f ∗ (a) =
hA a, ai.
2
Hints: Find ∇f (x) and use Equation (10.1).
We have
33
34 CHAPTER 10. FENCHEL DUALITY
Chapter 11
Convex Programming
for all x and y, so that the maximum value of g(y) and the minimum value
of f (x) are both equal to K(x̂, ŷ).
11.2 Exercises
11.1 Prove Theorem 11.1.
We have
f (x̂) = sup K(x̂, y) = K(x̂, ŷ),
y
and
f (x) = sup K(x, y) ≥ K(x, ŷ) ≥ K(x̂, ŷ) = f (x̂).
y
35
36 CHAPTER 11. CONVEX PROGRAMMING
11.2 Apply the gradient form of the KKT Theorem to minimize the func-
tion f (x, y) = (x + 1)2 + y 2 over all x ≥ 0 and y ≥ 0.
subject to
x + y ≤ 0.
Show that the function M P (z) is not differentiable at z = 0.
f (x, y) = −2x − y,
subject to
x + y ≤ 1,
0 ≤ x ≤ 1,
and
y ≥ 0.
and
0 = −1 + λ1 − λ4 .
Since λi ≥ 0 for each i, it follows that λ1 6= 0, so that 0 = g1 (x, y) =
x + y − 1, or x + y = 1.
If λ4 = 0, then λ1 = 1 and λ2 − λ3 = 1. Therefore, we cannot have
λ2 = 0, which tells us that g2 (x, y) = 0, or x = 1, in addition to y = 0.
If λ4 > 0, then y = 0, which we already know, and so x = 1 again. In
any case, then answer must be x = 1 and y = 0, so that the minimum is
f (1, 0) = −2.
λ2 T −1 λ2
L(x∗ , λ) = a Q a − λ2 aT Q−1 a − λc = − aT Q−1 a + λc .
2 2
Now we maximize L(x∗ , λ) over λ ≥ 0.
We get
0 = −λaT Q−1 a − c,
so that
c
λ∗ = − ,
aT Q−1 a
and the optimal x∗ is
cQ−1 a
x∗ = .
aT Q−1 a
11.6 Use Theorem 11.2 to prove that any real N by N symmetric matrix
has N mutually orthonormal eigenvectors.
38 CHAPTER 11. CONVEX PROGRAMMING
0 = Qx − λ1 x + λ2 xN .
Therefore,
0 = xT Qx − λ1 xT x + λ2 xT xN = xT Qx − λ1 ,
or
λ1 = xT Qx.
We also have
and
(xN )T Q = γN (xN )T ,
so that
0 = γN (xN )T x − λ1 (xN )T x + λ2 = λ2 ,
so λ2 = 0. Therefore, we have
Qx = λ1 x,
Iterative Optimization
Then
12.2 Exercises
12.1 Prove Lemma 12.1.
Use the Chain Rule to calculate the derivative of the function f (γ) given
by
f (γ) = g(xk − γ∇g(xk )),
and then set this derivative equal to zero.
39
40 CHAPTER 12. ITERATIVE OPTIMIZATION
This is easy.
and
T −68
0
Z Qx = .
56
We have
T 520 −448
Z QZ = ,
−448 392
so that
1 392 448
(Z T QZ)−1 = .
3136 448 520
One step of the reduced Newton-Raphson algorithm, beginning with v 0 =
0, gives
1 T −1 T 0 0.5
v = −(Z QZ) Z Qx = ,
0.4286
so that
0.6428
0.2858
x1 = x0 + Zv 1 = .
0.5
0.4286
When we check, we find that Z T Qx1 = 0, so we are finished.
12.7 Use the reduced steepest descent method with an exact line search to
solve the problem in the previous exercise.
Do as a computer problem.
42 CHAPTER 12. ITERATIVE OPTIMIZATION
Chapter 13
Modified-Gradient
Algorithms
43
44 CHAPTER 13. MODIFIED-GRADIENT ALGORITHMS
Chapter 14
Quadratic Programming
45
46 CHAPTER 14. QUADRATIC PROGRAMMING
Chapter 15
We begin at x0 = 0 and v 0 = 0. Then, the limit vector has for its upper
component x∞ = x̂ , and v ∞ = b − Ax̂ .
15.2 Exercises
15.1 Show that the two algorithms associated with Equations (15.1) and
(15.2), respectively, do actually perform as claimed.
47
48 CHAPTER 15. SOLVING SYSTEMS OF LINEAR EQUATIONS
provided that the matrix AAT is invertible, which it usually is in the under-
determined case.
The solution of Ax = b for which kx − pk is minimized can be found in
a similar way. We let z = x − p, so that x = z + p and Az = b − Ap. Now
we find the minimum two-norm solution of the system Az = b − Ap; our
final solution is x = z + p.
The regularized solution x̂ that we seek minimizes the function
1
f (x) = kAx − bk2 + 2 kxk2 ,
2
and therefore can be written explicitly as
When we use the ART algorithm, we find the solution closest to where we
began the iteration. We begin at
0
u b
= ,
v0 0
ku − bk2 + kvk2
or that −z = x̂ . Since the lower part of the minimum two-norm solution
is t = z, the assertion concerning this algorithm is established.
The second method has us solve the system
x
[ A I ] = b, (15.4)
v
Therefore,
(AAT + 2 I)z = b,
and
Consequently, x = x̂ , and this x is the upper part of the limit vector of
the ART iteration.
50 CHAPTER 15. SOLVING SYSTEMS OF LINEAR EQUATIONS
Chapter 16
Conjugate-Direction
Methods
∇f (xk ) · dk = 0. (16.1)
Proof: Differentiate the function f (xk−1 +αdk ) with respect to the variable
α, and then set it to zero. According to the Chain Rule, we have
∇f (xk ) · dk = 0.
r k · dk
αk = , (16.2)
dk · Qdk
where rk = c − Qxk−1 .
Proof: We have
so that
0 = ∇f (xk ) · dk = −rk · dk + αdk · Qdk .
51
52 CHAPTER 16. CONJUGATE-DIRECTION METHODS
so that
Lemma 16.4 A conjugate set that does not contain zero is linearly inde-
pendent. If pn 6= 0 for n = 1, ..., J, then the least-squares vector x̂ can be
written as
x̂ = a1 p1 + ... + aJ pJ ,
with aj = c · pj /pj · Qpj for each j.
0 = c1 p1 + ... + cn pn ,
pm · c = pm · Qx̂ = am pm · Qpm ,
so that
pm · c
am = .
pm· Qpm
• a) rn · rj = 0;
• b) rn · pj = 0; and
• c) pn · Qpj = 0.
16.2 Exercises
16.1 There are several lemmas in this chapter whose proofs are only
sketched. Complete the proofs of these lemma.
rj+1 = rj − αj Qpj ,
We have
so that
rj+1 − rj = −αj Qpj .
by showing that
r n · pn = r n · r n ,
and then use the induction hypothesis to show that
rn · Qpn = pn · pn .
54 CHAPTER 16. CONJUGATE-DIRECTION METHODS
We know that
rn = pn + βn−1 pn−1 ,
where
rn · Qpn−1
bn−1 = .
pn−1 · Qpn−1
Since rn · pn−1 = 0, it follows that
r n · pn = r n · r n .
Therefore, we have
rn · rn
αn = .
pn · Qpn
From the induction hypothesis, we have
so that
rn · Qpn = pn · Qpn .
Using
rn+1 = rn − αn Qpn ,
we find that
rn+1 · rn = rn · rn − αn rn · Qpn = 0.
16.4 Show that rn+1 ·rj = 0, for j = 1, ..., n−1. Hints: 1. show that rn+1
is in the span of the vectors {rj+1 , Qpj+1 , Qpj+2 , ..., Qpn }; and 2. show that
rj is in the span of {pj , pj−1 }. Then use the induction hypothesis.
We begin with
We then use
rn−1 = rn−2 − αn−2 pn−2 ,
and so on, to get that rn+1 is in the span of the vectors {rj+1 , Qpj+1 , Qpj+2 , ..., Qpn }.
We then use
rj · rj+1 = 0,
and
rj = pj + βj−1 pj−1 ,
along with the induction hypothesis, to get
rj · Qpm = 0,
for m = j + 1, ..., n.
16.2. EXERCISES 55
Write
pj = rj − βj−1 pj−1
and repeat this for pj−1 , pj−2 , and so on, and use p1 = r1 to show that pj
is in the span of {rj , rj−1 , ..., r1 }. Then use the previous exercises.
We have
pn+1 = rn+1 − βn pn ,
where
rn+1 · Qpn
βn = − .
pn · Qpn
We have
Sequential Unconstrained
Minimization Algorithms
PJ
Lemma 17.1 For any non-negative vectors x and z, with z+ = j=1 zj >
0, we have
x+
KL(x, z) = KL(x+ , z+ ) + KL(x, z). (17.1)
z+
17.1 Exercises
17.1 Prove Lemma 17.1.
This is easy.
f (x, y) = x − 2y,
x − 2y − k log(1 + x − y 2 ) − k log(y).
k = 1 + x − y2 ,
57
58CHAPTER 17. SEQUENTIAL UNCONSTRAINED MINIMIZATION ALGORITHMS
and 2y 1
2=k − .
1 + x − y2 y
Therefore,
2y 2 − 2y − k = 0,
so that
1 1√
y= + 1 + 2k.
2 2
As k → 0, we have y → 1 and x → 0, so the optimal solution is x = 0 and
y = 1, which can be checked using the KKT Theorem.
f (x, y) = −xy,
x + 2y − 4 = 0.
x = 4k(x + 2y − 4),
y = 2k(x + 2y − 4),
and so x = 2y and
2x − 8
y= .
−4 + k1
Solving for x, we get
16
x=
8 − k1
and
8
y= 1 .
8− k
As k → ∞, x approaches 2 and y approaches 1. We can check this result
by substituting x = 4 − 2y into f (x, y) and minimizing it as a function of
y alone.
Chapter 18
Likelihood Maximization
59
60 CHAPTER 18. LIKELIHOOD MAXIMIZATION
Chapter 19
Operators
J
X
x= aj uj . (19.1)
j=1
Then let
J
X
||x|| = |aj |. (19.2)
j=1
||x − y||22 − ||T x − T y||22 = 2(hGx − Gy, x − yi) − ||Gx − Gy||22 . (19.3)
61
62 CHAPTER 19. OPERATORS
19.2 Exercises
19.1 Show that a strict contraction can have at most one fixed point.
kT x − T yk ≤ rkx − yk,
kx − yk = kT x − T yk ≤ rkx − yk,
From
and
we have
From
kxk − xk+1 k ≤ rk kx0 − x1 k
we conclude that, given any > 0, we can find k > 0 so that, for any n > 0
kxk − xk+n k ≤ ,
19.2. EXERCISES 63
Let 0 < x < z. From the Mean Value Theorem we know that, for t > 0,
Then
J
X
B(x − y) = λj cj uj ,
j=1
64 CHAPTER 19. OPERATORS
and
J
X J
X
kB(x − y)k = |λj ||cj | ≤ ρ(B) |cj | = ρ(B)kx − yk.
j=1 j=1
19.5 Show that, if the operator T is α-av and 1 > β > α, then T is β-av.
Clearly, if
1
hGx − Gy, x − yi ≥ ||Gx − Gy||22 , (19.7)
2α
and 1 > β > α, then
1
hGx − Gy, x − yi ≥ ||Gx − Gy||22 .
2β
hT x − T y, x − yi − ||T x − T y||22 =
kT x − yk < kx − yk,
kT x − yk = kT x − T yk = kB(x − y)k.
Suppose that
J
X
x−y = aj uj ,
j=1
while
J
X
kB(x − y)k = |λj ||aj |.
j=1
19.9 Show that, if B is a linear av operator, then |λ| < 1 for all eigenval-
ues λ of B that are not equal to one.
B = (1 − α)I + αN.
or
|λ + α − 1| ≤ α.
Since B is ne, all its eigenvalues must have |λ| ≤ 1. We consider the
cases of λ real and λ complex and not real separately.
66 CHAPTER 19. OPERATORS
If λ is real, then we need only show that we cannot have λ = −1. But
if this were the case, we would have
| − 2 + α| = 2 − α ≤ α,
|(a + α − 1) + bi|2 ≤ α2 ,
or
(a + (α − 1))2 + b2 ≤ α2 .
Then
a2 + b2 + 2a(α − 1) + (α − 1)2 ≤ α2 .
Suppose that a2 + b2 = 1 and b 6= 0. Then we have
or
2 ≤ 2(1 − α)a + 2α,
so that
1 ≤ (1 − α)a + α(1),
which is a convex combination of the real numbers a and 1. Since α ∈ (0, 1),
we can say that, unless a = 1, 1 is strictly less than the maximum of 1
and a, implying that a ≥ 1, which is false. Therefore, we conclude that
a2 + b2 < 1.
Chapter 20
20.1 A Lemma
For i = 1, ..., I let Ci be a non-empty, closed convex set in RJ . Let C =
∩Ii=1 Ci be the non-empty intersection of the Ci .
PI
Lemma 20.1 If c ∈ C and x = c + i=1 pi , where, for each i, c =
PCi (c + pi ), then c = PC x.
20.2 Exercises
20.1 Prove Lemma 20.1.
hc − (c + pi ), ci − ci = h−pi , ci − ci ≥ 0,
h−pi , d − ci ≥ 0.
Therefore,
I
X I
X
hc − x, d − ci = h− pi , d − ci = h−pi , d − ci ≥ 0,
i=1 i=1
67
68CHAPTER 20. CONVEX FEASIBILITY AND RELATED PROBLEMS