0% found this document useful (0 votes)
108 views71 pages

A First Course in Optimization: Answers To Selected Exercises

This document provides solutions to selected exercises from the textbook "A First Course in Optimization" by Charles L. Byrne. It is organized by chapter and provides concise answers for each selected exercise, referencing concepts from optimization and calculus. The purpose is to aid students in checking their work on practice problems from the textbook.

Uploaded by

hasanaltinbas
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views71 pages

A First Course in Optimization: Answers To Selected Exercises

This document provides solutions to selected exercises from the textbook "A First Course in Optimization" by Charles L. Byrne. It is organized by chapter and provides concise answers for each selected exercise, referencing concepts from optimization and calculus. The purpose is to aid students in checking their work on practice problems from the textbook.

Uploaded by

hasanaltinbas
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

A First Course in Optimization:

Answers to Selected Exercises

Charles L. Byrne
Department of Mathematical Sciences
University of Massachusetts Lowell
Lowell, MA 01854

November 29, 2009

(The most recent draft is available as a pdf file at


http://faculty.uml.edu/cbyrne/cbyrne.html)
2
Contents

1 Preface 3

2 Introduction 5
2.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Optimization Without Calculus 7


3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Geometric Programming 13
4.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Convex Sets 15
5.1 References Used in the Exercises . . . . . . . . . . . . . . . 15
5.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Linear Programming 23
6.1 Needed References . . . . . . . . . . . . . . . . . . . . . . . 23
6.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7 Matrix Games and Optimization 25


7.1 Some Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 25

8 Differentiation 27
8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

9 Convex Functions 29
9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

10 Fenchel Duality 33
10.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

i
CONTENTS 1

11 Convex Programming 35
11.1 Referenced Results . . . . . . . . . . . . . . . . . . . . . . . 35
11.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

12 Iterative Optimization 39
12.1 Referenced Results . . . . . . . . . . . . . . . . . . . . . . . 39
12.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

13 Modified-Gradient Algorithms 43

14 Quadratic Programming 45

15 Solving Systems of Linear Equations 47


15.1 Regularizing the ART . . . . . . . . . . . . . . . . . . . . . 47
15.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

16 Conjugate-Direction Methods 51
16.1 Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . . 51
16.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

17 Sequential Unconstrained Minimization Algorithms 57


17.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

18 Likelihood Maximization 59

19 Operators 61
19.1 Referenced Results . . . . . . . . . . . . . . . . . . . . . . . 61
19.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

20 Convex Feasibility and Related Problems 67


20.1 A Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
20.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2 CONTENTS
Chapter 1

Preface

In the chapters that follow you will find solutions to many, but not all,
of the exercises in the text. Some chapters in the text have no exercises,
but those chapters are included here to keep the chapter numbers the same
as in the text. Please note that the numbering of exercises within each
chapter may differ from the numbering in the text itself, because certain
exercises have been skipped.

3
4 CHAPTER 1. PREFACE
Chapter 2

Introduction

2.1 Exercise
2.1 For n = 1, 2, ..., let
1
An = {x| kx − ak ≤ },
n
and let αn and βn be defined by

αn = inf{f (x)| x ∈ An },

and
βn = sup{f (x)| x ∈ An }.

• a) Show that the sequence {αn } is increasing, bounded above by f (a)


and converges to some α, while the sequence {βn } is decreasing,
bounded below by f (a) and converges to some β. Hint: use the
fact that, if A ⊆ B, where A and B are sets of real numbers, then
inf(A) ≥ inf(B).

• b) Show that α and β are in S. Hint: prove that there is a sequence


{xn } with xn in An and f (xn ) ≤ αn + n1 .

• c) Show that, if {xm } is any sequence converging to a, then there is


a subsequence, denoted {xmn }, such that xmn is in An , for each n,
and so
αn ≤ f (xmn ) ≤ βn .

• d) Show that, if {f (xm )} converges to γ, then

α ≤ γ ≤ β.

5
6 CHAPTER 2. INTRODUCTION

• e) Show that
α = lim inf f (x)
x→a

and
β = lim sup f (x).
x→a

According to the hint in a), we use the fact that An+1 ⊆ An . For b),
since αn is the infimum, there must be a member of An , call it xn , such
that
1
αn ≤ f (xn ) ≤ αn + .
n
Then the sequence {xn } converges to a and the sequence {f (xn )} converges
to α.
Now suppose that {xm } converges to a and {f (xm )} converges to γ.
We don’t know that each xm is in Am , but we do know that, for each n,
there must be a term of the sequence, call it xmn , that is within n1 of a,
that is, xmn is in An . From c) we have

αn ≤ f (xmn ) ≤ βn .

The sequence {f (xmn )} also converges to γ, so that

α ≤ γ ≤ β.

Finally, since we have shown that α is the smallest number in S it follows


that
α = lim inf f (x).
x→a

The argument for β is similar.


Chapter 3

Optimization Without
Calculus

3.1 Exercises
3.1 Let A be the arithmetic mean of a finite set of positive numbers, with
x the smallest of these numbers, and y the largest. Show that

xy ≤ A(x + y − A),

with equality if and only if x = y = A.

This is equivalent to showing that

A2 − A(x + y) + xy = (A − x)(A − y) ≤ 0,

which is true, since A − x ≥ 0 and A − y ≤ 0.

3.2 Minimize the function

1 4
f (x) = x2 + 2
+ 4x + ,
x x
over positive x. Note that the minimum value of f (x, y) cannot be found
by a straight-forward application of the AGM Inequality to the four terms
taken together. Try to find a way of rewriting f (x), perhaps using more
than four terms, so that the AGM Inequality can be applied to all the terms.

The product of the four terms is 16, whose fourth root is 2, so the AGM
Inequality tells us that f (x) ≥ 8, with equality if and only if all four terms

7
8 CHAPTER 3. OPTIMIZATION WITHOUT CALCULUS

are equal. But it is not possible for all four terms to be equal. Instead, we
can write
1 1 1 1 1
f (x) = x2 + +x+ +x+ +x+ +x+ .
x2 x x x x
These ten terms have a product of 1, which tells us that

f (x) ≥ 10,

with equality if and only if x = 1.

3.3 Find the maximum value of f (x, y) = x2 y, if x and y are restricted


to positive real numbers for which 6x + 5y = 45.

Write
1
f (x, y) = xxy = (3x)(3x)(5y),
45
and
45 = 6x + 5y = 3x + 3x + 5y.

3.4 Find the smallest value of


16
f (x) = 5x + + 21,
x
over positive x.

Just focus on the first two terms.

3.5 Find the smallest value of the function


p
f (x) = x2 + y 2 ,

among those values of x and y satisfying 3x − y = 20.

By Cauchy’s Inequality, we know that


p p √ p
20 = 3x − y = (x, y) · (3, −1) ≤ x2 + y 2 32 + (−1)2 = 10 x2 + y 2 ,

with equality if and only if (x, y) = c(3, −1) for some number c. Then solve
for c.

3.6 Find the maximum and minimum values of the function


p
f (x) = 100 + x2 − x

over non-negative x.
3.1. EXERCISES 9

The derivative of f (x) is negative, so f (x) is decreasing and attains its


maximum of 10 at x = 0. As x → ∞, f (x) goes to zero, but never reaches
zero.

3.7 Multiply out the product


1 1 1
(x + y + z)( + + )
x y z
and deduce that the least value of this product, over non-negative x, y, and
z, is 9. Use this to find the least value of the function
1 1 1
f (x) = + + ,
x y z
over non-negative x, y, and z having a constant sum c.

See the more general Exercise 3.9.

3.8 The harmonic mean of positive numbers a1 , ..., aN is


1 1
H = [( + ... + )/N ]−1 .
a1 aN
Prove that the geometric mean G is not less than H.

Use the AGM Inequality to show that H −1 ≥ G−1 .

3.9 Prove that


1 1
( + ... + )(a1 + ... + aN ) ≥ N 2 ,
a1 aN
with equality if and only if a1 = ... = aN .

When we multiply everything out, we find that there are N ones, and
N (N −1)
2 pairs of the form aamn
+ aam
n
, for m 6= n. Each of these pairs has
1
the form x + x , which is always greater than or equal to two. Therefore,
the entire product is greater than or equal to N + N (N − 1) = N 2 , with
equality if and only if all the an are the same.

3.10 Show that the Equation S = U LU T , can be written as

S = λ1 u1 (u1 )T + λ2 u2 (u2 )T + ... + λN uN (uN )T , (3.1)

and
1 1 1 T 1 1 N N T
S −1 = u (u ) + u2 (u2 )T + ... + u (u ) . (3.2)
λ1 λ2 λN
10 CHAPTER 3. OPTIMIZATION WITHOUT CALCULUS

This is just algebra.

3.11 Let Q be positive-definite, with positive eigenvalues

λ1 ≥ ... ≥ λN > 0

and associated mutually orthogonal norm-one eigenvectors un . Show that

xT Qx ≤ λ1 ,

for all vectors x with kxk = 1, with equality if x = u1 . Hints: use

1 = kxk2 = xT x = xT Ix,

I = u1 (u1 )T + ... + uN (uN )T ,


and Equation (3.1).

Use the inequality


N
X N
X
xT Qx = λn |xT un |2 ≤ λ1 |xT un |2 = λ1 .
n=1 n=1

3.12 Relate Example 4 to eigenvectors and eigenvalues.

We can write
 
4 6 12
2 3 6 2 3 6
 6 9 18  = 49( , , )T ( , , ),
7 7 7 7 7 7
12 18 36
so that
 
4 6 12
(2x + 3y + 6z)2 = |(x, y, z)(2, 3, 6)T |2 = (x, y, z)  6 9 18  (x, y, z)T .
12 18 36

The only positive eigenvalue of the matrix is 7 and the corresponding nor-
malized eigenvector is ( 72 , 37 , 67 )T . Then use Exercise 3.11.

3.13 Young’s Inequality Suppose that p and q are positive numbers


greater than one such that p1 + 1q = 1. If x and y are positive numbers, then

xp yq
xy ≤ + ,
p q
with equality if and only if xp = y q . Hint: use the GAGM Inequality.

This one is pretty easy.


3.1. EXERCISES 11

3.14 For given constants c and d, find the largest and smallest values of
cx + dy taken over all points (x, y) of the ellipse

x2 y2
+ = 1.
a2 b2
3.15 Find the largest and smallest values of 2x+y on the circle x2 +y 2 = 1.
Where do these values occur? What does this have to do with eigenvectors
and eigenvalues?

This one is similar to Exercise 3.12.

3.16 When a real M by N matrix A is stored in the computer it is usually


vectorized; that is, the matrix

A11 A12 ... A1N


 
 A21 A22 ... A2N 
 .
 
A=

 .


.
 
AM 1 AM 2 ... AM N

becomes

vec(A) = (A11 , A21 , ..., AM 1 , A12 , A22 , ..., AM 2 , ..., AM N )T .

Show that the dot product vec(A)· vec(B) = vec(B)T vec(A) can be ob-
tained by
vec(A)· vec(B) = trace (AB T ) = trace (B T A).

Easy.
12 CHAPTER 3. OPTIMIZATION WITHOUT CALCULUS
Chapter 4

Geometric Programming

4.1 Exercise
4.1 Show that there is no solution to the problem of minimizing the func-
tion
2
g(t1 , t2 ) = + t 1 t 2 + t1 , (4.1)
t1 t2

over t1 > 0, t2 > 0. Can g(t1 , t2 ) ever be smaller than 2 2?

Show that the only vector δ that works must have δ3 = 0, which is not
allowed.
To answer the second part, begin by showing that if t2 ≤ 1 then
g(t1 , t2 ) ≥ 4, and if t1 ≥ 1, then g(t1 , t2 ) ≥ 3. We want to consider
the case in which t1 → 0, t2 → ∞, and both t1 t2 and (t1 t2 )−1 remain
bounded. For example, we try
1
t2 = (ae−t1 + 1),
t1
2
for some positive a. Then g(t1 , t2 ) → a+1 + a + 1 as t1 → 0. Which a
gives√the smallest value? Minimizing √ the limit, with respect to a, gives
a = 2 − 1, for which the limit is 2 2.
More generally, let
f (t1 )
t2 = ,
t1
for some positive function f (t1 ) such that f (t1 ) does not go to zero as t1
goes to zero. Then we have
2
g(t1 , t2 ) = + f (t1 ) + t1 ,
f (t1 )

13
14 CHAPTER 4. GEOMETRIC PROGRAMMING

2
which converges to + f (0), as t1 → 0. The minimum of this limit, as a
f (0)
√ √
function of f (0), occurs when f (0) = 2, and the minimum limit is 2 2.
Chapter 5

Convex Sets

5.1 References Used in the Exercises


Proposition 5.1 The convex hull of a set S is the set C of all convex
combinations of members of S.
Proposition 5.2 For a given x, a vector z in C is PC x if and only if
hc − z, z − xi ≥ 0, (5.1)
for all c in the set C.
Lemma 5.1 For H = H(a, γ), z = PH x is the vector
z = PH x = x + (γ − ha, xi)a. (5.2)
J
Lemma 5.2 For H = H(a, γ), H0 = H(a, 0), and any x and y in R , we
have
PH (x + y) = PH x + PH y − PH 0, (5.3)
so that
PH0 (x + y) = PH0 x + PH0 y, (5.4)
that is, the operator PH0 is an additive operator. In addition,
PH0 (αx) = αPH0 x, (5.5)
so that PH0 is a linear operator.
Lemma 5.3 For any hyperplane H = H(a, γ) and H0 = H(a, 0),
PH x = PH0 x + PH 0, (5.6)
so PH is an affine linear operator.

15
16 CHAPTER 5. CONVEX SETS

Lemma 5.4 For i = 1, ..., I let Hi be the hyperplane Hi = H(ai , γi ),


Hi0 = H(ai , 0), and Pi and Pi0 the orthogonal projections onto Hi and
Hi0 , respectively. Let T be the operator T = PI PI−1 · · · P2 P1 . Then
T x = Bx + d, for some square matrix B and vector d; that is, T is an
affine linear operator.

Definition 5.1 Let S be a subset of RJ and f : S → [−∞, ∞] a function


defined on S. The subset of RJ+1 defined by

epi(f ) = {(x, γ)|f (x) ≤ γ}

is the epi-graph of f . Then we say that f is convex if its epi-graph is a


convex set.

5.2 Exercises
5.1 Let C ⊆ RJ , and let xn , n = 1, ..., N be members of C. For n =
1, ..., N , let αn > 0, with α1 + ... + αN = 1. Show that, if C is convex, then
the convex combination

α1 x1 + α2 x2 + ... + αN xN

is in C.

We know that the convex combination of any two members of C is again


in C. We prove the apparently more general result by induction. Suppose
that it is true that any convex combination of N − 1 or fewer members of
C is again in C. Now we show it is true for N members of C. To see this,
merely write
N N −1
X X αn
αn xn = (1 − αN ) xn + αN xN .
n=1 n=1
1 − αN

5.2 Prove Proposition 5.1. Hint: show that the set C is convex.

Note that it is not obvious that C is convex; we need to prove this fact.
We need to show that if we take two convex combinations of members of
S, say x and y, and form z = (1 − α)x + αy, then z is also a member of C.
To be concrete, let
XN
x= βn xn ,
n=1

and
M
X
y= γm y m ,
m=1
5.2. EXERCISES 17

where the xn and y m are in S and the positive βn and γm both sum to
one. Then
XN M
X
z= (1 − α)βn xn + αγm y m ,
n=1 m=1

which is a convex combination of N + M members of S, since


N
X M
X
(1 − α)βn + αγm = 1.
n=1 m=1

5.3 Show that the subset of RJ consisting of all vectors x with ||x||2 = 1
is not convex.

Don’t make the problem harder than it is; all we need is one counter-
example. Take x and −x, both with norm one. Then 12 (x + (−x)) = 0.

5.4 Let kxk2 = kyk2 = 1 and z = 12 (x + y) in RJ . Show that kzk2 < 1


unless x = y. Show that this conclusion does not hold if the two-norm k · k2
is replaced by the one-norm, defined by
J
X
kxk1 = |xj |.
j=1

Now x and y are given, and z is their midpoint. From the Parallelogram
Law,
kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 ,
we have
kx + yk2 + kx − yk2 = 4,
so that
1
kzk2 + kx − yk2 = 1,
4
from which we conclude that kzk < 1, unless x = y.

5.5 Let C be the set of all vectors x in RJ with kxk2 ≤ 1. Let K be a


subset of C obtained by removing from C any number of its members for
which kxk2 = 1. Show that K is convex. Consequently, every x in C with
kxk2 = 1 is an extreme point of C.

Suppose we remove some x with kxk = 1. In order for the set without
x to fail to be convex it is necessary that there be some y 6= x and z 6= x in
C for which x = y+z 2 . But we know that if kyk = kzk = 1, then kxk < 1,
unless y = z, in which case we would also have y = z = x. So we cannot
have kyk = kzk = 1. But what if their norms are less than one? In that
case, the norm of the midpoint of y and z is not greater than the larger of
18 CHAPTER 5. CONVEX SETS

the two norms, so again, it cannot be one. So no x on the boundary of C


is a convex combination of two distinct members of C, and removing any
number of such points leaves the remaining set convex.

5.6 Prove that every subspace of RJ is convex, and every linear manifold
is convex.

For the first part, replace β in the definition of a subspace with 1 − α,


and require that 0 ≤ α ≤ 1. For the second part, let M = S + b, where S
is a subspace, and b is a fixed vector. Let x = s + b and y = t + b, where s
and t are in S. Then

(1 − α)x + αy = [(1 − α)s + αt] + b,

which is then also in M .

5.7 Prove that every hyperplane H(a, γ) is a linear manifold.

Show that
γ
H(a, γ) = H(a, 0) + a.
kak2

5.8 (a) Let C be a circular region in R2 . Determine the normal cone


for a point on its circumference. (b) Let C be a rectangular region in R2 .
Determine the normal cone for a point on its boundary.

For (a), let x0 be the center of the circle and x be the point on its
circumference. Then the normal cone at x is the set of all z of the form
z = x + γ(x − x0 ), for γ ≥ 0. For (b), if the point x lies on one of the sides
of the rectangle, then the normal cone consists of the points along the line
segment through x formed by the outward normal to the side. If the point
x is a corner, then the normal cone consists of all points in the intersection
of the half-spaces external to the rectangle whose boundary lines are the
two sides that meet at x.

5.9 Prove Lemmas 5.2, 5.3 and 5.4.

Applying Equation (5.2),

PH x = x + (γ − ha, xi)a,

we find that

PH (x+y) = x+y +(γ −ha, x+yi)a = x+(γ −ha, xi)a+y +(γ −ha, yi)a−γa

= PH (x) + PH (y) − PH (0).


5.2. EXERCISES 19

We also have

PH (x) = x + γa − ha, xia = γa + x − ha, xia = PH (0) + PH0 (x).

Finally, consider the case of I = 2. Then we have

T x = P2 P1 (x) = P2 (x + (γ1 − ha1 , xi)a1 )

= [x − ha1 , xia1 − ha2 , xia2 + ha1 , xiha2 , a1 ia2 ] + γ1 a1 + γ2 a2 − γ1 ha2 , a1 ia2


= Lx + b,
where L is the linear operator

Lx = x − ha1 , xia1 − ha2 , xia2 + ha1 , xiha2 , a1 ia2 ,

and
b = γ1 a1 + γ2 a2 − γ1 ha2 , a1 ia2 .
We can then associate with the linear operator L the matrix

B = I − a1 (a1 )T − a2 (a2 )T + ha2 , a1 ia2 (a1 )T .

5.10 Let C be a convex set and f : C ⊆ RJ → (−∞, ∞]. Prove that f (x)
is a convex function, according to Definition 5.1, if and only if, for all x
and y in C, and for all 0 < α < 1, we have

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y).

Suppose that the epigraph of f is a convex set. Then

α(x, f (x)) + (1 − α)(y, f (y)) = (αx + (1 − α)y, αf (x) + (1 − α)f (y))

is a member of the epigraph, which then tells us that

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y).

Conversely, let (x, γ) and (y, δ) be members of the epigraph of f , so that


f (x) ≤ γ, and f (y) ≤ δ. Let α be in [0, 1]. We want to show that

α(x, γ) + (1 − α)(y, δ) = (αx + (1 − α)y, αγ + (1 − α)δ)

is in the epigraph. But this follows from the inequality

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y) ≤ αγ + (1 − α)δ.

5.11 Given a point s in a convex set C, where are the points x for which
s = PC x?
20 CHAPTER 5. CONVEX SETS

Suppose that s = PC x. From the inequality

hx − s, c − si ≤ 0,

for all c in C, it follows that x − s is in NC (s). So s = PC x if and only if


the vector x − s is in NC (s).

5.12 Let C be a closed non-empty convex set in RJ , x a vector not in C,


and d > 0 the distance from x to C. Let

σC (a) = supha, ci ,
c∈C

the support function of C. Show that

d = m = max {ha, xi − σC (a)}.


||a||≤1

1
Hints: Consider the unit vector d (x − PC x), and use Cauchy’s Inequality
and Proposition 5.2.

Let a0 = d1 (x − PC x). First, show that

σC (a0 ) = ha0 , PC xi.

Then

d = ha0 , x − PC xi = ha0 , xi − ha0 , P Cxi = ha0 , xi − σC (a0 ) ≤ m;

therefore, d ≤ m.
From Cauchy’s Inequality, we know that

ha, x − PC xi ≤ kx − PC xkkak = d,

for any vector a with norm one. Therefore, for any a with norm one, we
have
d ≥ ha, xi − ha, PC xi ≥ ha, xi − σC (a).
Therefore, we can conclude that d ≥ m, and so d = m.

5.13 (Rådström Cancellation)

• (a) Show that, for any subset S of RJ , we have 2S ⊆ S + S, and


2S = S + S if S is convex.

• (b) Find three finite subsets of R, say A, B, and C, with A not


contained in B, but with the property that A + C ⊆ B + C. Hint: try
to find an example where the set C is C = {−1, 0, 1}.
5.2. EXERCISES 21

• (c) Show that, if A and B are convex, B is closed, and C is bounded,


then A + C ⊆ B + C implies that A ⊆ B. Hint: Note that, under
these assumptions, 2A + C = A + (A + C) ⊆ 2B + C.

• (a) If x and y are in S, then x + y is in S + S. If S is convex, then


x+y
2 is also in S, or
x+y
x + y = 2( )
2
is in 2S; therefore S + S = 2S.

• (b) Take A = {0, 1}, B = {0, 2} and C = {−1, 0, 1}. Then we have

A + C = {−1, 0, 1, 2},

and
B + C = {−1, 0, 1, 2, 3}.

• (c) Begin with

2A + C = (A + A) + C = A + (A + C) ⊆ A + (B + C) = (A + C) + B

⊆ (B + C) + B = (B + B) + C = 2B + C.
Continuing in this way, show that

nA + C ⊆ nB + C,

or
1 1
A+ C ⊆ B + C,
n n
for n = 1, 2, 3, .... Now select a point a in A; we show that a is in B.
Since C is a bounded set, we know that there is a constant K > 0
such that kck ≤ K, for all c in C. For each n there are cn and dn in
C and bn in B such that
1 n 1
a+ c = bn + d n .
n n
Then we have
1 n
bn = a + (c − dn ). (5.7)
n
Since C is bounded, we know that n1 (cn − dn ) converges to zero, as
n → ∞. Then, taking limits on both sides of Equation (5.7), we find
that {bn } → a, which must be in B, since B is closed.
22 CHAPTER 5. CONVEX SETS
Chapter 6

Linear Programming

6.1 Needed References


Theorem 6.1 Let x and y be feasible vectors. Then
z = cT x ≥ bT y = w. (6.1)

Corollary 6.1 If z is not bounded below, then there are no feasible y.


Corollary 6.2 If x and y are both feasible, and z = w, then both x and y
are optimal for their respective problems.
Lemma 6.1 Let W = {w1 , ..., wN } be a spanning set for a subspace S
in RI , and V = {v 1 , ..., v M } a linearly independent subset of S. Then
M ≤ N.

6.2 Exercises
6.1 Prove Theorem 6.1 and its corollaries.
Since y is feasible, we know that AT y ≤ c, and since x is feasible, we
know that x ≥ 0 and Ax = b. Therefore, we have
w = bT y = (Ax)T y = xT AT y ≤ xT c = cT x = z.
This tells us that, if there are feasible x and y, then z = cT x is bounded
below, as we run over all feasible x, by any w. Consequently, if z is not
bounded, there cannot be any w, so there can be no feasible y.
If both x and y are feasible and z = cT x = bT y = w, then we cannot
decrease z by replacing x, nor can we increase w by replacing y. Therefore,
x and y are the best we can do.

23
24 CHAPTER 6. LINEAR PROGRAMMING

6.2 Let W = {w1 , ..., wN } be a spanning set for a subspace S in RI , and


V = {v 1 , ..., v M } a linearly independent subset of S. Let A be the matrix
whose columns are the v m , B the matrix whose columns are the wn . Show
that there is an N by M matrix C such that A = BC. Prove Lemma
6.1 by showing that, if M > N , then there is a non-zero vector x with
Cx = Ax = 0.

If C is any M by N matrix with M > N , then, using Gauss elimination,


we can show that there must be non-zero solutions of Cx = 0. Now, since
W is a spanning set, each column of A is a linear combination of the
columns of B, which means that we can find some C such that A = BC. If
there is a non-zero x for which Cx = 0, then Ax = 0 also. But the columns
of A are linearly independent, which tells us that Ax = 0 has only the zero
solution. We conclude that M ≤ N .

6.3 Show that when the simplex method has reached the optimal solution
for the primal problem PS, the vector y with y T = cTB B −1 becomes a feasible
vector for the dual problem and is therefore the optimal solution for DS.
Hint: Clearly, we have

z = cT x = cTB B −1 b = y T b = w,

so we need only show that AT y ≤ c.

We know that the simplex algorithm halts when the vector

rT = (cTN − cTB B −1 N ) = cTN − y T N

has only non-negative entries, or when

cN ≥ N T y.

Then we have
B T (B T )−1 cB
     
T BT BT y
A y= y= ≤ = c,
NT NT y cN

so y is feasible, and therefore optimal.


Chapter 7

Matrix Games and


Optimization

7.1 Some Exercises


7.1 Show that the vectors p̂ = µ1 x̂ and q̂ = µ1 ŷ are probability vectors and
are optimal randomized strategies for the matrix game.

Since
ẑ = cT x̂ = µ = bT ŷ = ŵ,
it follows that cT p̂ = bT q̂ = 1. We also have

x̂T Aŷ ≤ x̂T c = µ,

and
x̂T Aŷ = (AT x̂)T ŷ ≥ bT ŷ = µ,
so that
x̂T Aŷ = µ,
and
1
p̂T Aq̂ = .
µ
For any probabilities p and q we have
1 T 1 1
pT Aq̂ = p Aŷ ≤ pT c = ,
µ µ µ
and
1 T T 1 1
p̂T Aq = (AT p̂)T q = (A x̂) q ≥ bT q = .
µ µ µ

25
26 CHAPTER 7. MATRIX GAMES AND OPTIMIZATION

7.2 Given an arbitrary I by J matrix A, there is α > 0 so that the matrix


B with entries Bij = Aij + α has only positive entries. Show that any
optimal randomized probability vectors for the game with pay-off matrix B
are also optimal for the game with pay-off matrix A.

This one is easy.


Chapter 8

Differentiation

8.1 Exercises
8.1 Let Q be a real, positive-definite symmetric matrix. Define the Q-
inner product on RJ to be

hx, yiQ = xT Qy = hx, Qyi,

and the Q-norm to be q


||x||Q = hx, xiQ .
Show that, if ∇f (a) is the Fréchet derivative of f (x) at x = a, for the
usual Euclidean norm, then Q−1 ∇f (a) is the Fréchet derivative of f (x) at
x = a, for the Q-norm. Hint: use the inequality
p p
λJ ||h||2 ≤ ||h||Q ≤ λ1 ||h||2 ,

where λ1 and λJ denote the greatest and smallest eigenvalues of Q, respec-


tively.

8.2 For (x, y) not equal to (0, 0), let

xa y b
f (x, y) = ,
xp + yq
with f (0, 0) = 0. In each of the five cases below, determine if the function
is continuous, Gâteaux, Fréchet or continuously differentiable at (0, 0).
• 1) a = 2, b = 3, p = 2, and q = 4;
• 2) a = 1, b = 3, p = 2, and q = 4;
• 3) a = 2, b = 4, p = 4, and q = 8;

27
28 CHAPTER 8. DIFFERENTIATION

• 4) a = 1, b = 2, p = 2, and q = 2;
• 5) a = 1, b = 2, p = 2, and q = 4.
Chapter 9

Convex Functions

Proposition 9.1 The following are equivalent:


1) the epi-graph of g(x) is convex;
2) for all points a < x < b
g(b) − g(a)
g(x) ≤ (x − a) + g(a); (9.1)
b−a
3) for all points a < x < b
g(b) − g(a)
g(x) ≤ (x − b) + g(b); (9.2)
b−a
4) for all points a and b in R and for all α in the interval (0, 1)

g((1 − α)a + αb) ≤ (1 − α)g(a) + αg(b). (9.3)

Lemma 9.1 A firmly non-expansive operator on RJ is non-expansive.

9.1 Exercises
9.1 Prove Proposition 9.1.
x−a
The key idea here is to use α = b−a , so that x = (1 − α)a + αb.

9.2 Prove Lemma 9.1.

From the definition, F is firmly non-expansive if

hF (x) − F (y), x − yi ≥ kx − yk22 . (9.4)

Now use the Cauchy Inequality on the left side.

29
30 CHAPTER 9. CONVEX FUNCTIONS

9.3 Show that, if x̂ minimizes the function g(x) over all x in RJ , then
x = 0 is in the sub-differential ∂g(x̂).

This is easy.

9.4 If f (x) and g(x) are convex functions on RJ , is f (x) + g(x) convex?
Is f (x)g(x) convex?

It is easy to show that the sum of two convex functions is again convex.
The product of two is, however, not always convex; take f (x) = −1 and
g(x) = x2 , for example.

9.5 Let ιC (x) be the indicator function of the closed convex set C, that is,
(
0, if x ∈ C;
ιC (x) =
+∞, if x ∈/ C.

Show that the sub-differential of the function ιC at a point c in C is the


normal cone to C at the point c, that is, ∂ιC (c) = NC (c), for all c in C.

This follows immediately from the definitions.

9.6 Let g(t) be a strictly convex function for t > 0. For x > 0 and y > 0,
define the function
y
f (x, y) = xg( ).
x
Use induction to prove that
N
X
f (xn , yn ) ≥ f (x+ , y+ ),
n=1
PN
for any positive numbers xn and yn , where x+ = n=1 xn . Also show
that equality obtains if and only if the finite sequences {xn } and {yn } are
proportional.

We show this for the case of N = 2; the more general case is similar.
We need to show that

f (x1 , y1 ) + f (x2 , y2 ) ≥ f (x1 + x2 , y1 + y2 ).

Write
y1 y2  x
1 y1 x2 y2 
f (x1 , y1 )+f (x2 , y2 ) = x1 g( )+x2 g( ) = (x1 +x2 ) g( )+ g( )
x1 x2 x1 + x2 x1 x1 + x2 x2
x1 y1 x2 y2 y+
≥ (x1 + x2 )g( + ) = x+ g( ) = f (x+ , y+ ).
x1 + x2 x1 x1 + x2 x2 x+
9.1. EXERCISES 31

9.7 Use√the result in Exercise 9.6 to obtain Cauchy’s Inequality. Hint: let
g(t) = − t.

Using the result in Exercise 9.6, and the choice of g(t) = − t, we obtain
N r r
X yn y+
− xn ≥ −x+
n=1
xn x+

so that
N
X √ √ √ √
xN yn ≤ x+ y+ ,
n=1

which is Cauchy’s Inequality.

9.8 Use the result in Exercise 9.6 to obtain Milne’s Inequality:


N N
X  X xn yn 
x+ y+ ≥ (xn + yn ) .
n=1
x + yn
n=1 n

t
Hint: let g(t) = − 1+t .

This is a direct application of the result in Exercise 9.6.

9.9 For x > 0 and y > 0, let f (x, y) be the Kullback-Leibler function,
 x
f (x, y) = KL(x, y) = x log + y − x.
y
Use Exercise 9.6 to show that
N
X
KL(xn , yn ) ≥ KL(x+ , y+ ).
n=1

Use g(t) = − log t.


32 CHAPTER 9. CONVEX FUNCTIONS
Chapter 10

Fenchel Duality

10.1 Exercises
10.1 Let A be a real symmetric positive-definite matrix and
1
f (x) = hAx, xi.
2
Show that
1 −1
f ∗ (a) =
hA a, ai.
2
Hints: Find ∇f (x) and use Equation (10.1).

We have

f ∗ (a) = ha, (∇f )−1 (a)i − f ((∇f )−1 (a)). (10.1)

Since ∇f (x) = Ax, it follows from Equation (10.1) that


1 1
f ∗ (a) = ha, A−1 ai − hA(A−1 )a, A−1 ai = ha, A−1 ai.
2 2

33
34 CHAPTER 10. FENCHEL DUALITY
Chapter 11

Convex Programming

11.1 Referenced Results


Theorem 11.1 Let (x̂, ŷ) be a saddle point for K(x, y). Then x̂ solves the
primal problem, that is, x̂ minimizes f (x), over all x in X, and ŷ solves
the dual problem, that is, ŷ maximizes g(y), over all y in Y . In addition,
we have

g(y) ≤ K(x̂, ŷ) ≤ f (x), (11.1)

for all x and y, so that the maximum value of g(y) and the minimum value
of f (x) are both equal to K(x̂, ŷ).

Theorem 11.2 Let x∗ be a regular point. If x∗ is a local constrained


minimizer of f (x), then there is a vector λ∗ such that
• 1) λ∗i ≥ 0, for i = 1, ..., K;
• 2) λ∗i gi (x∗ ) = 0, for i = 1, ..., K;
PI
• 3) ∇f (x∗ ) + i=1 λ∗i ∇gi (x∗ ) = 0.

11.2 Exercises
11.1 Prove Theorem 11.1.

We have
f (x̂) = sup K(x̂, y) = K(x̂, ŷ),
y

and
f (x) = sup K(x, y) ≥ K(x, ŷ) ≥ K(x̂, ŷ) = f (x̂).
y

35
36 CHAPTER 11. CONVEX PROGRAMMING

From f (x) ≥ f (x̂) we conclude that x = x̂ minimizes f (x). In a similar


way, we can show that y = ŷ maximizes g(y).

11.2 Apply the gradient form of the KKT Theorem to minimize the func-
tion f (x, y) = (x + 1)2 + y 2 over all x ≥ 0 and y ≥ 0.

Setting the x-gradient of the Lagrangian to zero, we obtain the equa-


tions
2(x + 1) − λ1 = 0,
and
2y − λ2 = 0.
Since x ≥ 0, we cannot have λ1 = 0, consequently g1 (x, y) = −x = 0, so
x = 0. We also have that y = 0 whether or not λ2 = 0. Therefore, the
answer is x = 0, y = 0.

11.3 Minimize the function


p
f (x, y) = x2 + y 2 ,

subject to
x + y ≤ 0.
Show that the function M P (z) is not differentiable at z = 0.

Equivalently, we minimize f (x, y)2 = x2 + y 2 , subject to the constraint


g(x, y) ≤ z. If z ≥ 0, the optimal point is (0, 0) and M P (z) = 0. If
−z
z < 0, the optimal point is x = z2 = y, and M P (z) = √ 2
. The directional
derivative of M P (z) at z = 0, in the positive direction is zero, while in the
negative direction it is √12 , so M P (z) is not differentiable at z = 0.

11.4 Minimize the function

f (x, y) = −2x − y,

subject to
x + y ≤ 1,
0 ≤ x ≤ 1,
and
y ≥ 0.

We write g1 (x, y) = x + y − 1, g2 (x, y) = −x ≤ 0, g3 (x, y) = x − 1 ≤ 0,


and g4 (x, y) = −y ≤ 0. Setting the partial derivatives of the Lagrangian
to zero, we get
0 = −2 + λ1 + λ2 − λ3 ,
11.2. EXERCISES 37

and
0 = −1 + λ1 − λ4 .
Since λi ≥ 0 for each i, it follows that λ1 6= 0, so that 0 = g1 (x, y) =
x + y − 1, or x + y = 1.
If λ4 = 0, then λ1 = 1 and λ2 − λ3 = 1. Therefore, we cannot have
λ2 = 0, which tells us that g2 (x, y) = 0, or x = 1, in addition to y = 0.
If λ4 > 0, then y = 0, which we already know, and so x = 1 again. In
any case, then answer must be x = 1 and y = 0, so that the minimum is
f (1, 0) = −2.

11.5 Apply the theory of convex programming to the primal Quadratic


Programming Problem (QP), which is to minimize the function
1 T
f (x) = x Qx,
2
subject to
aT x ≤ c,
where a 6= 0 is in RJ , c < 0 is real, and Q is symmetric, and positive-
definite.

With g(x) = aT x − c ≤ 0, we set the x-gradient of L(x, λ) equal to zero,


obtaining
Qx∗ + λa = 0,
or
x∗ = −λQ−1 a.
So, for any λ ≥ 0, ∇x L(x, λ) = 0 has a solution, so all λ ≥ 0 are feasible
for the dual problem. We can then write

λ2 T −1  λ2 
L(x∗ , λ) = a Q a − λ2 aT Q−1 a − λc = − aT Q−1 a + λc .
2 2
Now we maximize L(x∗ , λ) over λ ≥ 0.
We get
0 = −λaT Q−1 a − c,
so that
c
λ∗ = − ,
aT Q−1 a
and the optimal x∗ is
cQ−1 a
x∗ = .
aT Q−1 a
11.6 Use Theorem 11.2 to prove that any real N by N symmetric matrix
has N mutually orthonormal eigenvectors.
38 CHAPTER 11. CONVEX PROGRAMMING

Let Q be symmetric. First, we minimize f (x) = 21 xT Qx subject to


g(x) = 1 − xT x = 0. The feasible set is closed and bounded, and f (x)
is continuous, so there must be a minimum. From the KKT Theorem we
have
Qx − λx = 0,
so that Qx = λx and x is an eigenvector of Q; call x = xN , and λ = γN .
Note that the optimal value of f (x) is γ2N .
Next, minimize f (x), subject to g1 (x) = 1 − xT x = 0 and xT xN = 0.
The Lagrangian is
1 T
L(x, λ) = x Qx + λ1 (1 − xT x) + λ2 xT xN .
2
Setting the x-gradient of L(x, λ) to zero, we get

0 = Qx − λ1 x + λ2 xN .

Therefore,

0 = xT Qx − λ1 xT x + λ2 xT xN = xT Qx − λ1 ,

or
λ1 = xT Qx.
We also have

0 = (xN )T Qx − λ1 (xN )T x + λ2 (xN )T xN ,

and
(xN )T Q = γN (xN )T ,
so that
0 = γN (xN )T x − λ1 (xN )T x + λ2 = λ2 ,
so λ2 = 0. Therefore, we have

Qx = λ1 x,

so x is another eigenvector of Q, with associated eigenvalue λ1 ; we write


x = xN −1 , and λ1 = γN −1 . Note that the optimal value of f (x) is now
γN −1
2 . Since the second optimization problem has more constraints than
the first one, we must conclude that γN −1 ≥ γN .
We continue in this manner, each time including one more orthogonal-
ity constraint, which thereby increases the optimal value. When we have
performed N optimizations, and are about to perform one more, we find
that the constraint conditions now require x to be orthogonal to each of
the previously calculated xn . But these vectors are linearly independent in
RN , and so the only vector orthogonal to all of them is zero, and we are
finished.
Chapter 12

Iterative Optimization

12.1 Referenced Results


Lemma 12.1 Suppose that xk+1 is chosen using the optimal value of γk ,
as described by Equation (12.1),

g(xk − γk ∇g(xk )) ≤ g(xk − γ∇g(xk )). (12.1)

Then

h∇g(xk+1 ), ∇g(xk )i = 0. (12.2)

12.2 Exercises
12.1 Prove Lemma 12.1.

Use the Chain Rule to calculate the derivative of the function f (γ) given
by
f (γ) = g(xk − γ∇g(xk )),
and then set this derivative equal to zero.

12.2 Apply√ the Newton-Raphson method to obtain an iterative procedure


for finding a, for any positive a. For which x0 does the method converge?
There are two answers, of course; how does the choice of x0 determine
which square root becomes the limit?

This is best done as a computer exercise.

12.3 Apply the Newton-Raphson method to obtain an iterative procedure


for finding a1/3 , for any real a. For which x0 does the method converge?

39
40 CHAPTER 12. ITERATIVE OPTIMIZATION

Another computer exercise.

12.4 Extend the Newton-Raphson method to complex variables. Redo the


previous exercises for the case of complex a. For the complex case, a has
two square roots and three cube roots. How does the choice of x0 affect
the limit? Warning: The case of the cube root is not as simple as it may
appear, and has a close connection to fractals and chaos.

Consult Schroeder’s book before trying this exercise.

12.5 (The Sherman-Morrison-Woodbury Identity) Let A be an in-


vertible matrix. Show that, if ω = 1 + v T A−1 u 6= 0, then A + uv T is
invertible and
1 −1 T −1
(A + uv T )−1 = A−1 − A uv A . (12.3)
ω

This is easy.

12.6 Use the reduced Newton-Raphson method to minimize the function


1 T
2 x Qx,
subject to Ax = b, where
0 −13 −6 −3
 
 −13 23 −9 3 
Q= ,
−6 −9 −12 1
−3 3 1 −1
 
2 1 2 1
A= ,
1 1 3 −1
and  
3
b= .
2
Start with
1
 
1
x0 =   .
0
0

We begin by finding a basis for the null space of A, which we do by


using Gauss elimination to solve Ax = 0. We find that (1, −4, 1, 0)T and
(−2, 3, 0, 1)T do the job, so the matrix Z is
1 −2
 
 −4 3 
Z= .
1 0
0 1
12.2. EXERCISES 41

The matrix Z T Q is then


 
46 −114 18 −14
ZT Q =
−42 98 −14 14

and  
T −68
0
Z Qx = .
56
We have 
T 520 −448
Z QZ = ,
−448 392
so that  
1 392 448
(Z T QZ)−1 = .
3136 448 520
One step of the reduced Newton-Raphson algorithm, beginning with v 0 =
0, gives  
1 T −1 T 0 0.5
v = −(Z QZ) Z Qx = ,
0.4286
so that
0.6428

 0.2858 
x1 = x0 + Zv 1 =  .
0.5
0.4286
When we check, we find that Z T Qx1 = 0, so we are finished.

12.7 Use the reduced steepest descent method with an exact line search to
solve the problem in the previous exercise.

Do as a computer problem.
42 CHAPTER 12. ITERATIVE OPTIMIZATION
Chapter 13

Modified-Gradient
Algorithms

There are no exercises in this chapter.

43
44 CHAPTER 13. MODIFIED-GRADIENT ALGORITHMS
Chapter 14

Quadratic Programming

There are no exercises in this chapter.

45
46 CHAPTER 14. QUADRATIC PROGRAMMING
Chapter 15

Solving Systems of Linear


Equations

15.1 Regularizing the ART


In our first method we use ART to solve the system of equations given in
matrix form by
 
T u
[A I ] = 0. (15.1)
v

We begin with u0 = b and v 0 = 0. Then, the lower component of the limit


vector is v ∞ = −x̂ .
The method of Eggermont et al. is similar. In their method we use
ART to solve the system of equations given in matrix form by
 
x
[ A I ] = b. (15.2)
v

We begin at x0 = 0 and v 0 = 0. Then, the limit vector has for its upper
component x∞ = x̂ , and v ∞ = b − Ax̂ .

15.2 Exercises
15.1 Show that the two algorithms associated with Equations (15.1) and
(15.2), respectively, do actually perform as claimed.

We begin by recalling that, in the under-determined case, the minimum


two-norm solution of a system of equations Ax = b has the form x = AT z,
for some z, so that the minimum two-norm solution is x = AT (AAT )−1 b,

47
48 CHAPTER 15. SOLVING SYSTEMS OF LINEAR EQUATIONS

provided that the matrix AAT is invertible, which it usually is in the under-
determined case.
The solution of Ax = b for which kx − pk is minimized can be found in
a similar way. We let z = x − p, so that x = z + p and Az = b − Ap. Now
we find the minimum two-norm solution of the system Az = b − Ap; our
final solution is x = z + p.
The regularized solution x̂ that we seek minimizes the function
1
f (x) = kAx − bk2 + 2 kxk2 ,
2
and therefore can be written explicitly as

x̂ = (AT A + 2 I)−1 AT b.

For large systems, it is too expensive and time-consuming to calculate x̂


this way; therefore, we seek iterative methods, in particular, ones that do
not require the calculation of the matrix AT A.
The first of the two methods offered in the chapter has us solve the
system
 
T u
[A I ] = 0. (15.3)
v

When we use the ART algorithm, we find the solution closest to where we
began the iteration. We begin at
 0  
u b
= ,
v0 0

so we are finding the solution of Equation (15.1) for which

ku − bk2 + kvk2

is minimized. From our previous discussion, we see that we need to find


the solution of  
s
[A T
I ] = −AT b
t
for which
ksk2 + ktk2
is minimized. This minimum two-norm solution must have the form
     
s A Az
= z= .
t I z

This tells us that


(AT A + 2 I)z = −AT b,
15.2. EXERCISES 49

or that −z = x̂ . Since the lower part of the minimum two-norm solution
is t = z, the assertion concerning this algorithm is established.
The second method has us solve the system
 
x
[ A I ] = b, (15.4)
v

using the ART algorithm and beginning at


 
0
,
0

so we are seeking a minimum two-norm solution of the system in Equation


(15.2). We know that this minimum two-norm solution must have the form
   T  T 
x A A z
= z= .
v I z

Therefore,
(AAT + 2 I)z = b,
and

AT (AAT + 2 I)z = (AT A + 2 I)AT z = (AT A + 2 I)x = b.

Consequently, x = x̂ , and this x is the upper part of the limit vector of
the ART iteration.
50 CHAPTER 15. SOLVING SYSTEMS OF LINEAR EQUATIONS
Chapter 16

Conjugate-Direction
Methods

16.1 Proofs of Lemmas


Lemma 16.1 When xk is constructed using the optimal α, we have

∇f (xk ) · dk = 0. (16.1)

Proof: Differentiate the function f (xk−1 +αdk ) with respect to the variable
α, and then set it to zero. According to the Chain Rule, we have

∇f (xk ) · dk = 0.

Lemma 16.2 The optimal αk is

r k · dk
αk = , (16.2)
dk · Qdk

where rk = c − Qxk−1 .

Proof: We have

∇f (xk ) = Qxk − c = Q(xk−1 + αk dk ) − c = Qxk−1 − c + αk Qdk ,

so that
0 = ∇f (xk ) · dk = −rk · dk + αdk · Qdk .

51
52 CHAPTER 16. CONJUGATE-DIRECTION METHODS

Lemma 16.3 Let ||x||2Q = x · Qx denote the square of the Q-norm of x.


Then
||x̂ − xk−1 ||2Q − ||x̂ − xk ||2Q = (rk · dk )2 /dk · Qdk ≥ 0
for any direction vectors dk .

Proof: We use c = Qx̂. Then we have

(x̂−xk )·Q(x̂−xk ) = (x̂−xk−1 )·Q(x̂−xk−1 )−2αk dk ·Q(x̂−xk−1 )+αk2 dk ·Qdk ,

so that

kx̂ − xk−1 k2Q − kx̂ − xk k2Q = 2αk dk · (c − Qxk−1 ) − αk2 dk · Qdk .

Now use rk = c − Qxk−1 and the value of αk .

Lemma 16.4 A conjugate set that does not contain zero is linearly inde-
pendent. If pn 6= 0 for n = 1, ..., J, then the least-squares vector x̂ can be
written as
x̂ = a1 p1 + ... + aJ pJ ,
with aj = c · pj /pj · Qpj for each j.

Proof: Suppose that we have

0 = c1 p1 + ... + cn pn ,

for some constants c1 , ..., cn . Then, for each m = 1, ..., n we have

0 = c1 pm · Qp1 + ... + cn pm · Qpn = cm pm · Qpm ,

from which it follows that cm = 0 or pm = 0.


Now suppose that the set {p1 , ..., pJ } is a conjugate basis for RJ . Then
we can write
x̂ = a1 p1 + ... + aJ pJ ,
for some aj . Then for each m we have

pm · c = pm · Qx̂ = am pm · Qpm ,

so that
pm · c
am = .
pm· Qpm

Lemma 16.5 Whenever pn+1 = 0, we also have rn+1 = 0 , in which case


we have c = Qxn , so that xn is the least-squares solution.
16.2. EXERCISES 53

Proof: If pn+1 = 0, then rn+1 is a multiple of pn . But, rn+1 is orthogonal


to pn , so rn+1 = 0.

Theorem 16.1 For n = 1, 2, ..., J and j = 1, ..., n − 1 we have

• a) rn · rj = 0;

• b) rn · pj = 0; and

• c) pn · Qpj = 0.

16.2 Exercises
16.1 There are several lemmas in this chapter whose proofs are only
sketched. Complete the proofs of these lemma.

The proof of Theorem 16.1 uses induction on the number n. Throughout


the following exercises assume that the statements in the theorem hold for
some n < J. We prove that they hold also for n + 1.

16.2 Use the fact that

rj+1 = rj − αj Qpj ,

to show that Qpj is in the span of the vectors rj and rj+1

We have

rj+1 = c − Qxj = c − Qxj−1 − αj Qpj = rj − αj Qpj ,

so that
rj+1 − rj = −αj Qpj .

16.3 Show that rn+1 · rn = 0. Hints: establish that


rn · rn
αn = ,
pn · Qpn

by showing that
r n · pn = r n · r n ,
and then use the induction hypothesis to show that

rn · Qpn = pn · pn .
54 CHAPTER 16. CONJUGATE-DIRECTION METHODS

We know that
rn = pn + βn−1 pn−1 ,
where
rn · Qpn−1
bn−1 = .
pn−1 · Qpn−1
Since rn · pn−1 = 0, it follows that

r n · pn = r n · r n .

Therefore, we have
rn · rn
αn = .
pn · Qpn
From the induction hypothesis, we have

(rn − pn ) · Qpn = βn−1 pn−1 · Qpn = 0,

so that
rn · Qpn = pn · Qpn .
Using
rn+1 = rn − αn Qpn ,
we find that
rn+1 · rn = rn · rn − αn rn · Qpn = 0.

16.4 Show that rn+1 ·rj = 0, for j = 1, ..., n−1. Hints: 1. show that rn+1
is in the span of the vectors {rj+1 , Qpj+1 , Qpj+2 , ..., Qpn }; and 2. show that
rj is in the span of {pj , pj−1 }. Then use the induction hypothesis.

We begin with

rn+1 = rn − αn Qpn = rn−1 − αn−1 Qpn−1 − αn Qpn .

We then use
rn−1 = rn−2 − αn−2 pn−2 ,
and so on, to get that rn+1 is in the span of the vectors {rj+1 , Qpj+1 , Qpj+2 , ..., Qpn }.
We then use
rj · rj+1 = 0,
and
rj = pj + βj−1 pj−1 ,
along with the induction hypothesis, to get

rj · Qpm = 0,

for m = j + 1, ..., n.
16.2. EXERCISES 55

16.5 Show that rn+1 · pj = 0, for j = 1, ..., n. Hint: show that pj is


in the span of the vectors {rj , rj−1 , ..., r1 }, and then use the previous two
exercises.

Write
pj = rj − βj−1 pj−1
and repeat this for pj−1 , pj−2 , and so on, and use p1 = r1 to show that pj
is in the span of {rj , rj−1 , ..., r1 }. Then use the previous exercises.

16.6 Show that pn+1 · Qpj = 0, for j = 1, ..., n − 1. Hint: use

Qpj = αj−1 (rj − rj+1 ).

We have

pn+1 · Qpj = rn+1 · Qpj = rn+1 · (αj−1 (rj − rj−1 )) = 0.

The final step in the proof is contained in the following exercise.


16.7 Show that pn+1 · Qpn = 0. Hints: write

pn+1 = rn+1 − βn pn ,

where
rn+1 · Qpn
βn = − .
pn · Qpn
We have

pn+1 · Qpn = rn+1 · Qpn − βn pn · Qpn = 0,

from the definition of βn .


56 CHAPTER 16. CONJUGATE-DIRECTION METHODS
Chapter 17

Sequential Unconstrained
Minimization Algorithms
PJ
Lemma 17.1 For any non-negative vectors x and z, with z+ = j=1 zj >
0, we have
x+
KL(x, z) = KL(x+ , z+ ) + KL(x, z). (17.1)
z+

17.1 Exercises
17.1 Prove Lemma 17.1.

This is easy.

17.2 Use the logarithmic barrier method to minimize the function

f (x, y) = x − 2y,

subject to the constraints


1 + x − y 2 ≥ 0,
and
y ≥ 0.

For k > 0, we minimize

x − 2y − k log(1 + x − y 2 ) − k log(y).

Setting the gradient to zero, we get

k = 1 + x − y2 ,

57
58CHAPTER 17. SEQUENTIAL UNCONSTRAINED MINIMIZATION ALGORITHMS

and  2y 1
2=k − .
1 + x − y2 y
Therefore,
2y 2 − 2y − k = 0,
so that
1 1√
y= + 1 + 2k.
2 2
As k → 0, we have y → 1 and x → 0, so the optimal solution is x = 0 and
y = 1, which can be checked using the KKT Theorem.

17.3 Use the quadratic-loss penalty method to minimize the function

f (x, y) = −xy,

subject to the equality constraint

x + 2y − 4 = 0.

For each k > 0 we minimize the function

−xy + k(x + 2y − 4)2 .

Setting the gradient equal to zero, we obtain

x = 4k(x + 2y − 4),

y = 2k(x + 2y − 4),
and so x = 2y and
2x − 8
y= .
−4 + k1
Solving for x, we get
16
x=
8 − k1
and
8
y= 1 .
8− k
As k → ∞, x approaches 2 and y approaches 1. We can check this result
by substituting x = 4 − 2y into f (x, y) and minimizing it as a function of
y alone.
Chapter 18

Likelihood Maximization

There are no exercises in this chapter.

59
60 CHAPTER 18. LIKELIHOOD MAXIMIZATION
Chapter 19

Operators

19.1 Referenced Results


Suppose that B is a diagonalizable matrix, that is, there is a basis for RJ
consisting of eigenvectors of B. Let {u1 , ..., uJ } be such a basis, and let
Buj = λj uj , for each j = 1, ..., J. For each x in RJ , there are unique
coefficients aj so that

J
X
x= aj uj . (19.1)
j=1

Then let
J
X
||x|| = |aj |. (19.2)
j=1

Lemma 19.1 The expression || · || in Equation (19.2) defines a norm on


RJ . If ρ(B) < 1, then the affine operator T is sc, with respect to this norm.

Lemma 19.2 Let T be an arbitrary operator T on RJ and G = I − T .


Then

||x − y||22 − ||T x − T y||22 = 2(hGx − Gy, x − yi) − ||Gx − Gy||22 . (19.3)

Lemma 19.3 Let T be an arbitrary operator T on RJ and G = I − T .


Then
hT x − T y, x − yi − ||T x − T y||22 =

hGx − Gy, x − yi − ||Gx − Gy||22 . (19.4)

61
62 CHAPTER 19. OPERATORS

Lemma 19.4 An operator T is ne if and only if its complement G = I −T


is 12 -ism, and T is fne if and only if G is 1-ism, and if and only if G is
fne. Also, T is ne if and only if F = (I + T )/2 is fne. If G is ν-ism and
γ > 0 then the operator γG is γν -ism.

Proposition 19.1 An operator F is firmly non-expansive if and only if


F = 12 (I + N ), for some non-expansive operator N .

Proposition 19.2 Let T be an affine linear operator whose linear part B


is diagonalizable, and |λ| < 1 for all eigenvalues λ of B that are not equal to
one. Then the operator T is pc, with respect to the norm given by Equation
(19.2).

19.2 Exercises
19.1 Show that a strict contraction can have at most one fixed point.

Suppose that T is strict contraction, with

kT x − T yk ≤ rkx − yk,

for some r ∈ (0, 1) and for all x and y. If T x = x and T y = y, then

kx − yk = kT x − T yk ≤ rkx − yk,

which implies that kx − yk = 0.

19.2 Let T be sc. Show that the sequence {T k x0 } is a Cauchy sequence.

From

||xk − xk+n || ≤ ||xk − xk+1 || + ... + ||xk+n−1 − xk+n ||. (19.5)

and

||xk+m − xk+m+1 || ≤ rm ||xk − xk+1 || (19.6)

we have

||xk − xk+n || ≤ (1 + r + r2 + ... + rn−1 )kxk − xk+1 k.

From
kxk − xk+1 k ≤ rk kx0 − x1 k
we conclude that, given any  > 0, we can find k > 0 so that, for any n > 0

kxk − xk+n k ≤ ,
19.2. EXERCISES 63

so that the sequence {T k x0 } is a Cauchy sequence.


Since {xk } is a Cauchy sequence, it has a limit, say x̂. Let ek = x̂ − xk .
From
ek = T k x̂ − T k x0
we have
kek k ≤ rk kx̂ − x0 k,
so that {ek } converges to zero, and {xk } converges to x̂. Since the sequence
{xk+1 } is just the sequence {xk }, but starting at x1 instead of x0 , the
sequence {xk+1 } also converges to x̂. But we have {xk+1 } = {T xk }, which
converges to T x̂, by the continuity of T . Therefore, T x̂ = x̂.

19.3 Suppose that we want to solve the equation


1 −x
x= e .
2
Let T x = 12 e−x for x in R. Show that T is a strict contraction, when re-
stricted to non-negative values of x, so that, provided we begin with x0 > 0,
the sequence {xk = T xk−1 } converges to the unique solution of the equa-
tion.

Let 0 < x < z. From the Mean Value Theorem we know that, for t > 0,

e−t = e−0 − e−s t,

for some s in the interval (0, t). Therefore,

1 − e−(z−x) = e−c (z − x),

for some c in the interval (0, z − x). Then


1 −x 1 1
|T x − T z| = e (1 − e−(z−x) ) ≤ e−x e−c (z − x) ≤ (z − x).
2 2 2
Therefore, T is a strict contraction, with r = 21 .

19.4 Prove Lemma 19.1.

Showing that it is a norm is easy.


Since T x − T y = Bx − By, we know that kT x − T yk = kB(x − y)k.
Suppose that
J
X
x−y = cj uj .
j=1

Then
J
X
B(x − y) = λj cj uj ,
j=1
64 CHAPTER 19. OPERATORS

and
J
X J
X
kB(x − y)k = |λj ||cj | ≤ ρ(B) |cj | = ρ(B)kx − yk.
j=1 j=1

Therefore, T is a strict contraction in this norm.

19.5 Show that, if the operator T is α-av and 1 > β > α, then T is β-av.

Clearly, if

1
hGx − Gy, x − yi ≥ ||Gx − Gy||22 , (19.7)

and 1 > β > α, then

1
hGx − Gy, x − yi ≥ ||Gx − Gy||22 .

19.6 Prove Lemma 19.4.

Clearly, the left side of the equation

||x − y||22 − ||T x − T y||22 = 2(hGx − Gy, x − yi) − ||Gx − Gy||22

is non-negative if and only if the right side is non-negative, which is equiv-


alent to
1
(hGx − Gy, x − yi) ≥ ||Gx − Gy||22 ,
2
which means that G is 21 -ism. Similarly, the left side of the equation

hT x − T y, x − yi − ||T x − T y||22 =

hGx − Gy, x − yi − ||Gx − Gy||22


is non-negative if and only if the right side is, which says that T is fne if
and only if G is fne if and only if G is 1-ism.

19.7 Prove Proposition 19.1.

Note that F = 12 (I + N ) is equivalent to F = I − 12 G, for G = I − N .


Therefore, if N is ne, then G is 21 -ism and 12 G is 1-ism, or fne. Therefore,
N is ne if and only if F is the complement of a 1-ism operator, and so is
itself a 1-ism operator and fne.

19.8 Prove Proposition 19.2.


19.2. EXERCISES 65

Suppose that T y = y. We need to show that

kT x − yk < kx − yk,

unless T x = x. Since T y = y, we can write

kT x − yk = kT x − T yk = kB(x − y)k.

Suppose that
J
X
x−y = aj uj ,
j=1

and suppose that T x 6= x. Then


J
X
kx − yk = |aj |,
j=1

while
J
X
kB(x − y)k = |λj ||aj |.
j=1

If λj = 1 for all j such that aj 6= 0, then B(x − y) = x − y, which is not


the case, since T x − T y 6= x − y. Therefore, at least one |λj | is less than
one, and so
kT x − yk < kx − yk.

19.9 Show that, if B is a linear av operator, then |λ| < 1 for all eigenval-
ues λ of B that are not equal to one.

We know that B is av if and only if there is α ∈ (0, 1) such that

B = (1 − α)I + αN.

From Bu = λu it follows that


λ+α−1
Nu = u.
α
Since N is ne, we must have
λ + α − 1
≤ 1,

α

or
|λ + α − 1| ≤ α.
Since B is ne, all its eigenvalues must have |λ| ≤ 1. We consider the
cases of λ real and λ complex and not real separately.
66 CHAPTER 19. OPERATORS

If λ is real, then we need only show that we cannot have λ = −1. But
if this were the case, we would have

| − 2 + α| = 2 − α ≤ α,

or α > 1, which is false.


If λ = a + bi, with a2 + b2 = 1 and b 6= 0, then we have

|(a + α − 1) + bi|2 ≤ α2 ,

or
(a + (α − 1))2 + b2 ≤ α2 .
Then
a2 + b2 + 2a(α − 1) + (α − 1)2 ≤ α2 .
Suppose that a2 + b2 = 1 and b 6= 0. Then we have

1 ≤ α2 + 2a(1 − α) − (1 − α)2 = 2a(1 − α) − 1 + 2α,

or
2 ≤ 2(1 − α)a + 2α,
so that
1 ≤ (1 − α)a + α(1),
which is a convex combination of the real numbers a and 1. Since α ∈ (0, 1),
we can say that, unless a = 1, 1 is strictly less than the maximum of 1
and a, implying that a ≥ 1, which is false. Therefore, we conclude that
a2 + b2 < 1.
Chapter 20

Convex Feasibility and


Related Problems

20.1 A Lemma
For i = 1, ..., I let Ci be a non-empty, closed convex set in RJ . Let C =
∩Ii=1 Ci be the non-empty intersection of the Ci .
PI
Lemma 20.1 If c ∈ C and x = c + i=1 pi , where, for each i, c =
PCi (c + pi ), then c = PC x.

20.2 Exercises
20.1 Prove Lemma 20.1.

For each i we have

hc − (c + pi ), ci − ci = h−pi , ci − ci ≥ 0,

for all ci in Ci . Then, for any d in C = ∩Ii=1 Ci , we have

h−pi , d − ci ≥ 0.

Therefore,
I
X I
X
hc − x, d − ci = h− pi , d − ci = h−pi , d − ci ≥ 0,
i=1 i=1

from which we conclude that c = PC x.

67
68CHAPTER 20. CONVEX FEASIBILITY AND RELATED PROBLEMS

You might also like