0% found this document useful (0 votes)
34 views

CntrlEngg (Optimization) ConvexAnalysisAndOptimization Solutions DimitriBertsekas

The document provides corrections for the book 'Convex Analysis and Optimization' by Dimitri P. Bertsekas. It lists page numbers and describes corrections to be made to the text, such as changing words, adding or removing text. The corrections are minor edits to improve accuracy and clarity.

Uploaded by

p20230520
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

CntrlEngg (Optimization) ConvexAnalysisAndOptimization Solutions DimitriBertsekas

The document provides corrections for the book 'Convex Analysis and Optimization' by Dimitri P. Bertsekas. It lists page numbers and describes corrections to be made to the text, such as changing words, adding or removing text. The corrections are minor edits to improve accuracy and clarity.

Uploaded by

p20230520
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 191

Corrections for the book CONVEX ANALYSIS AND OPTI-

MIZATION, Athena Scientific, 2003, by Dimitri P. Bertsekas

Last Changed: 5/3/04

p. 3 (+22) Change “as the union of the closures of all line segments” to
“as the closure of the union of all line segments”
p. 37 (-2) Change “Every x” to “Every x = 0”
p. 38 (+1) Change “Every x in” to “Every x ∈
/ X that belongs to”
p. 38 (+19) Change “i.e.,” to “with x1 , . . . , xm ∈ n and m ≥ 2, i.e.,”
p. 63 (+4, +6, +7, +19) Change four times “c y” to “a y”
p. 67 (+3) Change “y ∈ AC” to “y ∈ AC”
p. 70 (+9) Change “[BeN02]” to “[NeB02]”
p. 110 (+3 after the figure caption) Change “... does not belong to
the interior of C” to “... does not belong to the interior of C and hence
does not belong to the interior of cl(C) [cf. Prop. 1.4.3(b)]”
   
p. 148 (-8) Change “ x | r(x) ≤ γ ” to “ z | r(z) ≤ γ ”
p. 213 (-6) Change “remaining vectors vj , j = i.” to “vectors vj with
vj = vi .”
p. 219 (+3) Change “fi : C → ” to “fi : n → ”
p. 265 (+10) Change “d/d” to “−d/d”
p. 268 (-3) Change “j ∈ A(x∗ )” to “j ∈
/ A(x∗ )”
p. 338 (+17) Change “Section 5.2” to “Section 5.3”
p. 384 (+6) Change “convex, possibly nonsmooth functions” to “smooth
functions, and convex (possibly nonsmooth) functions”
p. 446 (+6 and +8) Interchange “... constrained problem (7.16)” and
“... penalized problem (7.19)”
p. 458 (+13) Change “... as well real-valued” to “... as well as real-
valued”
p. 458 (-10) Change “We will focus on this ... dual functions.” to “In
this case, the dual problem can be solved using gradient-like algorithms for
differentiable optimization (see e.g., Bertsekas [Ber99a]).”

1
Convex Analysis and
Optimization
Chapter 1 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE March 24, 2004

CHAPTER 1: SOLUTION MANUAL

1.1

Assume that C is convex. Then, clearly (λ1 + λ2 )C ⊂ λ1 C + λ2 C; this is true


even if C is not convex. To show the reverse inclusion, note that a vector x in
λ1 C + λ2 C is of the form x = λ1 x1 + λ2 x2 , where x1 , x2 ∈ C. By convexity of
C, we have
λ1 λ2
x1 + x2 ∈ C,
λ1 + λ2 λ1 + λ 2
and it follows that
x = λ1 x1 + λ2 x2 ∈ (λ1 + λ2 )C.
Hence λ1 C + λ2 C ⊂ (λ1 + λ2 )C.
For a counterexample when C is not convex, let C be a set in n consisting
of two vectors, 0 and x = 0, and let λ1 = λ2 = 1. Then evidently C is not convex,
and (λ1 + λ2 )C = 2C = {0, 2x} while λ1 C + λ2 C = C + C = {0, x, 2x}, showing
that (λ1 + λ2 )C = λ1 C + λ2 C.

1.2 (Properties of Cones)

(a) Let x ∈ ∩i∈I Ci and let α be a positive scalar. Since x ∈ Ci for all i ∈ I and
each Ci is a cone, the vector αx belongs to Ci for all i ∈ I. Hence, αx ∈ ∩i∈I Ci ,
showing that ∩i∈I Ci is a cone.
(b) Let x ∈ C1 × C2 and let α be a positive scalar. Then x = (x1 , x2 ) for some
x1 ∈ C1 and x2 ∈ C2 , and since C1 and C2 are cones, it follows that αx1 ∈ C1
and αx2 ∈ C2 . Hence, αx = (αx1 , αx2 ) ∈ C1 × C2 , showing that C1 × C2 is a
cone.
(c) Let x ∈ C1 + C2 and let α be a positive scalar. Then, x = x1 + x2 for some
x1 ∈ C1 and x2 ∈ C2 , and since C1 and C2 are cones, αx1 ∈ C1 and αx2 ∈ C2 .
Hence, αx = αx1 + αx2 ∈ C1 + C2 , showing that C1 + C2 is a cone.
(d) Let x ∈ cl(C) and let α be a positive scalar. Then, there exists a sequence
{xk } ⊂ C such that xk → x, and since C is a cone, αxk ∈ C for all k. Further-
more, αxk → αx, implying that αx ∈ cl(C). Hence, cl(C) is a cone.
(e) First we prove that A·C is a cone, where A is a linear transformation and A·C
is the image of C under A. Let z ∈ A · C and let α be a positive scalar. Then,
Ax = z for some x ∈ C, and since C is a cone, αx ∈ C. Because A(αx) = αz,
the vector αz is in A · C, showing that A · C is a cone.
Next we prove that the inverse image A−1 · C of C under A is a cone. Let
−1
x ∈ A · C and let α be a positive scalar. Then Ax ∈ C, and since C is a cone,
αAx ∈ C. Thus, the vector A(αx) is in C, implying that αx ∈ A−1 · C, and
showing that A−1 · C is a cone.

2
1.3 (Lower Semicontinuity under Composition)

(a) Let {xk } ⊂ n be a sequence  of vectors converging to some x ∈  . By


n

continuity of f , it follows that f (xk ) ⊂ m converges to f (x) ∈ m , so that


by lower semicontinuity of g, we have
   
lim inf g f (xk ) ≥ g f (x) .
k→∞

Hence, h is lower semicontinuous.


(b) Assume, to arrive at a contradiction, that h is not lower semicontinuous at
some x ∈ n . Then, there exists a sequence {xk } ⊂ n converging to x such
that    
lim inf g f (xk ) < g f (x) .
k→∞

Let {xk }K be a subsequence attaining the above limit inferior, i.e.,


     
lim g f (xk ) = lim inf g f (xk ) < g f (x) . (1.1)
k→∞, k∈K k→∞

Without loss of generality, we may assume that


   
g f (xk ) < g f (x) , ∀ k ∈ K.

Since g is monotonically nondecreasing, it follows that

f (xk ) < f (x), ∀ k ∈ K,

which together with the fact {xk }K → x and the lower semicontinuity of f yields

f (x) ≤ lim inf f (xk ) ≤ lim sup f (xk ) ≤ f (x),


k→∞, k∈K k→∞, k∈K

 
showing that f (xk ) K → f (x). By our choice of the sequence {xk }K and by
lower semicontinuity of g, it follows that
     
lim g f (xk ) = lim inf g f (xk ) ≥ g f (x) ,
k→∞, k∈K k→∞, k∈K

contradicting Eq. (1.1). Hence, h is lower semicontinuous.


As an example showing that the assumption that g is monotonically non-
decreasing is essential, consider the functions

0 if x ≤ 0,
f (x) =
1 if x > 0,

and g(x) = −x. Then 


  0 if x ≤ 0,
g f (x) =
−1 if x > 0,
which is not lower semicontinuous at 0.

3
1.4 (Convexity under Composition)

(a) Let x, y ∈ C and let α ∈ [0, 1]. Then we have


    
h αx + (1 − α)y = g f αx + (1 − α)y
 
≤ g αf (x) + (1 − α)f (y)
   
≤ αg f (x) + (1 − α)g f (y)
= αh(x) + (1 − α)h(y),
where the first inequality above follows from the convexity of f and the mono-
tonicity of g, while the second inequality follows from the convexity of g. If g
is monotonically increasing and f is strictly convex, then the first inequality in
the preceding relation is strict whenever x = y and α ∈ (0, 1), showing that h is
strictly convex.
(b) Let x, y ∈ n and let α ∈ [0, 1]. Then, by the definitions of h and f , we have
    
h αx + (1 − α)y = g f αx + (1 − α)y
    
= g f1 αx + (1 − α)y , . . . , fm αx + (1 − α)y
 
≤ g αf1 (x) + (1 − α)f1 (y), . . . , αfm (x) + (1 − α)fm (y)
    
= g α f1 (x), . . . , fm (x) + (1 − α) f1 (y), . . . , fm (y)
   
≤ αg f1 (x), . . . , fm (x) + (1 − α)g f1 (y), . . . , fm (y)
   
= αg f (x) + (1 − α)g f (y)
= αh(x) + (1 − α)h(y),
where the first inequality follows by convexity of each fi and monotonicity of g,
while the second inequality follows by convexity of g.

1.5 (Examples of Convex Functions)

(a) It can be seen that f1 is twice continuously differentiable over X and its
Hessian matrix is given by
⎡ 1−n 1
··· 1 ⎤
x2 x1 x2 x1 xn
⎢ ⎥
1

1 1−n
x2
··· 1

f1 (x) ⎢
x2 x1 x2 xn

∇2 f1 (x) = ⎢
2

n2 ..
⎣ . ⎦
1
xn x1
1
x1 x2
··· 1−n
x2
n

for all x = (x1 , . . . , xn ) ∈ X. From this, direct computation shows that for all
z = (z1 , . . . , zn ) ∈ n and x = (x1 , . . . , xn ) ∈ X, we have
 2 n 


n
 
 2 f1 (x) zi zi 2
z ∇ f1 (x)z = −n .
n2 xi xi
i=1 i=1

4
Note that this quadratic form is nonnegative for all z ∈ n and x ∈ X, since
f1 (x) < 0, and for any real numbers α1 , . . . , αn , we have

(α1 + · · · + αn )2 ≤ n(α12 + · · · + αn
2
),

in view of the fact that 2αj αk ≤ αj2 + αk2 . Hence, ∇2 f1 (x) is positive semidefinite
for all x ∈ X, and it follows from Prop. 1.2.6(a) that f1 is convex.
(b) We show that the Hessian of f2 is positive semidefinite at all x ∈ n . Let
β(x) = ex1 + · · · + exn . Then a straightforward calculation yields

1   (xi +xj )
n n

z  ∇2 f2 (x)z = e (zi − zj )2 ≥ 0, ∀ z ∈ n .
β(x)2
i=1 j=1

Hence by Prop. 1.2.6, f2 is convex.


 
(b) The function f3 (x) = x p can be viewed as a composition g f (x) of the
scalar function g(t) = tp with p ≥ 1 and the function f (x) = x . In this case, g is
convex and monotonically increasing over the nonnegative axis, the set of values
that f can take, while f is convex over n (since any vector norm is convex,
see the discussion preceding Prop. 1.2.4). Using Exercise 1.4, it follows that the
function f3 (x) = x p is convex over n .
1
 
(c) The function f4 (x) = f (x) can be viewed as a composition g h(x) of the
function g(t) = − 1t for t < 0 and the function h(x) = −f (x) for x ∈ n . In this
case, the g is convex and monotonically increasing in the set {t | t < 0}, while h
is convex over n . Using Exercise 1.4, it follows that the function f4 (x) = f (x)
1

is convex over  .
n
 
(d) The function f5 (x) = αf (x) + β can be viewed as a composition g f (x) of
the function g(t) = αt + β, where t ∈ , and the function f (x) for x ∈ n . In this
case, g is convex and monotonically increasing over  (since α ≥ 0), while f is
convex over n . Using Exercise 1.4, it follows that the function f5 (x) = αf (x)+β
is convex over n .
  
(e) The function f6 (x) = eβx Ax can be viewed as a composition g f (x) of the
function g(t) = eβt for t ∈  and the function f (x) = x Ax for x ∈ n . In this
case, g is convex and monotonically increasing over , while f is convex over n
(since A is positive semidefinite). Using Exercise 1.4, it follows that the function

f6 (x) = eβx Ax is convex over n .
(f) This part is straightforward using the definition of a convex function.

1.6 (Ascent/Descent Behavior of a Convex Function)

(a) Let x1 , x2 , x3 be three scalars such that x1 < x2 < x3 . Then we can write x2
as a convex combination of x1 and x3 as follows

x3 − x2 x2 − x1
x2 = x1 + x3 ,
x3 − x1 x3 − x1

5
so that by convexity of f , we obtain
x3 − x2 x2 − x1
f (x2 ) ≤ f (x1 ) + f (x3 ).
x3 − x1 x3 − x1

This relation and the fact


x3 − x2 x2 − x1
f (x2 ) = f (x2 ) + f (x2 ),
x3 − x1 x3 − x1

imply that
x3 − x2   x2 − x1  
f (x2 ) − f (x1 ) ≤ f (x3 ) − f (x2 ) .
x3 − x1 x3 − x1

By multiplying the preceding relation with x3 − x1 and by dividing it with (x3 −


x2 )(x2 − x1 ), we obtain

f (x2 ) − f (x1 ) f (x3 ) − f (x2 )


≤ .
x2 − x1 x3 − x2

(b) Let {xk } be an increasing scalar sequence, i.e., x1 < x2 < x3 < · · · . Then
according to part (a), we have for all k

f (x2 ) − f (x1 ) f (x3 ) − f (x2 ) f (xk+1 ) − f (xk )


≤ ≤ ··· ≤ . (1.2)
x2 − x1 x3 − x2 xk+1 − xk
 
Since f (xk ) − f (xk−1 ) /(xk − xk−1 ) is monotonically nondecreasing, we have

f (xk ) − f (xk−1 )
→ γ, (1.3)
xk − xk−1

where γ is either a real number or ∞. Furthermore,

f (xk+1 ) − f (xk )
≤ γ, ∀ k. (1.4)
xk+1 − xk

We now show that γ is independent of the sequence {xk }. Let {yj } be


any increasing scalar sequence. For each j, choose xkj such that yj < xkj and
xk1 < xk2 < · · · < xkj , so that we have yj < yj+1 < xkj+1 < xkj+2 . By part (a),
it follows that
f (yj+1 ) − f (yj ) f (xkj+2 ) − f (xkj+1 )
≤ ,
yj+1 − yj xkj+2 − xkj+1
and letting j → ∞ yields

f (yj+1 ) − f (yj )
lim ≤ γ.
j→∞ yj+1 − yj

Similarly, by exchanging the roles of {xk } and {yj }, we can show that

f (yj+1 ) − f (yj )
lim ≥ γ.
j→∞ yj+1 − yj

6
Thus the limit in Eq. (1.3) is independent of the choice for {xk }, and Eqs. (1.2)
and (1.4) hold for any increasing scalar sequence {xk }.
We consider separately each of the three possibilities γ < 0, γ = 0, and
γ > 0. First, suppose that γ < 0, and let {xk } be any increasing sequence. By
using Eq. (1.4), we obtain


k−1
f (xj+1 ) − f (xj )
f (xk ) = (xj+1 − xj ) + f (x1 )
xj+1 − xj
j=1


k−1

≤ γ(xj+1 − xj ) + f (x1 )
j=1

= γ(xk − x1 ) + f (x1 ),

and since γ < 0 and xk → ∞, it follows that f (xk ) → −∞. To show that f
decreases monotonically, pick any x and y with x < y, and consider the sequence
x1 = x, x2 = y, and xk = y + k for all k ≥ 3. By using Eq. (1.4) with k = 1, we
have
f (y) − f (x)
≤ γ < 0,
y−x
so that f (y) − f (x) < 0. Hence f decreases monotonically to −∞, corresponding
to case (1).
Suppose now that γ = 0, and let {xk } be any increasing sequence. Then,
by Eq. (1.4), we have f (xk+1 ) − f (xk ) ≤ 0 for all k. If f (xk+1 ) − f (xk ) < 0 for all
k, then f decreases monotonically. To show this, pick any x and y with x < y,
and consider a new sequence given by y1 = x, y2 = y, and yk = xK+k−3 for all
k ≥ 3, where K is large enough so that y < xK . By using Eqs. (1.2) and (1.4)
with {yk }, we have
f (y) − f (x) f (xK+1 ) − f (xK )
≤ < 0,
y−x xK+1 − xK
implying that f (y) − f (x) < 0. Hence f decreases monotonically, and it may
decrease to −∞ or to a finite value, corresponding to cases (1) or (2), respectively.
If for some K we have f (xK+1 ) − f (xK ) = 0, then by Eqs. (1.2) and (1.4)
where γ = 0, we obtain f (xk ) = f (xK ) for all k ≥ K. To show that f stays at
the value f (xK ) for all x ≥ xK , choose any x such that x > xK , and define {yk }
as y1 = xK , y2 = x, and yk = xN +k−3 for all k ≥ 3, where N is large enough so
that x < xN . By using Eqs. (1.2) and (1.4) with {yk }, we have
f (x) − f (xK ) f (xN ) − f (x)
≤ ≤ 0,
x − xK xN − x
so that f (x) ≤ f (xK ) and f (xN ) ≤ f (x). Since f (xK ) = f (xN ), we have
f (x) = f (xK ). Hence f (x) = f (xK ) for all x ≥ xK , corresponding to case (3).
 Finally, suppose
 that γ > 0, and let {xk } be any increasing sequence. Since
f (xk ) − f (xk−1 ) /(xk − xk−1 ) is nondecreasing and tends to γ [cf. Eqs. (1.3)
and (1.4)], there is a positive integer K and a positive scalar  with  < γ such
that
f (xk ) − f (xk−1 )
≤ , ∀ k ≥ K. (1.5)
xk − xk−1

7
Therefore, for all k > K

k−1
f (xj+1 ) − f (xj )
f (xk ) = (xj+1 − xj ) + f (xK ) ≥ (xk − xK ) + f (xK ),
xj+1 − xj
j=K

implying that f (xk ) → ∞. To show that f (x) increases monotonically to ∞ for


all x ≥ xK , pick any x < y satisfying xK < x < y, and consider a sequence given
by y1 = xK , y2 = x, y3 = y, and yk = xN +k−4 for k ≥ 4, where N is large enough
so that y < xN . By using Eq. (1.5) with {yk }, we have
f (y) − f (x)
≤ .
y−x
Thus f (x) increases monotonically to ∞ for all x ≥ xK , corresponding to case
(4) with x = xK .

1.7 (Characterization of Differentiable Convex Functions)

If f is convex, then by Prop. 1.2.5(a), we have


f (y) ≥ f (x) + ∇f (x) (y − x), ∀ x, y ∈ C.
By exchanging the roles of x and y in this relation, we obtain
f (x) ≥ f (y) + ∇f (y) (x − y), ∀ x, y ∈ C,
and by adding the preceding two inequalities, it follows that
 
∇f (y) − ∇f (x) (x − y) ≥ 0. (1.6)
Conversely, let Eq. (1.6) hold, and let x and y be two points in C. Define
the function h :  →  by
 
h(t) = f x + t(y − x) .
Consider some t, t ∈ [0, 1] such that t < t . By convexity of C, we have that
x + t(y − x) and x + t (y − x) belong to C. Using the chain rule and Eq. (1.6),
we have
 dh(t ) dh(t) 
− (t − t)
dt dt
     
= ∇f x + t (y − x) − ∇f x + t(y − x) (y − x)(t − t)

≥ 0.
Thus, dh/dt is nondecreasing on [0, 1] and for any t ∈ (0, 1), we have
 t  1
h(t) − h(0) 1 dh(τ ) 1 dh(τ ) h(1) − h(t)
= dτ ≤ h(t) ≤ dτ = .
t t 0
dτ 1−t t
dτ 1−t
Equivalently,
th(1) + (1 − t)h(0) ≥ h(t),
and from the definition of h, we obtain
 
tf (y) + (1 − t)f (x) ≥ f ty + (1 − t)x .
Since this inequality has been proved for arbitrary t ∈ [0, 1] and x, y ∈ C, we
conclude that f is convex.

8
1.8 (Characterization of Twice Continuously Differentiable
Convex Functions)

Suppose that f : n →  is convex over C. We first show that for all x ∈ ri(C)
and y ∈ S, we have y  ∇2 f (x)y ≥ 0. Assume to arrive at a contradiction, that
there exists some x ∈ ri(C) such that for some y ∈ S, we have

y  ∇2 f (x)y < 0.

Without loss of generality, we may assume that y = 1. Using the continuity of


∇2 f , we see that there is an open ball B(x, ) centered at x̄ with radius  such
that B(x, ) ∩ aff(C) ⊂ C [since x ∈ ri(C)], and

y  ∇2 f (x)y < 0, ∀ x ∈ B(x, ). (1.7)

By Prop. 1.1.13(a), for all positive scalars α with α < , we have

1  2
f (x̄ + αy) = f (x̄) + α∇f (x̄) y + y ∇ f (x̄ + ᾱy)y,
2

for some ᾱ ∈ [0, α]. Furthermore, (x + αy) − x ≤  [since y = 1 and ᾱ < ].
Hence, from Eq. (1.7), it follows that

f (x̄ + αy) < f (x̄) + α∇f (x̄) y, ∀ α ∈ [0, ).

On the other hand, by the choice of  and the assumption that y ∈ S, the vectors
x̄ + αy are in C for all α with α ∈ [0, ), which is a contradiction in view of
the convexity of f over C. Hence, we have y  ∇2 f (x)y ≥ 0 for all y ∈ S and all
x ∈ ri(C).
Next, let x be a point in C that is not in the relative interior of C. Then, by
the Line Segment Principle, there is a sequence {xk } ⊂ ri(C) such that xk → x.
As seen above, y  ∇2 f (xk )y ≥ 0 for all y ∈ S and all k, which together with the
continuity of ∇2 f implies that

y  ∇2 f (x)y = lim y  ∇2 f (xk )y ≥ 0, ∀ y ∈ S.


k→∞

It follows that y  ∇2 f (x)y ≥ 0 for all x ∈ C and y ∈ S.


Conversely, assume that y  ∇2 f (x)y ≥ 0 for all x ∈ C and y ∈ S. By Prop.
1.1.13(a), for all x, z ∈ C we have
 
f (z) = f (x) + (z − x) ∇f (x) + 12 (z − x) ∇2 f x + α(z − x) (z − x)

for some α ∈ [0, 1]. Since x, z ∈ C, we have that (z − x) ∈ S, and using the
convexity of C and our assumption, it follows that

f (z) ≥ f (x) + (z − x) ∇f (x), ∀ x, z ∈ C.

From Prop. 1.2.5(a), we conclude that f is convex over C.

9
1.9 (Strong Convexity)

(a) Fix some


 x, y ∈ 
 such that x = y, and define the function h :  →  by
n

h(t) = f x + t(y − x) . Consider scalars t and s such that t < s. Using the chain
rule and the equation
  2
∇f (x) − ∇f (y) (x − y) ≥ α x − y , ∀ x, y ∈ n , (1.8)

for some α > 0, we have


 dh(s) dh(t)

− (s − t)
dt dt
     
= ∇f x + s(y − x) − ∇f x + t(y − x) (y − x)(s − t)

≥ α(s − t)2 x − y 2
> 0.

Thus, dh/dt is strictly increasing and for any t ∈ (0, 1), we have
 t  1
h(t) − h(0) 1 dh(τ ) 1 dh(τ ) h(1) − h(t)
= dτ < dτ = .
t t 0
dτ 1−t t
dτ 1−t

 th(1) + (1 − t)h(0) > h(t). The definition of h yields tf (y) + (1 −


Equivalently,
t)f (x) > f ty + (1 − t)x . Since this inequality has been proved for arbitrary
t ∈ (0, 1) and x = y, we conclude that f is strictly convex.
(b) Suppose now that f is twice continuously differentiable and Eq. (1.8) holds.
Let c be a scalar. We use Prop. 1.1.13(b) twice to obtain

c2  2
f (x + cy) = f (x) + cy  ∇f (x) + y ∇ f (x + tcy)y,
2
and
c2  2
f (x) = f (x + cy) − cy  ∇f (x + cy) +y ∇ f (x + scy)y,
2
for some t and s belonging to [0, 1]. Adding these two equations and using Eq.
(1.8), we obtain

c2   2   
y ∇ f (x + scy) + ∇2 f (x + tcy) y = ∇f (x + cy) − ∇f (x) (cy) ≥ αc2 y 2
.
2
We divide both sides by c2 and then take the limit as c → 0 to conclude that
y  ∇2 f (x)y ≥ α y 2 . Since this inequality is valid for every y ∈ n , it follows
that ∇2 f (x) − αI is positive semidefinite.
For the converse, assume that ∇2 f (x) − αI is positive semidefinite for all
x ∈  . Consider the function g :  →  defined by
n

 
g(t) = ∇f tx + (1 − t)y (x − y).

Using the Mean Value Theorem (Prop. 1.1.12), we have


  dg(t)
∇f (x) − ∇f (y) (x − y) = g(1) − g(0) =
dt

10
for some t ∈ [0, 1]. On the other hand,

dg(t)  
= (x − y) ∇2 f tx + (1 − t)y (x − y) ≥ α x − y 2
,
dt
 
where the last inequality holds because ∇2 f tx+(1−t)y −αI is positive semidef-
inite. Combining the last two relations, it follows that f is strongly convex with
coefficient α.

1.10 (Posynomials)

(a) Consider the following posynomial for which we have n = m = 1 and β = 12 ,


1
g(y) = y 2 , ∀ y > 0.

This function is not convex.


(b) Consider the following change of variables, where we set
 
f (x) = ln g(y1 , . . . , yn ) , bi = ln βi , ∀ i, xj = ln yj , ∀ j.

With this change of variables, f (x) can be written as


m 

f (x) = ln ebi +ai1 x1 +···+ain xn .
i=1

Note that f (x) can also be represented as

f (x) = ln exp(Ax + b), ∀ x ∈ n ,


 
where ln exp(z) = ln ez1 + · · · + ezm for all z ∈ m , A is an m × n matrix with
entries aij , and b ∈ m is a vector with components bi . Let f2 (z) = ln(ez1 +
· · · + ezm ). This function is convex by Exercise 1.5(b). With this identification,
f (x) can be viewed as the composition f (x) = f2 (Ax + b), which is convex by
Exercise 1.5(g).
(c) Consider the function g : n →  of the form

g(y) = g1 (y)γ1 · · · gr (y)γr ,

where gk is a posynomial and γk > 0 for all k. Using a change of variables similar
to part (b), we see that we can represent the function f (x) = ln g(y) as


r

f (x) = γk ln exp(Ak x + bk ),
k=1

with the matrix Ak and the vector bk being associated with the posynomial gk for
each k. Since f (x) is a linear combination of convex functions with nonnegative
coefficients [part (b)], it follows from Prop. 1.2.4(a) that f (x) is convex.

11
1.11 (Arithmetic-Geometric Mean Inequality)

Consider the function f (x) = − ln(x). Since ∇2 f (x) = 1/x2 > 0 for all x > 0, the
function − ln(x) is strictly convex over (0, n∞). Therefore, for all positive scalars
x1 , . . . , xn ∈ (0, ∞) and α1 , . . . αn with α = 1, we have
i=1 i

− ln(α1 x1 + · · · + αn xn ) ≤ −α1 ln(x1 ) − · · · − αn ln(xn ),

which is equivalent to

eln(α1 x1 +···+αn xn ) ≥ eα1 ln(x1 )+···+αn ln(xn ) = eα1 ln(x1 ) · · · eαn ln(xn ) ,

or
α
α1 x1 + · · · + αn xn ≥ x1 1 · · · xαn
n ,

as desired. Since − ln(x) is strictly convex, the above inequality is satisfied with
equality if and only if x1 , . . . , xn are all equal.

1.12 (Young and Holder Inequalities)

According to Exercise 1.11, we have


1 1 u v
up vq ≤ + , ∀ u > 0, ∀ v > 0,
p q

where 1/p + 1/q = 1, p > 0, and q > 0. The above relation also holds if u = 0 or
v = 0. By setting u = xp and v = y q , we obtain Young’s inequality
xp yq
xy ≤ + , ∀ x ≥ 0, ∀ y ≥ 0.
p q
To show Holder’s inequality, note that it holds if x1 = · · · = xn = 0 or
y1 = · · · = yn = 0. If x1 , . . . , xn and y1 , . . . , yn are such that (x1 , . . . , xn ) = 0
and (y1 , . . . , yn ) = 0, then by using

|xi | |yi |
x=  1/p and y=  1/q
 n  n
j=1
|xj |p j=1
|yj |q

in Young’s inequality, we have for all i = 1, . . . , n,

|xi | |yi | |x |p |y |q
1/q ≤ n  +  i .
i
 1/p  n
n
|xj |p
n
|yj |q p j=1
|xj | p q j=1
|yj |q
j=1 j=1

By adding these inequalities over i = 1, . . . , n, we obtain


n
|xi | · |yi | 1 1
1/q ≤ p + q = 1,
i=1
  1/p 
n n
j=1
|xj |p j=1
|yj | q

which implies Holder’s inequality.

12
1.13

Let (x, w) and (y, v) be two vectorsin epi(f). Then f (x) ≤ w and f (y) ≤ v,
implying that there exist sequences (x, wk ) ⊂ C and (y, v k ) ⊂ C such that
for all k,
1 1
wk ≤ w + , vk ≤ v + .
k k
By the convexity of C, we have for all α ∈ [0, 1] and all k,
 
αx + (1 − αy), αwk + (1 − α)v k ∈ C,

so that for all k,


  1
f αx + (1 − α)y ≤ αwk + (1 − α)v k ≤ αw + (1 − α)v + .
k
Taking the limit as k → ∞, we obtain
 
f αx + (1 − α)y ≤ αw + (1 − α)v,

so that α(x, w) + (1 − α)(y, v) ∈ epi(f ). Hence, epi(f ) is convex, implying that


f is convex.

1.14

The elements of X belong to conv(X), so all their convex combinations belong


to conv(X) since conv(X) is a convex set. On the other hand, consider any
two convex combinations of elements of X, x = λ1 x1 + · · · + λm xm and y =
µ1 y1 + · · · + µr yr , where xi ∈ X and yj ∈ X. The vector

(1 − α)x + αy = (1 − α) (λ1 x1 + · · · + λm xm ) + α (µ1 y1 + · · · + µr yr ) ,

where 0 ≤ α ≤ 1, is another convex combination of elements of X. Thus, the


set of convex combinations of elements of X is itself a convex set. It contains X,
and is contained in conv(X), so it must coincide with conv(X).

1.15

Let y ∈ cone(C). If y = 0, then y ∈ ∪x∈C {γx | γ ≥ 0} and we are done. If y = 0,


then by definition of cone(C), we have


m

y= λi xi ,
i=1

m xi ∈ C. Since
for some positive integer m, nonnegative scalars λi , and vectors
y = 0, we cannot have all λi equal to zero, implying that λ > 0. Because
i=1 i
xi ∈ C for all i and C is convex, the vector

m
λ
x= m i xi
i=1
λi
i=1

13
belongs to C. For this vector, we have
m 

y= λi x,
i=1

m 
with i=1
λi > 0, implying that y ∈ ∪x∈C γx | γ ≥ 0} and showing that

cone(C) ⊂ ∪x∈C {γx | γ ≥ 0}.

The reverse inclusion follows directly from the definition of cone(C).

1.16 (Convex Cones)

(a) Let x ∈ C and let λ be a positive scalar. Then

ai (λx) = λai x ≤ 0, ∀ i ∈ I,

showing that λx ∈ C and that C is a cone. Let x, y ∈ C and let λ ∈ [0, 1]. Then
 
ai λx + (1 − λ)y = λai x + (1 − λ)ai y ≤ 0, ∀ i ∈ I,
 
showing that λx + (1 − λ)y ∈ C and that C is convex. Let a sequence {xk } ⊂ C
converge to some x̄ ∈ n . Then

ai x̄ = lim ai xk ≤ 0, ∀ i ∈ I,


k→∞

showing that x̄ ∈ C and that C is closed.


(b) Let C be a cone such that C + C ⊂ C, and let x, y ∈ C and α ∈ [0, 1]. Then
since C is a cone, αx ∈ C and (1 − α)y ∈ C, so that αx + (1 − α)y ∈ C + C ⊂ C,
showing that C is convex. Conversely, let C be a convex cone and let x, y ∈ C.
Then, since C is a cone, 2x ∈ C and 2y ∈ C, so that by the convexity of C,
x + y = 12 (2x + 2y) ∈ C, showing that C + C ⊂ C.
(c) First we prove that C1 + C2 ⊂ conv(C1 ∪ C2 ). Choose any x ∈ C1 + C2 .
Since C1 + C2 is a cone [see Exercise 1.2(c)], the vector 2x is in C1 + C2 , so that
2x = x1 + x2 for some x1 ∈ C1 and x2 ∈ C2 . Therefore,

1 1
x= x1 + x2 ,
2 2

showing that x ∈ conv(C1 ∪ C2 ).


Next, we show that conv(C1 ∪ C2 ) ⊂ C1 + C2 . Since 0 ∈ C1 and 0 ∈ C2 , it
follows that
Ci = Ci + 0 ⊂ C1 + C2 , i = 1, 2,
implying that
C1 ∪ C2 ⊂ C1 + C2 .

14
By taking the convex hull of both sides in the above inclusion and by using the
convexity of C1 + C2 , we obtain

conv(C1 ∪ C2 ) ⊂ conv(C1 + C2 ) = C1 + C2 .

We finally show that


  
C1 ∩ C2 = αC1 ∩ (1 − α)C2 .
α∈[0,1]

We claim that for all α with 0 < α < 1, we have

αC1 ∩ (1 − α)C2 = C1 ∩ C2 .

Indeed, if x ∈ C1 ∩ C2 , it follows that x ∈ C1 and x ∈ C2 . Since C1 and C2


are cones and 0 < α < 1, we have x ∈ αC1 and x ∈ (1 − α)C2 . Conversely, if
x ∈ αC1 ∩ (1 − α)C2 , we have
x
∈ C1 ,
α
and
x
∈ C2 .
(1 − α)
Since C1 and C2 are cones, it follows that x ∈ C1 and x ∈ C2 , so that x ∈ C1 ∩C2 .
If α = 0 or α = 1, we obtain

αC1 ∩ (1 − α)C2 = {0} ⊂ C1 ∩ C2 ,

since C1 and C2 contain the origin. Thus, the result follows.

1.17

By Exercise 1.14, C is the set of all convex combinations x = α1 y1 + · · · + αm ym ,


where m is a positive integer, and the vectors y1 , . . . , ym belong to the union of
the sets Ci . Actually, we can get C just by taking those combinations in which
the vectors are taken from different sets Ci . Indeed, if two of the vectors, y1 and
y2 belong to the same Ci , then the term α1 y1 + α2 y2 can be replaced by αy,
where α = α1 + α2 and

y = (α1 /α)y1 + (α2 /α)y2 ∈ Ci .

Thus, C is the union of the vector sums of the form

α1 Ci1 + · · · + αm Cim ,

with

m

αi ≥ 0, ∀ i = 1, . . . , m, αi = 1,
i=1

and the indices i1 , . . . , im are all different, proving our claim.

15
1.18 (Convex Hulls, Affine Hulls, and Generated Cones)

(a) We first show that X and cl(X) have the same affine hull. Since X ⊂ cl(X),
there holds  
aff(X) ⊂ aff cl(X) .
Conversely, because X ⊂ aff(X) and aff(X) is closed, we have cl(X) ⊂ aff(X),
implying that  
aff cl(X) ⊂ aff(X).
We now show that X and conv(X) have the same affine hull. By using a
translation argument if necessary, we assume without  loss
 of generality that X
contains the origin, so that both aff(X) and aff conv(X) are subspaces. Since
 
X ⊂ conv(X), evidently aff(X) ⊂ aff conv(X) . To show the reverse inclusion,
 
let the dimension of aff conv(X) be m, and let x1 , . . . , xm be linearly indepen-
   
dent vectors in conv(X) that span aff conv(X) . Then every x ∈ aff conv(X) is
a linear combination of the vectors x1 , . . . , xm , i.e., there exist scalars β1 , . . . , βm
such that

m

x= βi xi .
i=1

By the definition of convex hull, each xi is a convex combination of vectors in


 combination of vectors in X, implying that x ∈ aff(X).
X, so that x is a linear
Hence, aff conv(X) ⊂ aff(X).
 
(b) Since X ⊂ conv(X), clearly cone(X) ⊂ cone conv(X) . Conversely, let
 
x ∈ cone conv(X) . Then x is a nonnegative combination of some vectors in
conv(X), i.e., for some positive integer p, vectors x1 , . . . , xp ∈ conv(X), and
nonnegative scalars α1 , . . . , αp , we have

p

x= αi xi .
i=1

Each xi is a convex combination of some vectors in X, so that x is a nonneg-


ative combination
 of some vectors in X, implying that x ∈ cone(X). Hence
cone conv(X) ⊂ cone(X).
(c) Since conv(X) is the set of all convex combinations of vectors in X, and
cone(X) is the set of all nonnegative combinations of vectors in X, it follows that
conv(X) ⊂ cone(X). Therefore
   
aff conv(X) ⊂ aff cone(X) .

As an example
 showing that the above inclusion can be strict, consider the
set X = (1, 1) in 2 . Then conv(X) = X, so that
   
aff conv(X) = X = (1, 1) ,

and the dimension of conv(X) is zero. On the other hand, cone(X) = (α, α) |

α ≥ 0 , so that
   
aff cone(X) = (x1 , x2 ) | x1 = x2 ,

16
and the dimension of cone(X) is one.
(d) In view of parts (a) and (c), it suffices to show that
   
aff cone(X) ⊂ aff conv(X) = aff(X).
 
It is always true that 0 ∈ cone(X), so aff cone(X) is a subspace. Let the
 
dimension of aff cone(X) be m, and let x1 , . . . , xm be linearly independent
   
vectors in cone(X) that span aff cone(X) . Since every vector in aff cone(X) is
a linear combination of x1 , . . . , xm , and since each xi is a nonnegative
 combination
of some vectors in X, it follows that every vector in aff cone(X) is a linear
combination of some vectors in X. In view of the assumption that 0 ∈ conv(X),
the affine hull of conv(X) is a subspace, which implies  by part  (a) that the affine
hull of X is a subspace. Hence, every vector in aff cone(X) belongs to aff(X),
 
showing that aff cone(X) ⊂ aff(X).

1.19

By definition, f (x) is the infimum of the values of w such that (x, w) ∈ C, where
C is the convex hull of the union of nonempty convex sets epi(fi ). By Exercise
1.17, (x, w) ∈ C if and only if (x, w) can be expressed as a convex combination
of the form
⎛ ⎞
  
(x, w) = αi (xi , wi ) = ⎝ αi x i , αi wi ⎠ ,
i∈I i∈I i∈I

where I ⊂ I is a finite set and (xi , wi ) ∈ epi(fi ) for all i ∈ I. Thus, f (x) can be
expressed as

  

f (x) = inf αi wi  (x, w) = αi (xi , wi ),
i∈I i∈I


(xi , wi ) ∈ epi(fi ), αi ≥ 0, ∀ i ∈ I, αi = 1 .
i∈I
  
Since the set xi , fi (xi ) | xi ∈ n is contained in epi(fi ), we obtain
⎧ ⎫
⎨    ⎬

f (x) ≤ inf αi fi (xi )  x = αi xi , xi ∈ n , αi ≥ 0, ∀ i ∈ I, αi = 1 .
⎩ ⎭
i∈I i∈I i∈I

On the other hand, by the definition of epi(fi ), for each (xi , wi ) ∈ epi(fi ) we
have wi ≥ fi (xi ), implying that
⎧ ⎫
⎨    ⎬

f (x) ≥ inf αi fi (xi )  x = αi xi , xi ∈ n , αi ≥ 0, ∀ i ∈ I, αi = 1 .
⎩ ⎭
i∈I i∈I i∈I

17
By combining the last two relations, we obtain
⎧ ⎫
⎨    ⎬

f (x) = inf αi fi (xi )  x = αi xi , xi ∈ n , αi ≥ 0, ∀ i ∈ I, αi = 1 ,
⎩ ⎭
i∈I i∈I i∈I

where the infimum is taken over all representations of x as a convex combination


of elements xi such that only finitely many coefficients αi are nonzero.

1.20 (Convexification of Nonconvex Functions)


 
(a) Since conv epi(f ) is a convex set, it follows from Exercise 1.13 that F is con-
 
vex over conv(X). By Caratheodory’s Theorem, it can be seen that conv epi(f )
is the set of all convex combinations of elements of epi(f ), so that

  

F (x) = inf αi wi  (x, w) = αi (xi , wi ),
i i


(xi , wi ) ∈ epi(f ), αi ≥ 0, αi = 1 ,
i

where the infimum is taken over all


 representations
 of
 x as a convex combination
of elements of X. Since the set z, f (z) | z ∈ X is contained in epi(f ), we
obtain
 
   

F (x) ≤ inf αi f (xi )  x = αi xi , xi ∈ X, αi ≥ 0, αi = 1 .
i i i

On the other hand, by the definition of epi(f ), for each (xi , wi ) ∈ epi(f ) we have
wi ≥ f (xi ), implying that

  

F (x) ≥ inf αi f (xi )  (x, w) = αi (xi , wi ),
i i


(xi , wi ) ∈ epi(f ), αi ≥ 0, αi = 1 ,
i
 
   

= inf αi f (xi )  x = αi xi , xi ∈ X, αi ≥ 0, αi = 1 ,
i i i

which combined with the preceding inequality implies the desired relation.
(b) By using part (a), we have for every x ∈ X

F (x) ≤ f (x),

18

since f (x) corresponds to the value of the function α f (xi ) for a particular
i i
representation of x as a finite convex combination of elements of X, namely
x = 1 · x. Therefore, we have

inf F (x) ≤ inf f (x),


x∈X x∈X

and since X ⊂ conv(X), it follows that

inf F (x) ≤ inf f (x).


x∈conv(X) x∈X

Let f ∗ = inf x∈X f (x). If inf x∈conv(X) F (x) < f ∗ , then there exists z ∈
conv(X) with F (z) < f ∗ . According
 exist points xi ∈ X and
to part (a), there
nonnegative scalars αi with α = 1 such that z = i αi xi and
i i


F (z) ≤ αi f (xi ) < f ∗ ,
i

implying that
  
αi f (xi ) − f ∗ < 0.
i

Since each αi is nonnegative, for this inequality to hold, we must have f (xi )−f ∗ <
0 for some i, but this cannot be true because xi ∈ X and f ∗ is the optimal value
of f over X. Therefore

inf F (x) = inf f (x).


x∈conv(X) x∈X

(c) If x∗ ∈ X is a global minimum of f over X, then x∗ also belongs to conv(X),


and by part (b)

inf F (x) = inf f (x) = f (x∗ ) ≥ F (x∗ ),


x∈conv(X) x∈X

showing that x∗ is also a global minimum of F over conv(X).

1.21 (Minimization of Linear Functions)

Let f : X →  be the function f (x) = c x, and define


  
F (x) = inf w | (x, w) ∈ conv epi(f ) ,

as in Exercise 1.20. According to this exercise, we have

inf F (x) = inf f (x),


x∈conv(X) x∈X

19
and
 
   
 
F (x) = inf αi c x i  αi xi = x, xi ∈ X, αi = 1, αi ≥ 0
i i i
   
   
 
= inf c αi xi  αi xi = x, xi ∈ X, αi = 1, αi ≥ 0
i i i

= c x,
showing that
inf c x = inf c x.
x∈conv(X) x∈X

According to Exercise 1.20(c), if inf x∈X c x is attained at some x∗ ∈ X,


then inf x∈conv(X) c x is also attained at x∗ . Suppose now that inf x∈conv(X) c x is
attained at some x∗ ∈ conv(X), i.e., there is x∗ ∈ conv(X) such that
inf c x = c x∗ .
x∈conv(X)

Then, by Caratheodory’s Theorem, there exist vectors x1 , . . . , xn+1in X and


n+1
αi = 1 such that x∗ = i=1 αi xi ,
n+1
nonnegative scalars α1 , . . . , αn+1 with i=1
implying that

n+1

c  x∗ = αi c xi .
i=1

Since xi ∈ X ⊂ conv(X) for all i and c x ≥ c x∗ for all x ∈ conv(X), it follows


that

n+1

n+1

c x ∗ = αi c xi ≥ αi c x∗ = c x∗ ,
i=1 i=1

implying that c xi = c x∗ for all i corresponding to αi > 0. Hence, inf x∈X c x is


attained at the xi ’s corresponding to αi > 0.

1.22 (Extension of Caratheodory’s Theorem)

The proof will be an application of Caratheodory’s Theorem [part (a)] to the


subset of n+1 given by
   
Y = (x, 1) | x ∈ X1 ∪ (y, 0) | y ∈ X2 .
If x ∈ X, then

k

m

x= γi xi + γi yi ,
i=1 i=k+1

where the vectors x1 , . . . , xk belong to X1 , the vectors yk+1 , . . . , ym belong to X2 ,


and the scalars γ1 , . . . , γm are nonnegative with γ1 + · · · + γk = 1. Equivalently,
(x, 1) ∈ cone(Y ). By Caratheodory’s Theorem part (a), we have that


k

m

(x, 1) = αi (xi , 1) + αi (yi , 0),


i=1 i=k+1

20
for some positive scalars α1 , . . . , αm and vectors

(x1 , 1), . . . (xk , 1), (yk+1 , 0), . . . , (ym , 0),

which are linearly independent (implying that m ≤ n + 1) or equivalently,


k

m

k

x= αi xi + αi yi , 1= αi .
i=1 i=k+1 i=1

Finally, to show that the vectors x2 − x1 , . . . , xk − x1 , yk+1 , . . . , ym are linearly


independent, assume to arrive at a contradiction, that there exist λ2 , . . . , λm , not
all 0, such that

k

m

λi (xi − x1 ) + λi yi = 0.
i=2 i=k+1

Equivalently, defining λ1 = −(λ2 + · · · + λm ), we have


k

m

λi (xi , 1) + λi (yi , 0) = 0,
i=1 i=k+1

which contradicts the linear independence of the vectors

(x1 , 1), . . . , (xk , 1), (yk+1 , 0), . . . , (ym , 0).

1.23

The set cl(X) is compact since X is bounded by assumption. Hence, by Prop.


1.3.2, its convex hull, conv cl(X) , is compact, and it follows that

      
cl conv(X) ⊂ cl conv cl(X) = conv cl(X) .

It is also true in general that

      
conv cl(X) ⊂ conv cl conv(X) = cl conv(X) ,

since by Prop. 1.2.1(d), the closure of a convex set is convex. Hence, the result
follows.

21
1.24 (Radon’s Theorem)

Consider the system of n + 1 equations in the m unknowns λ1 , . . . , λm



m 
m

λi xi = 0, λi = 0.
i=1 i=1

Since m > n + 1, there exists a nonzero solution, call it λ∗ . Let


I = {i | λ∗i ≥ 0}, J = {j | λ∗j < 0},
and note that I and J are nonempty, and that
 
λ∗k = (−λ∗k ) > 0.
k∈I k∈J

Consider the vector 


x∗ = αi xi ,
i∈I

where
λ∗i
αi =  , i ∈ I.
λ∗k
m m
k∈I

In view of the equations i=1


λ∗i xi = 0 and i=1
λ∗i = 0, we also have

x∗ = αj xj ,
j∈J

where
−λ∗j
αj =  , j ∈ J.
k∈J
(−λ∗k )
It is seen that the αi and αj are nonnegative, and that
 
αi = αj = 1,
i∈I j∈J

so x∗ belongs to the intersection


   
conv {xi | i ∈ I} ∩ conv {xj | j ∈ J} .

1.25 (Helly’s Theorem [Hel21])

Let Bj be defined as in the hint, and for each j, let xj be a vector in Bj . Since
M + 1 ≥ n + 2, we can apply Radon’s Theorem to the vectors x1 , . . . , xM +1 .
Thus, there exist nonempty and disjoint index subsets I and J such that I ∪ J =
{1, . . . , m}, nonnegative scalars α1 , . . . , αm+1 , and a vector x∗ such that
   
x∗ = αi xi = αj xj , αi = αj = 1.
i∈I j∈J i∈I j∈J

It can be seen that for every i ∈ I, a vector in Bi belongs to the intersection


∩j∈J Cj . Therefore, since x∗ is a convex combination of vectors in Bi , i ∈ I, x∗
also belongs to the intersection ∩j∈J Cj . Similarly, by reversing the role of I and
J, we see that x∗ belongs to the intersection ∩i∈I CI . Thus, x∗ belongs to the
intersection of the entire collection C1 , . . . , CM +1 .

22
1.26

Assume the contrary, i.e., that for every index set I ⊂ {1, . . . , M }, which contains
no more than n + 1 indices, we have
 "
infn max fi (x) < f ∗.
x∈ i∈I

This means that for every such I, the intersection ∩i∈I Xi is nonempty, where
 
Xi = x | fi (x) < f ∗ .

From Helly’s Theorem, it follows that the entire collection {Xi | i = 1, . . . , M }


has nonempty intersection, thereby implying that
# $
inf max fi (x) < f ∗.
x∈n i=1,...,M

This contradicts the definition of f ∗ . Note: The result of this exercise relates to
the following question: what is the minimal number of functions fi that we need
to include in the cost function maxi fi (x) in order to attain the optimal value f ∗ ?
According to the result, the number is no more than n + 1. For applications of
this result in structural design and Chebyshev approximation, see Ben Tal and
Nemirovski [BeN01].

1.27

Let x be an arbitrary vector in cl(C). If f (x) = ∞, then we are done, so assume


that f (x) is finite. Let x be a point in the relative interior of C. By the Line
Segment Principle, all the points on the line segment connecting x and x, except
possibly x, belong to ri(C) and therefore, belong to C. From this, the given
property of f , and the convexity of f , we obtain for all α ∈ (0, 1],
 
αf (x) + (1 − α)f (x) ≥ f αx + (1 − α)x ≥ γ.

By letting α → 0, it follows that f (x) ≥ γ. Hence, f (x) ≥ γ for all x ∈ cl(C).

1.28

From Prop. 1.4.5(b), we have that for any vector a ∈ n , ri(C + a) = ri(C) + a.
Therefore, we can assume without loss of generality that 0 ∈ C, and aff(C)
coincides with S. We need to show that

ri(C) = int(C + S ⊥ ) ∩ C.

Let x ∈ ri(C). By definition, this implies that x ∈ C and there exists some
open ball B(x, ) centered at x with radius  > 0 such that

B(x, ) ∩ S ⊂ C. (1.9)

23
We now show that B(x, ) ⊂ C + S ⊥ . Let z be a vector in B(x, ). Then,
we can express z as z = x + αy for some vector y ∈ n with y = 1, and
some α ∈ [0, ). Since S and S ⊥ are orthogonal subspaces, y can be uniquely
decomposed as y = yS + yS ⊥ , where yS ∈ S and yS ⊥ ∈ S ⊥ . Since y = 1, this
implies that yS ≤ 1 (Pythagorean Theorem), and using Eq. (1.9), we obtain

x + αyS ∈ B(x, ) ∩ S ⊂ C,

from which it follows that the vector z = x + αy belongs to C + S ⊥ , implying


that B(x, ) ⊂ C + S ⊥ . This shows that x ∈ int(C + S ⊥ ) ∩ C.
Conversely, let x ∈ int(C + S ⊥ ) ∩ C. We have that x ∈ C and there exists
some open ball B(x, ) centered at x with radius  > 0 such that B(x, ) ⊂ C +S ⊥ .
Since C is a subset of S, it can be seen that (C + S ⊥ ) ∩ S = C. Therefore,

B(x, ) ∩ S ⊂ C,

implying that x ∈ ri(C).

1.29

(a) Let C be the given convex set. The convex hull of any subset of C is contained
in C. Therefore, the maximum dimension of the various simplices contained in
C is the largest m for which C contains m + 1 vectors x0 , . . . , xm such that
x1 − x0 , . . . , xm − x0 are linearly independent.
Let K = {x0 , . . . , xm } be such a set with
 m maximal,
 and let aff(K) denote
the affine hull of set K. Then, we have dim aff(K) = m, and since K ⊂ C, it
follows that aff(K) ⊂ aff(C).
We claim that C ⊂ aff(K). To see this, assume that there exists some
x ∈ C, which does not belong to aff(K). This implies that the set {x, x0 , . . . , xm }
is a set of m + 2 vectors in C such that x − x0 , x1 − x0 , . . . , xm − x0 are linearly
independent, contradicting the maximality of m. Hence, we have C ⊂ aff(K),
and it follows that
aff(K) = aff(C),
thereby implying that dim(C) = m.
(b) We first consider the case where C is n-dimensional with n > 0 and show that
the interior of C is not empty. By part (a), an n-dimensional convex set contains
an n-dimensional simplex. We claim that such a simplex S has a nonempty
interior. Indeed, applying an affine transformation if necessary, we can assume
that the vertices of S are the vectors (0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (0, 0, . . . , 1),
i.e.,  
 
n

S= (x1 , . . . , xn )  xi ≥ 0, ∀ i = 1, . . . , n, xi ≤ 1 .
i=1

The interior of the simplex S,


 

n

int(S) = (x1 , . . . , xn ) | xi > 0, ∀ i = 1, . . . , n, xi < 1 ,


i=1

24
is nonempty, which in turn implies that int(C) is nonempty.
For the case where dim(C) < n, consider the n-dimensional set C + S ⊥ ,
where S ⊥ is the orthogonal complement of the subspace parallel to aff(C). Since
C + S ⊥ is a convex set, it follows from the above argument that int(C + S ⊥ ) is
nonempty. Let x ∈ int(C + S ⊥ ). We can represent x as x = xC + xS ⊥ , where
xC ∈ C and xS ⊥ ∈ S ⊥ . It can be seen that xC ∈ int(C + S ⊥ ). Since

ri(C) = int(C + S ⊥ ) ∩ C,

(cf. Exercise 1.28), it follows that xc ∈ ri(C), so ri(C) is nonempty.

1.30
 
(a) Let C1 be the segment (x1 , x2 ) | 0 ≤ x1 ≤ 1, x2 = 0 and let C2 be the box
 
(x1 , x2 ) | 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1 . We have

 
ri(C1 ) = (x1 , x2 ) | 0 < x1 < 1, x2 = 0 ,

 
ri(C2 ) = (x1 , x2 ) | 0 < x1 < 1, 0 < x2 < 1 .

Thus C1 ⊂ C2 , while ri(C1 ) ∩ ri(C2 ) = Ø.


(b) Let x ∈ ri(C1 ), and consider a open ball B centered at x such that B ∩
aff(C1 ) ⊂ C1 . Since aff(C1 ) = aff(C2 ) and C1 ⊂ C2 , it follows that B ∩ aff(C2 ) ⊂
C2 , so x ∈ ri(C2 ). Hence ri(C1 ) ⊂ ri(C2 ).
(c) Because C1 ⊂ C2 , we have

ri(C1 ) = ri(C1 ∩ C2 ).

Since ri(C1 ) ∩ ri(C2 ) = Ø, there holds

ri(C1 ∩ C2 ) = ri(C1 ) ∩ ri(C2 )

[Prop. 1.4.5(a)]. Combining the preceding two relations, we obtain ri(C1 ) ⊂


ri(C2 ).
(d) Let x2 be in the intersection of C1 and ri(C2 ), and let x1 be in the relative
interior of C1 [ri(C1 ) is nonempty by Prop. 1.4.1(b)]. If x1 = x2 , then we are
done, so assume that x1 = x2 . By the Line Segment Principle, all the points
on the line segment connecting x1 and x2 , except possibly x2 , belong to the
relative interior of C1 . Since C1 ⊂ C2 , the vector x1 is in C2 , so that by the
Line Segment Principle, all the points on the line segment connecting x1 and x2 ,
except possibly x1 , belong to the relative interior of C2 . Hence, all the points on
the line segment connecting x1 and x2 , except possibly x1 and x2 , belong to the
intersection ri(C1 ) ∩ ri(C2 ), showing that ri(C1 ) ∩ ri(C2 ) is nonempty.

25
1.31

(a) Let x ∈ ri(C). We will show that for every x ∈ aff(C), there exists a γ > 1
such that x + (γ − 1)(x − x) ∈ C. This is true if x = x, so assume that x = x.
Since x ∈ ri(C), there exists  > 0 such that

 
z | z − x <  ∩ aff(C) ⊂ C.

 
Choose a point x ∈ C in the intersection of the ray x + α(x − x) | α ≥ 0 and
 
the set z | z − x <  ∩ aff(C). Then, for some positive scalar α ,

x − x = α (x − x).

Since x ∈ ri(C) and x ∈ C, by Prop. 1.4.1(c), there is γ > 1 such that

x + (γ − 1)(x − x ) ∈ C,

which in view of the preceding relation implies that

x + (γ − 1)α (x − x) ∈ C.

The result follows by letting γ = 1 + (γ − 1)α and noting that γ > 1, since
(γ − 1)α > 0. The converse assertion follows from the fact C ⊂ aff(C) and
Prop. 1.4.1(c).
(b) The inclusion cone(C) ⊂ aff(C) always holds if 0 ∈ C. To show the reverse
inclusion, we note that by part (a) with x = 0, for every x ∈ aff(C), there exists
γ > 1 such that x̃ = (γ − 1)(−x) ∈ C. By using part (a) again with x = 0, for
x̃ ∈ C ⊂ aff(C), we see that there is γ̃ > 1 such that z = (γ̃ − 1)(−x̃) ∈ C, which
combined with x̃ = (γ − 1)(−x) yields z = (γ̃ − 1)(γ − 1)x ∈ C. Hence

1
x= z
(γ̃ − 1)(γ − 1)

with z ∈ C and (γ̃ − 1)(γ − 1) > 0, implying that x ∈ cone(C) and, showing that
aff(C) ⊂ cone(C).
(c) This follows by part (b), where C = conv(X), and the fact

 
cone conv(X) = cone(X)

[Exercise 1.18(b)].

26
1.32

(a) If 0 ∈ C, then 0 ∈ ri(C) since 0 is not on the relative boundary of C.


By Exercise 1.31(b), it follows that cone(C) coincides with aff(C), which is a
closed set. If 0 ∈ C, let y be in the closure of cone(C) and let {yk } ⊂ cone(C)
be a sequence converging to y. By Exercise 1.15, for every yk , there exists a
nonnegative scalar αk and a vector xk ∈ C such that yk = αk xk . Since {yk } → y,
the sequence {yk } is bounded, implying that

αk xk ≤ sup ym < ∞, ∀ k.
m≥0

We have inf m≥0 xm > 0, since {xk } ⊂ C and C is a compact set not containing
the origin, so that

supm≥0 ym
0 ≤ αk ≤ < ∞, ∀ k.
inf m≥0 xm

Thus, the sequence {(αk , xk )} is bounded and has a limit point (α, x) such that
α ≥ 0 and x ∈ C. By taking a subsequence of {(αk , xk )} that converges to (α, x),
and by using the facts yk = αk xk for all k and {yk } → y, we see that y = αx
with α ≥ 0 and x ∈ C. Hence, y ∈ cone(C), showing that cone(C) is closed.
(b) To
 see that the assertion in part
 (a) fails when C is unbounded, let C be the
line (x1 , x2 ) | x1 = 1, x2 ∈  in 2 not passing through the origin. Then,
   
cone(C) is the nonclosed set (x1 , x2 ) | x1 > 0, x2 ∈  ∪ (0, 0) .
To see that the assertion in part (a) fails
 when C contains the origin  on its
relative boundary, let C be the closed ball (x1 , x2 ) | (x1 − 1)2 + x22 ≤ 1 in 2 .
   
Then, cone(C) is the nonclosed set (x1 , x2 ) | x1 > 0, x2 ∈  ∪ (0, 0) (see
Fig. 1.3.2).
(c) Since C is compact, the convex hull of C is compact (cf. Prop. 1.3.2). Because
conv(C) does not contain the origin on its relative boundary, by part (a),
 the cone
generated by conv(C) is closed. By Exercise 1.18(b), cone conv(C) coincides
with cone(C) implying that cone(C) is closed.

1.33

(a) By Prop. 1.4.1(b), the relative interior of a convex set is a convex set. We
only need to show that ri(C) is a cone. Let y ∈ ri(C). Then, y ∈ C and since C
is a cone, αy ∈ C for all α > 0. By the Line Segment Principle, all the points on
the line segment connecting y and αy, except possibly αy, belong to ri(C). Since
this is true for every α > 0, it follows that αy ∈ ri(C) for all α > 0, showing that
ri(C) is a cone.
(b) Consider the linear transformation A that maps (α1 , . . . , αm ) ∈ m into
 m
i=1
αi xi ∈  . Note that C is the image of the nonempty convex set
n

 
(α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0

27
under the linear transformation A. Therefore, by using Prop. 1.4.3(d), we have
  
ri(C) = ri A · (α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0
 
= A · ri (α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0
 
= A · (α1 , . . . , αm ) | α1 > 0, . . . , αm > 0
 

m

= αi xi | α1 > 0, . . . , αm > 0 .
i=1

1.34

Define the sets


 
D = n × C, S = (x, Ax) | x ∈ n .

Let T be the linear transformation that maps (x, y) ∈ n+m into x ∈ n . Then
it can be seen that
A−1 · C = T · (D ∩ S). (1.10)
The relative interior of D is given by ri(D) = n × ri(C), and the relative interior
of S is equal to S (since S is a subspace). Hence,
 
A−1 · ri(C) = T · ri(D) ∩ S . (1.11)

In view of the assumption that A−1 · ri(C) is nonempty, we have that the in-
tersection ri(D) ∩ S is nonempty. Therefore, it follows from Props. 1.4.3(d) and
1.4.5(a) that    
ri T · (D ∩ S) = T · ri(D) ∩ S . (1.12)
Combining Eqs. (1.10)-(1.12), we obtain
ri(A−1 · C) = A−1 · ri(C).

Next, we show the second relation. We have


−1
     
A · cl(C) = x | Ax ∈ cl(C) = T · (x, Ax) | Ax ∈ cl(C) = T · cl(D) ∩ S .
Since the intersection ri(D) ∩ S is nonempty, it follows from Prop. 1.4.5(a) that
cl(D) ∩ S = cl(D ∩ S). Furthermore, since T is continuous, we obtain
 
A−1 · cl(C) = T · cl(D ∩ S) ⊂ cl T · (D ∩ S) ,
which combined with Eq. (1.10) yields
A−1 · cl(C) ⊂ cl(A−1 · C).
To show the reverse inclusion, cl(A−1 · C) ⊂ A−1 · cl(C), let x be some vector in
cl(A−1 · C). This implies that there exists some sequence {xk } converging to x
such that Axk ∈ C for all k. Since xk converges to x, we have that Axk converges
to Ax, thereby implying that Ax ∈ cl(C), or equivalently, x ∈ A−1 · cl(C).

28
1.35 (Closure of a Convex Function)

(a) Let g : n → [−∞, ∞] be such that g(x)  ≤ f (x) for all x ∈ n . Choose
any x ∈ dom(cl f ). Since epi(cl f ) = cl epi(f ) , we can choose a sequence
 
(xk , wk ) ∈ epi(f ) such that xk → x, wk → (cl f )(x). Since g is lower semicon-
tinuous at x, we have
g(x) ≤ lim inf g(xk ) ≤ lim inf f (xk ) ≤ lim inf wk = (cl f )(x).
k→∞ k→∞ k→∞

Note also that since epi(f ) ⊂ epi(cl f ), we have (cl f )(x) ≤ f (x) for all x ∈ n .
(b) For the proof of this part and the next, we will use the easily shown fact that
for any convex function f , we have
     
ri epi(f ) = (x, w) | x ∈ ri dom(f ) , f (x) < w .
   
Let x ∈ ri dom(f ) , and consider the vertical line L = (x, w) | w ∈  .
 
Then there exists ŵ such that (x, ŵ) ∈ L∩ri epi(f ) . Let w be such that (x, w) ∈
     
L ∩ cl epi(f ) . Then, by Prop. 1.4.5(a), we have L ∩ cl epi(f ) = cl L ∩ epi(f ) ,
 
so that (x, w) ∈ cl L ∩ epi(f ) . It follows from the Line Segment Principle that
 
the vector x, ŵ + α(w − ŵ) belongs to epi(f ) for all α ∈ [0, 1). Taking the
 
limit as α → 1, we see that f (x) ≤ w for all w such that (x, w) ∈ L ∩ cl epi(f ) ,
implying that f (x) ≤ (cl f )(x). On the other hand, since epi(f ) ⊂ epi(cl f ), we
have (cl f )(x) ≤ f (x) for all x ∈ n , so f (x) = (cl f )(x).
We know that a closed convex function that is improper cannot take a finite
value at any point. Since cl f is closedand convex, and takes a finite value at all
points of the nonempty set ri dom(f ) , it follows that cl f must be proper.
(c) Since the function cl f is closed and is majorized by f , we have
   
(cl f )(y) ≤ lim inf (cl f ) y + α(x − y) ≤ lim inf f y + α(x − y) .
α↓0 α↓0

To show the reverse


 inequality,  let w be such that f (x) < w. Then, (x, w) ∈
ri epi(f ) , while y, (cl f )(y) ∈ cl epi(f ) . From the Line Segment Principle, it
follows that
   
αx + (1 − α)y, αw + (1 − α)(cl f )(y) ∈ ri epi(f ) , ∀ α ∈ (0, 1].
Hence,
 
f αx + (1 − α)y < αw + (1 − α)(cl f )(y), ∀ α ∈ (0, 1].
By taking the limit as α → 0, we obtain
 
lim inf f y + α(x − y) ≤ (cl f )(y),
α↓0

thus completing the proof.


 
(d) Let x ∈ ∩m
i=1 ri dom(fi ) . Since by Prop. 1.4.5(a), we have
   
ri dom(f ) = ∩m
i=1 ri dom(fi ) ,
 
it follows that x ∈ ri dom(f ) . By using part (c), we have for every y ∈ dom(cl f ),
  
m
  
m

(cl f )(y) = lim f y + α(x − y) = lim fi y + α(x − y) = (cl fi )(y).


α↓0 α↓0
i=1 i=1

29
1.36

Assume first that C is closed. Since C ∩ M is bounded, by part (c) of the


Recession Cone Theorem (cf. Prop. 1.5.1), RC∩M = {0}. This and the fact
RC∩M = RC ∩ RM , imply that RC ∩ RM = {0}. Let S be a subspace such that
M = x + S for some x ∈ M . Then RM = S, so that RC ∩ S = {0}. For every
affine set M that is parallel to M , we have RM = S, so that

RC∩M = RC ∩ RM = RC ∩ S = {0}.

Therefore, by part (c) of the Recession Cone Theorem, C ∩ M is bounded.


In the general case where C is not closed, the assumption that C ∩ M
is nonempty and bounded implies that cl(C) ∩ M is nonempty and bounded.
Therefore, by what has already been proved, cl(C) ∩ M is bounded, implying
that C ∩ M is bounded.

1.37 (Properties of Cartesian Products)

(a) We first show that the convex hull of X is equal to the Cartesian product of
the convex hulls of the sets Xi , i = 1, . . . , m. Let y be a vector that belongs to
conv(X). Then, by definition, for some k, we have


k

k

y= αi y i , with αi ≥ 0, i = 1, . . . , m, αi = 1,
i=1 i=1

where yi ∈ X for all i. Since yi ∈ X, we have that yi = (xi1 , . . . , xim ) for all i,
with xi1 ∈ X1 , . . . , xim ∈ Xm . It follows that
 k 

k
 
k

y= αi (xi1 , . . . , xim ) = αi xi1 , . . . , αi xim ,


i=1 i=1 i=1

thereby implying that y ∈ conv(X1 ) × · · · × conv(Xm ).


To prove the reverse inclusion, assume that y is a vector in conv(X1 )×· · ·×
conv(Xm ). Then, we can represent y as y = (y1 , . . . , ym ) with yi ∈ conv(Xi ),
i.e., for all i = 1, . . . , m, we have


ki

ki

yi = αji xij , xij ∈ Xi , ∀ j, αji ≥ 0, ∀ j, αji = 1.


j=1 j=1

First, consider the vectors

(x11 , x2r1 , . . . , xm 1 2 m 1 2 m
rm−1 ), (x2 , xr1 , . . . , xrm−1 ), . . . , (xki , xr1 , . . . , xrm−1 ),

for all possible values of r1 , . . . , rm−1 , i.e., we fix all components except the
first one, and vary the first component over all possible x1j ’s used in the convex

30
combination that yields y1 . Since all these vectors belong to X, their convex
combination given by
 k 
1 
αj1 x1j , x2r1 , . . . , xm
rm−1
j=1

belongs to the convex hull of X for all possible values of r1 , . . . , rm−1 . Now,
consider the vectors
 k   k 
1  1 
αj1 x1j , x21 , . . . , xm
rm−1 ,..., αj1 x1j , x2k2 , . . . , xm
rm−1 ,
j=1 j=1

i.e., fix all components except the second one, and vary the second component
over all possible x2j ’s used in the convex combination that yields y2 . Since all
these vectors belong to conv(X), their convex combination given by
 k 
1  
k2 
αj1 x1j , αj2 x2j , . . . , xm
rm−1
j=1 j=1

belongs to the convex hull of X for all possible values of r2 , . . . , rm−1 . Proceeding
in this way, we see that the vector given by
 k 
1  
k2  
km 
αj1 x1j , αj2 x2j ,..., αjm xm
j
j=1 j=1 j=1

belongs to conv(X), thus proving our claim.


Next, we show the corresponding result for the closure of X. Assume that
y = (x1 , . . . , xm ) ∈ cl(X). This implies that there exists some sequence {y k } ⊂ X
such that y k → y. Since y k ∈ X, we have that y k = (xk1 , . . . , xkm ) with xki ∈ Xi
for each i and k. Since y k → y, it follows that xi ∈ cl(Xi ) for each i, and
hence y ∈ cl(X1 ) × · · · × cl(Xm ). Conversely, suppose that y = (x1 , . . . , xm ) ∈
cl(X1 ) × · · · × cl(Xm ). This implies that there exist sequences {xki } ⊂ Xi such
that xki → xi for each i = 1, . . . , m. Since xki ∈ Xi for each i and k, we have that
y k = (xk1 , . . . , xkm ) ∈ X and {y k } converges to y = (x1 , . . . , xm ), implying that
y ∈ cl(X).
Finally, we show the corresponding result for the affine hull of X. Let’s
assume, by using a translation argument if necessary, that all the Xi ’s contain
the origin, so that aff(X1 ), . . . , aff(Xm ) as well as aff(X) are all subspaces.
Assume that y ∈ aff(X). Let the dimension of aff(X) be r, and let
y 1 , . . . , y r be linearly independent vectors in X that span aff(X). Thus, we
can represent y as

r

y= β i yi ,
i=1

31
where β 1 , . . . , β r are scalars. Since y i ∈ X, we have that y i = (xi1 , . . . , xim ) with
xij ∈ Xj . Thus,
 r 

r
i
 
r

y= β (xi1 , . . . , xim ) = β i xi1 , . . . , β i xim ,


i=1 i=1 i=1

implying that y ∈ aff(X1 ) × · · · × aff(Xm ). Now, assume that y ∈ aff(X1 ) ×


r
· · · × aff(Xm ). Let the dimension of aff(Xi ) be ri , and let x1i , . . . , xi i be linearly
independent vectors in Xi that span aff(Xi ). Thus, we can represent y as
r 
1 
rm

y= β1j xj1 , . . . , j j
βm xm .
j=1 j=1

Since each Xi contains the origin, we have that the vectors


r     
1 
r2

rm

β1j xj1 , 0, . . . , 0 , 0, β2j xj2 , 0, . . . , 0 ,..., 0, . . . , j j


βm xm ,
j=1 j=1 j=1

belong to aff(X), and so does their sum, which is the vector y. Thus, y ∈ aff(X),
concluding the proof.

(b) Assume that y ∈ cone(X). We can represent y as


r

y= αi y i ,
i=1

for some r, where α1 , . . . , αr are nonnegative scalars and yi ∈ X for all i. Since
y i ∈ X, we have that y i = (xi1 , . . . , xim ) with xij ∈ Xj . Thus,
 r 

r
i
 
r

y= α (xi1 , . . . , xim ) = αi xi1 , . . . , αi xim ,


i=1 i=1 i=1

implying that y ∈ cone(X1 ) × · · · × cone(Xm ).


Conversely, assume that y ∈ cone(X1 ) × · · · × cone(Xm ). Then, we can
represent y as  

r1

rm

y= α1j xj1 , . . . , j
αm xjm ,
j=1 j=1

where xji ∈ Xi and αij ≥ 0 for each i and j. Since each Xi contains the origin,
we have that the vectors
r     
1 
r2

rm

α1j xj1 , 0, . . . , 0 , 0, α2j xj2 , 0, . . . , 0 ..., 0, . . . , j


αm xjm ,
j=1 j=1 j=1

32
belong to the cone(X), and so does their sum, which is the vector y. Thus,
y ∈ cone(X), concluding the proof.
Finally, consider the example where

X1 = {0, 1} ⊂ , X2 = {1} ⊂ .

For this example, cone(X1 ) × cone(X2 ) is given by the nonnegative quadrant,


whereas cone(X) is given by the two halflines α(0, 1) and α(1, 1) for α ≥ 0 and
the region that lies between them.

(c) We first show that

ri(X) = ri(X1 ) × · · · × ri(Xm ).

Let x = (x1 , . . . , xm ) ∈ ri(X). Then, by Prop. 1.4.1 (c), we have that for all
x = (x1 , . . . , xm ) ∈ X, there exists some γ > 1 such that

x + (γ − 1)(x − x) ∈ X.

Therefore, for all xi ∈ Xi , there exists some γ > 1 such that

xi + (γ − 1)(xi − xi ) ∈ Xi ,

which, by Prop. 1.4.1(c), implies that xi ∈ ri(Xi ), i.e., x ∈ ri(X1 ) × · · · × ri(Xm ).


Conversely, let x = (x1 , . . . , xm ) ∈ ri(X1 ) × · · · × ri(Xm ). The above argument
can be reversed through the use of Prop. 1.4.1(c), to show that x ∈ ri(X). Hence,
the result follows.
Finally, let us show that

RX = RX1 × · · · × RXm .

Let y = (y1 , . . . , ym ) ∈ RX . By definition, this implies that for all x ∈ X and


α ≥ 0, we have x + αy ∈ X. From this, it follows that for all xi ∈ Xi and α ≥ 0,
xi + αyi ∈ Xi , so that yi ∈ RXi , implying that y ∈ RX1 × · · · × RXm . Conversely,
let y = (y1 , . . . , ym ) ∈ RX1 × · · · × RXm . By definition, for all xi ∈ Xi and α ≥ 0,
we have xi + αyi ∈ Xi . From this, we get for all x ∈ X and α ≥ 0, x + αy ∈ X,
thus showing that y ∈ RX .

1.38 (Recession Cones of Nonclosed Sets)

(a) Let y ∈ RC . Then, by the definition of RC , x + αy ∈ C for every x ∈ C and


every α ≥ 0. Since C ⊂ cl(C), it follows that x + αy ∈ cl(C) for some x ∈ cl(C)
and every α ≥ 0, which, in view of part (b) of the Recession Cone Theorem (cf.
Prop. 1.5.1), implies that y ∈ Rcl(C) . Hence

RC ⊂ Rcl(C) .

By taking closures in this relation and by using the fact that Rcl(C) is closed [part
(a) of the Recession Cone Theorem], we obtain cl(RC ) ⊂ Rcl(C) .

33
To see that the inclusion cl(RC ) ⊂ Rcl(C) can be strict, consider the set
   
C = (x1 , x2 ) | 0 ≤ x1 , 0 ≤ x2 < 1 ∪ (0, 1) ,

whose closure is

cl(C) = {(x1 , x2 ) | 0 ≤ x1 , 0 ≤ x2 ≤ 1}.

The recession cones of C and its closure are


   
RC = (0, 0) , Rcl(C) = (x1 , x2 ) | 0 ≤ x1 , x2 = 0 .
 
Thus, cl(RC ) = (0, 0) , and cl(RC ) is a strict subset of Rcl(C) .
(b) Let y ∈ RC and let x be a vector in C. Then we have x + αy ∈ C for all
α ≥ 0. Thus for the vector x, which belongs to C, we have x + αy ∈ C for all
α ≥ 0, and it follows from part (b) of the Recession Cone Theorem (cf. Prop.
1.5.1) that y ∈ RC . Hence, RC ⊂ RC .
To see that the inclusion RC ⊂ RC can fail when C is not closed, consider
the sets
   
C = (x1 , x2 ) | x1 ≥ 0, x2 = 0 , C = (x1 , x2 ) | x1 ≥ 0, 0 ≤ x2 < 1 .

Their recession cones are


   
RC = C = (x1 , x2 ) | x1 ≥ 0, x2 = 0 , RC = (0, 0) ,

showing that RC is not a subset of RC .

1.39 (Recession Cones of Relative Interiors)

(a) The inclusion Rri(C) ⊂ Rcl(C) follows from Exercise 1.38(b).


Conversely, let y ∈ Rcl(C) , so that by the definition of Rcl(C) , x+αy ∈ cl(C)
for every x ∈ cl(C) and every α ≥ 0. In particular, x + αy ∈ cl(C) for every
x ∈ ri(C) and every α ≥ 0. By the Line Segment Principle, all points on the
line segment connecting x and x + αy, except possibly x + αy, belong to ri(C),
implying that x + αy ∈ ri(C) for every x ∈ ri(C) and every α ≥ 0. Hence,
y ∈ Rri(C) , showing that Rcl(C) ⊂ Rri(C) .
(b) If y ∈ Rri(C) , then by the definition of Rri(C) for every vector x ∈ ri(C) and
α ≥ 0, the vector x + αy is in ri(C), which holds in particular for some x ∈ ri(C)
[note that ri(C) is nonempty by Prop. 1.4.1(b)].
Conversely, let y be such that there exists a vector x ∈ ri(C) with x + αy ∈
ri(C) for all α ≥ 0. Hence, there exists a vector x ∈ cl(C) with x + αy ∈ cl(C) for
all α ≥ 0, which, by part (b) of the Recession Cone Theorem (cf. Prop. 1.5.1),
implies that y ∈ Rcl(C) . Using part (a), it follows that y ∈ Rri(C) , completing the
proof.
(c) Using Exercise 1.38(c) and the assumption that C ⊂ C [which implies that
C ⊂ cl(C)], we have
RC ⊂ Rcl(C) = Rri(C) = RC ,

34
where the equalities follow from part (a) and the assumption that C = ri(C).
To see that the inclusion RC ⊂ RC can fail when C = ri(C), consider the
sets
   
C = (x1 , x2 ) | x1 ≥ 0, 0 < x2 < 1 , C = (x1 , x2 ) | x1 ≥ 0, 0 ≤ x2 < 1 ,

for which we have C ⊂ C and


   
RC = (x1 , x2 ) | x1 ≥ 0, x2 = 0 , RC = (0, 0) ,

showing that RC is not a subset of RC .

1.40

For each k, consider the set C k = Xk ∩ Ck . Note that {C k } is a sequence of


nonempty closed convex sets and X is specified by linear inequality constraints.
We will show that, under the assumptions given in this exercise, the assumptions
of Prop. 1.5.6 are satisfied, thus showing that the intersection X ∩ (∩∞ k=0 C k )
[which is equal to the intersection ∩∞
k=0 (Xk ∩ Ck )] is nonempty.
Since Xk+1 ⊂ Xk and Ck+1 ⊂ Ck for all k, it follows that

C k+1 ⊂ C k , ∀ k,

showing that assumption (1) of Prop. 1.5.6 is satisfied. Similarly, since by as-
sumption Xk ∩ Ck is nonempty for all k, we have that, for all k, the set

X ∩ C k = X ∩ Xk ∩ Ck = Xk ∩ Ck ,

is nonempty, showing that assumption (2) is satisfied. Finally, let R denote the
set R = ∩∞ k=0 RC . Since by assumption C k is nonempty for all k, we have, by
k
part (e) of the Recession Cone Theorem, that RC = RXk ∩ RCk implying that
k

R= ∩∞
k=0 RC
k

= ∩∞
k=0 (RXk ∩ RCk )
   
= ∩∞ ∞
k=0 RXk ∩ ∩k=0 RCk

= RX ∩ RC .

Similarly, letting L denote the set L = ∩∞


k=0 LC , it can be seen that L = LX ∩LC .
k
Since, by assumption RX ∩ RC ⊂ LC , it follows that

RX ∩ R = RX ∩ RC ⊂ LC ,

which, in view of the assumption that RX = LX , implies that

RX ∩ R ⊂ LC ∩ LX = L,

showing that assumption (3) of Prop. 1.5.6 is satisfied, and thus proving that the
intersection X ∩ (∩∞
k=0 C k ) is nonempty.

35
1.41

Let y be in the closure of A · C. We will show that y = Ax for some x ∈ cl(C).


For every  > 0, the set
 
C = cl(C) ∩ x | y − Ax ≤ 

is closed. Since A·C ⊂ A·cl(C) and y ∈ cl(A·C), it follows that y is in the closure
of A · cl(C), so that
 C is nonempty  for every  > 0. Furthermore, the recession
cone of the set x | Ax − y ≤  coincides with the null space N (A), so that
RC = Rcl(C) ∩ N (A). By assumption we have Rcl(C) ∩ N (A) = {0}, and by part
(c) of the Recession Cone Theorem (cf. Prop. 1.5.1), it follows that C is bounded
for every  > 0. Now, since the sets C are nested nonempty compact sets, their
intersection ∩>0 C is nonempty. For any x in this intersection, we have x ∈ cl(C)
and Ax − y = 0, showing that y ∈ A · cl(C). Hence, cl(A · C) ⊂ A · cl(C). The
converse A · cl(C) ⊂ cl(A · C) is clear, since for any x ∈ cl(C) and sequence
{xk } ⊂ C converging to x, we have Axk → Ax, showing that Ax ∈ cl(A · C).
Therefore,
cl(A · C) = A · cl(C). (1.13)

We now show that A · Rcl(C) = RA·cl(C) . Let y ∈ A · Rcl(C) . Then, there


exists a vector u ∈ Rcl(C) such that Au = y, and by the definition of Rcl(C) ,
there is a vector x ∈ cl(C) such that x + αu ∈ cl(C) for every α ≥ 0. Therefore,
Ax + αAu ∈ A · cl(C) for every α ≥ 0, which, together with Ax ∈ A · cl(C) and
Au = y, implies that y is a direction of recession of the closed set A · cl(C) [cf.
Eq. (1.13)]. Hence, A · Rcl(C) ⊂ RA·cl(C) .
Conversely, let y ∈ RA·cl(C) . We will show that y ∈ A · Rcl(c) . This is true
if y = 0, so assume that y = 0. By definition of direction of recession, there is a
vector z ∈ A · cl(C) such that z + αy ∈ A · cl(C) for every α ≥ 0. Let x ∈ cl(C) be
such that Ax = z, and for every positive integer k, let xk ∈ cl(C) be such that
Axk = z + ky. Since y = 0, the sequence {Axk } is unbounded, implying that
{xk } is also unbounded (if {xk } were bounded, then {Axk } would be bounded,
a contradiction). Because xk = x for all k, we can define

xk − x
uk = , ∀ k.
xk − x

Let u be a limit point of {uk }, and note that u = 0. It can be seen that
u is a direction of recession of cl(C) [this can be done similar to the proof of
part (c) of the Recession Cone Theorem (cf. Prop. 1.5.1)]. By taking an appro-
priate subsequence if necessary, we may assume without loss of generality that
limk→∞ uk = u. Then, by the choices of uk and xk , we have

Axk − Ax k
Au = lim Auk = lim = lim y,
k→∞ k→∞ xk − x k→∞ xk − x

implying that limk→∞ x k−x exists. Denote this limit by λ. If λ = 0, then u is


k
in the null space N (A), implying that u ∈ Rcl(C) ∩ N (A). By the given condition
Rcl(C) ∩ N (A) = {0}, we have u = 0 contradicting the fact u = 0. Thus, λ is

36
positive and Au = λy, so that A(u/λ) = y. Since Rcl(C) is a cone [part (a) of the
Recession Cone Theorem] and u ∈ Rcl(C) , the vector u/λ is in Rcl(C) , so that y
belongs to A · Rcl(C) . Hence, RA·cl(C) ⊂ A · Rcl(C) , completing the proof.
As an example showing that A·Rcl(C) and RA·cl(C) may differ when Rcl(C) ∩
N (A) = {0}, consider the set
 
C = (x1 , x2 ) | x1 ∈ , x2 ≥ x21 ,

and the linear transformation A that maps (x1 , x2 ) ∈ 2 into x1 ∈ . Then, C


is closed and its recession cone is
 
RC = (x1 , x2 ) | x1 = 0, x2 ≥ 0 ,

so that A · RC = {0}, where 0 is scalar. On the other hand, A · C coincides with


, so that RA·C =  = A · RC .

1.42

Let S be defined by
S = Rcl(C) ∩ N (A),

and note that S is a subspace of Lcl(C) by the given assumption. Then, by Lemma
1.5.4, we have  
cl(C) = cl(C) ∩ S ⊥ + S,

so that the images of cl(C) and cl(C) ∩ S ⊥ under A coincide [since S ⊂ N (A)],
i.e.,  
A · cl(C) = A · cl(C) ∩ S ⊥ . (1.14)

Because A · C ⊂ A · cl(C), we have


 
cl(A · C) ⊂ cl A · cl(C) ,

which in view of Eq. (1.14) gives


  
cl(A · C) ⊂ cl A · cl(C) ∩ S ⊥ .

Define
C = cl(C) ∩ S ⊥

so that the preceding relation becomes

cl(A · C) ⊂ cl(A · C). (1.15)

The recession cone of C is given by

RC = Rcl(C) ∩ S ⊥ , (1.16)

37
[cf. part (e) of the Recession Cone Theorem, Prop. 1.5.1], for which, since S =
Rcl(C) ∩ N (A), we have

RC ∩ N (A) = S ∩ S ⊥ = {0}.

Therefore, by Prop. 1.5.8, the set A · C is closed, implying that cl(A · C) = A · C.


By the definition of C, we have A · C ⊂ A · cl(C), implying that cl(A · C) ⊂
A · cl(C) which together with Eq. (1.15) yields cl(A · C) ⊂ A · cl(C). The converse
A · cl(C) ⊂ cl(A · C) is clear, since for any x ∈ cl(C) and sequence {xk } ⊂ C
converging to x, we have Axk → Ax, showing that Ax ∈ cl(A · C). Therefore,

cl(A · C) = A · cl(C). (1.17)

We next show that A · Rcl(C) = RA·cl(C) . Let y ∈ A · Rcl(C) . Then, there


exists a vector u ∈ Rcl(C) such that Au = y, and by the definition of Rcl(C) ,
there is a vector x ∈ cl(C) such that x + αu ∈ cl(C) for every α ≥ 0. Therefore,
Ax + αAu ∈ Acl(C) for some x ∈ cl(C) and for every α ≥ 0, which together with
Ax ∈ A · cl(C) and Au = y implies that y is a recession direction of the closed
set A · cl(C) [Eq. (1.17)]. Hence, A · Rcl(C) ⊂ RA·cl(C) .
Conversely, in view of Eq. (1.14) and the definition of C, we have

RA·cl(C) = RA·C .

Since RC ∩ N (A) = {0} and C is closed, by Exercise 1.41, it follows that

RA·C = A · RC ,

which combined with Eq. (1.16) implies that

A · RC ⊂ A · Rcl(C) .

The preceding three relations yield RA·cl(C) ⊂ A · Rcl(C) , completing the proof.

1.43 (Recession Cones of Vector Sums)

(a) Let C be the Cartesian product C1 × · · · × Cm . Then, by Exercise 1.37, C is


closed, and its recession cone and lineality space are given by

RC = RC1 × · · · × RCm , LC = LC1 × · · · × LCm .

Let A be a linear transformation that maps (x1 , . . . , xm ) ∈ mn into x1 + · · · +


xm ∈ n . The null space of A is the set of all (y1 , . . . , ym ) such that y1 +· · ·+ym =
0. The intersection RC ∩ N (A) consists of points (y1 , . . . , ym ) such that y1 + · · · +
ym = 0 with yi ∈ RCi for all i. By the given condition, every vector (y1 , . . . , ym )
in the intersection RC ∩ N (A) is such that yi ∈ LCi for all i, implying that
(y1 , . . . , ym ) belongs to the lineality space LC . Thus, RC ∩ N (A) ⊂ LC ∩ N (A).
On the other hand by definition of the lineality space, we have LC ⊂ RC , so that
LC ∩ N (A) ⊂ RC ∩ N (A). Therefore, RC ∩ N (A) = LC ∩ N (A), implying that

38
RC ∩ N (A) is a subspace of LC . By Exercise 1.42, the set A · C is closed and
RA·C = A · RC . Since A · C = C1 + · · · + Cm , the assertions of part (a) follow.
(b) The proof is similar to that of part (a). Let C be the Cartesian product
C1 × · · · × Cm . Then, by Exercise 1.37(a),
cl(C) = cl(C1 ) × · · · × cl(Cm ), (1.18)
and its recession cone and lineality space are given by
Rcl(C) = Rcl(C1 ) × · · · × Rcl(Cm ) , (1.19)
Lcl(C) = Lcl(C1 ) × · · · × Lcl(Cm ) .
Let A be a linear transformation that maps (x1 , . . . , xm ) ∈ mn into x1 + · · · +
xm ∈ n . Then, the intersection Rcl (C) ∩ N (A) consists of points (y1 , . . . , ym )
such that y1 + · · · + ym = 0 with yi ∈ Rcl(Ci ) for all i. By the given condition,
every vector (y1 , . . . , ym ) in the intersection Rcl(C) ∩N (A) is such that yi ∈ Lcl(Ci )
for all i, implying that (y1 , . . . , ym ) belongs to the lineality space Lcl(C) . Thus,
Rcl(C) ∩ N (A) ⊂ Lcl(C) ∩ N (A). On the other hand by definition of the lineality
space, we have Lcl(C) ⊂ Rcl(C) , so that Lcl(C) ∩ N (A) ⊂ Rcl(C) ∩ N (A). Hence,
Rcl(C) ∩ N (A) = Lcl(C) ∩ N (A), implying that Rcl(C) ∩ N (A) is a subspace of
Lcl(C) . By Exercise 1.42, we have cl(A · C) = A · cl(C) and RA·cl(C) = A · Rcl(C) ,
from which by using the relation A · C = C1 + · · · + Cm , and Eqs. (1.18) and
(1.19), we obtain
cl(C1 + · · · + Cm ) = cl(C1 ) + · · · + cl(Cm ),
Rcl(C1 +···+Cm ) = Rcl(C1 ) + · · · + Rcl(Cm ) .

1.44

Let C be the Cartesian product C1 × · · · × Cm viewed as a subset of mn , and


let A be the linear transformation that maps a vector (x1 , . . . , xm ) ∈ mn into
x1 + · · · + xm . Note that set C can be written as
 
C = x = (x1 , . . . , xm ) | x Qij x + aij x + bij ≤ 0, i = 1, . . . , m, j = 1, . . . , ri ,

where the Qij are appropriately defined symmetric positive semidefinite mn×mn
matrices and the aij are appropriately defined vectors in mn . Hence, the set C
is specified by convex quadratic inequalities. Thus, we can use Prop. 1.5.8(c) to
assert that the set AC = C1 + · · · + Cm is closed.

1.45 (Set Intersection and Helly’s Theorem)

Helly’s Theorem implies that the sets C k defined in the hint are nonempty. These
sets are also nested and satisfy the assumptions of Props. 1.5.5 and 1.5.6. There-
fore, the intersection ∩∞
i=1 C i is nonempty. Since

∩∞ ∞
i=1 C i ⊂ ∩i=1 Ci ,

the result follows.

39
Convex Analysis and
Optimization
Chapter 2 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE September 27, 2004

CHAPTER 2: SOLUTION MANUAL

2.1

(a) If x∗ is a global minimum of f , evidently it is also a global minimum of f


along any line passing through x∗ .
Conversely, let x∗ be a global minimum of f along any line passing through
x , i.e., for all d ∈ n , the function g :  → , defined by g(α) = f (x∗ + αd),

has α∗ = 0 as its global minimum. Assume, to arrive at a contradiction, that x∗


is not a global minimum of f . This implies that there exists some x ∈ n such
that f (x) < f (x∗ ). Let d = x − x∗ . In view of the assumption that g(α) has
α∗ = 0 as its global minimum, it follows that g(0) ≤ g( 12 ), or equivalently,
 
1
f (x∗ )≤ f x∗ + (x − x∗ ) ,
2
 
1 1
=f x + x∗ ,
2 2
1 1
≤ f (x) + f (x∗ ),
2 2

< f (x ),

which is a contradiction. [Note that in the above string of equations, the second
inequality follows by the convexity of f , and the strict inequality follows from the
assumption that f (x) < f (x∗ ).] Hence, it follows that x∗ is a global minimum of
f.
(b) Consider the function f (x1 , x2 ) = (x2 − px21 )(x2 − qx21 ), where 0 < p < q and
let x∗ = (0, 0).
We first show that g(α) = f (x∗ + αd) is minimized at α = 0 for all d ∈ 2 .
We have

g(α) = f (x∗ + αd) = (αd2 − pα2 d21 )(αd2 − qα2 d21 ) = α2 (d2 − pαd21 )(d2 − qαd21 ).

Also,

g  (α) = 2α(d2 − pαd21 )(d2 − qαd21 ) + α2 (−pd21 )(d2 − qαd21 ) + α2 (d2 − pαd21 )(−qd21 ).

Thus g  (0) = 0. Furthermore,

g  (α) = 2(d2 − pαd21 )(d2 − qαd21 ) + 2α(−pd21 )(d2 − qαd21 )


+ 2α(d2 − pαd21 )(−qd21 ) + 2α(−pd21 )(d2 − qαd21 ) + α2 (−pd21 )(−qd21 )
+ 2α(d2 − pαd21 )(−qd21 ) + α2 (−pd21 )(−qd21 ).

2
Thus g  (0) = 2d22 , which is greater than 0 if d2 = 0. If d2 = 0, g(α) = pqα4 d41 ,
which is clearly minimized at α = 0. Therefore, (0, 0) is a local minimum of f
along every line that passes through (0, 0).
Let’s now show that if p < m < q, f (y, my 2 ) < 0 if y = 0 and that
f (y, my 2 ) = 0 otherwise. Consider a point of the form (y, my 2 ). We have
f (y, my 2 ) = y 4 (m − p)(m − q). Clearly, f (y, my 2 ) < 0 if and only if p < m < q
and y = 0. In any −neighborhood of (0, 0), there exists a y = 0 such that for
some m ∈ (p, q), (y, my 2 ) also belongs to the neighborhood. Since f (0, 0) = 0,
we see that (0, 0) is not a local minimum.

2.2 (Lipschitz Continuity of Convex Functions)

Let  be a positive scalar and let C be the set given by


 
C = z | z − x ≤ , for some x ∈ cl(X) .

We claim that the set C is compact. Indeed, since X is bounded, so is its closure,
which implies that z ≤ maxx∈cl(X) x +  for all z ∈ C , showing that C is
bounded. To show the closedness of C , let {zk } be a sequence in C converging
to some z. By the definition of C , there is a corresponding sequence {xk } in
cl(X) such that
zk − xk  ≤ , ∀ k. (2.1)
Because cl(X) is compact, {xk } has a subsequence converging to some x ∈ cl(X).
Without loss of generality, we may assume that {xk } converges to x ∈ cl(X). By
taking the limit in Eq. (2.1) as k → ∞, we obtain z − x ≤  with x ∈ cl(X),
showing that z ∈ C . Hence, C is closed.
We now show that f has the Lipschitz property over X. Let x and y be
two distinct points in X. Then, by the definition of C , the point

z=y+ (y − x)
y − x
is in C . Thus
y − x 
y= z+ x,
y − x +  y − x + 
showing that y is a convex combination of z ∈ C and x ∈ C . By convexity of
f , we have
y − x 
f (y) ≤ f (z) + f (x),
y − x +  y − x + 
implying that
 
y − x   y − x
f (y) − f (x) ≤ f (z) − f (x) ≤ max f (u) − min f (v) ,
y − x +   u∈C v∈C

where in the last inequality we use Weierstrass’ theorem (f is continuous over


n by Prop. 1.4.6 and C is compact). By switching the roles of x and y, we
similarly obtain
 
x − y
f (x) − f (y) ≤ max f (u) − min f (v) ,
 u∈C v∈C

3
which combined with the preceding relation yields f (x) − f (y) ≤ Lx − y,
 
where L = maxu∈C f (u) − minv∈C f (v) /.

2.3 (Exact Penalty Functions)

(a) By assumption, x∗ minimizes f over X, so that x∗ ∈ X, and we have for all


c > L, y ∈ X, and x ∈ Y ,

Fc (x∗ ) = f (x∗ ) ≤ f (y) ≤ f (x) + Ly − x ≤ f (x) + cy − x,

where we use the Lipschitz continuity of f to get the second inequality. Taking
the infimum over all y ∈ X, we obtain

Fc (x∗ ) ≤ f (x) + c inf y − x = Fc (x), ∀ x ∈ Y.


y∈X

Hence, x∗ minimizes Fc (x) over Y for all c > L. (Note that the infimum in
the preceding relation is attained by Weierstrass’ Theorem, since X is closed by
assumption, and  ·  is a continuous function that has compact level sets.)
(b) Suppose, to arrive at a contradiction, that x∗ minimizes Fc (x) over Y , but
x∗ ∈
/ X.
We have that Fc (x∗ ) = f (x∗ ) + c miny∈X y − x∗ . Using the argument
given earlier, the minimum of y − x over y ∈ X is attained at some x̃ ∈ X,
which is not equal to x∗ , and therefore,

Fc (x∗ )= f (x∗ ) + cx̃ − x∗ 


> f (x∗ ) + Lx̃ − x∗ 
≥ f (x̃)
= Fc (x̃),

which contradicts the fact that x∗ minimizes Fc (x) over Y . (Note that the first
inequality follows from c > L and x̃ = x∗ . The second inequality follows from
the Lipschitz continuity of function f .) Hence, if x∗ minimizes Fc (x) over Y , it
follows that x∗ ∈ X.

2.4 (Ekeland’s Variational Principle [Eke74])

For some δ > 0, define the function F : n → (−∞, ∞] by

F (x) = f (x) + δx − x.

The function F is closed in view of the assumption that f is closed. Hence, by


Prop. 1.2.2(b), it follows that all the level sets of F are closed. The level sets are
also bounded, since for all γ > f ∗ , we have
 
   ∗
 γ − f∗
x | F (x) ≤ γ ⊂ x | f + δx − x ≤ γ = B x, , (2.2)
δ

4
 
where B x, (γ − f ∗ )/δ denotes the closed ball centered at x with radius (γ −
f ∗ )/δ. Hence, it follows by Weierstrass’ Theorem that F attains a minimum over
n , i.e., the set arg minx∈n F (x) is nonempty and compact.
Consider now minimizing f over the set arg minx∈n F (x). Since f is closed
by assumption, we conclude by using Weierstrass’ Theorem that f attains a
minimum at some x̃ over the set arg minx∈n F (x). Hence, we have

f (x̃) ≤ f (x), ∀ x ∈ arg minn F (x). (2.3)


x∈

Since x̃ ∈ arg minx∈n F (x), it follows that F (x̃) ≤ F (x), for all x ∈ n , and

F (x̃) < F (x), ∀x∈


/ arg minn F (x),
x∈

which by using the triangle inequality implies that

f (x̃)< f (x) + δx − x − δx̃ − x


(2.4)
≤ f (x) + δx − x̃, ∀x∈
/ arg minn F (x).
x∈

Using Eqs. (2.3) and (2.4), we see that

f (x̃) < f (x) + δx − x̃, ∀ x = x̃,

thereby implying that x̃ is the unique optimal solution of the problem of mini-
mizing f (x) + δx − x̃ over n .
Moreover, since F (x̃) ≤ F (x) for all x ∈ n , we have F (x̃) ≤ F (x), which
implies that
f (x̃) ≤ f (x) − δx̃ − x ≤ f (x),
and also
F (x̃) ≤ F (x) = f (x) ≤ f ∗ + .
Using Eq. (2.2), it follows that x̃ ∈ B(x, /δ), proving the desired result.

2.5 (Approximate Minima of Convex Functions)

(a) Let  > 0 be given. Assume, to arrive at a contradiction, that for any sequence
{δk } with δk ↓ 0, there exists a sequence {xk } ⊂ X such that for all k

f (xk ) ≤ f ∗ + δk , min xk − x∗  ≥ .


x∗ ∈X ∗

 
It follows that, for some k, xk belongs to the set x ∈ X | f (x) ≤ f ∗ + δk , for
all k ≥ k. Since by assumption f and X have no common nonzero direction of
recession,
 by the Recession
 Cone Theorem, we have that the closed convex set
x ∈ X | f (x) ≤ f ∗ + δk is bounded. Therefore, the sequence {xk } is bounded
and has a limit point x ∈ X, which, in view of the preceding relations, satisfies

f (x) ≤ f ∗ , x − x∗  ≥ , ∀ x∗ ∈ X ∗ ,

5
which is a contradiction. This proves that, for every  > 0, there exists a δ > 0
such that every vector x ∈ X with f (x) ≤ f ∗ + δ satisfies minx∗ ∈X ∗ x − x∗  < .
(b) Fix  > 0. By part (a), there exists some δ > 0 such that every vector x ∈ X
with f (x) ≤ f ∗ + δ satisfies minx∗ ∈X ∗ x − x∗  ≤ . Since f (xk ) → f ∗ , there
exists some K such that

f (xk ) ≤ f ∗ + δ , ∀ k ≥ K .

By part (a), this implies that {xk }k≥K ⊂ X ∗ + B. Since X ∗ is nonempty and
compact (cf. Prop. 2.3.2), it follows that every such sequence {xk } is bounded.
Let x be a limit point of the sequence {xk } ⊂ X satisfying f (xk ) → f ∗ .
By lower semicontinuity of the function f , we get that

f (x) ≤ lim inf f (xk ) = f ∗ .


k→∞

Because {xk } ⊂ X and X is closed, we have x ∈ X, which in view of the preceding


relation implies that f (x) = f ∗ , i.e., x ∈ X ∗ .

2.6 (Directions Along Which a Function is Flat)

(a) Let d ∈ RX ∩ Rf . If d ∈
/ Ff , then we must have limα→∞ f (x + αd) = −∞, for
some x ∈ dom(f ) ∩ X, implying that for each k, x + αd ∈ Ck for all sufficiently
large α. Thus d is a direction of recession of Ck and hence also of f , i.e., d ∈ Rf .
Therefore, we must have RX ∩ Ff = RX ∩ Rf , so using the hypothesis,  we have
RX ∩ Rf ⊂ Lf . From Prop. 1.5.6 [condition (3)], it follows that X ∩ ∩∞ k=0 Ck
is nonempty.
(b) Let d ∈ RX ∩ Rf . If d ∈/ Ff , then we must have limα→∞ f (x + αd) = −∞,
for some x ∈ dom(f ) ∩ X. Since d ∈ RX , we have x + αd ∈ X for all x ∈ X and
αγe0. It follows that inf x∈X f (x) = −∞, a contradiction. Therefore, we must
have RX ∩ Ff = RX ∩ Rf , so using the hypothesis, we have RX ∩ Rf ⊂ Lf .
From Prop. 2.3.3 [condition (2)], it follows that there exists at least one optimal
solution.
(c) Let X =  and f (x) = x. Then
 
Ff = Lf = y | y = 0 ,

so the condition RX ∩ Ff ⊂ Lf is satisfied. However, we have inf x∈X f (x) = −∞


and f does not attain a minimum over X. Note that Prop. 2.3.3 [under condition
(2)] does not apply here, because the relation RX ∩ Rf ⊂ Lf is not satisfied.

2.7 (Bidirectionally Flat Functions)

(a) As a first step, we will show that either ∩∞


k=1 Ck = Ø or else

there exists j ∈ {1, . . . , r} and y ∈ ∩rj=0 Rgj with y ∈


/ Fg .
j

6
Let x be a vector in C0 , and for each k ≥ 1, let xk be the projection of x on
Ck . If {xk } is bounded, then since the gj are closed, any limit point x̃ of {xk }
satisfies
gj (x̃) ≤ lim inf gj (xk ) ≤ 0,
k→∞

so x̃ ∈ ∩∞ and ∩∞
k=1 Ck ,  k=1 Ck = Ø. If {xk } is unbounded,
 let y be a limit point
of the sequence (xk − x)/xk − x | xk = x , and without loss of generality,
assume that
xk − x
→ y.
xk − x
We claim that
y ∈ ∩rj=0 Rgj .
Indeed, if for some j, we have y ∈/ Rgj , then there exists α > 0 such that
gj (x + αy) > w0 . Let
xk − x
zk = x + α ,
xk − x
and note that for sufficiently large k, zk lies in the line segment connecting x and
xk , so that g1 (zk ) ≤ w0 . On the other hand, we have zk → x + αy, so using the
closedness of gj , we must have

gj (x + αy) ≤ lim inf g1 (zk ) ≤ w0 ,


k→∞

which contradicts the choice of α to satisfy gj (x + αy) > w0 .


If y ∈ ∩rj=0 Fgj , since all the gj are bidirectionally flat, we have y ∈ ∩rj=0 Lgj .
If the vectors x and xk , k ≥ 1, all lie in the same line [which must be the line
{x + αy | α ∈ }], we would have gj (x) = gj (xk ) for all k and j. Then it follows
that x and xk all belong to ∩∞ k=1 Ck . Otherwise, there must be some xk , with
k large enough, and such that, by the Projection Theorem, the vector xk − αy
makes an angle greater than π/2 with xk − x. Since the gj are constant on the
line {xk − αy | α ∈ }, all vectors on the line belong to Ck , which contradicts
the fact that xk is the projection of x on Ck .
Finally, if y ∈ Rg0 but y ∈ / Fg0 , we have g0 (x + αy) → −∞ as α → ∞, so
that ∩∞k=1 Ck 
= Ø. This completes the proof that

∩∞
k=1 Ck = Ø ⇒ there exists j ∈ {1, . . . , r} and y ∈ ∩j=0 Rgj with y ∈
r
/ Fg .
j
(1)
We now use induction on r. For r = 0, the preceding proof shows that
∩∞ ∞
k=1 Ck = Ø. Assume that ∩k=1 Ck = Ø for all cases where r < r. We will show

that ∩k=1 Ck = Ø for r = r. Assume the contrary. Then, by Eq. (1), there exists
j ∈ {1, . . . , r} and y ∈ ∩rj=0 Rgj with y ∈
/ Fg . Let us consider the sets
j

 
C k = x | g0 (x) ≤ wk , gj (x) ≤ 0, j = 1, . . . , r, j = j .

Since these sets are nonempty, by the induction hypothesis, ∩∞ k=1 C k = Ø. For
any x̃ ∈ ∩∞ ∞
k=1 C k , the vector x̃ + αy belongs to ∩k=1 C k for all α > 0, since
y ∈ ∩rj=0 Rgj . Since g0 (x̃) ≤ 0, we have x̃ ∈ dom(gj ), by the hypothesis regarding
the domains of the gj . Since y ∈ ∩rj=0 Rgj with y ∈ / Fg , it follows that gj (x̃ +
j

7
αy) → −∞ as α → ∞. Hence, for sufficiently large α, we have gj (x̃ + αy) ≤ 0,
so x̃ + αy belongs to ∩∞
k=1 Ck .

Note: To see that the assumption


 
x | g0 (x) ≤ 0 ⊂ ∩rj=1 dom(gj )

is essential for the result to hold, consider an example in 2 . Let

g0 (x1 , x2 ) = x1 , g1 (x1 , x2 ) = φ(x1 ) − x2 ,

where the function φ :  → (−∞, ∞] is convex, closed, and coercive with


dom(φ) = (0, 1) [for example, φ(t) = − ln t − ln(1 − t) for 0 < t < 1]. Then
it can be verified that Ck = Ø for every k and sequence {wk } ⊂ (0, 1) with

wk ↓ 0 [take x1 ↓ 0 and x2 ≥ φ(x1)]. On the other
 hand, we have ∩k=0 Ck = Ø.
The difficulty here is that the set x | g0 (x) ≤ 0 , which is equal to

{x | x1 ≤ 0, x2 ∈ },

is not contained in dom(g1 ), which is equal to

{x | 0 < x1 < 1, x2 ∈ }

(in fact the two sets are disjoint).


(b) We will use part (a) and the line of proof of Prop. 1.5.8(c). In particular,
let {yk } be a sequence in A C converging to some y ∈ n . We will show that
y ∈ A C. We let

g0 (x) = Ax − y2 , wk = yk − y2 ,

and  
Ck = x | g0 (x) ≤ wk , gj (x) ≤ 0, j = 1, . . . , r .
The functions involved in the definition of Ck are bidirectionally flat, and each Ck
is nonempty by construction. By applying part (a), we see that the intersection
∩∞k=0 Ck is nonempty. For any x in this intersection, we have Ax = y (since
yk → y), showing that y ∈ A C.
(c) We will use part (a) and the line of proof of Prop. 2.3.3 [condition (3)]. Denote

f ∗ = inf f (x),
x∈C

and assume without loss of generality that f ∗ = 0 [otherwise, we replace f (x) by


f (x) − f ∗ ]. We choose a scalar sequence {wk } such that wk ↓ f ∗ , and we consider
the (nonempty) sets
 
Ck = x ∈ n | f (x) ≤ wk , gj (x) ≤ 0, j = 1, . . . , r .

By part (a), it follows that ∩∞


k=0 Ck , the set of minimizers of f over C, is nonempty.

(d) Use the line of proof of Prop. 2.3.9.

8
2.8 (Minimization of Quasiconvex Functions)

(a) Let x∗ be a local minimum of f over X and assume, to arrive at a contradic-


tion, that there exists a vector x ∈ X such that f (x) < f (x∗ ). Then, x and x∗
belong to the set X ∩ Vγ ∗ , where γ ∗ = f (x∗ ). Since this set is convex, the line
segment connecting x∗ and x belongs to the set, implying that
 
f αx + (1 − α)x∗ ≤ f (x∗ ), ∀ α ∈ [0, 1]. (1)
For each integer k ≥ 1, there exists an αk ∈ (0, 1/k] such that
 
f αk x + (1 − αk )x∗ < f (x∗ ), for some αk ∈ (0, 1/k]; (2)
otherwise, in view of Eq. (1), we would have
 that f (x) is constant for x on the
line segment connecting x∗ and (1/k)x + 1 − (1/k) x∗ . Equation (2) contradicts
the local optimality of x∗ .
(b) We consider the level sets
 
Vγ = x | f (x) ≤ γ
for γ > f ∗ . Let {γ k } be a scalar sequence such that γ k ↓ f ∗ . Using the fact
that for two nonempty closed convex sets C and D such that C ⊂ D, we have
RC ⊂ RD , it can be seen that
Rf = ∩γ∈Γ Rγ = ∩∞
k=1 Rγ k .

Similarly, Lf can be written as


Lf = ∩γ∈Γ Lγ = ∩∞
k=1 Lγ k .

Under each of the conditions (1)-(4), we show that the set of minima of f
over X, which is given by
X ∗ = ∩∞
k=1 (X ∩ Vγ k )

is nonempty.
Let condition (1) hold. The sets X ∩ Vγ k are nonempty, closed, convex,
and nested. Furthermore, for each k, their recession cone is given by RX ∩ Rγ k
and their lineality space is given by LX ∩ Lγ k . We have that
∩∞
k=1 (RX ∩ Rγ k ) = RX ∩ Rf ,

and
∩∞
k=1 (LX ∩ Lγ k ) = LX ∩ Lf ,

while by assumption RX ∩ Rf = LX ∩ Lf . Then it follows by Prop. 1.5.5 that


X ∗ is nonempty.
Let condition (2) hold. The sets Vγ k are nested and the intersection X ∩Vγ k
is nonempty for all k. We also have by assumption that RX ∩ Rf ⊂ Lf and X is
specified by linear inequalities. By Prop. 1.5.6, it follows that X ∗ is nonempty.
Let condition (3) hold. The sets Vγ k have the form
Vγ k = {x ∈ n | x Qx + c x + b(γ k ) ≤ 0}.
In view of the assumption that b(γ) is bounded for γ ∈ (f ∗ , γ], we can consider
a subsequence {b(γ k )}K that converges to a scalar. Furthermore, X is specified
by convex quadratic inequalities, and the intersection X ∩ Vγ k is nonempty for
all k ∈ K. By Prop. 1.5.7, it follows that X ∗ is nonempty.
Similarly, under condition (4), the result follows using Exercise 2.7(a).

9
2.9 (Partial Minimization)

(a) The epigraph of f is given by


 
epi(f ) = (x, w) | f (x) ≤ w .

If (x, w) ∈ Ef , then it follows that (x, w) ∈ epi(f ), showing that Ef ⊂ epi(f ).


Next, assume that (x, w) ∈ epi(f ), i.e., f (x) ≤ w. Let {wk } be a sequence with
wk > w for all k , and wk → w. Then we have, f (x) < wk for all k, implying
that (x, wk ) ∈ Ef for all k, and that the limit (x, w) ∈ cl(Ef ). Thus we have the
desired relations,
Ef ⊂ epi(f ) ⊂ cl(Ef ). (2.5)
We next show that f is convex if and only if Ef is convex. By definition, f
is convex if and only if epi(f ) is convex. Assume that epi(f ) is convex. Suppose,
to arrive at a contradiction, that Ef is not convex. This implies the existence of
vectors (x1 , w1 ) ∈ Ef , (x2 , w2 ) ∈ Ef , and a scalar α ∈ (0, 1) such that α(x1 , w1 )+
(1 − α)(x2 , w2 ) ∈/ Ef , from which we get
 
f αx1 + (1 − α)x2 ≥ αw1 + (1 − α)w2
(2.6)
> αf (x1 ) + (1 − α)f (x2 ),

where the second inequality


  1 , w1 ) and (x2 , w2 ) belong
follows from the fact that (x
to Ef . We have x1 , f (x1 ) ∈ epi(f ) and x2 , f (x2 ) ∈ epi(f ). In view of the
   
convexity assumption of epi(f ), this yields α x1 , f (x1 ) + (1 − α) x2 , f (x2 ) ∈
epi(f ) and therefore,

f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 ).

Combined with Eq. (2.6), the preceding relation yields a contradiction, thus
showing that Ef is convex.
Next assume that Ef is convex. We show that epi(f ) is convex. Let
(x1 , w1 ) and (x2 , w2 ) be arbitrary vectors in epi(f ). Consider sequences of vectors
(x1 , w1k ) and (x2 , w2k ) such that w1k > w1 , w2k > w2 , and w1k → w1 , w2k →
w2 . It follows that for each k, (x1 , w1k ) and (x2 , w2k ) belong to Ef . Since Ef is
convex by assumption, this implies that for each α ∈ [0, 1] and all k, the vector
αx1 + (1 − α)x2 , αw1k + (1 − α)w2k ∈ Ef , i.e., we have for each k
 
f αx1 + (1 − α)x2 < αw1k + (1 − α)w2k .

Taking the limit in the preceding relation, we get


 
f αx1 + (1 − α)x2 ≤ αw1 + (1 − α)w2 ,
 
showing that αx1 +(1−α)x2 , αw1 +(1−α)w2 ∈ epi(f ). Hence epi(f ) is convex.
 
(b) Let T denote the projection of the set (x, z, w) | F (x, z) < w on the space
of (x, w). We show that Ef = T . Let (x, w) ∈ Ef . By definition, we have

inf F (x, z) < w,


z∈m

10
which implies that there exists some z ∈ m such that

F (x, z) < w,
 
showing that (x, z, w) belongs to the set (x, z, w) | F (x, z) < w , and (x, w) ∈ T .
Conversely, let (x, w) ∈ T . This implies that there exists some z such that
F (x, z) < w, from which we get

f (x) = infm F (x, z) < w,


z∈

showing that (x, w) ∈ Ef , and completing the proof.


(c) Let F be a convex function. Using
 part (a), the convexity of F implies that
the set (x, z, w) | F (x, z) < w is convex. Since the projection mapping is
linear, and hence preserves convexity, we have, using part (b), that the set Ef is
convex, which implies by part (a) that f is convex.

2.10 (Partial Minimization of Nonconvex Functions)

(a) For each u ∈ m , let fu (x) = f (x, u). There are two cases; either fu ≡ ∞,
or fu is lower semicontinuous with bounded level sets. The first case, which
corresponds to p(u) = ∞, can’t hold for every u, since f is not identically equal
to ∞. Therefore, dom(p) = Ø, and for each u ∈ dom(p), we have by Weierstrass’
Theorem that p(u) = inf x fu (x) is finite [i.e., p(u) > −∞ for all u ∈ dom(p)] and
the set P (u) = arg minx fu (x) is nonempty and compact.
We now show that p is lower semicontinuous. By assumption, for all u ∈
m and for all α ∈ 
 , there exists a neighborhood N of u such that the set
(x, u) | f (x, u) ≤ α ∩ (n × N ) is bounded in n × m . We can choose a
 
smaller closed set N containing u such that the set (x, u) | f (x, u) ≤ α ∩
( × N ) is closed (since f is lower semicontinuous) and bounded. In view of the
n

assumption that fu is lower semicontinuous with bounded level sets, it follows


using Weierstrass’ Theorem that for any scalar α,

p(u) ≤ α if and only if there exists x such that f (x, u) ≤ α.


  
Hence, the set u | p(u) ≤ α ∩ N is the image of the set (x, u) | f (x, u) ≤

α ∩ (n × N ) under the continuous mapping (x, u) → u. Since the image of a
compact
 set undera continuous mapping is compact [cf. Prop. 1.1.9(d)], we see
that u | p(u) ≤ α ∩ N is closed.
 u ∈  is contained in a closed set whose
m
 Thus, each  intersection with
u | p(u) ≤ α is closed, so that the set u | p(u) ≤ α itself is closed for all
scalars α. It follows from Prop. 1.2.2 that p is lower semicontinuous.
(b) Consider the following example
 
min |x − 1/u|, 1 + |x| if u = 0, x ∈ ,
f (x, u) =
1 + |x| if u = 0, x ∈ ,

where x and u are scalars. This function is continuous in (x, u) and the level sets
are bounded in x for each u, but not locally uniformly in u, i.e., there does not

11
 
exists a neighborhood N of u = 0 such that the set (x, u) | u ∈ N, f (x, u) ≤ α
is bounded for some α > 0.
For this function, we have

0 if u = 0,
p(u) =
1 if u = 0.

Hence, the function p is not lower semicontinuous at 0.


(c) Let {uk } be a sequence such that uk → u∗ for some u∗ ∈ dom(p), and also
p(uk ) → p(u∗ ). Let α be any scalar such that p(u∗ ) < α. Since p(uk ) → p(u∗ ),
we obtain
f (xk , uk ) = p(uk ) < α, (2.7)
for all k sufficiently large, where we use the fact that xk ∈ P (uk ) for all k. We
take N to be a closed neighborhood of u∗ as in part (a). Since uk → u∗ , using Eq.
(2.7), we see that for all k sufficiently large, the pair (xk , uk ) lies in the compact
set  
(x, u) | f (x, u) ≤ α ∩ (n × N ).
Hence, the sequence {xk } is bounded, and therefore has a limit point, call it x∗ .
It follows that  
(x∗ , u∗ ) ∈ (x, u) | f (x, u) ≤ α .
Since this is true for arbitrary α > p(u∗ ), we see that f (x∗ , u∗ ) ≤ p(u∗ ), which,
by the definition of p(u), implies that x∗ ∈ P (u∗ ).
(d) By definition, we have p(u) ≤ f (x∗ , u) for all u and p(u∗ ) = f (x∗ , u∗ ). Since
f (x∗ , ·) is continuous at u∗ , we have for any sequence {uk } converging to u∗

lim sup p(uk ) ≤ lim sup f (x∗ , uk ) = f (x∗ , u∗ ) = p(u∗ ),


k→∞ k→∞

thereby implying that p is upper semicontinuous at u∗ . Since p is also lower


semicontinuous at u∗ by part (a), we conclude that p is continuous at u∗ .

2.11 (Projection on a Nonconvex Set)

We define the function f by

w − x if w ∈ C,
f (w, x) =
∞ if w ∈
/ C.

With this identification, we get

dC (x) = inf f (w, x), PC (x) = arg min f (w, x).


w w

We now show that f (w, x) satisfies the assumptions of Exercise 2.10, so that we
can apply the results of this exercise to this problem.
Since the set C is closed by assumption, it follows that f (w, x) is lower
semicontinuous. Moreover, by Weierstrass’ Theorem, we see that f (w, x) > −∞
for all x and w. Since the set C is nonempty by assumption, we also have that

12
dom(f ) is nonempty. It is also straightforward to see that the function  · ,
and therefore the function f , satisfies the locally uniformly level-boundedness
assumption of Exercise 2.10.
(a) Since the function · is lower semicontinuous and the set C is closed, it follows
from Weierstrass’ Theorem that for all x∗ ∈ n , the infimum in inf w f (w, x) is
attained at some w∗ , i.e., P (x∗ ) is nonempty. Hence, we see that for all x∗ ∈ n ,
there exists some w∗ ∈ P (x∗ ) such that f (w∗ , ·) is continuous at x∗ , which
follows by continuity of the function  · . Hence, the function f (w, x) satisfies
the sufficiency condition given in Exercise 2.10(d), and it follows that dC (x)
depends continuously on x.
(b) This part follows from part (a) of Exercise 2.10.
(c) This part follows from part (c) of Exercise 2.10.

2.12 (Convergence of Penalty Methods [RoW98])

(a) We set s = 1/c and consider the function g(x, s) : n ×  → (−∞, ∞] defined
by
 
g(x, s) = f (x) + θ̃ F (x), s ,

with the function θ̃ given by

θ(u, 1/s) if s ∈ (0, s],


θ̃(u, s) = δD (u) if s = 0,
∞ if s < 0 or s > s,

where
0 if u ∈ D,
δD (u) =
∞ if u ∈
/ D.
We identify the original problem with that of minimizing g(x, 0) in x ∈ n , and
the approximate problem for parameter s ∈ (0, s] with that of minimizing g(x, s)
in x ∈ n where s = 1/c. With the notation introduced in Exercise 2.10, the
optimal value of the original problem is given by p(0) and the optimal value of
the approximate problem is given by p(s). Hence, we have

p(s) = infn g(x, s).


x∈

We now show that, for the function g(x, s), the assumptions of Exercise 2.10 are
satisfied.
We have that g(x, s) > −∞ for all (x, s), since by assumption f (x) > −∞
for all x and θ(u, s) > −∞ for all (u, s). The function θ̃ is such that θ̃(u, s) < ∞
at least for one vector (u, s), since the set D is nonempty. Therefore, it follows
that g(x, s) < ∞ for at least one vector (x, s), unless g ≡ ∞, in which case all
the results of this exercise follow trivially.
We now show that the function θ̃ is lower semicontinuous. This is easily
seen at all points where s = 0 in view of the assumption that the function θ is

13
lower semicontinuous on m × (0, ∞). We next consider points where s = 0. We
claim that for any α ∈ ,
   
u | θ̃(u, 0) ≤ α = u | θ̃(u, s) ≤ α . (2.8)
s∈(0,s]

To see this, assume that θ̃(u, 0) ≤ α. Since θ̃(u, s) ↑ θ̃(u, 0) as s ↓ 0, we have


θ̃(u, s) ≤ α for all s ∈ (0, s]. Conversely, assume that θ̃(u, s) ≤ α for all s ∈ (0, s].
By definition of θ̃, this implies that

θ(u, 1/s) ≤ α, ∀ s ∈ (0, s0 ].

Taking the limit as s → 0 in the preceding relation, we get

lim θ(u, 1/s) = δD (u) = θ̃(u, 0) ≤ α,


s→0

thus, proving the relation in (2.8). Note that for all α ∈  and all s ∈ (0, s], the
set    
u | θ̃(u, s) ≤ α = u | θ(u, 1/s) ≤ α ,
is closed by the lower semicontinuity  of the function  θ. Hence, the relation
in Eq. (2.8) implies that the set u | θ̃(u, 0) ≤ α is closed for all α ∈ ,
thus showing that the function θ̃ is lower semicontinuous everywhere (cf. Prop.
1.2.2). Together with the assumptions that f is lower semicontinuous and F is
continuous, it follows that g is lower semicontinuous.
Finally, we show that g satisfies the locally uniform level boundedness
property given in Exercise 2.10, i.e., for all s∗ ∈   and for all α ∈ , there
exists a neighborhood N of s∗ such that the set (x, s) | s ∈ N, g(x, s) ≤ α is
bounded.  By assumption,
 we have that the level sets of the function g(x, s) =
f (x) + θ̃ F (x), 1/s are bounded. The definition of θ̃, together with the fact that
θ̃(u, s) is monotonically increasing as s ↓ 0, implies that g is indeed level-bounded
in x locally uniformly in s.
Therefore, all the assumptions of Exercise 2.10 are satisfied and we get
that the function p is lower semicontinuous in s. Since θ̃(u, s) is monotonically
increasing as s ↓ 0, it follows that p is monotonically nondecreasing as s ↓ 0. This
implies that
p(s) → p(0), as s ↓ 0.
Defining sk = 1/ck for all k, where {ck } is the given sequence of parameter values,
we get
p(sk ) → p(0),
thus proving that the optimal value of the approximate problem converges to the
optimal value of the original problem.
(b) We have by assumption that sk → 0 with xk ∈ P1/sk . It follows from part (a)
that p(sk ) → p(0), so Exercise 2.10(b) implies that the sequence {xk } is bounded
and all its limit points are optimal solutions of the original problem.

14
2.13 (Approximation by Envelope Functions [RoW98])

(a) We fix a c0 ∈ (0, cf ) and consider the function

1
f (w) + ( 2c )w − x2 if c ∈ (0, c0 ],
h(w, x, c) = f (x) if c = 0 and w = x,
∞ otherwise.

We consider the problem of minimizing h(w, x, c) in w. With this identification


and using the notation introduced in Exercise 2.10, for some c ∈ (0, c0 ), we obtain

ec f (x) = p(x, c) = inf h(w, x, c),


w

and
Pc f (x) = P (x, c) = arg min h(w, x, c).
w

We now show that, for the function h(w, x, c), the assumptions given in Exercise
2.10 are satisfied.
We have that h(w, x, c) > −∞ for all (w, x, c), since by assumption f (x) >
−∞ for all x ∈ n . Furthermore, h(w, x, c) < ∞ for at least one vector (w, x, c),
since by assumption f (x) < ∞ for at least one vector x ∈ X.
We next show that the function h is lower semicontinuous in (w, x, c). This
is easily seen at all points where c ∈ (0, c0 ] in view of the assumption that f is
lower semicontinuous and the function  · 2 is  lower semicontinuous.
 We now
consider points where c = 0 and w = x. Let (wk , xk , ck ) be a sequence that
converges to some (w, x, 0) with w = x. We can assume without loss of generality
that wk = xk for all k. Note that for some k, we have

∞ if ck = 0,
h(wk , xk , ck ) = f (wk ) + ( 2c1 )wk − xk 2 if ck > 0.
k

Taking the limit as k → ∞, we have

lim h(wk , xk , ck ) = ∞ ≥ h(w, x, 0),


k→∞

since w = x by assumption. This shows that h is lower semicontinuous at points


where c = 0 and w = x. We finally consider points  where  c = 0 and w = x. At
these points, we have h(w, x, c) = f (x). Let (wk , xk , ck ) be a sequence that
converges to some (w, x, 0) with w= x. Considering  all possibilities, we see that
the limit inferior of the sequence h(wk , xk , ck ) cannot be less than f (x), thus
showing that h is also lower semicontinuous at points where c = 0 and w = x.
Finally, we show that h satisfies the locally uniform level-boundedness prop-
erty given in Exercise 2.10, i.e., for all (x∗ , c∗ ) and for all α ∈ , there exists a
neighborhood N of (x∗ , c∗ ) such that the set (w, x, c) | (x, c) ∈ N, h(w, x, c) ≤

α is bounded. Assume, to arrive at a contradiction, that there exists a sequence
 
(wk , xk , ck ) such that

h(wk , xk , ck ) ≤ α < ∞, (2.9)

15
for some scalar α, with (xk , ck ) → (x∗ , c∗ ), and wk  → ∞. Then, for sufficiently
large k, we have wk = xk , which in view of Eq. (2.9) and the definition of the
function h, implies that ck ∈ (0, c0 ] and

1
f (wk ) + wk − xk 2 ≤ α,
2ck

for all sufficiently large k. In particular, since ck ≤ c0 , it follows from the pre-
ceding relation that
1
f (wk ) + wk − xk 2 ≤ α. (2.10)
2c0
The choice of c0 ensures, through the definition of cf , the existence of some
c1 > c0 , some x ∈ n , and some scalar β such that

1
f (w) ≥ − w − x2 + β, ∀ w.
2c1

Together with Eq. (2.10), this implies that

1 1
− wk − x2 + wk − xk 2 ≤ α − β,
2c1 2c0

for all sufficiently large k. Dividing this relation by wk 2 and taking the limit
as k → ∞, we get
1 1
− + ≤ 0,
2c1 2c0
from which it follows that c1 ≤ c0 . This is a contradiction by our choice of c1 .
Hence, the function h(w, x, c) satisfies all the assumptions of Exercise 2.10.
By assumption, we have that f (x) < ∞ for some x ∈ n . Using the
definition of ec f (x), this implies that

1
ec f (x)= inf f (w) + w − x2
w 2c
1
≤ f (x) + x − x2 < ∞, ∀ x ∈ n ,
2c

where the first inequality is obtained by setting w = x in f (w) + 2c 1


w − x2 .
Together with Exercise 2.10(a), this shows that for every c ∈ (0, c0 ) and all
x ∈ n , the function ec f (x) is finite, and the set Pc f (x) is nonempty and compact.
Furthermore, it can be seen from the definition of h(w, x, c), that for all c ∈ (0, c0 ),
h(w, x, c) is continuous in (x, c). Therefore, it follows from Exercise 2.10(d) that
for all c ∈ (0, c0 ), ec f (x) is continuous in (x, c). In particular, since ec f (x) is a
monotonically decreasing function of c, it follows that

ec f (x) = p(x, c) ↑ p(x, 0) = f (x), ∀ x as c ↓ 0.

This concludes the proof for part (a).


(b) Directly follows from Exercise 2.10(c).

16
2.14 (Envelopes and Proximal Mappings under Convexity [RoW98])

We consider the function gc defined by

1
gc (x, w) = f (w) + w − x2 .
2c

In view of the assumption that f is lower semicontinuous, it follows that gc (x, w)


is lower semicontinuous. We also have that gc (x, w) > −∞ for all (x, w) and
gc (x, w) < ∞ for at least one vector (x, w). Moreover, since f (x) is convex by
assumption, gc (x, w) is convex in (x, w), even strictly convex in w.
Note that by definition, we have

ec f (x) = inf gc (x, w),


w

Pc f (x) = arg min gc (x, w).


w

(a) In order to show that cf is ∞, it suffices to show that ec f (0) > −∞ for all
c > 0. This will follow from Weierstrass’ Theorem, once we show the boundedness
of the level sets of gc (0, ·). Assume the contrary, i.e., there exists some α ∈ 
and a sequence {xk } such that xk  → ∞ and

1
gc (0, xk ) = f (xk ) + xk 2 ≤ α, ∀ k. (2.11)
2c

Assume without loss of generality that xk  > 1 for all k. We fix an x0 with
f (x0 ) < ∞. We define
1
τk = ∈ (0, 1),
xk 
and
xk = (1 − τk )x0 + τk xk .

Since xk  → ∞, it follows that τk → 0. Using Eq. (2.11) and the convexity of
f , we obtain
f (xk )≤ (1 − τk )f (x0 ) + τk f (xk )
τk
≤ (1 − τk )f (x0 ) + τk α − xk 2 .
2c
Taking the limit as k → ∞ in the above equation, we see that f (xk ) → −∞. It
follows from the definitions of τk and xk that

xk  ≤ 1 − τk x0  + τk xk 


≤ x0  + 1.

Therefore, the sequence {xk } is bounded. Since f is lower semicontinuous, Weier-


strass’ Theorem suggests that f is bounded from below on every bounded subset
of n . Since the sequence {xk } is bounded, this implies that the sequence f (xk )
is bounded from below, which contradicts the fact that f (xk ) → ∞. This proves

17
that the level sets of the function gc (0, ·) are bounded. Therefore, using Weier-
strass’ Theorem, we have that the infimum in ec f (0) = inf w gc (0, w) is attained,
and ec f (0) > −∞ for every c > 0. This shows that the supremum cf of all c > 0,
such that ec f (x) > −∞ for some x ∈ n , is ∞.
(b) Since the value cf is equal to ∞ by part (a), it follows that ec f and Pc f have
all the properties given in Exercise 2.13 for all c > 0: The set Pc f (x) is nonempty
and compact, and the function ec f (x) is finite for all x, and is continuous in (x, c).
Consider a sequence {wk } with wk ∈ Pck f (xk ) for some sequences xk → x∗ and
ck → c∗ > 0. Then, it follows from Exercise 2.13(b) that the sequence {wk }
is bounded and all its limit points belong to the set Pc∗ f (x∗ ). Since gc (x, w) is
strictly convex in w, it follows from Prop. 2.1.2 that the proximal mapping Pc f is
single-valued. Hence, we have that Pc f (x) → Pc∗ f (x∗ ) whenever (x, c) → (x∗ , c∗ )
with c∗ > 0.
(c) The envelope function ec f is convex by Exercise 2.15 [since gc (x, w) is convex
in (x, w)], and continuous by Exercise 2.13. We now prove that it is differentiable.
Consider any point x, and let w = Pc f (x). We will show that ec f is differentiable
at x with
(x − w)
∇ec f (x) = .
c
Equivalently, we will show that the function h given by
(x − w) 
h(u) = ec f (x + u) − ec f (x) − u (2.12)
c
is differentiable at 0 with ∇h(0) = 0. Since w = Pc f (x), we have
1
ec f (x) = f (w) + w − x2 ,
2c
whereas
1
ec f (x + u) ≤ f (w) + w − (x + u)2 , ∀ u,
2c
so that
1 1 1 1
h(u) ≤ w − (x + u)2 − w − x2 − (x − w) u = u2 , ∀ u. (2.13)
2c 2c c 2c
Since ec f is convex, it follows from Eq. (2.12) that h is convex, and therefore,
 
1 1 1 1
0 = h(0) = h u + (−u) ≤ h(u) + h(−u),
2 2 2 2
which implies that h(u) ≥ −h(−u). From Eq. (2.13), we obtain
1 1
−h(−u) ≥ −  − u2 = − u2 , ∀ u,
2c 2c
which together with the preceding relation yields
1
h(u) ≥ − u2 , ∀ u.
2c
Thus, we have
1
|h(u)| ≤ u2 , ∀ u,
2c
which implies that h is differentiable at 0 with ∇h(0) = 0. From the formula
for ∇ec f (·) and the continuity of Pc f (·), it also follows that ec is continuously
differentiable.

18
2.15

(a) In view of the assumption that int(C1 ) and C2 are disjoint and convex [cf
Prop. 1.2.1(d)], it follows from the Separating Hyperplane Theorem that there
exists a vector a = 0 such that

a x1 ≤ a  x2 , ∀ x1 ∈ int(C1 ), ∀ x2 ∈ C2 .

Let b = inf x2 ∈C2 a x2 . Then, from the preceding relation, we have

a x ≤ b, ∀ x ∈ int(C1 ). (2.14)

We claim that the closed halfspace {x | a x ≥ b}, which contains C2 , does not
intersect int(C1 ).
Assume to arrive at a contradiction that there exists some x1 ∈ int(C1 )
such that a x1 ≥ b. Since x1 ∈ int(C1 ), we have that there exists some  > 0
such that x1 + a ∈ int(C1 ), and

a (x1 + a) ≥ b + a2 > b.

This contradicts Eq. (2.14). Hence, we have

int(C1 ) ⊂ {x | a x < b}.

(b) Consider the sets

 
C1 = (x1 , x2 ) | x1 = 0 ,
 
C2 = (x1 , x2 ) | x1 > 0, x2 x1 ≥ 1 .

These two sets are convex and C2 is disjoint from ri(C1 ), which is equal to C1 . The
only separating hyperplane is the x2 axis, which corresponds to having a = (0, 1),
as defined in part (a). For this example, there does not exist a closed halfspace
that contains C2 but is disjoint from ri(C1 ).

2.16

If there exists a hyperplane H with the properties stated, the condition M ∩


ri(C) = Ø clearly holds. Conversely, if M ∩ ri(C) = Ø, then M and C can be
properly separated by Prop. 2.4.5. This hyperplane can be chosen to contain
M since M is affine. If this hyperplane contains a point in ri(C), then it must
contain all of C by Prop. 1.4.2. This contradicts the proper separation property,
thus showing that ri(C) is contained in one of the open halfspaces.

19
2.17 (Strong Separation)

(a) We first show that (i) implies (ii). Suppose that C1 and C2 can be separated
strongly. By definition, this implies that for some nonzero vector a ∈ n , b ∈ ,
and  > 0, we have
C1 + B ⊂ {x | a x > b},
C2 + B ⊂ {x | a x < b},
where B denotes the closed unit ball. Since a = 0, we also have

inf{a y | y ∈ B} < 0, sup{a y | y ∈ B} > 0.

Therefore, it follows from the preceding relations that

b ≤ inf{a x + a y | x ∈ C1 , y ∈ B} < inf{a x | x ∈ C1 },

b ≥ sup{a x + a y | x ∈ C2 , y ∈ B} > sup{a x | x ∈ C2 }.


Thus, there exists a vector a ∈ n such that

inf a x > sup a x,


x∈C1 x∈C2

proving (ii).
Next, we show that (ii) implies (iii). Suppose that (ii) holds, i.e., there
exists some vector a ∈ n such that

inf a x > sup a x, (2.15)


x∈C1 x∈C2

Using the Schwartz inequality, we see that

0 < inf a x − sup a x


x∈C1 x∈C2

= inf a (x1 − x2 ),
x1 ∈C1 , x2 ∈C2

≤ inf ax1 − x2 .
x1 ∈C1 , x2 ∈C2

It follows that
inf x1 − x2  > 0,
x1 ∈C1 , x2 ∈C2

thus proving (iii).


Finally, we show that (iii) implies (i). If (iii) holds, we have for some  > 0,

inf x1 − x2  > 2 > 0.


x1 ∈C1 , x2 ∈C2

From this we obtain for all x1 ∈ C1 , all x2 ∈ C2 , and for all y1 , y2 with y1  ≤ ,
y2  ≤ ,

(x1 + y1 ) − (x2 + y2 ) ≥ x1 − x2  − y1  − y2  > 0,

20
which implies that 0 ∈
/ (C1 + B) − (C2 + B). Therefore, the convex sets C1 + B
and C2 + B are disjoint. By the Separating Hyperplane Theorem, we see that
C1 + B and C2 + B can be separated, i.e., C1 + B and C2 + B lie in opposite
closed halfspaces associated with the hyperplane that separates them. Then,
the sets C1 + (/2)B and C2 + (/2)B lie in opposite open halfspaces, which by
definition implies that C1 and C2 can be separated strongly.
(b) Since C1 and C2 are disjoint, we have 0 ∈ / (C1 − C2 ). Any one of conditions
(2)-(5) of Prop. 2.4.3 imply condition (1) of that proposition (see the discussion
in the proof of Prop. 2.4.3), which states that the set C1 − C2 is closed, i.e.,

cl(C1 − C2 ) = C1 − C2 .

/ cl(C1 − C2 ), which implies that


Hence, we have 0 ∈

inf x1 − x2  > 0.


x1 ∈C1 , x2 ∈C2

From part (a), it follows that there exists a hyperplane separating C1 and C2
strongly.

2.18

(a) If C1 and C2 can be separated properly, we have from the Proper Separation
Theorem that there exists a vector a = 0 such that

inf a x ≥ sup a x, (2.16)


x∈C1 x∈C2

sup a x > inf a x. (2.17)


x∈C1 x∈C2

Let
b = sup a x. (2.18)
x∈C2

and consider the hyperplane

H = {x | a x = b}.

Since C2 is a cone, we have

λa x = a (λx) ≤ b < ∞, ∀ x ∈ C2 , ∀ λ > 0.

This relation implies that a x ≤ 0, for all x ∈ C2 , since otherwise it is possible to


choose λ large enough and violate the above inequality for some x ∈ C2 . Hence,
it follows from Eq. (2.18) that b ≤ 0. Also, by letting λ → 0 in the preceding
relation, we see that b ≥ 0. Therefore, we have that b = 0 and the hyperplane H
contains the origin.
(b) If C1 and C2 can be separated strictly, we have by definition that there exists
a vector a = 0 and a scalar β such that

a x2 < β < a x1 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2 . (2.19)

21
We choose b to be
b = sup a x, (2.20)
x∈C2

and consider the closed halfspace

K = {x | a x ≤ b},

which contains C2 . By Eq. (2.19), we have

b ≤ β < a x, ∀ x ∈ C1 ,

so the closed halfspace K does not intersect C1 .


Since C2 is a cone, an argument similar to the one in part (a) shows that
b = 0, and hence the hyperplane associated with the closed halfspace K passes
through the origin, and has the desired properties.

2.19 (Separation Properties of Cones)

(a) C is contained in the intersection of the homogeneous closed halfspaces that


contain C, so we focus on proving the reverse inclusion. Let x ∈ / C. Since C is
closed and convex by assumption, by using the Strict Separation Theorem, we
see that the sets C and {x} can be separated strictly. From Exercise 2.18(c), this
implies that there exists a hyperplane that passes through the origin such that
one of the associated closed halfspaces contains C, but is disjoint from x. Hence,
if x ∈
/ C, then x cannot belong to the intersection of the homogeneous closed
halfspaces containing C, proving that C contains that intersection.
(b) A homogeneous halfspace is in particular a closed convex cone containing 
the origin, and such a cone includes X if and only if it includes cl cone(X) .
Hence, the intersection of all closed homogeneous halfspaces containing
 Xand
the intersection of all closed homogeneous halfspaces containing cl cone(X) co-
incide.
 From what has been proved in part(a), the latter intersection is equal to
cl cone(X) .

2.20 (Convex System Alternatives)

(a) Consider the set


 
C = u | there exists an x ∈ X such that gj (x) ≤ uj , j = 1, . . . , r ,

which may be viewed as the projection of the set


 
M = (x, u) | x ∈ X, gj (x) ≤ uj , j = 1, . . . , r

on the space of u. Let us denote this linear transformation by A. It can be seen


that  
RM ∩ N (A) = (y, 0) | y ∈ RX ∩ Rg1 · · · ∩ Rgr ,

22
where RM denotes the recession cone of set M . Similarly, we have
 
LM ∩ N (A) = (y, 0) | y ∈ LX ∩ Lg1 · · · ∩ Lgr ,

where LM denotes the lineality space of set M . Under conditions (1), (2), and
(3), it follows from Prop. 1.5.8 that the set AM = C is closed. Similarly, under
condition (4), it follows from Exercise 2.7(b) that the set AM = C is closed.
By assumption, there is no vector x ∈ X such that

g1 (x) ≤ 0, . . . , gr (x) ≤ 0.

This implies that the origin does not belong to C. Therefore, by the Strict Sep-
aration Theorem, it follows that there exists a hyperplane that strictly separates
the origin and the set C, i.e., there exists a vector µ such that

0 <  ≤ µ u, ∀ u ∈ C. (2.21)

This equation implies that µ ≥ 0 since for each


 u ∈ C, we have  that (u1 , . . . , uj +
γ, . . . , ur ) ∈ C for all j and γ > 0. Since g1 (x), . . . , gr (x) ∈ C for all x ∈ X,
Eq. (2.21) yields

µ1 g1 (x) + · · · + µr gr (x) ≥ , ∀ x ∈ X. (2.22)

(b) Assume that there is no vector x ∈ X such that

g1 (x) ≤ 0, . . . , gr (x) ≤ 0.

This implies by part (a) that there exists a positive scalar , and a vector µ ∈ r
with µ ≥ 0, such that

µ1 g1 (x) + · · · + µr gr (x) ≥ , ∀ x ∈ X.

Let x be an arbitrary vector in X and let j(x) be the smallest index that satisfies
j(x) = arg maxj=1,...,r gj (x). Then Eq. (2.22) implies that for all x ∈ X


r

r

r

≤ µj gj (x) ≤ µj gj(x) (x) = gj(x) (x) µj .


j=1 j=1 j=1

Hence, for all x ∈ X, there exists some j(x) such that



gj(x) (x) ≥ r > 0.
j=1
µj

This contradicts the statement that for every  > 0, there exists a vector x ∈ X
such that
g1 (x) < , . . . , gr (x) < ,
and concludes the proof.

23
2.21

C is contained in the intersection of the closed halfspaces that contain C and


correspond to nonvertical hyperplanes, so we focus on proving the reverse inclu-
sion. Let x ∈/ C. Since by assumption C does not contain any vertical lines, we
can apply Prop. 2.5.1, and we see that there exists a closed halfspace that cor-
respond to a nonvertical hyperplane, containing C but not containing x. Hence,
if x ∈
/ C, then x cannot belong to the intersection of the closed halfspaces con-
taining C and corresponding to nonvertical hyperplanes, proving that C contains
that intersection.

2.22 (Min Common/Max Crossing Duality)

(a) Let us denote the optimal value of the min common point problem and the
∗ ∗
max crossing point problem corresponding to conv(M ) by wconv(M ) and qconv(M ) ,
respectively. In view of the assumption that M is compact, it follows from Prop.
1.3.2 that the set conv(M ) is compact. Therefore, by Weierstrass’ Theorem,

wconv(M ) , defined by

wconv(M ) = inf w
(0,w)∈conv(M )

is finite. It can also be seen that the set


 
conv(M ) = (u, w) | there exists w with w ≤ w and (u, w) ∈ conv(M )

is convex. Indeed, we consider vectors (u, w) ∈ conv(M ) and (ũ, w̃) ∈ conv(M ),
and we show that their convex combinations lie in conv(M ). The definition of
conv(M ) implies that there exists some wM and w̃M such that

wM ≤ w, (u, wM ) ∈ conv(M ),

w̃M ≤ w̃, (ũ, w̃M ) ∈ conv(M ).


For any α ∈ [0, 1], we multiply these relations with α and (1 − α), respectively,
and add. We obtain

αwM + (1 − α)w̃M ≤ αw + (1 − α)w̃.

In view of the convexity of conv(M ), we have α(u, wM ) + (1 − α)(ũ, w̃M ) ∈


conv(M ), so these equations imply that the convex combination of (u, w) and
(ũ, w̃) belongs to conv(M ). This proves the convexity of conv(M ).
 Using
 the compactness of conv(M ), it can be shown that for every sequence

(uk , wk ) ⊂ conv(M ) with uk → 0, there holds wconv(M ) ≤ lim inf k→∞ wk .
 
Let (uk , wk ) ⊂ conv(M ) be a sequence with uk → 0. Since conv(M ) is
 
compact, the sequence (uk , wk ) has a subsequence that converges to some
 
(0, w) ∈ conv(M ). Assume without loss of generality that (uk , wk ) converges
to (0, w). Since (0, w) ∈ conv(M ), we get

wconv(M ) ≤ w = lim inf wk .
k→∞

24
Therefore, by Min Common/Max Crossing Theorem I, we have

∗ ∗
wconv(M ) = qconv(M ) . (2.23)

Let q ∗ be the optimal value of the max crossing point problem corresponding to
M , i.e.,
q ∗ = sup q(µ),
µ∈n

where for all µ ∈ n


q(µ) = inf {w + µ u}.
(u,w)∈M

We will show that q ∗ = wconv(M



) . For every µ ∈  , q(µ) can be expressed as
n

q(µ) = inf x∈M c x, where c = (µ, 1) and x = (u, w). From Exercise 2.23, it follows
that minimization of a linear function over a set is equivalent to minimization
over its convex hull. In particular, we have

q(µ) = inf c x = inf c x,


x∈X x∈conv(X)

from which using Eq. (2.23), we get

q ∗ = qconv(M
∗ ∗
) = wconv(M ) ,

proving the desired claim.


(b) The function f is convex by the result of Exercise 2.23. Furthermore, for all
x ∈ dom(f ), the infimum in the definition of
 f (x) is attained. The reason is that,
for x ∈ dom(f ), the set w | (x, w) ∈ M is closed and bounded below, since
 
M is closed and does not contain a halfline of the form (x, w + α) | α ≤ 0 .
Thus, we have f (x) > −∞ for all x ∈ dom(f ), while dom(f ) is nonempty, since
M is nonempty in the min common/max crossing framework. It follows that f
is proper. Furthermore, by its definition, M is the epigraph of f . Finally, to
show that f is closed, we argue by contradiction. If f is not closed, there exists
a vector x and a sequence {xk } that converges to x and is such that

f (x) > lim f (xk ).


k→∞

We claim that limk→∞ f (xk ) is finite, i.e., that limk→∞ f (xk ) > −∞. Indeed, by
Prop. 2.5.1, the epigraph of f is contained in the upper halfspace
 of anonvertical
hyperplane of n+1 . Since {xk } converges to x, the limit of f (xk ) cannot be
 
equal to −∞. Thus the sequence xk , f (xk ) , which belongs to M , converges to
   
x, limk→∞ f (xk ) Therefore, since M is closed, x, limk→∞ f (xk ) ∈ M . By the
definition of f , this implies that f (x) ≤ limk→∞ f (xk ), contradicting our earlier
hypothesis.
(c) We prove this result by showing that all the assumptions of Min Com-
mon/Max Crossing Theorem I are satisfied. By assumption, w∗ < ∞ and
the set M is convex. Therefore, we only need to show that for every sequence
{uk , wk } ⊂ M with uk → 0, there holds w∗ ≤ lim inf k→∞ wk .

25
Consider a sequence {uk , wk } ⊂ M with uk → 0. If lim inf k→∞ wk = ∞,
then we are done, so assume that lim inf k→∞ wk = w̃ for some scalar w̃. Since
M ⊂ M and M is closed by assumption, it follows that (0, w̃) ∈ M . By the
definition of the set M , this implies that there exists some w with w ≤ w̃ and
(0, w) ∈ M . Hence we have

w∗ = inf w ≤ w ≤ w̃ = lim inf wk ,


(0,w)∈M k→∞

proving the desired result, and thus showing that q ∗ = w∗ .

2.23 (An Example of Lagrangian Duality)

(a) The corresponding max crossing problem is given by

q ∗ = sup q(µ),
µ∈m

where q(µ) is given by




m

q(µ) = inf {w + µ u} = inf f (x) + µi (ei x − di ) .


(u,w)∈M x∈X
i=1

(b) Consider the set


  
M= u1 , . . . , um , w | ∃ x ∈ X such that ei x − di = ui , ∀ i, f (x) ≤ w .

We show that M is convex. To this end, we consider vectors (u, w) ∈ M and


(ũ, w̃) ∈ M , and we show that their convex combinations lie in M . The definition
of M implies that for some x ∈ X and x̃ ∈ X, we have

f (x) ≤ w, ei x − di = ui , i = 1, . . . , m,
f (x̃) ≤ w̃, ei x̃ − di = ũi , i = 1, . . . , m.

For any α ∈ [0, 1], we multiply these relations with α and 1-α, respectively, and
add. By using the convexity of f , we obtain
 
f αx + (1 − α)x̃ ≤ αf (x) + (1 − α)f (x̃) ≤ αw + (1 − α)w̃,
 
ei αx + (1 − α)x̃ − di = αui + (1 − α)ũi , i = 1, . . . , m.
In view of the convexity of X, we have αx+(1−α)x̃ ∈ X, so these equations imply
that the convex combination of (u, w) and (ũ, w̃) belongs to M , thus proving that
M is convex.
(c) We prove this result by showing that all the assumptions of Min Com-
mon/Max Crossing Theorem I are satisfied. By assumption, w∗ is finite. It
follows from part (b) that the set M is convex. Therefore, we only need to

26
 
show that for every sequence (uk , wk ) ⊂ M with uk → 0, there holds w∗ ≤
lim inf k→∞ wk .  
Consider a sequence (uk , wk ) ⊂ M with uk → 0. Since X is compact
and f is convex by assumption (which implies that f is continuous by Prop.
1.4.6),
 it follows
 from Prop. 1.1.9(c) that set M is compact. Hence, the sequence
(uk , wk ) has a subsequence that converges to some (0, w) ∈ M . Assume with-
 
out loss of generality that (uk , wk ) converges to (0, w). Since (0, w) ∈ M , we
get
w∗ = inf w ≤ w = lim inf wk ,
(0,w)∈M k→∞

proving the desired result, and thus showing that q ∗ = w∗ .


(d) We prove this result by showing that all the assumptions of Min Com-
mon/Max Crossing Theorem II are satisfied. By assumption, w∗ is finite. It
follows from part (b) that the set M is convex. Therefore, we only need to show
that the set  
D = (e1 x − d1 , . . . , em x − dm ) | x ∈ X
contains the origin in its relative interior. The set D can equivalently be written
as
D = E · X − d,
where E is a matrix, whose rows are the vectors ei , i = 1, . . . , m, and d is a
vector with entries equal to di , i = 1, . . . , m. By Prop. 1.4.4 and Prop. 1.4.5(b),
it follows that
ri(D) = E · ri(X) − d.
Hence the assumption that there exists a vector x ∈ ri(X) such that Ex − d = 0
implies that 0 belongs to the relative interior of D, thus showing that q ∗ = w∗
and that the max crossing problem has an optimal solution.

2.24 (Saddle Points in Two Dimensions)

We consider a function φ of two real variables x and z taking values in compact


intervals X and Z, respectively. We assume that for each z ∈ Z, the function
φ(·, z) is minimized over X at a unique point denoted x̂(z), and for each x ∈ X,
the function φ(x, ·) is maximized over Z at a unique point denoted ẑ(x),

x̂(z) = arg min φ(x, z), ẑ(x) = arg max φ(x, z).
x∈X z∈Z

Consider the composite function f : X → X given by


 
f (x) = x̂ ẑ(x) ,

which is a continuous function in view of the assumption that the functions x̂(z)
and ẑ(x) are continuous over Z and X, respectively. Assume that the compact
interval X is given by [a, b]. We now show that the function f has a fixed point,
i.e., there exists some x∗ ∈ [a, b] such that

f (x∗ ) = x∗ .

27
Define the function g : X → X by

g(x) = f (x) − x.

Assume that f (a) > a and f (b) < b, since otherwise we are done. We have

g(a) = f (a) − a > 0,

g(b) = f (b) − b < 0.


Since g is a continuous function, the preceding relations imply that there exists
some x∗ ∈ (a, b) such that g(x∗ ) = 0, i.e., f (x∗ ) = x∗ . Hence, we have
 
x̂ ẑ(x∗ ) = x∗ .

Denoting ẑ(x∗ ) by z ∗ , we get

x∗ = x̂(z ∗ ), z ∗ = ẑ(x∗ ). (2.24)

By definition, a pair (x, z) is a saddle point if and only if

max φ(x, z) = φ(x, z) = min φ(x, z),


z∈Z x∈X

or equivalently, if x = x̂(z) and z = ẑ(x). Therefore, from Eq. (2.24), we see that
(x∗ , z ∗ ) is a saddle point of φ.
We now consider the function φ(x, z) = x2 + z 2 over X = [0, 1] and Z =
[0, 1]. For each z ∈ [0, 1], the function φ(·, z) is minimized over [0, 1] at a unique
point x̂(z) = 0, and for each x ∈ [0, 1], the function φ(x, ·) is maximized over
[0, 1] at a unique point ẑ(x) = 1. These two curves intersect at (x∗ , z ∗ ) = (0, 1),
which is the unique saddle point of φ.

2.25 (Saddle Points of Quadratic Functions)

Let X and Z be closed and convex sets. Then, for each z ∈ Z, the function
tz : n → (−∞, ∞] defined by

φ(x, z) if x ∈ X,
tz (x) =
∞ otherwise,

is closed and convex in view of the assumption that Q is a positive semidefinite


symmetric matrix. Similarly, for each x ∈ X, the function rx : m →  (−∞, ∞]
defined by
−φ(x, z) if z ∈ Z,
rx (z) =
∞ otherwise,
is closed and convex in view of the assumption that R is a positive semidefinite
symmetric matrix. Hence, Assumption 2.6.1 is satisfied. Let also Assumptions
2.6.2 and 2.6.3 hold, i.e,
inf sup φ(x, z) < ∞,
x∈X z∈Z

28
and
−∞ < sup inf φ(x, z).
z∈Z x∈X

By the positive semidefiniteness of Q, it can be seen that, for each z ∈ Z, the


recession cone of the function tz is given by

Rtz = RX ∩ N (Q) ∩ {y | y  Dz ≤ 0},

where RX is the recession cone of the convex set X and N (Q) is the null space
of the matrix Q. Similarly, for each z ∈ Z, the constancy space of the function
tz is given by
Ltz = LX ∩ N (Q) ∩ {y | y  Dz = 0},

where LX is the lineality space of the set X. By the positive semidefiniteness of


R, for each x ∈ X, it can be seen that the recession cone of the function rx is
given by
Rrx = RZ ∩ N (R) ∩ {y | x Dy ≥ 0},
where RZ is the recession cone of the convex set Z and N (R) is the null space of
the matrix R. Similarly, for each x ∈ X, the constancy space of the function rx
is given by
Lrx = LZ ∩ N (R) ∩ {y | x Dy = 0},
where LZ is the lineality space of the set Z.
If
Rtz = {0}, and Rrx = {0}, (2.25)
z∈Z x∈X

then it follows from the Saddle Point Theorem part (a), that the set of saddle
points of φ is nonempty and compact. [In particular, the condition given in Eq.
(2.25) holds when Q and R are positive definite matrices, or if X and Z are
compact.]
Similarly, if

Rtz = Ltz , and Rrx = Lrx ,


z∈Z z∈Z x∈X x∈X

then it follows from the Saddle Point Theorem part (b), that the set of saddle
points of φ is nonempty.

29
Convex Analysis and
Optimization
Chapter 3 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE April 3, 2004

CHAPTER 3: SOLUTION MANUAL

3.1 (Cone Decomposition Theorem)

(a) Let x̂ be the projection of x on C, which exists and is unique since C is closed
and convex. By the Projection Theorem (Prop. 2.2.1), we have

(x − x̂) (y − x̂) ≤ 0, ∀ y ∈ C.

Since C is a cone, we have (1/2)x̂ ∈ C and 2x̂ ∈ C, and by taking y = (1/2)x̂


and y = 2x̂ in the preceding relation, it follows that

(x − x̂) x̂ = 0.

By combining the preceding two relations, we obtain

(x − x̂) y ≤ 0, ∀ y ∈ C,

implying that x − x̂ ∈ C ∗ .
Conversely, if x̂ ∈ C, (x − x̂) x̂ = 0, and x − x̂ ∈ C ∗ , then it follows that

(x − x̂) (y − x̂) ≤ 0, ∀ y ∈ C,

and by the Projection Theorem, x̂ is the projection of x on C.


(b) Suppose that property (i) holds, i.e., x1 and x2 are the projections of x on C
and C ∗ , respectively. Then, by part (a), we have

x1 ∈ C, (x − x1 ) x1 = 0, x − x1 ∈ C ∗ .

Let y = x − x1 , so that the preceding relation can equivalently be written as

x − y ∈ C = (C ∗ )∗ , y  (x − y) = 0, y ∈ C ∗.

By using part (a), we conclude that y is the projection of x on C ∗ . Since by the


Projection Theorem, the projection of a vector on a closed convex set is unique,
it follows that y = x2 . Thus, we have x = x1 + x2 and in view of the preceding
two relations, we also have x1 ∈ C, x2 ∈ C ∗ , and x1 x2 = 0. Hence, property (ii)
holds.
Conversely, suppose that property (ii) holds, i.e., x = x1 + x2 with x1 ∈ C,
x2 ∈ C ∗ , and x1 x2 = 0. Then, evidently the relations

x1 ∈ C, (x − x1 ) x1 = 0, x − x1 ∈ C ∗ ,

x2 ∈ C ∗ , (x − x2 ) x2 = 0, x − x2 ∈ C
are satisfied, so that by part (a), x1 and x2 are the projections of x on C and
C ∗ , respectively. Hence, property (i) holds.

2
3.2
 
If a ∈ C ∗ + x | x ≤ γ/β , then

a = â + a with â ∈ C ∗ and a ≤ γ/β.

Since C is a closed convex cone, by the Polar Cone Theorem (Prop. 3.1.1), we
have (C ∗ )∗ = C, implying that for all x in C with x ≤ β,

â x ≤ 0 and a x ≤ a · x ≤ γ.

Hence,
a x = (â + a) x ≤ γ, ∀ x ∈ C with x ≤ β,
thus implying that
max a x ≤ γ.
x≤β, x∈C

Conversely, assume that a x ≤ γ for all x ∈ C with x ≤ β. Let â and a


be the projections of a on C ∗ and C, respectively. By the Cone Decomposition
Theorem (cf. Exercise 3.1), we have a = â + a with â ∈ C ∗ , a ∈ C, and â a = 0.
Since a x ≤ γ for all x ∈ C with x ≤ β and a ∈ C, we obtain

a a
a β = (â + a) β = aβ ≤ γ,
a a
 
implying that a ≤ γ/β, and showing that a ∈ C ∗ + x | x ≤ γ/β .

3.3

Note that aff(C) is a subspace of n because C is a cone in n . We first show


that  ⊥
LC ∗ = aff(C) .
Let y ∈ LC ∗ . Then, by the definition of the lineality space (see Chapter 1), both
vectors y and −y belong to the recession cone RC ∗ . Since 0 ∈ C ∗ , it follows that
0 + y and 0 − y belong to C ∗ . Therefore,

y  x ≤ 0, (−y) x ≤ 0, ∀ x ∈ C,

implying that
y  x = 0, ∀ x ∈ C. (3.1)
Let the dimension of the subspace aff(C) be m. By Prop. 1.4.1, there exist vectors
x0 , x1 , . . . , xm in ri(C) such that x1 − x0 , . . . , xm − x0 span aff(C). Thus, for any
z ∈ aff(C), there exist scalars β1 , . . . , βm such that


m

z= βi (xi − x0 ).
i=1

3
By using this relation and Eq. (3.1), for any z ∈ aff(C), we obtain


m

y z = βi y  (xi − x0 ) = 0,
i=1

 ⊥  ⊥
implying that y ∈ aff(C) . Hence, LC ∗ ⊂ aff(C) .
 ⊥
Conversely, let y ∈ aff(C) , so that in particular, we have

y  x = 0, (−y) x = 0, ∀ x ∈ C.

Therefore, 0 + αy ∈ C ∗ and 0 + α(−y) ∈ C ∗ for all α ≥ 0, and since C ∗ is a closed


convex set, by the Recession Cone Theorem(b) [Prop. 1.5.1(b)], it follows that y
and −y belong to the recession cone RC ∗ . Hence, y belongs to the lineality space
 ⊥
of C ∗ , showing that aff(C) ⊂ LC ∗ and completing the proof of the equality
 ⊥
LC ∗ = aff(C) .
   ⊥
By definition, we have dim(C) = dim aff(C) and since LC ∗ = aff(C) ,
 ⊥ 
we have dim(LC ∗ ) = dim aff(C) . This implies that

dim(C) + dim(LC ∗ ) = n.

By replacing C with C ∗ in the preceding relation, and by using the Polar


Cone Theorem (Prop. 3.1.1), we obtain
   
dim(C ∗ ) + dim L(C ∗ )∗ = dim(C ∗ ) + dim Lcl(conv(C)) = n.

Furthermore, since
Lconv(C) ⊂ Lcl(conv(C)) ,

it follows that
   
dim(C ∗ ) + dim Lconv(C) ≤ dim(C ∗ ) + dim Lcl(conv(C)) = n.

3.4 (Polar Cone Operations)

(a) It suffices to consider the case where m = 2. Let (y1 , y2 ) ∈ (C1 × C2 )∗ . Then,
we have (y1 , y2 ) (x1 , x2 ) ≤ 0 for all (x1 , x2 ) ∈ C1 × C2 , or equivalently

y1 x1 + y2 x2 ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 .

Since C2 is a cone, 0 belongs to its closure, so by letting x2 → 0 in the preceding


relation, we obtain y1 x1 ≤ 0 for all x1 ∈ C1 , showing that y1 ∈ C1∗ . Similarly, we
obtain y2 ∈ C2∗ , and therefore (y1 , y2 ) ∈ C1∗ × C2∗ , implying that (C1 × C2 )∗ ⊂
C1∗ × C2∗ .

4
Conversely, let y1 ∈ C1∗ and y2 ∈ C2∗ . Then, we have

(y1 , y2 ) (x1 , x2 ) = y1 x1 + y2 x2 ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 ,

implying that (y1 , y2 ) ∈ (C1 × C2 )∗ , and showing that C1∗ × C2∗ ⊂ (C1 × C2 )∗ .
(b) A vector y belongs to the polar cone of ∪i∈I Ci if and only if y  x ≤ 0 for all
x ∈ Ci and all i ∈ I, which
∗ is equivalent to having y ∈ Ci∗ for every i ∈ I. Hence,
y belongs to ∪i∈I Ci if and only if y belongs to ∩i∈I Ci∗ .
(c) Let y ∈ (C1 + C2 )∗ , so that

y  (x1 + x2 ) ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 . (3.2)

Since the zero vector is in the closures of C1 and C2 , by letting x2 → 0 with


x2 ∈ C2 in Eq. (3.2), we obtain

y  x1 ≤ 0, ∀ x1 ∈ C1 ,

and similarly, by letting x1 → 0 with x1 ∈ C1 in Eq. (3.2), we obtain

y  x2 ≤ 0, ∀ x2 ∈ C2 .

Thus, y ∈ C1∗ ∩ C2∗ , showing that (C1 + C2 )∗ ⊂ C1∗ ∩ C2∗ .


Conversely, let y ∈ C1∗ ∩ C2∗ . Then, we have

y  x1 ≤ 0, ∀ x1 ∈ C1 ,

y  x2 ≤ 0, ∀ x2 ∈ C2 ,
implying that

y  (x1 + x2 ) ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 .

Hence y ∈ (C1 + C2 )∗ , showing that C1∗ ∩ C2∗ ⊂ (C1 + C2 )∗ .


(d) Since C1 and C2 are closed convex cones, by the Polar Cone Theorem (Prop.
3.1.1) and by part (b), it follows that

C1 ∩ C2 = (C1∗ )∗ ∩ (C2∗ )∗ = (C1∗ + C2∗ )∗ .

By taking the polars and by using the Polar Cone Theorem, we obtain
 ∗  
(C1 ∩ C2 )∗ = (C1∗ + C2∗ )∗ = cl conv(C1∗ + C2∗ ) .

The cone C1∗ + C2∗ is convex, so that

(C1 ∩ C2 )∗ = cl(C1∗ + C2∗ ).

Suppose now that ri(C1 ) ∩ ri(C2 ) = Ø. We will show that C1∗ + C2∗ is
closed by using Exercise 1.43. According to this exercise, if for any nonempty
closed convex sets C 1 and C 2 in n , the equality y1 + y2 = 0 with y1 ∈ RC and
1

5
y2 ∈ RC implies that y1 and y2 belong to the lineality spaces of C 1 and C 2 ,
2
respectively, then the vector sum C 1 + C 2 is closed.
Let y1 + y2 = 0 with y1 ∈ RC ∗ and y2 ∈ RC ∗ . Because C1∗ and C2∗ are
1 2
closed convex cones, we have RC ∗ = C1∗ and RC ∗ = C2∗ , so that y1 ∈ C1∗ and
1 2
y2 ∈ C2∗ . The lineality space of a cone is the set of vectors y such that y and
−y belong to the cone, so that in view of the preceding discussion, to show that
C1∗ + C2∗ is closed, it suffices to prove that −y1 ∈ C1∗ and −y2 ∈ C2∗ .
Since y1 = −y2 and y1 ∈ C1∗ , it follows that

y2 x ≥ 0, ∀ x ∈ C1 , (3.3)

and because y2 ∈ C2∗ , we have

y2 x ≤ 0, ∀ x ∈ C2 ,

which combined with the preceding relation yields

y2 x = 0, ∀ x ∈ C1 ∩ C2 . (3.4)

In view of the fact ri(C1 ) ∩ ri(C2 ) = Ø, and Eqs. (3.3) and (3.4), it follows that
the linear function y2 x attains its minimum over the convex set C1 at a point in
the relative interior of C1 , implying that y2 x = 0 for all x ∈ C1 (cf. Prop. 1.4.2).
Therefore, y2 ∈ C1∗ and since y2 = −y1 , we have −y1 ∈ C1∗ . By exchanging the
roles of y1 and y2 in the preceding analysis, we similarly show that −y2 ∈ C2∗ ,
completing the proof.
(e) By drawing the cones C1 and C2 , it can be seen that ri(C1 ) ∩ ri(C2 ) = Ø and
 
C1 ∩ C2 = (x1 , x2 , x3 ) | x1 = 0, x2 = −x3 , x3 ≤ 0 ,

 
C1∗ = (y1 , y2 , y3 ) | y12 + y22 ≤ y32 , y3 ≥ 0 ,
 
C2∗ = (z1 , z2 , z3 ) | z1 = 0, z2 = z3 .

Clearly, x1 + x2 + x3 = 0 for all x ∈ C1 ∩ C2 , implying that (1, 1, 1) ∈ (C1 ∩ C2 )∗ .


Suppose that (1, 1, 1) ∈ C1∗ + C2∗ , so that (1, 1, 1) = (y1 , y2 , y3 ) + (z1 , z2 , z3 ) for
some (y1 , y2 , y3 ) ∈ C1∗ and (z1 , z2 , z3 ) ∈ C2∗ , implying that y1 = 1, y2 = 1 − z2 ,
y3 = 1 − z2 for some z2 ∈ . However, this point does not belong to C1∗ ,
which is a contradiction. Therefore, (1, 1, 1) is not in C1∗ + C2∗ . Hence, when
ri(C1 ) ∩ ri(C2 ) = Ø, the relation

(C1 ∩ C2 )∗ = C1∗ + C2∗

may fail.

6
3.5 (Linear Transformations and Polar Cones)

We have y ∈ (AC)∗ if and only if y  Ax ≤ 0 for all x ∈ C, which is equivalent


to (A y) x ≤ 0 for all x ∈ C. This is in turn equivalent to A y ∈ C ∗ . Hence,
y ∈ (AC)∗ if and only if y ∈ (A )−1 · C ∗ , showing that

(AC)∗ = (A )−1 · C ∗ . (3.5)

We next show that for a closed convex cone K ⊂ m , we have


 ∗
A−1 · K = cl(A K ∗ ).
 ∗
Let y ∈ A−1 · K and to arrive at a contradiction, assume that y ∈ cl(A K ∗ ).
By the Strict Separation Theorem (Prop. 2.4.3), the closed convex cone cl(A K ∗ )
and the vector y can be strictly separated, i.e., there exist a vector a ∈ n and
a scalar b such that

a x < b < a y, ∀ x ∈ cl(A K ∗ ).

If a x > 0 for some x ∈ cl(A K ∗ ), then since cl(A K ∗ ) is a cone, we would


have λx ∈ cl(A K ∗ ) for all λ > 0, implying that a (λx) → ∞ when λ → ∞,
which contradicts the preceding relation. Thus, we must have a x ≤ 0 for all
x ∈ cl(A K ∗ ), and since 0 ∈ cl(A K ∗ ), it follows that

sup a x = 0 ≤ b < a y. (3.6)


x∈cl(A K ∗ )

 ∗  ∗
Therefore, a ∈ cl(A K ∗ ) , and since cl(A K ∗ ) ⊂ (A K ∗ )∗ , it follows that
a ∈ (A K ∗ )∗ . In view of Eq. (3.5) and the Polar Cone Theorem (Prop. 3.1.1), we
have
(A K ∗ )∗ = A−1 (K ∗ )∗ = A−1 · K,
 ∗
implying that a ∈ A−1 · K. Because y ∈ A−1 · K , it follows that y  a ≤ 0,
contradicting Eq. (3.6). Hence, we must have y ∈ cl(A K ∗ ), showing that
 ∗
A−1 · K ⊂ cl(A K ∗ ).

To show the reverse inclusion, let y ∈ A K ∗ and assume, to arrive at a con-


tradiction, that y ∈ (A−1 · K)∗ . By the Strict Separation Theorem (Prop. 2.4.3),
the closed convex cone (A−1 · K)∗ and the vector y can be strictly separated, i.e.,
there exist a vector a ∈ n and a scalar b such that

a x < b < a y, ∀ x ∈ (A−1 · K)∗ .

Similar to the preceding analysis, since (A−1 · K)∗ is a cone, it can be seen that

sup a x = 0 ≤ b < a y, (3.7)


x∈(A−1 ·K)∗

7
 ∗
implying that a ∈ (A−1 ·K)∗ . Since K is a closed convex cone and A is a linear
(and therefore continuous) transformation, the set A−1 ·K
 is a closedconvex cone.

Furthermore, by the Polar Cone Theorem, we have that (A−1 ·K)∗ = A−1 ·K.
Therefore, a ∈ A−1 · K, implying that Aa ∈ K. Since y ∈ A K ∗ , we have y = A v
for some v ∈ K ∗ , and it follows that

y  a = (A v) a = v  Aa ≤ 0,

contradicting Eq. (3.7). Hence, we must have y ∈ (A−1 · K)∗ , implying that

A K ∗ ⊂ (A−1 · K)∗ .

Taking the closure of both sides of this relation, we obtain

cl(A K ∗ ) ⊂ (A−1 · K)∗ ,

completing the proof.


Suppose that ri(K ∗ ) ∩ R(A) = Ø. We will show that the cone A K ∗ is
closed by using Exercise 1.42. According to this exercise, if RK ∗ ∩ N (A ) is a
subspace of the lineality space LK ∗ of K ∗ , then

cl(A K ∗ ) = A K ∗ .

Thus, it suffices to verify that RK ∗ ∩ N (A ) is a subspace of LK ∗ . Indeed, we


will show that RK ∗ ∩ N (A ) = LK ∗ ∩ N (A ).
Let y ∈ K ∗ ∩ N (A ). Because y ∈ K ∗ , we obtain

(−y) x ≥ 0, ∀ x ∈ K. (3.8)

For y ∈ N (A ), we have −y ∈ N (A ) and since N (A ) = R(A)⊥ , it follows that

(−y) z = 0, ∀ z ∈ R(A). (3.9)

In view of the relation ri(K) ∩ R(A) = Ø, and Eqs. (3.8) and (3.9), the linear
function (−y) x attains its minimum over the convex set K at a point in the
relative interior of K, implying that (−y) x = 0 for all x ∈ K (cf. Prop. 1.4.2).
Hence (−y) ∈ K ∗ , so that y ∈ LK ∗ and because y ∈ N (A ), we see that y ∈
LK ∗ ∩N (A ). The reverse inclusion follows directly from the relation LK ∗ ⊂ RK ∗ ,
thus completing the proof.

3.6 (Pointed Cones and Bases)

(a) ⇒ (b) Since C is a pointed cone, C ∩ (−C) = {0}, so that


 ∗
C ∩ (−C) = n .

On the other hand, by Exercise 3.4, it follows that


 ∗
C ∩ (−C) = cl(C ∗ − C ∗ ),

8
which when combined with the preceding relation yields cl(C ∗ − C ∗ ) = n .
(b) ⇒ (c) Since C is a closed convex cone, by the polar cone operations of Exercise
3.4, it follows that
 ∗
C ∩ (−C) = cl(C ∗ − C ∗ ) = n .

By taking the polars and using the Polar Cone Theorem (Prop. 3.1.1), we obtain
  ∗ ∗
C ∩ (−C) = C ∩ (−C) = {0}. (3.10)

Now, to arrive at a contradiction assume that there is a vector x̂ ∈ n such that


x̂ ∈ C ∗ − C ∗ . Then, by the Separating Hyperplane Theorem (Prop. 2.4.2), there
exists a nonzero vector a ∈ n such that

a x̂ ≥ a x, ∀ x ∈ C∗ − C∗.

If a x > 0 for some x ∈ C ∗ − C ∗ , then since C ∗ − C ∗ is a cone, the right hand-side


of the preceding relation can be arbitrarily large, a contradiction. Thus, we have
a x ≤ 0 for all x ∈ C ∗ − C ∗ , implying that a ∈ (C ∗ − C ∗ )∗ . By the polar cone
operations of Exercise 3.4(b) and the Polar Cone Theorem, it follows that

(C ∗ − C ∗ )∗ = (C ∗ )∗ ∩ (−C ∗ )∗ = C ∩ (−C).

Thus, a ∈ C ∩ (−C) with a = 0, contradicting Eq. (3.10). Hence, we must have


C ∗ − C ∗ = n .
(c) ⇒ (d) Because C ∗ ⊂ aff(C ∗ ) and −C ∗ ⊂ aff(C ∗ ), we have C ∗ − C ∗ ⊂ aff(C ∗ )
and since C ∗ − C ∗ = n , it follows that aff(C ∗ ) = n , showing that C ∗ has
nonempty interior.
(d) ⇒ (e) Let v be a vector in the interior of C ∗ . Then, there exists a positive
y
scalar δ such that the vector v + δ y is in C ∗ for all y ∈ n with y = 0, i.e.,

 
y
v+δ x ≤ 0, ∀ x ∈ C, ∀ y ∈ n , y = 0.
y

By taking y = x, it follows that


 
x
v+δ x ≤ 0, ∀ x ∈ C, x = 0,
x

implying that
v  x + δx ≤ 0, ∀ x ∈ C, x = 0.

Clearly, this relation holds for x = 0, so that

v  x ≤ −δx, ∀ x ∈ C.

9
Multiplying the preceding relation with −1 and letting x̂ = −v, we obtain

x̂ x ≥ δx, ∀ x ∈ C.

(e) ⇒ (f) Let  


D = y ∈ C | x̂ y = 1 .

Then, D is a closed convex set since it is the intersection of the closed convex
cone C and the closed convex set {y | x̂ y = 1}. Obviously, 0 ∈ D. Thus, to show
that D is a base for C, it remains to prove that C = cone(D). Take any x ∈ C.
If x = 0, then x ∈ cone(D) and we are done, so assume that x = 0. We have by
hypothesis
x̂ x ≥ δx > 0, ∀ x ∈ C, x = 0,

so we may define ŷ = x̂x x . Clearly, ŷ ∈ D and x = (x̂ x)ŷ with x̂ x > 0,
showing that x ∈ cone(D) and that C ⊂ cone(D). Since D ⊂ C, the inclusion
cone(D) ⊂ C is obvious. Thus, C = cone(D) and D is a base for C. Furthermore,
for every y in D, since y is also in C, we have

1 = x̂ y ≥ δy,

showing that D is bounded and completing the proof.


(f) ⇒ (a) Since C has a bounded base, C = cone(D) for some bounded convex
set D with 0 ∈ cl(D). To arrive at a contradiction, we assume that the cone C is
not pointed, so that there exists a nonzero vector d ∈ C ∩ (−C), implying that d
and −d are in C. Let {λk } be a sequence of positive scalars. Since λk d ∈ C for
all k and D is a base for C, there exist a sequence {µk } of positive scalars and a
sequence {yk } of vectors in D such that

λk d = µ k y k , ∀ k.

λ 
Therefore, yk = µk d ∈ D for all k and because D is bounded, the sequence yk }
k
has a subsequence converging to some y ∈ cl(D). Without loss of generality, we
λ
may assume that yk → y, which in view of yk = µk d for all k, implies that y = αd
k
and αd ∈ cl(D) for some α ≥ 0. Furthermore, by the definition of base, we have
0 ∈ cl(D), so that α > 0. Similar to the preceding, by replacing d with −d, we
can show that α̃(−d) ∈ cl(D) for some positive scalar α̃. Therefore, αd ∈ cl(D)
and α̃(−d) ∈ cl(D) with α > 0 and α̃ > 0. Since D is convex, its closure cl(D)
is also convex, implying that 0 ∈ cl(D), contradicting the definition of a base.
Hence, the cone C must be pointed.

3.7

Let the closed convex cone C be polyhedral, and of the form


 
C = x | aj x ≤ 0, j = 1, . . . , r ,

10
for some vectors aj in n . By Farkas’ Lemma [Prop. 3.2.1(b)], we have
 
C ∗ = cone {a1 , . . . , ar } ,

so the polar cone of a polyhedral cone is finitely generated. Conversely, using the
Polar Cone Theorem, we have
 ∗  
cone {a1 , . . . , ar } = x | aj x ≤ 0, j = 1, . . . , r ,

so the polar of a finitely generated cone is polyhedral. Thus, a closed convex cone
is polyhedral if and only if its polar cone is finitely generated. By the Minkowski-
Weyl Theorem [Prop. 3.2.1(c)], a cone is finitely generated if and only if it is
polyhedral. Therefore, a closed convex cone is polyhedral if and only if its polar
cone is polyhedral.

3.8

(a) We first show that C is a subset of RP , the recession cone of P . Let y ∈ C,


m
and choose any α ≥ 0 and x ∈ P of the form x = j=1 µj vj . Since C is a cone,
αy ∈ C, so that x + αy ∈ P for all α ≥ 0. It follows that y ∈ RP . Hence C ⊂ RP .
Conversely, to show that RP ⊂ C, let y ∈ RP and take any x ∈ P . Then 
x + ky ∈ P for all k ≥ 1. Since P = V + C, where V = conv {v1 , . . . , vm } , it
follows that
x + ky = v k + y k , ∀ k ≥ 1,

with v k ∈ V and y k ∈ C for all k ≥ 1. Because V is compact, the sequence


{v k } has a limit point v ∈ V , and without loss of generality, we may assume that
v k → v. Then
lim ky − y k  = lim v k − x = v − x,
k→∞ k→∞

implying that
lim y − (1/k)y k = 0.
k→∞
 
Therefore, the sequence (1/k)y k converges to y. Since y k ∈ C for all k ≥ 1,
 
the sequence (1/k)y k is in C, and by the closedness of C, it follows that y ∈ C.
Hence, RP ⊂ C.
 
(b) Any point in P has the form v + y with v ∈ conv {v1 , . . . , vm } and y ∈ C,
or equivalently
1 1
v + y = v + (v + 2y),
2 2
with v and v + 2y being twodistinct points  in P if y = 0. Therefore, none of the
points v + y, with v ∈ conv {v1 , . . . , vm } and y ∈ C, is an extreme point of P
if y = 0. Hence, an extreme point of P must be in the set {v1 , . . . , vm }. Since
by definition, an extreme point of P is not a convex combination of points in P ,
an extreme point of P must be equal to some vi that cannot be expressed as a
convex combination of the remaining vectors vj , j = i.

11
3.9 (Polyhedral Cones and Sets under Linear Transformations)

(a) Let A be an m × n matrix and let C be a polyhedral cone in n . By the


Minkowski-Weyl Theorem [Prop. 3.2.1(c)], C is finitely generated, so that


r

C= x x= µj aj , µj ≥ 0, j = 1, . . . , r ,
j=1

for some vectors a1 , . . . , ar in n . The image of C under A is given by




r

AC = {y | y = Ax, x ∈ C} = y y= µj Aaj , µj ≥ 0, j = 1, . . . , r ,
j=1

showing that AC is a finitely generated cone in m . By the Minkowski-Weyl


Theorem, the cone AC is polyhedral.
Let now K be a polyhedral cone in m given by
 
K = y | dj y ≤ 0, j = 1, . . . , r ,

for some vectors d1 , . . . , dr in m . Then, the inverse image of K under A is

A−1 · K = {x | Ax ∈ K}
 
= x | dj Ax ≤ 0, j = 1, . . . , r
 
= x | (A dj ) x ≤ 0, j = 1, . . . , r ,

showing that A−1 · K is a polyhedral cone in n .


(b) Let P be a polyhedral set in n with Minkowski-Weyl Representation


m

m

P = x x= µj vj + y, µj = 1, µj ≥ 0, j = 1, . . . , m, y ∈ C ,
j=1 j=1

where v1 , . . . , vm are some vectors in n and C is a finitely generated cone in n


(cf. Prop. 3.2.2). The image of P under A is given by

AP = {z | z = Ax, x ∈ P }


m 
m

= z z= µj Avj + Ay, µj = 1, µj ≥ 0, j = 1, . . . , m, Ay ∈ AC .
j=1 j=1

By setting Avj = wj and Ay = u, we obtain




m

m

AP = z z= µj wj + u, µj = 1, µj ≥ 0, j = 1, . . . , m, u ∈ AC
j=1 j=1
 
= conv {w1 , . . . , wm } + AC,

12
where w1 , . . . , wm ∈ m . By part (a), the cone AC is polyhedral, implying by the
Minkowski-Weyl Theorem [Prop. 3.2.1(c)] that AC is finitely generated. Hence,
the set AP has a Minkowski-Weyl representation and therefore, it is polyhedral
(cf. Prop. 3.2.2).
Let also Q be a polyhedral set in m given by
 
Q = y | dj y ≤ bj , j = 1, . . . , r ,

for some vectors d1 , . . . , dr in m . Then, the inverse image of Q under A is

A−1 · Q = {x | Ax ∈ Q}
 
= x | dj Ax ≤ bj , j = 1, . . . , r
 
= x | (A dj ) x ≤ bj , j = 1, . . . , r ,

showing that A−1 · Q is a polyhedral set in n .

3.10

It suffices to show the assertions for m = 2.


(a) Let C1 and C2 be polyhedral cones in n1 and n2 , respectively, given by
 
C1 = x1 ∈ n1 | aj x1 ≤ 0, j = 1, . . . , r1 ,
 
C2 = x2 ∈ n2 | ãj x2 ≤ 0, j = 1, . . . , r2 ,

where a1 , . . . , ar1 and ã1 , . . . , ãr2 are some vectors in n1 and n2 , respectively.
Define
aj = (aj , 0), ∀ j = 1, . . . , r1 ,

aj = (0, ãj ), ∀ j = r1 + 1, . . . , r1 + r2 .

We have (x1 , x2 ) ∈ C1 × C2 if and only if

aj x1 ≤ 0, ∀ j = 1, . . . , r1 ,

ãj x2 ≤ 0, ∀ j = r1 + 1, . . . , r1 + r2 ,

or equivalently
aj (x1 , x2 ) ≤ 0, ∀ j = 1, . . . , r1 + r2 .

Therefore,
 
C1 × C2 = x ∈ n1 +n2 | aj x ≤ 0, j = 1, . . . , r1 + r2 ,

showing that C1 × C2 is a polyhedral cone in n1 +n2 .


(b) Let C1 and C2 be polyhedral cones in n . Then, straightforwardly from the
definition of a polyhedral cone, it follows that the cone C1 ∩ C2 is polyhedral.

13
By part (a), the Cartesian product C1 × C2 is a polyhedral cone in n+n .
Under the linear transformation A that maps (x1 , x2 ) ∈ n+n into x1 + x2 ∈
n , the image A · (C1 × C2 ) is the set C1 + C2 , which is a polyhedral cone by
Exercise 3.9(a).
(c) Let P1 and P2 be polyhedral sets in n1 and n2 , respectively, given by
 
P1 = x1 ∈ n1 | aj x1 ≤ bj , j = 1, . . . , r1 ,
 
P2 = x2 ∈ n2 | ãj x2 ≤ b̃j , j = 1, . . . , r2 ,
where a1 , . . . , ar1 and ã1 , . . . , ãr2 are some vectors in n1 and n2 , respectively,
and bj and b̃j are some scalars. By defining

aj = (aj , 0), b j = bj , ∀ j = 1, . . . , r1 ,

aj = (0, ãj ), bj = b̃j , ∀ j = r1 + 1, . . . , r1 + r2 ,


similar to the proof of part (a), we see that
 
P1 × P2 = x ∈ n1 +n2 | aj x ≤ bj , j = 1, . . . , r1 + r2 ,

showing that P1 × P2 is a polyhedral set in n1 +n2 .


(d) Let P1 and P2 be polyhedral sets in n . Then, using the definition of a
polyhedral set, it follows that the set P1 ∩ P2 is polyhedral.
By part (c), the set P1 × P2 is polyhedral. Furthermore, under the linear
transformation A that maps (x1 , x2 ) ∈ n+n into x1 + x2 ∈ n , the image
A · (P1 × P2 ) is the set P1 + P2 , which is polyhedral by Exercise 3.9(b).

3.11

We give two proofs. The first is based on the Minkowski-Weyl Representation of


a polyhedral set P (cf. Prop. 3.2.2), while the second is based on a representation
of P by a system of linear inequalities.
Let P be a polyhedral set with Minkowski-Weyl representation


m

m

P = x x= µj vj + y, µj = 1, µj ≥ 0, j = 1, . . . , m, y ∈ C ,
j=1 j=1

where v1 , . . . , vm are some vectors in n and C is a finitely generated cone in n .


Let C be given by


r

C= y y= λi ai , λi ≥ 0, i = 1, . . . , r ,
i=1

where a1 , . . . , ar are some vectors in n , so that




m

r

m

P = x x= µj vj + λi ai , µj = 1, µj ≥ 0, ∀ j, λi ≥ 0, ∀ i .
j=1 i=1 j=1

14
We claim that
 
cone(P ) = cone {v1 , . . . , vm , a1 , . . . , ar } .
 
Since P ⊂ cone {v1 , . . . , vm , a1 , . . . , ar } , it follows that
 
cone(P ) ⊂ cone {v1 , . . . , vm , a1 , . . . , ar } .
 
Conversely, let y ∈ cone {v1 , . . . , vm , a1 , . . . , ar } . Then, we have


m

r

y= µj vj + λi a i ,
j=1 i=1

r
with µj ≥ 0 and λi ≥ 0 for all i and j. If µj = 0 for all j, then y = i=1 λi ai ∈ C,
and since C = RP (cf. Exercise 3.8), it follows that y ∈ RP . Because the origin
belongs to P and y ∈ RP , we have 0 + y ∈ P , implying that y ∈ P , and
m
consequently y ∈ cone(P ). If µj > 0 for some j, then by setting µ = µ ,
j=1 j
µj = µj /µ for all j, and λi = λi /µ for all i, we obtain
m 
 
r

y=µ µj vj + λi a i ,
j=1 i=1

m
where µ > 0, µj ≥ 0 with µ = 1, and λi ≥ 0. Therefore y = µ x with
j=1 j
x ∈ P and µ > 0, implying that y ∈ cone(P ) and showing that
 
cone {v1 , . . . , vm , a1 , . . . , ar } ⊂ cone(P ).

We now give an alternative proof using the representation of P by a system


of linear inequalities. Let P be given by
 
P = x | aj x ≤ bj , j = 1, . . . , r ,

where a1 , . . . , ar are vectors in n and b1 , . . . , br are scalars. Since P contains


the origin, it follows that bj ≥ 0 for all j. Define the index set J as follows

J = {j | bj = 0}.

We consider separately the two cases where J = Ø and J = Ø. If J = Ø,


then we will show that
 
cone(P ) = x | aj x ≤ 0, j ∈ J .
 
To see this, note that since P ⊂ x | aj x ≤ 0, j ∈ J , we have
 
cone(P ) ⊂ x | aj x ≤ 0, j ∈ J .

15
 
Conversely, let x ∈ x | aj x ≤ 0, j ∈ J . We will show that x ∈ cone(P ).
If x ∈ P , then x ∈ cone(P ) and we are done, so assume that x ∈ P , implying
that the set
J = {j ∈ J | aj x > bj } (3.11)

is nonempty. By the definition of J, we have bj > 0 for all j ∈ J, so let

bj
µ = min ,
j∈J aj x

and note that 0 < µ < 1. We have

aj (µx) ≤ 0, ∀ j ∈ J,

aj (µx) ≤ bj , ∀ j ∈ J.

For j ∈ J ∪ J and aj x ≤ 0 < bj , since µ > 0, we still have aj (µx) ≤ 0 < bj . For
j ∈ J ∪ J and 0 < aj x ≤ bj , since µ < 1, we have 0 < aj (µx) < bj . Therefore,
µx ∈ P , implying that x = µ1 (µx) ∈ cone(P ). It follows that

 
x | aj x ≤ 0, j ∈ J ⊂ cone(P ),

 
and hence, cone(P ) = x | aj x ≤ 0, j ∈ J .
If J = Ø, then we will show that cone(P ) = n . To see this, take any
x ∈  . If x ∈ P , then clearly x ∈ cone(P ), so assume that x ∈ P , implying that
n

the set J as defined in Eq. (3.11) is nonempty. Note that bj > 0 for all j, since
J is empty. The rest of the proof is similar to the preceding case.
As an example, where cone(P ) is not polyhedral when P does not contain
the origin, consider the polyhedral set P ⊂ 2 given by
 
P = (x1 , x2 ) | x1 ≥ 0, x2 = 1 .

Then, we have
   
cone(P ) = (x1 , x2 ) | x1 > 0, x2 > 0 ∪ (x1 , x2 ) | x1 = 0, x2 ≥ 0 ,

which is not closed and therefore not polyhedral.

3.12 (Properties of Polyhedral Functions)

(a) Let f1 and f2 be polyhedral functions such that dom(f1 ) ∩ dom(f2 ) = Ø. By


Prop. 3.2.3, dom(f1 ) and dom(f2 ) are polyhedral sets in n , and
 
f1 (x) = max a1 x + b1 , . . . , am x + bm , ∀ x ∈ dom(f1 ),

 
f2 (x) = max a1 x + b1 , . . . , am x + bm , ∀ x ∈ dom(f2 ),

16
where ai and ai are vectors in n , and bi and bi are scalars. The domain of
f1 +f2 coincides with dom(f1 )∩dom(f2 ), which is polyhedral by Exercise 3.10(d).
Furthermore, we have for all x ∈ dom(f1 + f2 ),
   
f1 (x) + f2 (x) = max a1 x + b1 , . . . , am x + bm + max a1 x + b1 , . . . , am x + bm
 
= max ai x + bi + aj x + bj
1≤i≤m, 1≤j≤m
 
= max (ai + aj ) x + (bi + bj ) .
1≤i≤m, 1≤j≤m

Therefore, by Prop. 3.2.3, the function f1 + f2 is polyhedral.


(b) Since g : m → (−∞, ∞] is a polyhedral function, by Prop. 3.2.3, dom(g) is
a polyhedral set in m and g is given by
 
g(y) = max a1 y + b1 , . . . , am y + bm , ∀ y ∈ dom(g),
for some vectors ai in m and scalars bi . The domain of f can be expressed as
     
dom(f ) = x | f (x) < ∞ = x | g(Ax) < ∞ = x | Ax ∈ dom(g) .
Thus, dom(f ) is the inverse image of the polyhedral set dom(g) under the linear
transformation A. By the assumption that dom(g) contains a point in the range
of A, it follows that dom(f ) is nonempty, while by Exercise 3.9(b), the set dom(f )
is polyhedral. Furthermore, for all x ∈ dom(f ), we have
f (x) = g(Ax)
 
= max a1 Ax + b1 , . . . , am Ax + bm
 
= max (A a1 ) x + b1 , . . . , (A am ) x + bm .
Thus, by Prop. 3.2.3, it follows that the function f is polyhedral.

3.13 (Partial Minimization of Polyhedral Functions)

As shown at the end of Section 2.3, we have


    
P epi(F ) ⊂ epi(f ) ⊂ cl P epi(F ) .

Since the function F is polyhedral, its epigraph


 
epi(F ) = (x, z, w) | F (x, z) ≤ w, (x, w) ∈ dom(F )
 
is a polyhedral set in n+m+1 . The set P epi(F ) is the image of the polyhedral
set epi(F ) under the linear transformation P , and therefore, by Exercise 3.9(b),
the set P epi(F ) is polyhedral. Furthermore, a polyhedral set is always closed,
and hence
    
P epi(F ) = cl P epi(F ) .

The preceding two relations yield


 
epi(f ) = P epi(F ) ,
implying that the function f is polyhedral.

17
3.14 (Existence of Minima of Polyhedral Functions)

If the set of minima of f over P is nonempty, then evidently inf x∈P f (x) must
be finite.
Conversely, suppose that inf x∈P f (x) is finite. Since f is a polyhedral
function, by Prop. 3.2.3, we have
 
f (x) = max a1 x + b1 , . . . , am x + bm , ∀ x ∈ dom(f ),

where dom(f ) is a polyhedral set. Therefore,


 
inf f (x) = inf f (x) = inf max a1 x + b1 , . . . , am x + bm .
x∈P x∈P ∩dom(f ) x∈P ∩dom(f )

Let P = P ∩ dom(f ) and note that P is nonempty by assumption. Since P is the


intersection of the polyhedral sets P and dom(f ), the set P is polyhedral. The
problem  
minimize max a1 x + b1 , . . . , am x + bm
subject to x ∈ P
is equivalent to the following linear program

minimize y
subject to aj x + bj ≤ y, j = 1, . . . , m, x ∈ P , y ∈ .

By introducing the variable z = (x, y) ∈ n+1 , the vector c = (0, . . . , 0, 1) ∈


n+1 , and the set
 
P̂ = (x, y) | aj x + bj ≤ y, j = 1, . . . , m, x ∈ P , y ∈  ,

we see that the original problem is equivalent to

minimize c z
subject to z ∈ P̂ ,

where P̂ is polyhedral (P̂ = Ø since P = Ø). Furthermore, because inf x∈P f (x)
is finite, it follows that inf z∈P̂ c z is also finite. Thus, by Prop. 2.3.4 of Chapter
2, the set Z ∗ of minimizers
  of c z over P̂ is nonempty, and the nonempty set

x | z = (x, y), z ∈ Z is the set of minimizers of f over P .

3.15 (Existence of Solutions of Quadratic Nonconvex Programs


[FrW56])

We use induction on the dimension of the set X. Suppose that the dimension
of X is 0. Then, X consists of a single point, which is the global minimum of f
over X.

18
Assume that, for some l < n, f attains its minimum over every set X of
dimension less than or equal to l that is specified by linear inequality constraints,
and is such that f is bounded over X. Let X be of the form

X = {x | aj x ≤ bj , j = 1, . . . , r},

have dimension l + 1, and be such that f is bounded over X. We will show that
f attains its minimum over X.
If X is a bounded polyhedral set, f attains a minimum over X by Weier-
strass’ Theorem. We thus assume that X is unbounded. Using the the Minkowski-
Weyl representation, we can write X as

X = {x | x = v + αy, v ∈ V, y ∈ C, α ≥ 0},

where V is the convex hull of finitely many vectors and C is the intersection of a
finitely generated cone with the surface of the unit sphere {x | x = 1}. Then,
for any x ∈ X and y ∈ C, the vector x + αy belongs to X for every positive scalar
α and
f (x + αy) = f (x) + α(c + x Q)y + α2 y  Qy.

In view of the assumption that f is bounded over X, this implies that y  Qy ≥ 0


for all y ∈ C.
If y  Qy > 0 for all y ∈ C, then, since C and V are compact, there exist
some δ > 0 and γ > 0 such that y  Qy > δ for all y ∈ C, and (c + v  Q)y > −γ for
all v ∈ V and y ∈ C. It follows that for all v ∈ V , y ∈ C, and α ≥ γ/δ, we have

f (v + αy)= f (v) + α(c + v  Q)y + α2 y  Qy


> f (v) + α(−γ + αδ)
≥ f (v),

which implies that


inf f (x) = inf f (x).
x∈X x∈(V +αC)
γ
0≤α≤
δ

Since the minimization in the right hand side is over a compact set, it follows
from Weierstrass’ Theorem and the preceding relation that the minimum of f
over X is attained.
Next, assume that there exists some y ∈ C such that y  Qy = 0. From
Exercise 3.8, it follows that y belongs to the recession cone of X, denoted by RX .
If y is in the lineality space of X, denoted by LX , the vector x + αy belongs to
X for every x ∈ X and every scalar α, and we have

f (x + αy) = f (x) + α(c + x Q)y.

This relation together with the boundedness of f over X implies that

(c + x Q)y = 0, ∀ x ∈ X. (3.12)

19
Let S = {γy | γ ∈ } be the subspace generated by y and consider the following
decomposition of X:
X = S + (X ∩ S ⊥ ),
(cf. Prop. 1.5.4). Then, we can write any x ∈ X as x = z+αy for some z ∈ X ∩S ⊥
and some scalar α, and it follows from Eq. (3.12) that f (x) = f (z), which implies
that
inf f (x) = inf f (x).
x∈X x∈X∩S ⊥

It can be seen that the dimension of set X ∩ S ⊥ is smaller than the dimension
of set X. To see this, note that S ⊥ contains the subspace parallel to the affine
hull of X ∩ S ⊥ . Therefore, y does not belong to the subspace parallel to the
affine hull of X ∩ S ⊥ . On the other hand, y belongs to the subspace parallel to
the affine hull of X, hence showing that the dimension of set X ∩ S ⊥ is smaller
than the dimension of set X. Since X ∩ S ⊥ ⊂ X, f is bounded over X ∩ S ⊥ ,
so by using the induction hypothesis, it follows that f attains its minimum over
X ∩ S ⊥ , which, in view of the preceding relation, is also the minimum of f over
X.
Finally, assume that y is not in LX , i.e., y ∈ RX , but −y ∈ / RX . The
recession cone of X is of the form

RX = {y | aj y ≤ 0, j = 1, . . . , r}.

Since y ∈ RX , we have

aj y ≤ 0, ∀ j = 1, . . . , r,

and since −y ∈
/ RX , the index set

J = {j | aj y < 0}

is nonempty.
Let {xk } be a minimizing sequence, i.e.,

f (xk ) → f ∗ ,

where f ∗ = inf x inX f (x). Suppose that for each k, we start at xk and move
along −y as far as possible without leaving the set X, up to the point where we
encounter the vector
xk = xk − βk y,
where βk is the nonnegative scalar given by

aj xk − bj
βk = min .
j∈J aj y

Since y ∈ RX and f is bounded over X, we have (c + x Q)y ≥ 0 for all x ∈ X,


which implies that
f (xk ) ≤ f (xk ), ∀ k. (3.13)

20
By construction of the sequence {xk }, it follows that there exists some j0 ∈ J such
that aj0 xk = bj0 for all k in an infinite index set K ⊂ {0, 1, . . .}. By reordering
the linear inequalities if necessary, we can assume that j0 = 1, i.e.,

a1 xk = b1 , ∀ k ∈ K.

To apply the induction hypothesis, consider the set

X = {x | a1 x = b1 , aj x ≤ bj , j = 2, . . . , r},

and note that {xk }K ⊂ X. The dimension of X is smaller than the dimension
of X. To see this, note that the set {x | a1 x = b1 } contains X, so that a1 is
orthogonal to the subspace SX that is parallel to aff(X). Since a1 y < 0, it follows
that y ∈
/ SX . On the other hand, y belongs to SX , the subspace that is parallel
to aff(X), since for all k, we have xk ∈ X and xk − βk y ∈ X.
Since X ⊂ X, f is also bounded over X, so it follows from the induction
hypothesis that f attains its minimum over X at some x∗ . Because {xk }K ⊂ X,
and using also Eq. (3.13), we have

f (x∗ ) ≤ f (xk ) ≤ f (xk ), ∀ k ∈ K.

Since f (xk ) → f ∗ , we obtain

f (x∗ ) ≤ lim f (xk ) = f ∗ ,


k→∞, k∈K

and since x∗ ∈ X ⊂ X, this implies that f attains the minimum over X at x∗ ,


concluding the proof.

3.16

Assume that P has an extreme point, say v. Then, by Prop. 3.3.3(a), the set
 
Av = aj | aj v = bj , j ∈ {1, . . . , r}

contains n linearly independent vectors, so the set of vectors {aj | j = 1, . . . , r}


contains a subset of n linearly independent vectors.
Assume now that the set {aj | j = 1, . . . , r} contains a subset of n linearly
independent vectors. Suppose, to obtain a contradiction, that P does not have
any extreme points. Then, by Prop. 3.3.1, P contains a line

L = {x + λd | λ ∈ },

where x ∈ P and d ∈ n is a nonzero vector. Since L ⊂ P , it follows that aj d = 0


for all j = 1, . . . , r. Since d = 0, this implies that the set {a1 , . . . , ar } cannot
contain a subset of n linearly independent vectors, a contradiction.

21
3.17

Suppose that x is not an extreme point of C. Then x = αx1 + (1 − α)x2 for


some x1 , x2 ∈ C with x1 = x and x2 = x, and a scalar α ∈ (0, 1), so that
Ax = αAx1 + (1 − α)Ax2 . Since the columns of A are linearly independent, we
have Ay1 = Ay2 if and only if y1 = y2 . Therefore, Ax1 = Ax and Ax2 = Ax,
implying that Ax is a convex combination of two distinct points in AC, i.e., Ax
is not an extreme point of AC.
Suppose now that Ax is not an extreme point of AC, so that Ax = αAx1 +
(1 − α)Ax2 for some x1 , x2 ∈ C with Ax1 = Ax and Ax2 = Ax, and a scalar
α ∈ (0, 1). Then, A x − αx1 − (1 − α)x2 = 0 and since the columns of A are
linearly independent, it follows that x = αx1 − (1 − α)x2 . Furthermore, because
Ax1 = Ax and Ax2 = Ax, we must have x1 = x and x2 = x, implying that x is
not an extreme point of C.
As an example showing that if the columns of A are linearly dependent,
then Ax can be an extreme point of AC, for some non-extreme point x of C,
consider the 1 × 2 matrix A = [1 0], whose columns are linearly dependent. The
polyhedral set C given by
 
C = (x1 , x2 ) | x1 ≥ 0, 0 ≤ x2 ≤ 1

has two extreme points, (0,0) and (0,1). Its image AC ⊂  is given by

AC = {x1 | x1 ≥ 0},

whose unique extreme point is x1 = 0. The point x = (0, 1/2) ∈ C is not an


extreme point of C, while its image Ax = 0 is an extreme point of AC. Actually,
all the points in C on the line segment connecting (0,0) and (0,1), except for
(0,0) and (0,1), are non-extreme points of C that are mapped under A into the
extreme point 0 of AC.

3.18

For the sets C1 and C2 as given in this exercise, the set C1 ∪ C2 is compact, and
its convex hull is also compact by Prop. 1.3.2 of Chapter 1. The set of extreme
points of conv(C1 ∪ C2 ) is not closed, since it consists of the two end points of the
line segment C1 , namely (0, 0, −1) and (0, 0, 1), and all the points x = (x1 , x2 , x3 )
such that
x = 0, (x1 − 1)2 + x22 = 1, x3 = 0.

3.19

By Prop. 3.3.2, a polyhedral set has a finite number of extreme points. Con-
versely, let P be a compact convex set having a finite number of extreme points
{v1 , . . . , vm }. By the Krein-Milman Theorem (Prop. 3.3.1), a compact
 convex set 
is equal to the convex hull of its extreme points, so that P = conv {v1 , . . . , vm } ,

22
which is a polyhedral set by the Minkowski-Weyl Representation Theorem (Prop.
3.2.2).
As an example showing that the assertion fails if compactness of the set
is replaced by a weaker assumption that the set is closed and contains no lines,
consider the set D ⊂ 3 given by
 
D = (x1 , x2 , x3 ) | x21 + x22 ≤ 1, x3 = 1 .

Let C = cone(D). It can seen that C is not a polyhedral set. On the other hand,
C is closed, convex, does not contain a line, and has a unique extreme point at
the origin.
[For a more formal argument, note that if C were polyhedral, then the set
 
D = C ∩ (x1 , x2 , x3 ) | x3 = 1

would also be polyhedral by Exercise 3.10(d), since both C and (x1 , x2 , x3 ) |

x3 = 1 are polyhedral sets. Thus, by Prop. 3.2.2, it would follow that D has a
finite number of extreme points. But  this is a contradiction because the set of
extreme points of D coincides with (x1 , x2 , x3 ) | x21 + x22 = 1, x3 = 1 , which
contains an infinite number of points. Thus, C is not a polyhedral cone, and
therefore not a polyhedral set, while C is closed, convex, does not contain a line,
and has a unique extreme point at the origin.]

3.20 (Faces)

(a) Let P be a polyhedral set in n , and let F = P ∩ H be a face of P , where


H is a hyperplane passing through some boundary point x of P and containing
P in one of its halfspaces. Then H is given by H = {x | a x = a x} for some
nonzero vector a ∈ n . By replacing a x = a x with two inequalities a x ≤ a x
and −a x ≤ −a x, we see that H is a polyhedral set in n . Since the intersection
of two nondisjoint polyhedral sets is a polyhedral set [cf. Exercise 3.10(d)], the
set F = P ∩ H is polyhedral.
(b) Let P be given by
 
P = x | aj x ≤ bj , j = 1, . . . , r ,

for some vectors aj ∈ n and scalars bj . Let v be an extreme point of P , and


without loss of generality assume that the first n inequalities define v, i.e., the
first n of the vectors aj are linearly independent and such that

aj v = bj , ∀ j = 1, . . . , n

[cf. Prop. 3.3.3(a)]. Define the vector a ∈ n , the scalar b, and the hyperplane
H as follows

1 1  
n n

a= aj , b= bj , H = x | a x = b .
n n
j=1 j=1

23
Then, we have
a v = b,

so that H passes through v. Moreover, for every x ∈ P , we have aj x ≤ bj for


all j, implying that a x ≤ b for all x ∈ P . Thus, H contains P in one of its
halfspaces.
We will next prove that P ∩ H = {v}. We start by showing that for every
v ∈ P ∩ H, we must have

aj v = bj , ∀ j = 1, . . . , n. (3.14)

To arrive at a contradiction, assume that aj v < bj for some v ∈ P ∩ H and j ∈


{1, . . . , n}. Without loss of generality, we can assume that the strict inequality
holds for j = 1, so that

a1 v < b1 , aj v ≤ bj , ∀ j = 2, . . . , n.

By multiplying each of the above inequalities with 1/n and by summing the
obtained inequalities, we obtain

1  1
n n

aj v < bj ,
n n
j=1 j=1

implying that a v < b, which contradicts the fact that v ∈ H. Hence, Eq. (3.14)
holds, and since the vectors a1 , . . . , an are linearly independent, it follows that
v = v, showing that P ∩ H = {v}.
As discussed in Section 3.3, every extreme point of P is a relative boundary
point of P . Since every relative boundary point of P is also a boundary point of
P , it follows that every extreme point of P is a boundary point of P . Thus, v is
a boundary point of P , and as shown earlier, H passes through v and contains P
in one of its halfspaces. By definition, it follows that P ∩ H = {v} is a face of P .
(c) Since P is not an affine set, it cannot consist of a single point, so we must
have dim(P ) > 0. Let P be given by
 
P = x | aj x ≤ bj , j = 1, . . . , r ,

for some vectors aj ∈ n and scalars bj . Also, let A be the matrix with rows aj
and b be the vector with components bj , so that

P = {x | Ax ≤ b}.

An inequality aj x ≤ bj of the system Ax ≤ b is redundant if it is implied by the


remaining inequalities in the system. If the system Ax ≤ b has no redundant
inequalities, we say that the system is nonredundant. An inequality aj x ≤ bj of
the system Ax ≤ b is an implicit equality if aj x = bj for all x satisfying Ax ≤ b.
By removing the redundant inequalities if necessary, we may assume that
the system Ax ≤ b defining P is nonredundant. Since P is not an affine set,

24
there exists an inequality aj0 x ≤ bj0 that is not an implicit equality of the
system Ax ≤ b. Consider the set
 
F = x ∈ P | aj0 x = bj0 .

Note that F = Ø, since otherwise aj0 x ≤ bj0 would be a redundant inequality


of the system Ax ≤ b, contradicting our earlier assumption that the system is
nonredundant. Note also that every point ofF is a boundary  point of P . Thus, F
is the intersection of P and the hyperplane x | aj0 x = bj0 that passes through
a boundary point of P and contains P in one of its halfspaces, i.e., F is a face
of P . Since aj0 x ≤ bj0 is not an implicit equality of the system Ax ≤ b, the
dimension of F is dim(P ) − 1.
(d) Let P be a polyhedral set given by
 
P = x | aj x ≤ bj , j = 1, . . . , r ,

with aj ∈ n and bj ∈ , or equivalently

P = {x | Ax ≤ b},

where A is an r × n matrix and b ∈ r . We will show that F is a face of P if and


only if F is nonempty and
 
F = x ∈ P | aj x = bj , j ∈ J ,

where J ⊂ {1, . . . , r}. From this it will follow that the number of distinct faces
of P is finite.
By removing the redundant inequalities if necessary, we may assume that
the system Ax ≤ b defining P is nonredundant. Let F be a face of P , so that
F = P ∩ H, where H is a hyperplane that passes through a boundary
 point of P
and contains P in one of its halfspaces. Let H = x | c x = cx for a nonzero
vector c ∈ n and a boundary point x of P , so that
 
F = x ∈ P | c x = cx

and
c x ≤ cx, ∀ x ∈ P.
These relations imply that the set of points x such that Ax ≤ b and c x ≤ cx
coincides with P , and since the system Ax ≤ b is nonredundant, it follows that
c x ≤ cx is a redundant inequality of the system Ax ≤ b and c x ≤ cx. Therefore,
the inequality c x ≤ cx is implied by the inequalities of Ax ≤ b, so that there
exists some µ ∈ r with µ ≥ 0 such that


r

r

µj aj = c, µj bj = c x.
j=1 j=1

Let J = {j | µj > 0}. Then, for every x ∈ P , we have


 
c x = cx ⇐⇒ µj aj x = µ j bj ⇐⇒ aj x = bj , j ∈ J, (3.15)
j∈J j∈J

25
implying that  
F = x ∈ P | aj x = bj , j ∈ J .
Conversely, let F be a nonempty set given by
 
F = x ∈ P | aj x = bj , j ∈ J ,

for some J ⊂ {1, . . . , r}. Define


 
c= aj , β= bj .
j∈J j∈J

Then, we have
   
x ∈ P | aj x = bj , j ∈ J = x ∈ P | c x = β ,
 
[cf. Eq. (3.15) where µj = 1 for all j ∈ J]. Let H = x | c x = β , so that in
view of the preceding relation, we have that F = P ∩ H. Since every point of F
is a boundary point of P , it follows that H passes through a boundary point of
P . Furthermore, for every x ∈ P , we have aj x ≤ bj for all j ∈ J, implying that
c x ≤ β for every x ∈ P . Thus, H contains P in one of its halfspaces. Hence, F
is a face.

3.21 (Isomorphic Polyhedral Sets)

(a) Let P and Q be isomorhic polyhedral sets, and let f : P → Q and g : Q → P


be affine functions such that
   
x = g f (x) , ∀ x ∈ P, y = f g(y) , ∀ y ∈ Q.

Assume that x∗ is an extreme point of P and let y ∗ = f (x∗ ). We will show that
y ∗ is an extreme point of Q. Since x∗ is an extreme point of P , by Exercise
3.20(b), it is also a face of P , and therefore, there exists a vector c ∈ n such
that
c x < c x∗ , ∀ x ∈ P, x = x∗ .
For any y ∈ Q with y = y ∗ , we have
 
f g(y) = y = y ∗ = f (x∗ ),

implying that
g(y) = g(y ∗ ) = x∗ , with g(y) ∈ P.
Hence,
c g(y) < c g(y ∗ ), ∀ y ∈ Q, y = y ∗ .
Let the affine function g be given by g(y) = By + d for some n × m matrix B
and vector d ∈ n . Then, we have

c (By + d) < c (By ∗ + d), ∀ y ∈ Q, y = y ∗ ,

26
implying that
(B  c) y < (B  c) y ∗ , ∀ y ∈ Q, y = y ∗ .
If y ∗ were not an extreme point of Q, then we would have y ∗ = αy1 + (1 − α)y2
for some distinct points y1 , y2 ∈ Q, y1 = y ∗ , y2 = y ∗ , and α ∈ (0, 1), so that

(B  c) y ∗ = α(B  c) y1 + (1 − α)(B  c) y2 < (B  c) y ∗ ,

which is a contradiction. Hence, y ∗ is an extreme point of Q.


Conversely, if y ∗ is an extreme point of Q, then by using a symmetrical
argument, we can show that x∗ is an extreme point of P .
(b) For the sets
P = {x ∈ n | Ax ≤ b, x ≥ 0},
 
Q = (x, z) ∈ n+r | Ax + z = b, x ≥ 0, z ≥ 0 ,
let f and g be given by

f (x) = (x, b − Ax), ∀ x ∈ P,

g(x, z) = x, ∀ (x, z) ∈ Q.
Evidently, f and g are affine functions. Furthermore, clearly
 
f (x) ∈ Q, g f (x) = x, ∀ x ∈ P,
 
g(x, z) ∈ P, f g(x, z) = x, ∀ (x, z) ∈ Q.
Hence, P and Q are isomorphic.

3.22 (Unimodularity I)

Suppose that the system Ax = b has integer components for every vector b ∈ n
with integer components. Since A is invertible, it follows that the vector A−1 b has
integer components for every b ∈ n with integer components. For i = 1, . . . , n,
let ei be the vector with ith component equal to 1 and all other components equal
to 0. Then, for b = ei , the vectors A−1 ei , i = 1, . . . , n, have integer components,
implying that the columns of A−1 are vectors with integer components, so that
A−1 has integer entries. Therefore, det(A−1 ) is integer, and since det(A) is
also integer and det(A) · det(A−1 ) = 1, it follows that either det(A) = 1 or
det(A) = −1, showing that A is unimodular.
Suppose now that A is unimodular. Take any vector b ∈ n with integer
components, and for each i ∈ {1, . . . , n}, let Ai be the matrix obtained from A
by replacing the ith column of A with b. Then, according to Cramer’s rule, the
components of the solution x̂ of the system Ax = b are given by
det(Ai )
x̂i = , i = 1, . . . , n.
det(A)
Since each matrix Ai has integer entries, it follows that det(Ai ) is integer for all
i = 1, . . . , n. Furthermore, because A is invertible and unimodular, we have either
det(A) = 1 or det(A) = −1, implying that the vector x̂ has integer components.

27
3.23 (Unimodularity II)

(a) The proof is straightforward from the definition of the totally unimodular
matrix and the fact that B is a submatrix of A if and only if B  is a submatrix
of A .
(b) Suppose that A is totally unimodular. Let J be a subset of {1, . . . , n}. Define
z by zj = 1 if j ∈ J, and zj = 0 otherwise. Also let w = Az, ci = di = 12 wi if
wi is even, and ci = 12 (wi − 1) and di = 12 (wi + 1) if wi is odd. Consider the
polyhedral set
P = {x | c ≤ Ax ≤ d, 0 ≤ x ≤ z},
and note that P = Ø because 12 z ∈ P . Since A is totally unimodular, the
polyhedron P has integer extreme points. Let x̂ ∈ P be one of them. Because
0 ≤ x̂ ≤ z and x̂ has integer components, it follows that x̂j = 0 for j ∈ J and
x̂j ∈ {0, 1} for j ∈ J. Therefore, zj − 2x̂j = ±1 for j ∈ J. Define J1 = {j ∈ J |
zj − 2x̂j = 1} and J2 = {j ∈ J | zj − 2x̂j = −1}. We have
  
aij − aij = aij (zj − 2x̂j )
j∈J1 j∈J2 j∈J


n

= aij (zj − 2x̂j )


j=1

= [Az]i − 2[Ax̂]i
= wi − 2[Ax̂]i ,

where [Ax]i denotes the ith component of the vector Ax. If wi is even, then since
ci ≤ [Ax̂]i ≤ di and ci = di = 12 wi , it follows that [Ax̂]i = wi , so that

wi − 2[Ax̂]i = 0, when wi is even.

If wi is odd, then since ci ≤ [Ax̂]i ≤ di , ci = 12 (wi − 1), and di = 1


2
(wi + 1), it
follows that
1 1
(wi − 1) ≤ [Ax̂]i ≤ (wi + 1),
2 2
implying that
−1 ≤ wi − 2[Ax̂]i ≤ 1.
Because wi − 2[Ax̂]i is integer, we conclude that

wi − 2[Ax̂]i ∈ {−1, 0, 1}, when wi is odd.

Therefore,
 
aij − aij ≤ 1, ∀ i = 1, . . . , m. (3.16)
j∈J1 j∈J2

Suppose now that the matrix A is such that any J ⊂ {1, . . . , n} can be
partitioned into two subsets so that Eq. (3.16) holds. We prove that A is totally

28
unimodular, by showing that each of its square submatrices is unimodular, i.e.,
the determinant of every square submatrix of A is -1, 0, or 1. We use induction
on the size of the square submatrices of A.
To start the induction, note that for J ⊂ {1, . . . , n} with J consisting of a
single element, from Eq. (3.16) we obtain aij ∈ {−1, 0, 1} for all i and j. Assume
now that the determinant of every (k − 1) × (k − 1) submatrix of A is -1, 0, or
1. Let B be a k × k submatrix of A. If det(B) = 0, then we are done, so assume
that B is invertible. Our objective is to prove that | det B| = 1. By Cramer’s
B∗
rule and the induction hypothesis, we have B −1 = det(B) , where b∗ij ∈ {−1, 0, 1}.
By the definition of B , we have Bb1 = det(B)e1 , where b∗1 is the first column of
∗ ∗

B ∗ and e1 = (1, 0, . . . 0) .


Let J = {j | b∗j1 = 0} and note that J = Ø since B is invertible. Let
J 1 = {j ∈ J | b∗j1 = 1} and J 2 = {j ∈ J | j ∈ J 1 }. Then, since [Bb∗1 ]i = 0 for
i = 2, . . . , k, we have


k
 
[Bb∗1 ]i = bij b∗j1 = bij − bij = 0, ∀ i = 2, . . . , k.
j=1 j∈J 1 j∈J 2

Thus, the cardinality of the set J is even, so that for any partition (J̃1 , J̃2 ) of J,
it follows that j∈J̃ bij − j∈J̃ bij is even for all i = 2, . . . , k. By assumption,
1 2
there is a partition (J1 , J2 ) of J such that

 
bij − bij ≤ 1 ∀ i = 1, . . . , k, (3.17)
j∈J1 j∈J2

implying that  
bij − bij = 0, ∀ i = 2, . . . , k. (3.18)
j∈J1 j∈J2

Consider now the value α = j∈J1


b1j − j∈J2
b1j , for which in view
of Eq. (3.17), we have either α = 0 or α = 1. Define y ∈ k by yi = 1 for
i ∈ J1 , yi = −1 for i ∈ J2 , and yi = 0 otherwise. Then, we have [By]1 = α
and by Eq. (3.18), [By]i = 0 for all i = 2, . . . , k. If α = 0, then By = 0 and
since B is invertible, it follows that y = 0, implying that J = Ø, which is a
contradiction. Hence, we must have α = 1 so that By = ±e1 . Without loss of
generality assume that By = e1 (if By = −e1 , we can replace y by −y). Then,
since Bb∗1 = det(B)e1 , we see that B b∗1 − det(B)y = 0 and since B is invertible,
we must have b∗1 = det(B)y. Because y and b∗1 are vectors with components -1,
0, or 1, it follows that b∗1 = ±y and det(B) = 1, completing the induction and
showing that A is totally unimodular.

3.24 (Unimodularity III)

(a) We show that the determinant of any square submatrix of A is -1, 0, or 1. We


prove this by induction on the size of the square submatrices of A. In particular,

29
the 1 × 1 submatrices of A are the entries of A, which are -1, 0, or 1. Suppose
that the determinant of each (k − 1) × (k − 1) submatrix of A is -1, 0, or 1, and
consider a k × k submatrix B of A. If B has a zero column, then det(B) = 0
and we are done. If B has a column with a single nonzero component (1 or -1),
then by expanding its determinant along that column and by using the induction
hypothesis, we see that det(B) = 1 or det(B) = −1. Finally, if each column of
B has exactly two nonzero components (one 1 and one -1), the sum of its rows
is zero, so that B is singular and det(B) = 0, completing the proof and showing
that A is totally unimodular.
(b) The proof is based on induction as in part (a). The 1 × 1 submatrices of A
are the entries of A, which are 0 or 1. Suppose now that the determinant of each
(k−1)×(k−1) submatrix of A is -1, 0, or 1, and consider a k×k submatrix B of A.
Since in each column of A, the entries that are equal to 1 appear consecutively, the
same is true for the matrix B. Take the first column b1 of B. If b1 = 0, then B is
singular and det(B) = 0. If b1 has a single nonzero component, then by expanding
the determinant of B along b1 and by using the induction hypothesis, we see
that det(B) = 1 or det(B) = −1. Finally, let b1 have more than one nonzero
component (its nonzero entries are 1 and appear consecutively). Let l and p be
rows of B such that bi1 = 0 for all i < l and i > p, and bi1 = 1 for all l ≤ i ≤ p.
By multiplying the lth row of B with (-1) and by adding it to the l + 1st, l + 2nd,
. . ., kth row of B, we obtain a matrix B such that det(B) = det(B) and the first
column b1 of B has a single nonzero component. Furthermore, the determinant
of every square submatrix of B is -1, 0, or 1 (this follows from the fact that the
determinant of a square matrix is unaffected by adding a scalar multiple of a
row of the matrix to some of its other rows, and from the induction hypothesis).
Since b1 has a single nonzero component, by expanding the determinant of B
along b1 , it follows that det(B) = 1 or det(B) = −1, implying that det(B) = 1 or
det(B) = −1, completing the induction and showing that A is totally unimodular.

3.25 (Unimodularity IV)

If A is totally unimodular, then by Exercise 3.23(a), its transpose A is also totally


unimodular and by Exercise 3.23(b), the set I = {1, . . . , m} can be partitioned
into two subsets I1 and I2 such that

 
aij − aij ≤ 1, ∀ j = 1, . . . , n.
i∈I1 i∈I2

Since aij ∈ {−1, 0, 1} and exactly two of a1j , . . . , amj are nonzero for each j, it
follows that  
aij − aij = 0, ∀ j = 1, . . . , n.
i∈I1 i∈I2

Take any j ∈ {1, . . . , n}, and let l and p be such that aij = 0 for all i = l and
i = p, so that in view of the preceding relation and the fact aij ∈ {−1, 0, 1}, we
see that: if alj = −apj , then both l and p are in the same subset (I1 or I2 ); if
alj = apj , then l and p are not in the same subset.

30
Suppose now that the rows of A can be divided into two subsets such
that for each column the following property holds: if the two nonzero entries in
the column have the same sign, they are in different subsets, and if they have
the opposite sign, they are in the same subset. By multiplying all the rows in
one of the subsets by −1, we obtain the matrix A with entries aij ∈ {−1, 0, 1},
and exactly one 1 and exactly one -1 in each of its columns. Therefore, by
Exercise 3.24(a), A is totally unimodular, so that every square submatrix of A
has determinant -1, 0, or 1. Since the determinant of a square submatrix of A
and the determinant of the corresponding submatrix of A differ only in sign, it
follows that every square submatrix of A has determinant -1, 0, or 1, showing
that A is totally unimodular.

3.26 (Gordan’s Theorem of the Alternative [Gor73])

(a) Assume that there exist x̂ ∈ n and µ ∈ r such that both conditions (i) and
(ii) hold, i.e.,
aj x̂ < 0, ∀ j = 1, . . . , r, (3.19)

r

µ = 0, µ ≥ 0, µj aj = 0. (3.20)
j=1

By premultiplying Eq. (3.19) with µj ≥ 0 and summing the obtained inequalities


over j, we have

r

µj aj x̂ < 0.
j=1

On the other hand, from Eq. (3.20), we obtain


r

µj aj x̂ = 0,
j=1

which is a contradiction. Hence, both conditions (i) and (ii) cannot hold simul-
taneously.
The proof will be complete if we show that the conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
 
C1 = w ∈ r | aj x ≤ wj , j = 1, . . . , r, x ∈ n ,

C2 = {ξ ∈ r | ξj < 0, j = 1, . . . , r}.

It can be seen that both C1 and C2 are convex. Furthermore, because the condi-
tion (i) does not hold, C1 and C2 are disjoint sets. Therefore, by the Separating
Hyperplane Theorem (Prop. 2.4.2), C1 and C2 can be separated, i.e., there exists
a nonzero vector µ ∈ r such that

µ w ≥ µ ξ, ∀ w ∈ C1 , ∀ ξ ∈ C2 ,

31
implying that
inf µ w ≥ µ ξ, ∀ ξ ∈ C2 .
w∈C1

Since each component ξj of ξ ∈ C2 can be any negative scalar, for the preceding
relation to hold, µj must be nonnegative for all j. Furthermore, by letting ξ → 0,
in the preceding relation, it follows that

inf µ w ≥ 0,
w∈C1

implying that
µ1 w1 + · · · + µr wr ≥ 0, ∀ w ∈ C1 .
By setting wj = aj x for all j, we obtain

(µ1 a1 + · · · + µr ar ) x ≥ 0, ∀ x ∈ n ,

and because this relation holds for all x ∈ n , we must have

µ1 a1 + · · · + µr ar = 0.

Hence, the condition (ii) holds, showing that the conditions (i) and (ii) cannot
fail to hold simultaneously.
Alternative proof : We will show the equivalent statement of part (b), i.e., that
a polyhedral cone contains an interior point if and only if the polar C ∗ does not
contain a line. This is a special case of Exercise 3.2 (the dimension of C plus the
dimension of the lineality space of C ∗ is n), as well as Exercise 3.6(d), but we
will give an independent proof.
Let  
C = x | aj x ≤ 0, j = 1, . . . , r ,
where aj = 0 for all j. Assume that C contains an interior point, and to arrive
at a contradiction, assume that C ∗ contains a line. Then there exists a d = 0
such that d and −d belong to C ∗ , i.e., d x ≤ 0 and −d x ≤ 0 for all x ∈ C, so
that d x = 0 for all x ∈ C. Thus for the interior point x ∈ C, we have d x = 0,
and since d ∈ C ∗ and d = j=1 µj aj for some µj ≥ 0, we have
r


r

µj aj x = 0.
j=1

This is a contradiction, since x is an interior point of C, and we have aj x < 0 for
all j.
Conversely, assume that C ∗ does not contain a line. Then by Prop. 3.3.1(b),
C ∗ has an extreme point, and since the origin is the only possible extreme point
of a cone, it follows that the origin is an extreme point of C ∗, which is the cone
generated by {a1 , . . . , ar }. Therefore 0 ∈
/ conv {a1 , . . . , ar } , and there exists
 
a hyperplane that strictly separates the origin from conv {a1 , . . . , ar } . Thus,
 
there exists a vector x such that y  x < 0 for all y ∈ conv {a1 , . . . , ar } , so in
particular,
aj x < 0, ∀ j = 1, . . . , r,

32
and x is an interior point of C.
(b) Let C be a polyhedral cone given by

 
C = x | aj x ≤ 0, j = 1, . . . , r ,

where aj = 0 for all j. The interior of C is given by

 
int(C) = x | aj x < 0, j = 1, . . . , r ,

so that C has nonempty interior if and only if the condition (i) of part (a) holds.
By Farkas’ Lemma [Prop. 3.2.1(b)], the polar cone of C is given by




r

C = x x= µj aj , µj ≥ 0, j = 1, . . . , r .
j=1

We now show that C ∗ contains a line if and only if there is a µ ∈ r such that
r
µ = 0, µ ≥ 0, and j=1 µj aj = 0 [condition (ii) of part (a) holds]. Suppose that
C contains a line, i.e., a set of the form {x + αz | α ∈ }, where x ∈ C ∗ and

z is a nonzero vector. Since C ∗ is a closed convex cone, by the Recession Cone


Theorem (Prop. 1.5.1), it follows that z and −z belong to RC ∗ . This, implies
that 0 + z = z ∈ C ∗ and 0 − z = −z ∈ C ∗ , and therefore z and −z can be
represented as


r

z= µj aj , ∀ j, µj ≥ 0, µj = 0 for some j,
j=1


r

−z = µj aj , ∀ j, µj ≥ 0, µj = 0 for some j.
j=1

r
Thus, j=1
(µj + µj )aj = 0, where (µj + µj ) ≥ 0 for all j and (µj + µj ) = 0 for
at least one j, showing that the condition (ii) of part (a) holds.
r
Conversely, suppose that µ a = 0 with µj ≥ 0 for all j and µj = 0
j=1 j j
for some j. Assume without loss of generality that µ1 > 0, so that

 µj
−a1 = aj ,
µ1
j =1

with µj /µ1 ≥ 0 for all j, which implies that −a1 ∈ C ∗ . Since a1 ∈ C ∗ , −a1 ∈ C ∗ ,
and a1 = 0, it follows that C ∗ contains a line, completing the proof.

33
3.27 (Linear System Alternatives)

Assume that there exist x̂ ∈ n and µ ∈ r such that both conditions (i) and
(ii) hold, i.e.,
aj x̂ ≤ bj , ∀ j = 1, . . . , r, (3.21)

r 
r

µ ≥ 0, µj aj = 0, µj bj < 0. (3.22)
j=1 j=1

By premultiplying Eq. (3.21) with µj ≥ 0 and summing the obtained inequalities


over j, we have

r

r

µj aj x̂ ≤ µ j bj .
j=1 j=1

On the other hand, by using Eq. (3.22), we obtain


r

r

µj aj x̂ = 0 > µj bj ,
j=1 j=1

which is a contradiction. Hence, both conditions (i) and (ii) cannot hold simul-
taneously.
The proof will be complete if we show that conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
P1 = {ξ ∈ r | ξj ≤ 0, j = 1, . . . , r},
 
P2 = w ∈ r | aj x − bj = wj , j = 1, . . . , r, x ∈ n .
Clearly, P1 is a polyhedral set. For the set P2 , we have

P2 = {w ∈ r | Ax − b = w, x ∈ n } = R(A) − b,

where A is the matrix with rows aj and b is the vector with components bj .
Thus, P2 is an affine set and is therefore polyhedral. Furthermore, because the
condition (i) does not hold, P1 and P2 are disjoint polyhedral sets, and they
can be strictly separated [Prop. 2.4.3 under condition (5)]. Hence, there exists a
vector µ ∈ r such that
sup µ ξ < inf µ w.
ξ∈P1 w∈P2

Since each component ξj of ξ ∈ P1 can be any negative scalar, for the preceding
relation to hold, µj must be nonnegative for all j. Furthermore, since 0 ∈ P1 , it
follows that
0 < inf µ w,
w∈P2

implying that
0 < µ 1 w1 + · · · + µ r wr , ∀ w ∈ P2 .
By setting wj = aj x − bj for all j, we obtain

µ1 b1 + · · · + µr br < (µ1 a1 + · · · + µr ar ) x, ∀ x ∈ n .

34
Since this relation holds for all x ∈ n , we must have

µ1 a1 + · · · + µr ar = 0,

implying that
µ1 b1 + · · · + µr br < 0.

Hence, the condition (ii) holds, showing that the conditions (i) and (ii) cannot
fail to hold simultaneously.

3.28 (Convex System Alternatives [FGH57])

Assume that there exist x̂ ∈ C and µ ∈ r such that both conditions (i) and (ii)
hold, i.e.,
fj (x̂) < 0, ∀ j = 1, . . . , r, (3.23)

r

µ = 0, µ ≥ 0, µj fj (x̂) ≥ 0. (3.24)
j=1

By premultiplying Eq. (3.23) with µj ≥ 0 and summing the obtained inequalities


over j, we obtain, using the fact µ = 0,


r

µj fj (x̂) < 0,
j=1

contradicting the last relation in Eq. (3.24). Hence, both conditions (i) and (ii)
cannot hold simultaneously.
The proof will be complete if we show that conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
P = {ξ ∈ r | ξj ≤ 0, j = 1, . . . , r},
 
C1 = w ∈ r | fj (x) < wj , j = 1, . . . , r, x ∈ C .

The set P is polyhedral, while C1 is convex by the convexity of C and fj for all j.
Furthermore, since condition (i) does not hold, P and C1 are disjoint, implying
that ri(C1 ) ∩ P = Ø. By the Polyhedral Proper Separation Theorem (cf. Prop.
3.5.1), the polyhedral set P and convex set C1 can be properly separated by a
hyperplane that does not contain C1 , i.e., there exists a vector µ ∈ r such that

sup µ ξ ≤ inf µ w, inf µ w < sup µ w.


ξ∈P w∈C1 w∈C1 w∈C1

Since each component ξj of ξ ∈ P can be any negative scalar, the first relation
implies that µj ≥ 0 for all j, while the second relation implies that µ = 0.
Furthermore, since µ ξ ≤ 0 for all ξ ∈ P and 0 ∈ P , it follows that

sup µ ξ = 0 ≤ inf µ w,
ξ∈P w∈C1

35
implying that
0 ≤ µ1 w1 + · · · + µr wr , ∀ w ∈ C1 .
By letting wj → fj (x) for all j, we obtain

0 ≤ µ1 f1 (x) + · · · + µr fr (x), ∀ x ∈ C ∩ dom(f1 ) ∩ · · · ∩ dom(fr ).

Thus, the convex function

f = µ1 f1 + · · · + µr fr

is finite and nonnegative over the convex set

C̃ = C ∩ dom(f1 ) ∩ · · · ∩ dom(fr ).
 
By Exercise 1.27, the function f is nonnegative over cl C̃ . Given that ri(C) ⊂
dom(fi ) for all i, we have ri(C) ⊂ C̃, and therefore
   
C ⊂ cl ri(C) ⊂ cl C̃ .

Hence, f is nonnegative over C and condition (ii) holds, showing that the condi-
tions (i) and (ii) cannot fail to hold simultaneously.

3.29 (Convex-Affine System Alternatives)

Assume that there exist x̂ ∈ C and µ ∈ r such that both conditions (i) and (ii)
hold, i.e.,

fj (x̂) < 0, ∀ j = 1, . . . , r, fj (x̂) ≤ 0, ∀ j = r + 1, . . . , r, (3.25)


r

(µ1 , . . . , µr ) = 0, µ ≥ 0, µj fj (x̂) ≥ 0. (3.26)


j=1

By premultiplying Eq. (3.25) with µj ≥ 0 and by summing the obtained inequal-


ities over j, since not all µ1 , . . . , µr are zero, we obtain


r

µj fj (x̂) < 0,
j=1

contradicting the last relation in Eq. (3.26). Hence, both conditions (i) and (ii)
cannot hold simultaneously.
The proof will be complete if we show that conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
P = {ξ ∈ r | ξj ≤ 0, j = 1, . . . , r},
 
C1 = w ∈ r | fj (x) < wj , j = 1, . . . , r, fj (x) = wj , j = r + 1, . . . , r, x ∈ C .

36
The set P is polyhedral, and it can be seen that C1 is convex, since C and
f1 , . . . , fr are convex, and fr+1 , . . . , fr are affine. Furthermore, since the con-
dition (i) does not hold, P and C1 are disjoint, implying that ri(C1 ) ∩ P = Ø.
Therefore, by the Polyhedral Proper Separation Theorem (cf. Prop. 3.5.1), the
polyhedral set P and convex set C1 can be properly separated by a hyperplane
that does not contain C1 , i.e., there exists a vector µ ∈ r such that

sup µ ξ ≤ inf µ w, inf µ w < sup µ w. (3.27)


ξ∈P w∈C1 w∈C1 w∈C1

Since each component ξj of ξ ∈ P can be any negative scalar, the first relation
implies that µj ≥ 0 for all j. Therefore, µ ξ ≤ 0 for all ξ ∈ P and since 0 ∈ P , it
follows that
sup µ ξ = 0 ≤ inf µ w.
ξ∈P w∈C1

This implies that

0 ≤ µ1 w1 + · · · + µr wr , ∀ w ∈ C1 ,

and by letting wj → fj (x) for j = 1, . . . , r, we have

0 ≤ µ1 f1 (x) + · · · + µr fr (x), ∀ x ∈ C ∩ dom(f1 ) ∩ · · · ∩ dom(fr ).

Thus, the convex function

f = µ1 f1 + · · · + µr fr

is finite and nonnegative over the convex set

C = C ∩ dom(f1 ) ∩ · · · ∩ dom(fr ).
 
By Exercise 1.27, f is nonnegative over cl C . Given that ri(C) ⊂ dom(fi ) for
all i = 1, . . . , r, we have ri(C) ⊂ C, and therefore
   
C ⊂ cl ri(C) ⊂ cl C .

Hence, f is nonnegative over C.


We now show that not all µ1 , . . . , µr are zero. To arrive at a contradiction,
suppose that all µ1 , . . . , µr are zero, so that

0 ≤ µr+1 fr+1 (x) + · · · + µr fr (x), ∀ x ∈ C.

Since the system


fr+1 (x) ≤ 0, . . . , fr (x) ≤ 0,

has a solution x ∈ ri(C), it follows that

µr+1 fr+1 (x) + · · · + µr fr (x) = 0,

37
so that
 
inf µr+1 fr+1 (x) + · · · + µr fr (x) = µr+1 fr+1 (x) + · · · + µr fr (x) = 0,
x∈C

with x ∈ ri(C). Thus, the affine function µr+1 fr+1 + · · · + µr fr attains its
minimum value over C at a point in the relative interior of C. Hence, by Prop.
1.4.2 of Chapter 1, the function µr+1 fr+1 + · · · + µr fr is constant over C, i.e.,

µr+1 fr+1 (x) + · · · + µr fr (x) = 0, ∀ x ∈ C.

Furthermore, we have µj = 0 for all j = 1, . . . , r, while by the definition of C1 , we


have fj (x) = wj for j = r + 1, . . . , r, which combined with the preceding relation
yields
µ1 w1 + · · · + µr wr = 0, ∀ w ∈ C1 ,
implying that
inf µ w = sup µ w.
w∈C1 w∈C1

This contradicts the second relation in (3.27). Hence, not all µ1 , . . . , µr are zero,
showing that the condition (ii) holds, and proving that the conditions (i) and (ii)
cannot fail to hold simultaneously.

3.30 (Elementary Vectors [Roc69])

(a) If two elementary vectors z and z had the same support, the vector z − γz
would be nonzero and have smaller support than z and z for a suitable scalar
γ. If z and z are not scalar multiples of each other, then z − γz = 0, which
contradicts the definition of an elementary vector.
(b) We note that either y is elementary or else there exists a nonzero vector z
with support strictly contained in the support of y. Repeating this argument for
at most n − 1 times, we must obtain an elementary vector.
(c) We first show that every nonzero vector y ∈ S has the property that there
exists an elementary vector of S that is in harmony with y and has support that
is contained in the support of y.
We show this by induction on the number of nonzero components of y. Let
Vk be the subset of nonzero vectors in S that have k or less nonzero components,
and let k be the smallest k for which Vk is nonempty. Then, by part (b), every
vector y ∈ Vk must be elementary, so it has the desired property. Assume that all
vectors in Vk have the desired property for some k ≥ k. We let y be a vector in
Vk+1 and we show that it also has the desired property. Let z be an elementary
vector whose support is contained in the support of y. By using the negative of
z if necessary, we can assume that yj zj > 0 for at least one index j. Then there
exists a largest value of γ, call it γ, such that

yj − γzj ≥ 0, ∀ j with yj > 0,

yj − γzj ≤ 0, ∀ j with yj < 0.

38
The vector y − γz is in harmony with y and has support that is strictly contained
in the support of y. Thus either y − γz = 0, in which case the elementary
vector z is in harmony with y and has support equal to the support of y, or else
y − γz is nonzero. In the latter case, we have y − γz ∈ Vk , and by the induction
hypothesis, there exists an elementary vector z that is in harmony with y − γz
and has support that is contained in the support of y − γz. The vector z is also
in harmony with y and has support that is contained in the support of y. The
induction is complete.
Consider now the given nonzero vector x ∈ S, and choose any elementary
vector z 1 of S that is in harmony with x and has support that is contained in
the support of x (such a vector exists by the property just shown). By using the
negative of z 1 if necessary, we can assume that xj z 1j > 0 for at least one index j.
Let γ be the largest value of γ such that

xj − γz 1j ≥ 0, ∀ j with xj > 0,

xj − γz 1j ≤ 0, ∀ j with xj < 0.
The vector x − z , where
1

z1 = γ z1,
is in harmony with x and has support that is strictly contained in the support of
x. There are two cases: (1) x = z 1 , in which case we are done, or (2) x = z 1 , in
which case we replace x by x − z 1 and we repeat the process. Eventually, after m
steps where m ≤ n (since each step reduces the number of nonzero components
by at least one), we will end up with the desired decomposition x = z 1 + · · · + z m .

3.31 (Combinatorial Separation Theorem [Cam68], [Roc69])

For simplicity, assume that B is the Cartesian product of bounded open intervals,
so that B has the form

B = {t | bj < tj < bj , j = 1, . . . , n},

where bj and bj are some scalars. The proof is easily modified for the case where
B has a different form.
Since B ∩ S ⊥ = Ø, there exists a hyperplane that separates B and S ⊥ . The
normal of this hyperplane is a nonero vector d ∈ S such that

t d ≤ 0, ∀ t ∈ B.

Since B is open, this inequality implies that actually

t d < 0, ∀ t ∈ B.

Equivalently, we have
 
(bj − )dj + (bj + )dj < 0, (3.28)
{j|dj >0} {j|dj <0}

39
for all  > 0 such that bj +  < bj − . Let

d = z1 + · · · + zm,

be a decomposition of d, where z 1 , . . . , z m are elementary vectors of S that are


in harmony with x, and have supports that are contained in the support of d [cf.
part (c) of the Exercise 3.30]. Then the condition (3.28) is equivalently written
as  
0> (bj − )dj + (bj + )dj
{j|dj >0} {j|dj <0}
m  m 
  i   i
= (bj − ) zj + (bj + ) zj
{j|dj >0} i=1 {j|dj <0} i=1
⎛ ⎞

m
⎜   ⎟
= ⎝ (bj − )zji + (bj + )zji ⎠ ,
i=1 {j|z i >0} {j|z i <0}
j j

where the last equality holds because the vectors z i are in harmony with d and
their supports are contained in the support of d. From the preceding relation,
we see that for at least one elementary vector z i , we must have
 
0> (bj − )zji + (bj + )zji ,
{j|z i >0} {j|z i <0}
j j

for all  > 0 that are sufficiently small and are such that bj +  < bj − , or
equivalently
0 > t z i , ∀ t ∈ B.

3.32 (Tucker’s Complementarity Theorem)

(a) Fix an index k and consider the following two assertions:


(1) There exists a vector x ∈ S with xi ≥ 0 for all i, and xk > 0.
(2) There exists a vector y ∈ S ⊥ with yi ≥ 0 for all i, and yk > 0.
We claim that one and only one of the two assertions holds. Clearly, assertions
(1) and (2) cannot hold simultaneously, since then we would have x y > 0, while
x ∈ S and y ∈ S ⊥ . We will show that they cannot fail simultaneously. Indeed, if
(1) does not hold, the Cartesian product B = Πn i=1 Bi of the intervals


(0, ∞) if i = k,
Bi =
[0, ∞) if i = k,

does not intersect the subspace S, so by the result of Exercise 3.31, there exists
a vector z of S ⊥ such that x z < 0 for all x ∈ B. For this to hold, we must have
z ∈ B ∗ or equivalently z ≤ 0, while by choosing x = (0, . . . , 0, 1, 0, . . . , 0) ∈ B,

40
with the 1 in the kth position, the inequality x z < 0 yields zk < 0. Thus
assertion (2) holds with y = −z. Similarly, we show that if (2) does not hold,
then (1) must hold.
Let now I be the set of indices k such that (1) holds, and for each k ∈ I,
let x(k) be a vector in S such that x(k) ≥ 0 and xk (k) > 0 (note that we do not
exclude the possibility that one of the sets I and I is empty). Let I be the set of
indices such that (2) holds, and for each k ∈ I, let y(k) be a vector in S ⊥ such
that y(k) ≥ 0 and yk (k) > 0. From what has already been shown, I and I are
disjoint, I ∪ I = {1, . . . , n}, and the vectors
 
x= x(k), y= y(k),
k∈I k∈I

satisfy
xi > 0, ∀ i ∈ I, xi = 0, ∀ i ∈ I,
yi = 0, ∀ i ∈ I, yi > 0, ∀ i ∈ I.
The uniqueness of I and I follows from their construction and the preceding
arguments. In particular, if for some k ∈ I, there existed a vector x ∈ S with
x ≥ 0 and xk > 0, then since for the vector y(k) of S ⊥ we have y(k) ≥ 0
and yk (k) > 0, assertions (a) and (b) must hold simultaneously, which is a
contradiction.
The last assertion follows from the fact that for each k, exactly one of the
assertions (1) and (2) holds.
(b) Consider the subspace
 
S = (x, w) | Ax − bw = 0, x ∈ n , w ∈  .
Its orthogonal complement is the range of the transpose of the matrix [A − b],
so it has the form  
S ⊥ = (A z, −b z) | z ∈ m .
By applying the result of part (a) to the subspace S, we obtain a partition of the
index set {1, . . . , n + 1} into two subsets. There are two possible cases:
(1) The index n + 1 belongs to the first subset.
(2) The index n + 1 belongs to the second subset.
In case (2), the two subsets are of the form I and I∪{n+1} with I∪I = {1, . . . , n},
and by the last assertion of part (a), we have w = 0 for all (x, w) such that
x ≥ 0, w ≥ 0 and Ax − bw = 0. This, however, contradicts the fact that the
set F = {x | Ax = b, x ≥ 0} is nonempty. Therefore, case (1) holds, i.e., the
index n + 1 belongs to the first index subset. In particular, we have that there
exist disjoint index sets I and I with I ∪ I = {1, . . . , n}, and vectors (x, w) with
Ax − bw = 0, and z ∈ m such that
w > 0, b z = 0,
xi > 0, ∀ i ∈ I, xi = 0, ∀ i ∈ I,
yi = 0, ∀ i ∈ I, yi > 0, ∀ i ∈ I,
where y = A z. By dividing (x, w) with w if needed, we may assume that w = 1
so that Ax − b = 0, and the result follows.

41
Convex Analysis and
Optimization
Chapter 4 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE March 24, 2004

CHAPTER 4: SOLUTION MANUAL

4.1 (Directional Derivative of Extended Real-Valued Functions)

(a) Since f  (x; 0) = 0, the relation f  (x; λy) = λf  (x; y) clearly holds for λ = 0
and all y ∈ n . Choose λ > 0 and y ∈ n . By the definition of directional
derivative, we have
   

f x + α(λy) − f (x) f x + (αλ)y − f (x)
f (x; λy) = inf = λ inf .
α>0 α α>0 αλ

By setting β = λα in the preceding relation, we obtain

f (x + βy) − f (x)
f  (x; λy) = λ inf = λf  (x; y).
β>0 β

 
(b) Let (y1 , w1 ) and (y2 , w2 ) be two points in epi f  (x; ·) , and let γ be a scalar
with γ ∈ (0, 1). Consider a point (yγ , wγ ) given by

yγ = γy1 + (1 − γ)y2 , wγ = γw1 + (1 − γ)w2 .

Since for all y ∈ n , the ratio

f (x + αy) − f (x)
α

is monotonically nonincreasing as α ↓ 0, we have

f (x + αy1 ) − f (x) f (x + α1 y1 ) − f (x)


≤ , ∀ α, α1 , with 0 < α ≤ α1 ,
α α1

f (x + αy2 ) − f (x) f (x + α2 y2 ) − f (x)


≤ , ∀ α, α2 , with 0 < α ≤ α2 .
α α2
Multiplying the first relation by γ and the second relation by 1 − γ, and adding,
we have for all α with 0 < α ≤ α1 and 0 < α ≤ α2 ,

γf (x + αy1 ) + (1 − γ)f (x + αy2 ) − f (x) f (x + α1 y1 ) − f (x)


≤γ
α α1
f (x + α2 y2 ) − f (x)
+ (1 − γ) .
α2

From the convexity of f and the definition of yγ , it follows that

f (x + αyγ ) ≤ γf (x + αγy1 ) + (1 − γ)f (x + αy2 ).

2
Combining the preceding two relations, we see that for all α ≤ α1 and α ≤ α2 ,
f (x + αyγ ) − f (x) f (x + α1 y1 ) − f (x) f (x + α2 y2 ) − f (x)
≤γ + (1 − γ) .
α α1 α2
By taking the infimum over α, and then over α1 and α2 , we obtain
f  (x; yγ ) ≤ γf  (x; y1 ) + (1 − γ)f  (x; y2 ) ≤ γw1 + (1 − γ)w2 = wγ ,
 
where in the last inequality we use the fact (y1 , w1 ), (y2 , w2 ) ∈ epi f  (x; ·) Hence
 
the point (yγ , wγ ) belongs to epi f  (x; ·) , implying that f  (x; ·) is a convex
function.
(c) Since f  (x; 0) = 0 and (1/2)y + (1/2)(−y) = 0, it follows that
 
f  x; (1/2)y + (1/2)(−y) = 0, ∀ y ∈ n .
By part (b), the function f  (x; ·) is convex, so that
0 ≤ (1/2)f  (x; y) + (1/2)f  (x; −y),
and
−f  (x; −y) ≤ f  (x; y).
 
(d) Let a vector y be in the level set y | f  (x; y) ≤ 0 , and let λ > 0. By part
(a),
f  (x; λy) = λf  (x; y) ≤ 0,
so that λy also belongs to this level set, which is therefore  a cone. By part  (b),
the function f  (x; ·) is convex, implying that the level set y | f  (x; y) ≤ 0 is
convex.
Since dom(f ) = n , f  (x; ·) is a real-valued function, and since
 it is convex,
by Prop. 1.4.6, it is also continuous over n . Therefore the level set y | f  (x; y) ≤

0 is closed.
We now show that
  ∗   
y | f  (x; y) ≤ 0 = cl cone ∂f (x) .

By Prop. 4.2.2, we have


f  (x; y) = max y  d,
d∈∂f (x)

implying that f  (x; y) ≤ 0 if and only if maxd∈∂f (x) y  d ≤ 0. Equivalently,


f  (x; y) ≤ 0 if and only if
y  d ≤ 0, ∀ d ∈ ∂f (x).
Since
 
y  d ≤ 0, ∀ d ∈ ∂f (x) ⇐⇒ y  d ≤ 0, ∀ d ∈ cone ∂f (x) ,
it follows from Prop. 3.1.1(a) that f  (x; y) ≤ 0 if and only if
 
y  d ≤ 0, ∀ d ∈ cone ∂f (x) .
Therefore
     ∗
y | f  (x; y) ≤ 0 = cone ∂f (x) ,

and the desired relation follows by the Polar Cone Theorem [Prop. 3.1.1(b)].

3
4.2 (Chain Rule for Directional Derivatives)

For any d ∈ n , by using the directional differentiability of f at x, we have


   
F (x + αd) − F (x) = g f (x + αd) − g f (x)
   
= g f (x) + αf  (x; d) + o(α) − g f (x) .

Let zα = f  (x; d) + o(α)/α and note that zα → f  (x; d) as α ↓ 0. By using this


and the assumed property of g, we obtain
   
F (x + αd) − F (x) g f (x) + αzα − g f (x)  
lim = lim = g  f (x); f  (x; d) ,
α↓0 α α↓0 α

showing that F is directionally differentiable at x and that the given chain rule
holds.

4.3

By definition, a vector d ∈ n is a subgradient of f at x if and only if

f (y) ≥ f (x) + d (y − x), ∀ y ∈ n ,

or equivalently
d x − f (x) ≥ d y − f (y), ∀ y ∈ n .
Therefore, d ∈ n is a subgradient of f at x if and only if
 
d x − f (x) = max d y − f (y) .
y

4.4

 f (x) = x is differentiable with ∇f (x) = x/ x , so


(a) For x = 0,the function
that ∂f (x) = ∇f (x) = x/ x . Consider now the case x = 0. If a vector d
is a subgradient of f at x = 0, then f (z) ≥ f (0) + d z for all z, implying that

z ≥ d z, ∀ z ∈ n .

By letting z = d in this relation, we obtain d ≤ 1, showing that ∂f (0) ⊂ d |

d ≤1 .
On the other hand, for any d ∈ n with d ≤ 1, we have

d z ≤ d · z ≤ z , ∀ z ∈ n ,


which is equivalent
  to f (0) + d z ≤ f (z) for all z, so that d ∈ ∂f (0), and therefore
d | d ≤ 1 ⊂ ∂f (0).

4
Note that an alternative proof is obtained by writing
x = max x z,
z≤1

and by using Danskin’s Theorem (Prop. 4.5.1).


(b) By convention ∂f (x) = Ø when x ∈ dom(f ), and since here dom(f ) = C, we
see that ∂f (x) = Ø when x ∈ C. Let now x ∈ C. A vector d is a subgradient of
f at x if and only if
d (z − x) ≤ f (z), ∀ z ∈ n .
Because f (z) = ∞ for all z ∈ C, the preceding relation always holds when z ∈ C,
so the points z ∈ C can be ignored. Thus, d ∈ ∂f (x) if and only if
d (z − x) ≤ 0, ∀ z ∈ C.
Since C is convex, by Prop. 4.6.3, the preceding relation is equivalent to d ∈
NC (x), implying that ∂f (x) = NC (x) for all x ∈ C.

4.5

When f is defined on the real line, by Prop. 4.2.1, ∂f (x) is a compact interval of
the form
∂f (x) = [α, β].
By Prop. 4.2.2, we have
f  (x; y) = max y  d, ∀ y ∈ n ,
d∈∂f (x)

from which we see that


f  (x; 1) = α, f  (x; −1) = β.
Since
f  (x; 1) = f + (x), f  (x; −1) = f − (x),
we have
∂f (x) = [f − (x), f + (x)].

4.6

We can view the function


 
ϕ(t) = f tx + (1 − t)y , t∈
as the composition of the form
 
ϕ(t) = f g(t) , t ∈ ,
where g(t) :  →  is an affine function given by
n

g(t) = y + t(x − y), t ∈ .


By using the Chain Rule [Prop. 4.2.5(a)], where A = (x − y), we obtain
 
∂ϕ(t) = A ∂f g(t) , ∀ t ∈ ,
or equivalently
  
∂ϕ(t) = (x − y) d | d ∈ ∂f tx + (1 − t)y , ∀ t ∈ .

5
4.7

Let x and y be any two points in the set X. Since ∂f (x) is nonempty, by using
the subgradient inequality, it follows that

f (x) + d (x − y) ≤ f (y), ∀ d ∈ ∂f (x),

implying that

f (x) − f (y) ≤ d · x − y , ∀ d ∈ ∂f (x).

According to Prop. 4.2.3, the set ∪x∈X ∂f (x) is bounded, so that for some con-
stant L > 0, we have

d ≤ L, ∀ d ∈ ∂f (x), ∀ x ∈ X, (4.1)

and therefore,
f (x) − f (y) ≤ L x − y .

By exchanging the roles of x and y, we similarly obtain

f (y) − f (x) ≤ L x − y ,

and by combining the preceding two relations, we see that


 
f (x) − f (y) ≤ L x − y ,

showing that f is Lipschitz continuous over X.


Also, by using Prop. 4.2.2(b) and the subgradient boundedness [Eq. (4.1)],
we obtain

f  (x; y) = max d y ≤ max d · y ≤L y , ∀ x ∈ X, ∀ y ∈ n .


d∈∂f (x) d∈∂f (x)

4.8 (Nonemptiness of Subdifferential)

Suppose that ∂f (x) is nonempty, and let z ∈ dom(f ). By the definition of


subgradient, for any d ∈ ∂f (x), we have
 
f x + α(z − x) − f (x)
≥ d (z − x), ∀ α > 0,
α

implying that
 

f x + α(z − x) − f (x)
f (x; z − x) = inf ≥ d (z − x) > −∞.
α>0 α

6
Furthermore, for all α ∈ (0, 1), and x, z ∈ dom(f ), the vector x+α(z −x) belongs
to dom(f ). Therefore, for all α ∈ (0, 1),
 
f x + α(z − x) − f (x)
< ∞,
α

implying that
f  (x; z − x) < ∞.

Hence, f  (x; z − x) is finite.


Converseley, suppose that f  (x; z − x) is finite for all z ∈ dom(f ). Fix a
vector x in the relative interior of dom(f ). Consider the set
 
C = (z, ν) | z ∈ dom(f ), f (x) + f  (x; z − x) < ν ,

and the halfline


 
P = (u, ζ) | u = x + β(x − x), ζ = f (x) + βf  (x; x − x), β ≥ 0 .

By Exercise 4.1(b), the directional derivative function f  (x; ·) is convex, implying


that f  (x; z − x) is convex in z. Therefore, the set C is convex. Furthermore,
being a halfline, the set P is polyhedral.
Suppose that C and P have a point (z, ν) in common, so that we have

z ∈ dom(f ), f (x) + f  (x; z − x) < ν, (4.2)

z = x + β(x − x), ν = f (x) + βf  (x; x − x),

for some scalar β ≥ 0. Because βf  (x; y) = f  (x; βy) for all β ≥ 0 and y ∈ n
[see Exercise 4.1(a)], it follows that
  
ν = f (x) + f  x; β(x − x) = f (x) + f  x; z − x),

contradicting Eq. (4.2), and thus showing that C and P do not have any common
point. Hence, ri(C) and P do not have any common point, so by the Polyhedral
Proper Separation Theorem (Prop. 3.5.1), the polyhedral set P and the convex
set C can be properly separated by a hyperplane that does not contain C, i.e.,
there exists a vector (a, γ) ∈ n+1 such that
   
a z + γν ≥ a x + β(x − x) + γ f (x) + βf  (x; x − x) , ∀ (z, ν) ∈ C, ∀ β ≥ 0,
    (4.3)
inf a z + γν < sup a z + γν , (4.4)
(z,ν)∈C (z,ν)∈C

We cannot have γ < 0 since then the left-hand side of Eq. (4.3) could be made
arbitrarily small by choosing ν sufficiently large. Also if γ = 0, then for β = 1,
from Eq. (4.3) we obtain

a (z − x) ≥ 0, ∀ z ∈ dom(f ).

7
 
Since x ∈ ri dom(f ) , we have that the linear function a z attains its minimum
over dom(f ) at a point in the relative interior of dom(f ). By Prop. 1.4.2, it
follows that a z is constant over dom(f ), i.e., a z = a x for all z ∈ dom(f ),
contradicting Eq. (4.4). Hence, we must have γ > 0 and by dividing with γ in
Eq. (4.3), we obtain
 
a z + ν ≥ a x + β(x − x) + f (x) + βf  (x; x − x), ∀ (z, ν) ∈ C, ∀ β ≥ 0,

where a = a/γ. By letting β = 0 and ν ↓ f (x) + f  (x; z − x) in this relation, and


by rearranging terms, we have

f  (x; z − x) ≥ (−a) (z − x), ∀ z ∈ dom(f ).

Because
 
  f x + λ(z − x) − f (x)
f (z) − f (x) = f x + (z − x) − f (x) ≥ inf = f  (x; z − x),
λ>0 λ

it follows that

f (z) − f (x) ≥ (−a) (z − x), ∀ z ∈ dom(f ).

Finally, by using the fact f (z) = ∞ for all z ∈ dom(f ), we see that

f (z) − f (x) ≥ (−a) (z − x), ∀ z ∈ n ,

showing that −a is a subgradient of f at x and that ∂f (x) is nonempty.

4.9 (Subdifferential of Sum of Extended Real-Valued Functions)

It will suffice to prove the result for the case where f = f1 + f2 . If d1 ∈ ∂f1 (x)
and d2 ∈ ∂f2 (x), then by the subgradient inequality, it follows that

f1 (z) ≥ f1 (x) + (z − x) d1 , ∀ z,

f2 (z) ≥ f2 (x) + (z − x) d2 , ∀ z,


so by adding these inequalities, we obtain

f (z) ≥ f (x) + (z − x) (d1 + d2 ), ∀ z.

Hence, d1 + d2 ∈ ∂f (x), implying that ∂f


1 (x) + ∂f2 (x) ⊂ ∂f (x).
Assuming that ri dom(f1 ) and ri dom(f1 ) have a point in common, we
will prove the reverse inclusion. Let d ∈ ∂f (x), and define the functions

g1 (y) = f1 (x + y) − f1 (x) − d y, ∀ y,

g2 (y) = f2 (x + y) − f2 (x), ∀ y.

8
Then, for the function g = g1 + g2 , we have g(0) = 0 and by using d ∈ ∂f (x), we
obtain
g(y) = f (x + y) − f (x) − d y ≥ 0, ∀ y. (4.5)
Consider the convex sets
 
C1 = (y, µ) ∈ n+1 | y ∈ dom(g1 ), µ ≥ g1 (y) ,
 
C2 = (u, ν) ∈ n+1 | u ∈ dom(g2 ), ν ≤ −g2 (y) ,
and note that
   
ri(C1 ) = (y, µ) ∈ n+1 | y ∈ ri dom(g1 ) , µ > g1 (y) ,
   
ri(C2 ) = (u, ν) ∈ n+1 | u ∈ ri dom(g2 ) , ν < −g2 (y) .
Suppose that there exists a vector (ŷ, µ̂) ∈ ri(C1 ) ∩ ri(C2 ). Then,

g1 (ŷ) < µ̂ < −g2 (ŷ),

yielding
g(ŷ) = g1 (ŷ) + g2 (ŷ) < 0,
which contradicts Eq. (4.5). Therefore, the sets ri(C1 ) and ri(C2 ) are disjoint,
and by the Proper Separation (Prop. 2.4.6), the two convex sets C1 and C2 can
be properly separated, i.e., there exists a vector (w, γ) ∈ n+1 such that

inf {w y + γµ} ≥ sup {w u + γν}, (4.6)


(y,µ)∈C1 (u,ν)∈C2

sup {w y + γµ} > inf {w u + γν}.


(y,µ)∈C1 (u,ν)∈C2

We cannot have γ < 0, because by letting µ → ∞ in Eq. (4.6), we will obtain a


contradiction. Thus, we must have γ ≥ 0. If γ = 0, then the preceding relations
reduce to
inf w y ≥ sup w u,
y∈dom(g1 ) u∈dom(g2 )


sup wy> inf w u,
y∈dom(g1 ) u∈dom(g2 )

which in view of the fact

dom(g1 ) = dom(f1 ) − x, dom(g2 ) = dom(f2 ) − x,

imply that
 dom(f1 ) and dom(f2 ) are properly separated. But this is impossible
since ri dom(f1 ) and ri dom(f1 ) have a point in common. Hence γ > 0, and
by dividing in Eq. (4.6) with γ and by setting b = w/γ, we obtain
   
inf b y + µ ≥ sup b u + ν .
(y,µ)∈C1 (u,ν)∈C2

9
Since g1 (0) = 0 and g2 (0) = 0, we have (0, 0) ∈ C1 ∩ C2 , implying that

b y + µ ≥ 0 ≥ b u + ν, ∀ (y, µ) ∈ C1 , ∀ (u, ν) ∈ C2 .

Therefore, for µ = g1 (y) and ν = −g2 (u), we obtain

g1 (y) ≥ −b y, ∀ y ∈ dom(g1 ),

g2 (u) ≥ b ν, ∀ u ∈ dom(g2 ),
and by using the definitions of g1 and g2 , we see that

f1 (x + y) ≥ f1 (x) + (d − b) y, for all y with x + y ∈ dom(f1 ),

f2 (x + u) ≥ f2 (x) + b u, for all u with x + u ∈ dom(f2 ).


Hence,
f1 (z) ≥ f1 (x) + (d − b) (z − x), ∀ z,

f2 (z) ≥ f2 (x) + b (z − x), ∀ z,
so that d − b ∈ ∂f1 (x) and b ∈ ∂f2 (x), showing that d ∈ ∂f1 (x) + ∂f2 (x) and
∂f (x) ⊂ ∂f1 (x) + ∂f2 (x).
When some of the functions are polyhedral, we use a different separation
argument for C1 and C2 . In particular, since the sum of polyhedral functions is
a polyhedral function (see Exercise 3.12), it will still suffice to consider the case
m = 2. Thus, let f1 be a convex function, and let f2 be a polyhedral function
such that  
ri dom(f1 ) ∩ dom(f2 ) = Ø.
Then, in the preceding proof, g2 is a polyhedral function and C2 is a polyhedral
set. Furthermore, ri(C1 ) and C2 are disjoint, for otherwise we would have for
some (ŷ, µ̂) ∈ ri(C1 ) ∩ C2 ,

g1 (ŷ) < µ̂ ≤ −g2 (ŷ),

implying that g(ŷ) = g1 (ŷ) + g2 (ŷ) < 0 and contradicting Eq. (4.5). Therefore,
by the Polyhedral Proper Separation Theorem (Prop. 3.5.1), the convex set C1
and the polyhedral set C2 can be properly separated by a hyperplane that does
not contain C1 , i.e., there exists a vector (w, γ) ∈ n+1 such that
   
inf w y + γµ ≥ sup w u + γν ,
(y,µ)∈C1 (u,ν)∈C2

   
inf w y + γµ < sup w y + γµ .
(y,µ)∈C1 (y,µ)∈C1

We cannot have γ < 0, because by letting µ → ∞ in the first of the preceding


relations, we will obtain a contradiction. Thus, we must have γ ≥ 0. If γ = 0,
then the preceding relations reduce to

inf w y ≥ sup w u,
y∈dom(g1 ) u∈dom(g2 )

10
inf w y < sup w y.
y∈dom(g1 ) y∈dom(g1 )

In view of the fact

dom(g1 ) = dom(f1 ) − x, dom(g2 ) = dom(f2 ) − x,

it follows that dom(f1 ) and dom(f2 ) are properly separated by a hyperplane that
does not contain dom(f1 ), while dom(f2 ) is polyhedral since f2 is polyhedral [see
Prop. 3.2.3). Therefore,
 by the Polyhedral Proper Separation Theorem (Prop.
3.5.1), we have that ri dom(f1 ) ∩ dom(f2 ) = Ø, which is a contradiction. Hence
γ > 0, and the remainder of the proof is similar to the preceding one.

4.10 (Chain Rule for Extended Real-Valued Functions)

We note that dom(F ) is nonempty since in contains the inverse image under A
of the common point of the range of A and the relative interior of dom(f ). In
particular, F is proper. We fix an x in dom(F ). If d ∈ A ∂f (Ax), there exists a
g ∈ ∂f (Ax) such that d = A g. We have for all z ∈ m ,

F (z) − F (x) − (z − x) d = f (Az) − f (Ax) − (z − x) A g


= f (Az) − f (Ax) − (Az − Ax) g
≥ 0,

where the inequality follows from the fact g ∈ ∂f (Ax). Hence, d ∈ ∂F (x), and
we have A ∂f (Ax) ⊂ ∂F (x).
We next show the reverse inclusion. By using a translation argument if
necessary, we may assume that x = 0 and F (0) = 0. Let d ∈ ∂F (0). Then we
have
F (z) − z  d ≥ 0, ∀ z ∈ n ,
or
f (Az) − z  d ≥ 0, ∀ z ∈ n ,
or
f (y) − z  d ≥ 0, ∀ z ∈ n , y = Az,
or
H(y, z) ≥ 0, ∀ z ∈ n , y = Az,
where the function H : m × n → (−∞, ∞] has the form

H(y, z) = f (y) − z  d.
 
Since the range of A contains a point in ri dom(f ) , and dom(H) = dom(f )×n ,
 
we see that the set (y, z) ∈ dom(H) | y = Az contains a point in the relative
interior of dom(H). Hence, we can apply the Nonlinear Farkas’ Lemma [part (b)]
with the following identification:

x = (y, z), C = dom(H), g1 (y, z) = Az − y, g2 (y, z) = y − Az.

11
In this case, we have
   
x ∈ C | g1 (x) ≤ 0, g2 (x) ≤ 0 = (y, z) ∈ dom(H) | Az − y = 0 .
As asserted earlier, this set contains a relative interior point of C, thus implying
that the set
 
Q∗ = µ ≥ 0 | H(y, z) + µ1 g1 (y, z) + µ2 g2 (y, z) ≥ 0, ∀ (y, z) ∈ dom(H)
is nonempty. Hence, there exists (µ1 , µ2 ) such that
f (y) − z  d + (µ1 − µ2 ) (Az − y) ≥ 0, ∀ (y, z) ∈ dom(H).
Since dom(H) =  m
×  , by letting λ = µ1 − µ2 , we obtain
n

f (y) − z  d + λ (Az − y) ≥ 0, ∀ y ∈  m , z ∈ n ,
or equivalently
f (y) ≥ λ y + z  (d − A λ), ∀ y ∈  m , z ∈ n .
Because this relation holds for all z, we have d = A λ implying that
f (y) ≥ λ y, ∀ y ∈ m ,

which shows that λ ∈ ∂f (0). Hence d ∈ A ∂f (0), thus completing the proof.

4.11

Suppose that the set ∪x∈X ∂ f (x) is unbounded for some > 0. Then, there exist
a sequence {xk } ⊂ X, and a sequence {dk } such that dk ∈ ∂ f (xk ) for all k and
dk → ∞. Without loss of generality, we may assume that dk = 0 for all k,
and we denote yk = dk / dk . Since both {xk } and {yk } are bounded, they must
contain convergent subsequences, and without loss of generality, we may assume
that xk converges to some x and yk converges to some y with y = 1. Since
dk ∈ ∂ f (xk ) for all k, it follows that
f (xk + yk ) ≥ f (xk ) + dk yk − = f (xk ) + dk − .
By letting k → ∞ and by using the continuity of f , we obtain f (x + y) = ∞, a
contradiction. Hence, the set ∪x∈X ∂ f (x) must be bounded for all > 0.

4.12

Let d ∈ ∂f (x). Then, by the definitions of subgradient and -subgradient, it


follows that for any > 0,
f (y) ≥ f (x) + d (y − x) ≥ f (x) + d (y − x) − , ∀ y ∈ n ,
implying that d ∈ ∂ f (x) for all > 0. Therefore d ∈ ∩>0 ∂ f (x), showing that
∂f (x) ⊂ ∩>0 ∂ f (x).
Conversely, let d ∈ ∂ f (x) for all > 0, so that
f (y) ≥ f (x) + d (y − x) − , ∀ y ∈ n , ∀ > 0.
By letting ↓ 0, we obtain
f (y) ≥ f (x) + d (y − x), ∀ y ∈ n ,
implying that d ∈ ∂f (x), and showing that ∩>0 ∂ f (x) ⊂ ∂f (x).

12
4.13 (Continuity Properties of -Subdifferential [Nur77])

(a) By the -subgradient definition, we have for all k,

f (y) ≥ f (x) + dk (y − x) − , ∀ y ∈ n .

Since the sequence {xk } is bounded, by Exercise 4.11, the sequence {dk } is also
bounded and therefore, it has a limit point d. Taking the limit in the preceding
relation along a subsequence of {dk } converging to d, we obtain

f (y) ≥ f (x) + d (y − x) − , ∀ y ∈ n ,

showing that d ∈ ∂ f (x).


(b) First we show that
 
cl ∪0<δ< ∂δ f (x) = ∂ f (x). (4.7)

Let d ∈ ∂δ f (x) for some scalar δ satisfying 0 < δ < . Then, by the definition of
-subgradient, we have

f (y) ≥ f (x) − d (y − x) − δ ≥ f (x) − d (y − x) − , ∀ y ∈ n ,

showing that d ∈ ∂ f (x). Therefore,

∂δ f (x) ⊂ ∂ f (x), ∀ δ ∈ (0, ), (4.8)

implying that
∪0<δ< ∂δ f (x) ⊂ ∂ f (x).
Since ∂ f (x) is closed, by taking the closures of both sides in the preceding
relation, we obtain  
cl ∪0<δ< ∂δ f (x) ⊂ ∂ f (x).
Conversely, assume
 to arrive at a contradiction that there is a vector d ∈
∂ f (x) with d ∈ cl ∪0<δ< ∂δ f (x) . Note that the set ∪0<δ< ∂δ f (x) is bounded
since it is contained in the compact set ∂ f (x). Furthermore, we claim that
∪0<δ< ∂δ f (x) is convex. Indeed if d1 and d2 belong to this set, then d1 ∈ ∂δ1 f (x)
and d2 ∈ ∂δ2 f (x) for some positive scalars δ1 and δ2 . Without loss of generality,
let δ1 ≤ δ2 . Then, by Eq. (4.8), it follows that d1 , d2 ∈ ∂δ2 f (x), which is a convex
set by Prop. 4.3.1(a). Hence, λd1 + (1 − λ)d2 ∈ ∂δ2 f (x) for all λ ∈ [0, 1], implying
that λd1 + (1 − λ)d2 ∈ ∪0<δ< ∂δ f (x) for all λ ∈ [0, 1], and showing that the set
∪0<δ< ∂δ f (x) is convex.  
The vector d and the convex and compact set cl ∪0<δ< ∂δ f (x) can be
strongly separated (see Exercise 2.17), i.e., there exists a vector b ∈ n such that

b d >  max

 b g.
g∈cl ∪0<δ< ∂δ f (x)

This relation implies that for some positive scalar β,

b d > max b g + 2β, ∀ δ ∈ (0, ).


g∈∂δ f (x)

13
By Prop. 4.3.1(a), we have

f (x + αb) − f (x) + δ
inf = max b g,
α>0 α g∈∂δ f (x)

so that
f (x + αb) − f (x) + δ
b d > inf + 2β, ∀ δ, 0 < δ < .
α>0 α
Let {δk } be a positive scalar sequence converging to . In view of the preceding
relation, for each δk , there exists a small enough αk > 0 such that

αk b d ≥ f (x + αk b) − f (x) + δk + β. (4.9)

Without loss of generality, we may assume that {αk } is bounded, so that it


has a limit point α ≥ 0. By taking the limit in Eq. (4.9) along an appropriate
subsequence, and by using δk → , we obtain

αb d ≥ f (x + αb) − f (x) + + β.

If α = 0, we would have 0 ≥ + β, which is a contradiction. If α > 0, we would


have
αb d + f (x) − > f (x + αb),
which cannot hold since d ∈ ∂ f (x). Hence, we must have
 
∂ f (x) ⊂ cl ∪0<δ< ∂δ f (x) ,

thus completing the proof of Eq. (4.7).


We now prove the statement of the exercise. Let {xk } be a sequence con-
verging to x. By Prop. 4.3.1(a), the -subdifferential ∂ f (x) is bounded, so that
there exists a constant L > 0 such that

g ≤ L, ∀ g ∈ ∂ f (x).

Let  
γk = f (xk ) − f (x) + L xk − x , ∀ k. (4.10)
Since xk → x, by continuity of f , it follows that γk → 0 as k → ∞, so that
k = − γk converges to . Let {ki } ⊂ {0, 1, . . .} be an index sequence such that
{ ki } is positive and monotonically increasing to , i.e.,

ki ↑ with ki = − γki > 0, ki < ki+1 , ∀ i.

In view of relation (4.7), we have


 
cl ∪i≥0 ∂k f (x) = ∂ f (x), (4.11)
i

implying that for a given vector d ∈ ∂ f (x), there exists a sequence {dki } such
that
dki → d with dki ∈ ∂k f (x), ∀ i. (4.12)
i

14
There remains to show that dki ∈ ∂ f (xki ) for all i. Since dki ∈ ∂k f (x),
i
it follows that for all i and y ∈ n ,

f (y) ≥ f (x) + dki (y − x) − ki


 
= f (xki ) + f (x) − f (xki ) + dki (y − xki ) + dki (xki − x) − ki
    
≥ f (xki ) + dki (y − xki ) − f (x) − f (xki )+dki (xki − x) + ki .
(4.13)
Because dki ∈ ∂ f (x) [cf. Eqs. (4.11) and (4.12)] and ∂ f (x) is bounded, there
holds   
dk (xk − x) ≤ L xk − x .
i i i

Using this relation, the definition of γk [cf. Eq. (4.10)], and the fact k = − γk
for all k, from Eq. (4.13) we obtain for all i and y ∈ n ,

f (y) ≥ f (xki ) + dki (y − xki ) − (γki + ki ) = f (xki ) + dki (y − xki ) − .

Hence dki ∈ ∂ f (xki ) for all i, thus completing the proof.

4.14 (Subgradient Mean Value Theorem)

(a) Scalar Case: Define the scalar function g :  →  by

ϕ(b) − ϕ(a)
g(t) = ϕ(t) − ϕ(a) − (t − a),
b−a

and note that g is convex and g(a) = g(b) = 0. We first show that g attains its
minimum over  at some point t∗ ∈ [a, b]. For t < a, we have

b−a a−t
a= t+ b,
b−t b−t

and by using convexity of g and g(a) = g(b) = 0, we obtain

b−a a−t b−a


0 = g(a) ≤ g(t) + g(b) = g(t),
b−t b−t b−t

implying that g(t) ≥ 0 for t < a. Similarly, for t > b, we have

b−a t−b
b= t+ a,
t−a t−a

and by using convexity of g and g(a) = g(b) = 0, we obtain

b−a t−b t−b


0 = g(b) ≤ g(t) + g(a) = g(t),
t−a t−a t−a

implying that g(t) ≥ 0 for t > b. Therefore g(t) ≥ 0 for t ∈ (a, b), while
g(a) = g(b) = 0. Hence
min g(t) = min g(t). (4.14)
t∈ t∈[a,b]

15
Because g is convex over , it is also continuous over , and since [a, b] is compact,
the set of minimizers of g over [a, b] is nonempty. Thus, in view of Eq. (4.14),
there exists a scalar t∗ ∈ [a, b] such that g(t∗ ) = mint∈ g(t). If t∗ ∈ (a, b), then
we are done. If t∗ = a or t∗ = b, then since g(a) = g(b) = 0, it follows that
every t ∈ [a, b] attains the minimum of g over , so that we can replace t∗ by a
point in the interval (a, b). Thus, in any case, there exists t∗ ∈ (a, b) such that
g(t∗ ) = mint∈ g(t).
We next show that

ϕ(b) − ϕ(a)
∈ ∂ϕ(t∗ ).
b−a

The function g is the sum of the convex function ϕ and the linear (and therefore
smooth) function − ϕ(b)−ϕ(a)
b−a
(t − a). Thus the subdifferential of ∂g(t∗ ) is the sum
of the sudifferential of ∂ϕ(t∗ ) and the gradient − ϕ(b)−ϕ(a)
b−a
(see Prop. 4.2.4),

ϕ(b) − ϕ(a)
∂g(t∗ ) = ∂ϕ(t∗ ) − .
b−a

Since t∗ minimizes g over , by the optimality condition, we have 0 ∈ ∂g(t∗ ).


This and the preceding relation imply that

ϕ(b) − ϕ(a)
∈ ∂ϕ(t∗ ).
b−a

(b) Vector Case: Let x and y be any two vectors in n . If x = y, then



f (y) = f (x) + d (y − x) trivially holds for any d ∈ ∂f (x), and we are done. So
assume that x = y, and consider the scalar function ϕ given by

ϕ(t) = f (xt ), xt = tx + (1 − t)y, t ∈ .

By part (a), where a = 0 and b = 1, there exists α ∈ (0, 1) such that

ϕ(1) − ϕ(0) ∈ ∂ϕ(α),

while by Exercise 4.6, we have


 
∂ϕ(α) = d (x − y) | d ∈ ∂f (xα ) .

Since ϕ(1) = f (x) and ϕ(0) = f (y), we see that


 
f (x) − f (y) ∈ d (x − y) | d ∈ ∂f (xα ) .

Therefore, there exists d ∈ ∂f (xα ) such that f (y) − f (x) = d (y − x).

16
4.15 (Steepest Descent Direction of a Convex Function)

Note that the problem statement in the book contains a typo: d̄/ d̄ should be
replaced by −d̄/  d̄ . 
The sets d | d ≤ 1 and ∂f (x) are compact, and the function φ(d, g) =
d g is linear in each variable when the other variable is fixed, so that φ(·, g) is
convex and closed for all g, while the function −φ(d, ·) is convex and closed for
all d. Thus, by Prop. 2.6.9, the order of min and max can be interchanged,

min max d g = max min d g,


d≤1 g∈∂f (x) g∈∂f (x) d≤1

and there exist associated saddle points.


By Prop. 4.2.2, we have f  (x; d) = maxg∈∂f (x) d g, so

min max d g = min f  (x; d). (4.15)


d≤1 g∈∂f (x) d≤1

We also have for all g,


min d g = − g ,
d≤1

and the minimum is attained for d = −g/ g . Thus


 
max min d g = max − g = − min g . (4.16)
g∈∂f (x) d≤1 g∈∂f (x) g∈∂f (x)

From the generic characterization of a saddle point (cf. Prop. 2.6.1), it follows
that the set of saddle points of d g is D∗ × G∗ , where D∗ is the set of minima
of f  (x; d) subject to d ≤ 1 [cf. Eq. (4.15)], and G∗ is the set of minima of
g subject to g ∈ ∂f (x) [cf. Eq. (4.16)], i.e., G∗ consists of the unique vector g ∗
of minimum norm on ∂f (x). Furthermore, again by Prop. 2.6.1, every d∗ ∈ D∗
must minimize d g ∗ subject to d ≤ 1, so it must satisfy d∗ = −g ∗ / g ∗ .

4.16 (Generating Descent Directions of Convex Functions)

Suppose
 that
 the process does not terminate in a finite number of steps, and let
(wk , gk ) be the sequence generated by the algorithm. Since wk is the projection
of the origin on the set conv{g1 , . . . , gk−1 }, by the Projection Theorem (Prop.
2.2.1), we have

(g − wk ) wk ≥ 0, ∀ g ∈ conv{g1 , . . . , gk−1 },

implying that

gi wk ≥ wk 2
≥ g∗ 2
> 0, ∀ i = 1, . . . , k − 1, ∀ k ≥ 1, (4.17)

where g ∗ ∈ ∂f (x) is the vector with minimum norm in ∂f (x). Note that g ∗ > 0
because x does not minimize f . The sequences {wk } and {gk } are contained in
∂f (x), and since ∂f (x) is compact, {wk } and {gk } have limit points in ∂f (x).

17
Without loss of generality, we may assume that these sequences converge, so that
for some ŵ, ĝ ∈ ∂f (x), we have

lim wk = ŵ, lim gk = ĝ,


k→∞ k→∞

which in view of Eq. (4.17) implies that ĝ  ŵ > 0. On the other hand, because
none of the vectors (−wk ) is a descent direction of f at x, we have f  (x; −wk ) ≥ 0,
so that
gk (−wk ) = max g  (−wk ) = f  (x; −wk ) ≥ 0.
g∈∂f (x)

By letting k → ∞, we obtain ĝ  ŵ ≤ 0, thus contradicting ĝ  ŵ > 0. Therefore,


the process must terminate in a finite number of steps with a descent direction.

4.17 (Generating -Descent Directions of Convex Functions [Lem74])

Suppose
 that
 the process does not terminate in a finite number of steps, and let
(wk , gk ) be the sequence generated by the algorithm. Since wk is the projection
of the origin on the set conv{g1 , . . . , gk−1 }, by the Projection Theorem (Prop.
2.2.1), we have

(g − wk ) wk ≥ 0, ∀ g ∈ conv{g1 , . . . , gk−1 },

implying that

gi wk ≥ wk 2
≥ g∗ 2
> 0, ∀ i = 1, . . . , k − 1, ∀ k ≥ 1, (4.18)

where g ∗ ∈ ∂ f (x) is the vector with minimum norm in ∂ f (x). Note that
g ∗ > 0 because x is not an -optimal solution, i.e., f (x) > inf z∈ n f (z) + [see
Prop. 4.3.1(b)]. The sequences {wk } and {gk } are contained in ∂ f (x), and since
∂ f (x) is compact [Prop. 4.3.1(a)], {wk } and {gk } have limit points in ∂ f (x).
Without loss of generality, we may assume that these sequences converge, so that
for some ŵ, ĝ ∈ ∂ f (x), we have

lim wk = ŵ, lim gk = ĝ,


k→∞ k→∞

which in view of Eq. (4.18) implies that ĝ  ŵ > 0. On the other hand, because
none of the vectors (−wk ) is an -descent direction of f at x, by Prop. 4.3.1(a),
we have

f (x − αwk ) − f (x) +
gk (−wk ) = max g  (−wk ) = inf ≥ 0.
g∈∂f (x) α>0 α

By letting k → ∞, we obtain ĝ  ŵ ≤ 0, thus contradicting ĝ  ŵ > 0. Hence, the


process must terminate in a finite number of steps with an -descent direction.

18
4.18

(a) For x ∈ int(C), we clearly have FC (x) = n , implying that

TC (x) = n .

Since C is convex, by Prop. 4.6.3, we have

NC (x) = TC (x)∗ = {0}.

For x ∈ C with x ∈ int(C), we have x = 1. By the definition of the set


FC (x) of feasible directions at x, we have y ∈ FC (x) if and only if x + αy ∈ C
for all sufficiently small positive scalars α. Thus, y ∈ FC (x) if and only if there
exists α > 0 such that x + αy 2 ≤ 1 for all α with 0 < α ≤ α, or equivalently

x 2
+ 2αx y + α2 y 2
≤ 1, ∀ α, 0 < α ≤ α.

Since x = 1, the preceding relation reduces to

2x y + α y 2
≤ 0. ∀ α, 0 < α ≤ α.

This relation holds if and only if y = 0, or x y < 0 and α ≤ −2x y/ y 2


(i.e.,
α = −2x y/ y 2 ). Therefore,
 
FC (x) = y | x y < 0 ∪ {0}.
 
Because C is convex, by Prop. 4.6.2(c), we have TC (x) = cl FC (x) , implying
that  
TC (x) = y | x y ≤ 0 .

Furthermore, by Prop. 4.6.3, we have NC (x)
 = TC (x) , while by the Farkas’
Lemma [Prop. 3.2.1(b)], TC (x)∗ = cone {x} , implying that
 
NC (x) = cone {x} .

(b) If C is a subspace, then clearly FC (x) = C for all x ∈ C. Because C is convex,


by Props. 4.6.2(a) and 4.6.3, we have
 
TC (x) = cl FC (x) = C, NC (x) = TC (x)∗ = C ⊥ , ∀ x ∈ C.

 
(c) Let C be a closed halfspace given by C = x | a x ≤ b with a nonzero vector
a ∈ n and a scalar b. For x ∈ int(C), i.e., a x < b, we have FC (x) = n and
since C is convex, by Props. 4.6.2(a) and 4.6.3, we have
 
TC (x) = cl FC (x) = n , NC (x) = TC (x)∗ = {0}.

For x ∈ C with x ∈ int(C), we have a x = b, so that x + αy ∈ C for some


y ∈  and α > 0 if and only if a y ≤ 0, implying that
n

 
FC (x) = y | a y ≤ 0 .

19
By Prop. 4.6.2(a), it follows that
   
TC (x) = cl FC (x) = y | a y ≤ 0 ,

while by Prop. 4.6.3 and the Farkas’ Lemma [Prop. 3.2.1(b)], it follows that
 
NC (x) = TC (x)∗ = cone {a} .

(d) For x ∈ C with x ∈ int(C), i.e., xi > 0 for all i ∈ I, we have FC (x) = n .
Then, by using Props. 4.6.2(a) and 4.6.3, we obtain
 
TC (x) = cl FC (x) = n , NC (x) = TC (x)∗ = {0}.

For x ∈ C with x ∈ int(C), the set Ax = {i ∈ I | xi = 0} is nonempty.


Then, x + αy ∈ C for some y ∈ n and α > 0 if and only if yi ≤ 0 for all i ∈ Ax ,
implying that
FC (x) = {y | yi ≤ 0, ∀ i ∈ Ax }.
Because C is convex, by Prop. 4.6.2(a), we have that
 
TC (x) = cl FC (x) = {y | yi ≤ 0, ∀ i ∈ Ax },

or equivalently  
TC (x) = y | y  ei ≤ 0, ∀ i ∈ Ax ,
where ei ∈ n is the vector whose ith component is 1 and all other components

are 0. By Prop. 4.6.3, we further have NC (x) = TC (x)  , while by the
 Farkas’

Lemma [Prop. propforea(b)], we see that TC (x) = cone {ei | i ∈ Ax } , implying
that  
NC (x) = cone {ei | i ∈ Ax } .

4.19

(a) ⇒ (b) Let x ∈ ri(C) and let S be the subspace that is parallel to aff(C).
Then, for every y ∈ S, x + αy ∈ ri(C) for all sufficiently small positive scalars
α, implying that y ∈ FC (x) and showing that S ⊂ FC (x). Furthermore, by the
definition of the set of feasible directions, it follows that if y ∈ FC (x), then there
exists α > 0 such that x + αy ∈ C for all α ∈ (0, α]. Hence y ∈ S, implying that
FC (x) ⊂ S. This and the relation S ⊂ FC (x) show that FC (x) = S. Since C is
convex, by Prop. 4.6.2(a), it follows that
 
TC (x) = cl FC (x) = S,

thus proving that TC (x) is a subspace.


(b) ⇒ (c) Let TC (x) be a subspace. Then, because C is convex, from Prop. 4.6.3
it follows that
NC (x) = TC (x)∗ = TC (x)⊥ ,

20
showing that NC (x) is a subspace.
(c) ⇒ (a) Let NC (x) be a subspace, and to arrive at a contradiction suppose that
x is not a point in the relative interior of C. Then, by the Proper Separation
Theorem (Prop. 2.4.5), the point x and the relative interior of C can be properly
separated, i.e., there exists a vector a ∈ n such that

sup a y ≤ a x, (4.19)
y∈C

inf a y < sup a y. (4.20)


y∈C y∈C

The relation (4.19) implies that

(−a) (x − y) ≤ 0, ∀ y ∈ C. (4.21)

Since C is convex, by Prop. 4.6.3, the preceding relation is equivalent to −a ∈


TC (x)∗ . By the same proposition, there holds NC (x) = TC (x)∗ , implying that
−a ∈ NC (x). Because NC (x) is a subspace, we must also have a ∈ NC (x), and
by using  
NC (x) = TC (x)∗ = z | z  (x − y) ≤ 0, ∀ y ∈ C

(cf. Prop. 4.6.3), we see that

a (x − y) ≤ 0, ∀ y ∈ C.

This relation and Eq. (4.21) yield

a (x − y) = 0, ∀ y ∈ C,

contradicting Eq. (4.20). Hence, x must be in the relative interior of C.

4.20 (Tangent and Normal Cones of Affine Sets)

Let C = {x | Ax = b} and let x ∈ C be arbitrary. We then have

FC (x) = {y | Ay = 0} = N (A),

and by using Prop. 4.6.2(a), we obtain


 
TC (x) = cl FC (x) = N (A).

Since C is convex, by Prop. 4.6.3, it follows that

NC (x) = TC (x)∗ = N (A)⊥ = R(A ).

21
4.21 (Tangent and Normal Cones of Level Sets)

Let  
C = z | f (z) ≤ f (x) .
We first show that    
cl FC (x) = y | f  (x; y) ≤ 0 .
Let y ∈ FC (x) be arbitrary. Then, by the definition of FC (x), there exists a
scalar α such that x + αy ∈ C for all α ∈ (0, α]. By the definition of C, it follows
that f (x + αy) ≤ f (x) for all α ∈ (0, α], implying that

f (x + αy) − f (x)
f  (x; y) = inf ≤ 0.
α>0 α
 
Therefore y ∈ y | f  (x; y) ≤ 0 , thus showing that
 
FC (x) ⊂ y | f  (x; y) ≤ 0 .
 
By Exercise 4.1(d), the set y | f  (x; y) ≤ 0 is closed, so that by taking closures
in the preceding relation, we obtain
   
cl FC (x) ⊂ y | f  (x; y) ≤ 0 .

To show the converse inclusion, let y be such that f  (x; y) < 0, so that for all
small enough α ≥ 0, we have

f (x + αy) − f (x) < 0.

Therefore x + αy ∈ C for all small enough α ≥ 0, implying that y ∈ FC (x) and


showing that  
y | f  (x; y) < 0 ⊂ FC (x).
By taking the closures of the sets in the preceding relation, we obtain

     
y | f  (x; y) ≤ 0 = cl y | f  (x; y) < 0 ⊂ cl FC (x) .

Hence    
cl FC (x) = y | f  (x; y) ≤ 0 .
 
Since C is convex, by Prop. 4.6.2(c), we have cl FC (x) = TC (x). This and
the preceding relation imply that
 
TC (x) = y | f  (x; y) ≤ 0 .

Since by Prop. 4.6.3, NC (x) = TC (x)∗ , it follows that


  ∗
NC (x) = y | f  (x; y) ≤ 0 .

22
Furthermore, by Exercise 4.1(d), we have that
  ∗   
y | f  (x; y) ≤ 0 = cl cone ∂f (x) ,

implying that   
NC (x) = cl cone ∂f (x) .

If x does not minimize f over n , then the subdifferential ∂f (x) does not
contain the origin. Furthermore, by Prop. 4.2.1, ∂f (x) is nonempty and compact,
implying by Exercise 1.32(a) that the cone generated by ∂f (x) is closed. There-
fore, in this case, the closure operation in the preceding relation is unnecessary,
i.e.,  
NC (x) = cone ∂f (x) .

4.22

It suffices to consider the case m = 2. From the definition of the cone of feasible
directions, it can be seen that
FC1 ×C2 (x1 , x2 ) = FC1 (x1 ) × FC2 (x2 ).
By taking the closure of both sides in the preceding relation, and by using the fact
that the closure of the Cartesian product of two sets coincides with the Cartesian
product of their closures (see Exercise 1.37), we obtain
     
cl FC1 ×C2 (x1 , x2 ) = cl FC1 (x1 ) × cl FC2 (x2 ) .
Since C1 and C2 are convex, by Prop. 4.6.2(c), we have
   
TC1 (x1 ) = cl FC1 (x1 ) , TC2 (x2 ) = cl FC2 (x2 ) .
Furthermore, the Cartesian product C1 ×C2 is also convex, and by Prop. 4.6.2(c),
we also have  
TC1 ×C2 (x1 , x2 ) = cl FC1 ×C2 (x1 , x2 ) .
By combining the preceding three relations, we obtain
TC1 ×C2 (x1 , x2 ) = TC1 (x1 ) × TC2 (x2 ).
By taking polars in the preceding relation, we obtain
 ∗
TC1 ×C2 (x1 , x2 )∗ = TC1 (x1 ) × TC2 (x2 ) ,
and because the polar of the Cartesian product of two cones coincides with the
Cartesian product of their polar cones (see Exercise 3.4), it follows that
TC1 ×C2 (x1 , x2 )∗ = TC1 (x1 )∗ × TC2 (x2 )∗ .
Since the sets C1 , C2 , and C1 × C2 are convex, by Prop. 4.6.3, we have
TC1 ×C2 (x1 , x2 )∗ = NC1 ×C2 (x1 , x2 ),
TC1 (x1 )∗ = NC1 (x1 ), TC2 (x2 )∗ = NC2 (x2 ),
so that
NC1 ×C2 (x1 , x2 ) = NC1 (x1 ) × NC2 (x2 ).

23
4.23 (Tangent and Normal Cone Relations)

(a) We first show that

NC1 (x) + NC2 (x) ⊂ NC1 ∩C2 (x), ∀ x ∈ C1 ∩ C2 .

For i = 1, 2, let fi (x) = 0 when x ∈ C and fi (x) = ∞ otherwise, so that for


f = f1 + f2 , we have 
0 if x ∈ C1 ∩ C2 ,
f (x) =
∞ otherwise.
By Exercise 4.4(d), we have

∂f1 (x) = NC1 (x), ∀ x ∈ C1 ,

∂f2 (x) = NC2 (x), ∀ x ∈ C2 ,


∂f (x) = NC1 ∩C2 (x), ∀ x ∈ C1 ∩ C2 ,
while by Exercise 4.9, we have

∂f1 (x) + ∂f2 (x) ⊂ ∂f (x), ∀ x.

In particular, this relation holds for every x ∈ dom(f ) and since dom(f ) =
C1 ∩ C2 , we obtain

NC1 (x) + NC2 (x) ⊂ NC1 ∩C2 (x), ∀ x ∈ C1 ∩ C2 . (4.22)

If ri(C1 ) ∩ ri(C2 ) is nonempty, then by Exercise 4.9, we have

∂f (x) = ∂f1 (x) + ∂f2 (x), ∀ x,

implying that

NC1 ∩C2 (x) = NC1 (x) + NC2 (x), ∀ x ∈ C1 ∩ C2 . (4.23)

Furthermore, by Exercise 4.9, this relation also holds if C2 is polyhedral and


ri(C1 ) ∩ C2 is nonempty.
By taking polars in Eq. (4.22), it follows that
 ∗
NC1 ∩C2 (x)∗ ⊂ NC1 (x) + NC2 (x) ,

and since  ∗
NC1 (x) + NC2 (x) = NC1 (x)∗ ∩ NC2 (x)∗
(see Exercise 3.4), we obtain

NC1 ∩C2 (x)∗ ⊂ NC1 (x)∗ ∩ NC2 (x)∗ . (4.24)

Because C1 and C2 are convex, their intersection C1 ∩ C2 is also convex, and by


Prop. 4.6.3, we have
NC1 ∩C2 (x)∗ = TC1 ∩C2 (x),

24
NC1 (x)∗ = TC1 (x), NC2 (x)∗ = TC2 (x).
In view of Eq. (4.24), it follows that

TC1 ∩C2 (x) ⊂ TC1 (x) ∩ TC2 (x).

When ri(C1 ) ∩ ri(C2 ) is nonempty, or when ri(C1 ) ∩ C2 is nonempty and C2 is


polyhedral, by taking the polars in both sides of Eq. (4.23), it can be similarly
seen that
TC1 ∩C2 (x) = TC1 (x) ∩ TC2 (x).

(b) Let x1 ∈ C1 and x2 ∈ C2 be arbitrary. Since C1 and C2 are convex, the sum
C1 + C2 is also convex, so that by Prop. 4.6.3, we have
 
z ∈ NC1 +C2 (x1 + x2 ) ⇐⇒ z  (y1 + y2 ) − (x1 + x2 ) ≤ 0, ∀ y1 ∈ C1 , ∀ y2 ∈ C2 ,
(4.25)
z1 ∈ NC1 (x1 ) ⇐⇒ z1 (y1 − x1 ) ≤ 0, ∀ y1 ∈ C1 , (4.26)
z2 ∈ NC2 (x2 ) ⇐⇒ z2 (y2 − x2 ) ≤ 0, ∀ y2 ∈ C2 . (4.27)
If z ∈ NC1 +C2 (x1 + x2 ), then by using y2 = x2 in Eq. (4.25), we obtain

z  (y1 − x1 ) ≤ 0, ∀ y1 ∈ C1 ,

implying that z ∈ NC1 (x1 ). Similarly, by using y1 = x1 in Eq. (4.25), we see that
z ∈ NC2 (x2 ). Hence z ∈ NC1 (x1 ) ∩ NC2 (x2 ) implying that

NC1 +C2 (x1 + x2 ) ⊂ NC1 (x1 ) ∩ NC2 (x2 ).

Conversely, let z ∈ NC1 (x1 ) ∩ NC2 (x2 ), so that both Eqs. (4.26) and (4.27)
hold, and by adding them, we obtain
 
z  (y1 + y2 ) − (x1 + x2 ) ≤ 0, ∀ y1 ∈ C1 , ∀ y2 ∈ C2 .

Therefore, in view of Eq. (4.25), we have z ∈ NC1 +C2 (x1 + x2 ), showing that

NC1 (x1 ) ∩ NC2 (x2 ) ⊂ NC1 +C2 (x1 + x2 ).

Hence
NC1 +C2 (x1 + x2 ) = NC1 (x1 ) ∩ NC2 (x2 ).
By taking polars in this relation, we obtain
 ∗
NC1 +C2 (x1 + x2 )∗ = NC1 (x1 ) ∩ NC2 (x2 ) .

Since NC1 (x1 ) and NC2 (x2 ) are closed convex cones, by Exercise 3.4, it follows
that  
NC1 +C2 (x1 + x2 )∗ = cl NC1 (x1 )∗ + NC2 (x2 )∗ .
The sets C1 , C2 , and C1 + C2 are convex, so that by Prop. 4.6.3, we have

NC1 (x1 )∗ = TC1 (x1 ), NC2 (x2 )∗ = TC2 (x2 ),

25
NC1 +C2 (x1 + x2 )∗ = TC1 +C2 (x1 + x2 ),
implying that  
TC1 +C2 (x1 + x2 ) = cl TC1 (x1 ) + TC2 (x2 ) .

(c) Let x ∈ C be arbitrary. Since C is convex, its image AC under the linear
transformation A is also convex, so by Prop. 4.6.3, we have

z ∈ NAC (Ax) ⇐⇒ z  (y − Ax) ≤ 0, ∀ y ∈ AC,

which is equivalent to

z ∈ NAC (Ax) ⇐⇒ z  (Av − Ax) ≤ 0, ∀ v ∈ C.

Furthermore, the condition

z  (Av − Ax) ≤ 0, ∀v∈C

is the same as
(A z) (v − x) ≤ 0, ∀ v ∈ C,
and since C is convex, by Prop. 4.6.3, this is equivalent to A z ∈ NC (x). Thus,

z ∈ NAC (Ax) ⇐⇒ A z ∈ NC (x),

which together with the fact A z ∈ NC (x) if and only if z ∈ (A )−1 · NC (x) yields

NAC (Ax) = (A )−1 · NC (x).

By taking polars in the preceding relation, we obtain


 ∗
NAC (Ax)∗ = (A )−1 · NC (x) . (4.28)

Because AC is convex, by Prop. 4.6.3, we have

NAC (Ax)∗ = TAC (Ax). (4.29)

Since C is convex, by the same proposition, we have NC (x) = TC (x)∗ , so that


NC (x) is a closed convex cone and by using Exercise 3.5, we obtain
 ∗  
(A )−1 · NC (x) = cl A · NC (x)∗ .

Furthermore, by convexity of C, we also have NC (x)∗ = TC (x), implying that


 ∗  
(A )−1 · NC (x) = cl A · TC (x) . (4.30)

Combining Eqs. (4.28), (4.29), and (4.30), we obtain


 
TAC (Ax) = cl A · TC (x) .

26
4.24 [GoT71], [RoW98]

We assume for simplicity that all the constraints are inequalities. Consider the
scalar function θ0 : [0, ∞) →
  defined by
θ0 (r) = sup y  (x − x∗ ), r ≥ 0.
x∈C, x−x∗ ≤r

Clearly θ0 (r) is nondecreasing and satisfies


0 = θ0 (0) ≤ θ0 (r), ∀ r ≥ 0.
Furthermore, since y ∈ TC (x ) , we have y (x − x∗ ) ≤ o( x − x∗ ) for x ∈ C, so
∗ ∗ 

that θ0 (r) = o(r), which implies that θ0 is differentiable at r = 0 with ∇θ0 (0) = 0.
Thus, the function F0 defined by
F0 (x) = θ0 ( x − x∗ ) − y  (x − x∗ )
is differentiable at x∗ , attains a global minimum over C at x∗ , and satisfies
−∇F0 (x∗ ) = y.
If F0 were smooth we would be done, but since it need not even be contin-
uous, we will successively perturb it into a smooth function. We first define the
function θ1 : [0, ∞) →  by
1 2r
θ1 (r) = r r
θ0 (s)ds if r > 0,
0 if r = 0,
(the integral above is well-defined since the function θ0 is nondecreasing). The
function θ1 is seen to be nondecreasing and continuous, and satisfies
0 ≤ θ0 (r) ≤ θ1 (r), ∀ r ≥ 0,
θ1 (0) = 0, and ∇θ1 (0) = 0. Thus the function
F1 (x) = θ1 ( x − x∗ ) − y  (x − x∗ )
has the same significant properties for our purposes as F0 [attains a global mini-
mum over C at x∗ , and has −∇F1 (x∗ ) = y], and is in addition continuous.
We next define the function θ2 : [0, ∞) →  by
1 2r
θ2 (r) = θ1 (s)ds if r > 0,
r r
0 if r = 0.
Again θ2 is seen to be nondecreasing, and satisfies
0 ≤ θ1 (r) ≤ θ2 (r), ∀ r ≥ 0,
θ2 (0) = 0, and ∇θ2 (0) = 0. Also, because θ1 is continuous, θ2 is smooth, and so
is the function F2 given by
F2 (x) = θ2 ( x − x∗ ) − y  (x − x∗ ).
The function F2 fulfills all the requirements of the proposition, except that it may
have global minima other than x∗ . To ensure the uniqueness of x∗ we modify F2
as follows:
F (x) = F2 (x) + x − x∗ 2 .
The function F is smooth, attains a strict global minimum over C at x∗ , and
satisfies −∇F (x∗ ) = y.

27
4.25

We consider the problem

minimize x1 − x2 + x2 − x3 + x3 − x1
subject to xi ∈ Ci , i = 1, 2, 3,

with the additional condition that x1 , x2 and x3 do not lie on the same line.
Suppose that (x∗1 , x∗2 , x∗3 ) defines an optimal triangle. Then, x∗1 solves the problem

minimize x1 − x∗2 + x∗2 − x∗3 + x∗3 − x1


subject to x1 ∈ C1 ,

for which we have the following necessary optimality condition

x∗2 − x∗1 x∗ − x∗1


d1 = ∗ ∗
+ 3∗ ∈ TC1 (x∗1 )∗ .
x2 − x1 x3 − x∗1

The half-line {x | x = x∗1 + αd1 , α ≥ 0} is one of the bisectors of the optimal


triangle. Similarly, there exist d2 ∈ TC2 (x∗2 )∗ and d3 ∈ TC3 (x∗3 )∗ which define the
remaining bisectors of the optimal triangle. By elementary geometry, there exists
a unique point z ∗ at which all three bisectors intersect (z ∗ is the center of the
circle that is inscribed in the optimal triangle). From the necessary optimality
conditions we have

z ∗ − x∗i = αi di ∈ TCi (x∗i )∗ , i = 1, 2, 3.

4.26

Let us characterize the cone TX (x∗ )∗ . Define


 
A(x∗ ) = j | aj x∗ = bj .

Since X is convex, by Prop. 4.6.2, we have


 
TX (x∗ ) = cl FX (x∗ ) ,

while from definition of X, we have


 
FX (x∗ ) = y | aj y ≤ 0, ∀ j ∈ A(x∗ ) ,

and since this set is closed, it follows that


 
TX (x∗ ) = y | aj y ≤ 0, ∀ j ∈ A(x∗ ) .

28
By taking polars in this relation and by using the Farkas’ Lemma [Prop. 3.2.1(b)],
we obtain ⎧ ⎫
⎨   ⎬

TX (x∗ )∗ = µj aj  µj ≥ 0, ∀ j ∈ A(x∗ ) ,
⎩ ⎭
j∈A(x∗ )

/ A(x∗ ), we can write


and by letting µj = 0 for all j ∈
 r 
 
∗ ∗  ∗
TX (x ) = µj aj  µj ≥ 0, ∀ j, µj = 0, ∀ j ∈
/ A(x ) . (4.31)
j=1

By Prop. 4.7.2, the vector x∗ minimizes f over X if and only if

0 ∈ ∂f (x∗ ) + TX (x∗ )∗ .

In view of Eq. (4.31) and the definition of A(x∗ ), it follows that x∗ minimizes f
over X if and only if there exist µ∗1 , . . . , µ∗r such that
(i) µ∗j ≥ 0 for all j = 1, . . . , r, and µ∗j = 0 for all j such that aj x∗ < bj .
r
(ii) 0 ∈ ∂f (x∗ ) + j=1
µ∗j aj .

4.27 (Quasiregularity)

(a) Let y be a nonzero tangent of X at x∗ . Then there exists a sequence {ξ k }


and a sequence {xk } ⊂ X such that xk = x∗ for all k,

ξ k → 0, xk → x∗ ,

and
xk − x∗ y
= + ξk . (4.32)
xk − x∗ y
By the mean value theorem, we have for all k

f (xk ) = f (x∗ ) + ∇f (x̃k ) (xk − x∗ ),

where x̃k is a vector that lies on the line segment joining xk and x∗ . Using Eq.
(4.32), the last relation can be written as

xk − x∗
f (xk ) = f (x∗ ) + ∇f (x̃k ) y k , (4.33)
y

where
yk = y + y ξk .

If the tangent y satsifies ∇f (x∗ ) y < 0, then, since x̃k → x∗ and y k → y, we


obtain for all sufficiently large k, ∇f (x̃k ) y k < 0 and [from Eq. (4.33)] f (xk ) <
f (x∗ ). This contradicts the local optimality of x∗ .

29
(b) Assume first that there are no equality constraints. Let x ∈ X and let y be
a nonzero tangent of X at x. Then there exists a sequence {ξ k } and a sequence
{xk } ⊂ X such that xk = x for all k,

ξ k → 0, xk → x,

and
xk − x y
= + ξk .
xk − x y
By the mean value theorem, we have for all j and k

0 ≥ gj (xk ) = gj (x) + ∇gj (x̃k ) (xk − x) = ∇gj (x̃k ) (xk − x),

where x̃k is a vector that lies on the line segment joining xk and x. This relation
can be written as
xk − x
∇gj (x̃k ) y k ≤ 0,
y

where y k = y + ξ k y , or equivalently

∇gj (x̃k ) y k ≤ 0, yk = y + ξk y .

Taking the limit as k → ∞, we obtain ∇gj (x) y ≤ 0 for all j, thus proving that
y ∈ V (x), and TX (x) ⊂ V (x). If there are some equality constraints hi (x) = 0,
they can be converted to the two inequality constraints hi (x) ≤ 0 and −hi (x) ≤ 0,
and the result follows similarly.
(c) Assume first that there are no equality constraints. From part (a), we have
D(x∗ ) ∩ V (x∗ ) = Ø, which is equivalent to having ∇f (x∗ ) y ≥ 0 for all y with
∇gj (x∗ ) y ≤ 0 for all j ∈ A(x∗ ). By Farkas’ Lemma, this is equivalent to the
existence of Lagrange multipliers µ∗j with the properties stated in the exercise. If
there are some equality constraints hi (x) = 0, they can be converted to the two
inequality constraints hi (x) ≤ 0 and −hi (x) ≤ 0, and the result follows similarly.

30
Convex Analysis and
Optimization
Chapter 5 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE April 15, 2003

CHAPTER 5: SOLUTION MANUAL

5.1 (Second Order Sufficiency Conditions for Equality-


Constrained Problems)

We first prove the following lemma.


Lemma 5.1: Let P and Q be two symmetric matrices. Assume that Q is positive
semidefinite and P is positive definite on the nullspace of Q, that is, x P x > 0
for all x = 0 with x Qx = 0. Then there exists a scalar c such that

P + cQ : positive definite, ∀ c > c.

Proof: Assume the contrary. Then for every integer k, there exists a vector xk
with xk  = 1 such that
 
xk P xk + kxk Qxk ≤ 0.

Since {xk } is bounded, there is a subsequence {xk }k∈K converging to some x,


and since xk  = 1 for all k, we have x = 1. Taking the limit superior in the
above inequality, we obtain

x P x + lim sup (kxk Qxk ) ≤ 0. (5.1)
k→∞, k∈K

 
Since, by the positive semidefiniteness of Q, xk Qxk ≥ 0, we see that {xk Qxk }K
must converge to zero, for otherwise the left-hand side of the above inequality
would be ∞. Therefore, x Qx = 0 and since P is positive definite, we obtain
x P x > 0. This contradicts Eq. (5.1). Q.E.D.

Let us introduce now the augmented Lagrangian function

c
Lc (x, λ) = f (x) + λ h(x) + h(x)2 ,
2

where c is a scalar. This is the Lagrangian function for the problem

c
minimize f (x) + h(x)2
2
subject to h(x) = 0,

which has the same local minima as our original problem of minimizing f (x)
subject to h(x) = 0. The gradient and Hessian of Lc with respect to x are
 
∇x Lc (x, λ) = ∇f (x) + ∇h(x) λ + ch(x) ,

2

m
 
∇2xx Lc (x, λ) = ∇2 f (x) + λi + chi (x) ∇2 hi (x) + c∇h(x)∇h(x) .
i=1
∗ ∗
In particular, if x and λ satisfy the given conditions, we have
 
∇x Lc (x∗ , λ∗ ) = ∇f (x∗ ) + ∇h(x∗ ) λ∗ + ch(x∗ ) = ∇x L(x∗ , λ∗ ) = 0, (5.2)


m

∇2xx Lc (x∗ , λ∗ ) = ∇2 f (x∗ ) + λ∗i ∇2 hi (x∗ ) + c∇h(x∗ )∇h(x∗ )


i=1

= ∇2xx L(x∗ , λ∗ ) + c∇h(x∗ )∇h(x∗ ) .

By assumption, we have that y  ∇2xx L(x∗ , λ∗ )y > 0 for all y = 0 such that
y  ∇h(x∗ )∇h(x∗ ) y = 0, so by applying Lemma 5.1 with P = ∇2xx L(x∗ , λ∗ ) and
Q = ∇h(x∗ )∇h(x∗ ) , it follows that there exists a c such that

∇2xx Lc (x∗ , λ∗ ) : positive definite, ∀ c > c. (5.3)

Using now the standard sufficient optimality condition for unconstrained


optimization (see e.g., [Ber99a], Section 1.1), we conclude from Eqs. (5.2) and
(5.3), that for c > c, x∗ is an unconstrained local minimum of Lc (·, λ∗ ). In
particular, there exist γ > 0 and  > 0 such that

γ
Lc (x, λ∗ ) ≥ Lc (x∗ , λ∗ ) + x − x∗ 2 , ∀ x with x − x∗  < .
2

Since for all x with h(x) = 0 we have Lc (x, λ∗ ) = f (x), ∇λ L(x∗ , λ∗ ) = h(x∗ ) = 0,
it follows that
γ
f (x) ≥ f (x∗ ) + x − x∗ 2 , ∀ x with h(x) = 0, and x − x∗  < .
2

Thus x∗ is a strict local minimum of f over h(x) = 0.

5.2 (Second Order Sufficiency Conditions for Inequality-


Constrained Problems)

We prove this result by using a transformation to an equality-constrained problem


together with Exercise 5.1. Consider the equivalent equality-constrained problem

minimize f (x)
subject to h1 (x) = 0, . . . , hm (x) = 0, (5.4),
g1 (x) + z12 = 0, . . . , gr (x) + zr2 = 0,

which is an optimization problem in variables x and z = (z1 , . . . , zr ). Consider


the vector (x∗ , z ∗ ), where z ∗ = (z1∗ , . . . , zr∗ ),
 1/2
zj∗ = −gj (x∗ ) , j = 1, . . . , r.

3
We will show that (x∗ , z ∗ ) and (λ∗ , µ∗ ) satisfy the sufficiency conditions of Ex-
ercise 5.1, thus showing that (x∗ , z ∗ ) is a strict local minimum of problem (5.4),
proving that x∗ is a strict local minimum of the original inequality-constrained
problem.
Let L(x, z, λ, µ) be the Lagrangian function for this problem, i.e.,


m 
r
 
L(x, z, λ, µ) = f (x) + λi hi (x) + µj gj (x) + zj2 .
i=1 j=1

We have
 
∇(x,z) L(x∗ , z ∗ , λ∗ , µ∗ ) = ∇x L(x∗ , z ∗ , λ∗ , µ∗ ) , ∇z L(x∗ , z ∗ , λ∗ , µ∗ )
 
= ∇x L(x∗ , λ∗ , µ∗ ) , 2µ∗1 z1∗ , . . . , 2µ∗r zr∗
= [0, 0],

where the last equality follows since, by assumption, we have ∇x L(x∗ , λ∗ , µ∗ ) = 0,


 1/2
and µ∗j = 0 for all j ∈
/ A(x∗ ), whereas zj∗ = −gj (x∗ ) = 0 for all j ∈ A(x∗ ).
We also have

∇(λ,µ) L(x∗ , z ∗ , λ∗ , µ∗ ) = h1 (x∗ ), . . . , hm (x∗ ),
   
g1 (x∗ ) + (z1∗ )2 , . . . , gr (x∗ ) + (zr∗ )2

= [0, 0].

Hence the first order conditions of the sufficiency conditions for equality-constrained
problems, given in Exercise 5.1, are satisfied.
We next show that for all (y, w) = (0, 0) satisfying

∇h(x∗ ) y = 0, ∇gj (x∗ ) y + 2zj∗ wj = 0, j = 1, . . . , r, (5.5)

we have
 ∇2 L(x∗ , λ∗ , µ∗ ) 0 
xx
 2µ∗1 0 ... 0  
 0 2µ∗2 ... 0  y
( y w )   w > 0. (5.6)
 0 .. .. .. .. 
. . . .
0 0 ... 2µ∗r

The left-hand side of the preceding expression can also be written as


r

y  ∇2xx L(x∗ , λ∗ , µ∗ )y + 2 µ∗j wj2 . (5.7)


j=1

Let (y, w) = (0, 0) be a vector satisfying Eq. (5.5). We have that zj∗ = 0
for all j ∈ A(x∗ ), so it follows from Eq. (5.5) that

∇hi (x∗ ) y = 0, ∀ i = 1, . . . , m, ∇gj (x∗ ) y = 0, ∀ j ∈ A(x∗ ).

4
Hence, if y = 0, it follows by assumption that

y  ∇2xx L(x∗ , λ∗ , µ∗ )y > 0,

which implies, by Eq. (5.7) and the assumption µ∗j ≥ 0 for all j, that (y, w)
satisfies Eq. (5.6), proving our claim.
If y = 0, it follows that wk = 0 for some k = 1, . . . , r. In this case, by using
Eq. (5.5), we have
2zj∗ wj = 0, j = 1, . . . , r,
from which we obtain that zk∗ must be equal to 0, and hence k ∈ A(x∗ ). By
assumption, we have that

µ∗j > 0, ∀ j ∈ A(x∗ ).

This implies that µ∗k wk2 > 0, and therefore


r

2 µ∗j wj2 > 0,


j=1

showing that (y, w) satisfies Eq. (5.6), completing the proof.

5.3 (Sensitivity Under Second Order Conditions)

We first prove the result for the special case of equality-constrained problems.
Proposition 5.3: Let x∗ and λ∗ be a local minimum and Lagrange multiplier,
respectively, satisfying the second order sufficiency conditions of Exercise 5.1,
and assume that the gradients ∇hi (x∗ ), i = 1, . . . , m, are linearly independent.
Consider the family of problems

minimize f (x)
(5.8)
subject to h(x) = u,

parameterized by the vector u ∈ m . Then there exists an open sphere S centered


at u = 0 such that for every u ∈ S, there is an x(u) ∈ n and a λ(u) ∈ m , which
are a local minimum-Lagrange multiplier pair of problem (5.8). Furthermore, x(·)
and λ(·) are continuously differentiable functions within S and we have x(0) = x∗ ,
λ(0) = λ∗ . In addition, for all u ∈ S we have

∇p(u) = −λ(u),

where p(u) is the optimal cost parameterized by u, that is,


 
p(u) = f x(u) .

Proof: Consider the system of equations

∇f (x) + ∇h(x)λ = 0, h(x) = u. (5.9)

5
For each fixed u, this system represents n + m equations with n + m unknowns
– the vectors x and λ. For u = 0 the system has the solution (x∗ , λ∗ ). The
corresponding (n + m) × (n + m) Jacobian matrix with respect to (x, λ) is given
by  2 
∇xx L(x∗ , λ∗ ) ∇h(x∗ )
J= .
∇h(x∗ ) 0

Let us show that J is nonsingular. If it were not, some nonzero vector


(y  , z  ) would belong to the nullspace of J, that is,

∇2xx L(x∗ , λ∗ )y + ∇h(x∗ )z = 0, (5.10)

∇h(x∗ ) y = 0. (5.11)

Premultiplying Eq. (5.10) by y  and using Eq. (5.11), we obtain

y  ∇2xx L(x∗ , λ∗ )y = 0.

In view of Eq. (5.11), it follows that y = 0, for otherwise our second order suffi-
ciency assumption would be violated. Since y = 0, Eq. (5.10) yields ∇h(x∗ )z = 0,
which in view of the linear independence of the columns ∇hi (x∗ ), i = 1, . . . , m,
of ∇h(x∗ ), yields z = 0. Thus, we obtain y = 0, z = 0, which is a contradiction.
Hence, J is nonsingular.
Returning now to the system (5.9), it follows from the nonsingularity of J
and the Implicit Function Theorem that for all u in some open sphere S centered
at u = 0, there exist x(u) and λ(u) such that x(0) = x∗ , λ(0) = λ∗ , the functions
x(·) and λ(·) are continuously differentiable, and
   
∇f x(u) + ∇h x(u) λ(u) = 0, (5.12)

 
h x(u) = u.

For u sufficiently close to 0, the vectors x(u) and λ(u) satisfy the second order
sufficiency conditions for problem (5.8), since they satisfy them by assumption for
u = 0. This is straightforward to verify by using our continuity assumptions. [If
it were not true, there would exist a sequence {uk } with uk → 0, and a sequence
 
{y k } with y k  = 1 and ∇h x(uk ) y k = 0 for all k, such that

  
y k ∇2xx L x(uk ), λ(uk ) y k ≤ 0, ∀ k.

By taking the limit along a convergent subsequence of {y k }, we would obtain a


contradiction of the second order sufficiency condition at (x∗ , λ∗ ).] Hence, x(u)
and λ(u) are a local minimum-Lagrange multiplier
 pair for
 problem (5.8).
There remains to show that ∇p(u) = ∇u f x(u) = −λ(u). By multi-
plying Eq. (5.12) by ∇x(u), we obtain
   
∇x(u)∇f x(u) + ∇x(u)∇h x(u) λ(u) = 0.

6
 
By differentiating the relation h x(u) = u, it follows that
    
I = ∇u h x(u) = ∇x(u)∇h x(u) , (5.13)

where I is the m × m identity matrix. Finally, by using the chain rule, we have
    
∇p(u) = ∇u f x(u) = ∇x(u)∇f x(u) .

Combining the above three relations, we obtain

∇p(u) + λ(u) = 0, (5.14)

and the proof is complete. Q.E.D.

We next use the preceding result to show the corresponding result for
inequality-constrained problems. We assume that x∗ and (λ∗ , µ∗ ) are a local
minimum and Lagrange multiplier, respectively, of the problem

minimize f (x)
subject to h1 (x) = 0, . . . , hm (x) = 0, (5.15)
g1 (x) ≤ 0, . . . , gr (x) ≤ 0,

and they satisfy the second order sufficiency conditions of Exercise 5.2. We also
assume that the gradients ∇hi (x∗ ), i = 1, . . . , m, ∇gj (x∗ ), j ∈ A(x∗ ) are linearly
independent, i.e., x∗ is regular. We consider the equality-constrained problem

minimize f (x)
subject to h1 (x) = 0, . . . , hm (x) = 0, (5.16)
g1 (x) + z12 = 0, . . . , gr (x) + zr2 = 0,

which is an optimization problem in variables x and z = (z1 , . . . , zr ). Let z ∗ be


a vector with  1/2
zj∗ = −gj (x∗ ) , j = 1, . . . , r.
It can be seen that, since x∗ and (λ∗ , µ∗ ) satisfy the second order assumptions
of Exercise 5.2, (x∗ , z ∗ ) and (λ∗ , µ∗ ) satisfy the second order assumptions of
Exercise 5.1, thus showing that (x∗ , z ∗ ) is a strict local minimum of problem
(5.16)(cf. proof of Exercise 5.2). It is also straightforward to see that since x∗
is regular for problem (5.15), (x∗ , z ∗ ) is regular for problem (5.16). We consider
the family of problems

minimize f (x)
subject to hi (x) = ui , i = 1, . . . , m, (5.17)
gj (x) + zj2 = vj , j = 1, . . . , r,

parametrized by u and v.

7
Using Prop. 5.3, given in the beginning of this exercise, we have that there
exists an open sphere S centered at (u, v) = (0, 0) such that for every (u, v) ∈ S
there is an x(u, v) ∈ n , z(u, v) ∈ r and λ(u, v) ∈ m , µ(u, v) ∈ r , which are
a local minimum and associated Lagrange multiplier vectors of problem (5.17).
We claim that the vectors x(u, v) and λ(u, v) ∈ m , µ(u, v) ∈ r are a
local minimum and Lagrange multiplier vector for the problem

minimize f (x)
subject to hi (x) = ui , ∀ i = 1, . . . , m, (5.18)
gj (x) ≤ vj , ∀ j = 1, . . . , r.

It is straightforward to see that x(u, v) is a local minimum of the preceding prob-


lem. To see that λ(u, v) and µ(u, v) are the corresponding Lagrange multipliers,
we use the first order necessary optimality conditions for problem (5.17) to write

  
m
  
r
 
∇f x(u, v) + λi (u, v)∇hi x(u, v) + µj (u, v)∇gj x(u, v) = 0,
i=1 j=1

2µj (u, v)zj (u, v) = 0, j = 1, . . . , r.


  1/2  
Since zj (u, v) = vj − gj x(u, v) > 0 for j ∈
/ A x(u, v) , where

     
A x(u, v) = j | gj x(u, v) = vj ,

the last equation can also be written as


 
µj (u, v) = 0, ∀j∈
/ A x(u, v) . (5.19)

Thus, to show λ(u, v) and µ(u, v) are Lagrange multipliers for problem (5.18),
there remains to show the nonnegativity of µ(u, v). For this purpose we use the
second order necessary condition for the equivalent equality constrained problem
(5.17). It yields
   
∇2xx L x(u, v), λ(u, v), µ(u, v) 0
 2µ1 (u, v) 0 ... 0  
  y
(y 
w )

 0 2µ2 (u, v) ... 0 
 w ≥ 0,
 0 .. .. .. .. 
. . . .
0 0 ... 2µr (u, v)
(5.20)
for all y ∈ n
and w ∈ r
satisfying
     
∇h x(u, v) y = 0, ∇gj x(u, v) y + 2zj (u, v)wj = 0,
j ∈ A x(u, v) .
  (5.21)
Next let us select, for every j ∈ A x(u, v) , a vector (y, w) with y = 0, wj = 0,
wk = 0 for all k = j. Such a vector satisfies the condition of Eq. (5.21). By using
such a vector in Eq. (5.20), we obtain 2µj (u, v)wj2 ≥ 0, and
 
µj (u, v) ≥ 0, ∀ j ∈ A x(u, v) .

8
Furthermore, by Prop. 5.3 given in the beginning of this exercise, it follows
that x(·, ·), λ(·, ·), and µ(·, ·) are continuously differentiable in S and we have
x(0, 0) = x∗ , λ(0, 0) = λ∗ , µ(0, 0) = µ∗ . In addition, for all (u, v) ∈ S, there
holds
∇u p(u, v) = −λ(u, v),
∇v p(u, v) = −µ(u, v),
where p(u, v) is the optimal cost of problem (5.17), parameterized by (u, v), which
is the same as the optimal cost of problem (5.18), completing our proof.

5.4 (General Sufficiency Condition)

We have
f (x∗ ) = f (x∗ ) + µ∗ g(x∗ )
 
= min f (x) + µ∗ g(x)
x∈X
 
≤ min f (x) + µ∗ g(x)
x∈X, g(x)≤0

≤ min f (x)
x∈X, g(x)≤0

≤ f (x∗ ),

where the first equality follows from the hypothesis, which implies that µ∗ g(x∗ ) =
0, the next-to-last inequality follows from the nonnegativity of µ∗ , and the last
inequality follows from the feasibility of x∗ . It folows that equality holds through-
out, and x∗ is a optimal solution.

5.5

(a) We note that



− 12 µ2 if µ ∈ M ,
inf L0 (d, µ) =
d∈n −∞ otherwise,

so since µ∗ is the vector of minimum norm in M , we obtain for all γ > 0,

1
− µ∗ 2 = sup infn L0 (d, µ)
2 µ≥0 d∈

≤ infn sup L0 (d, µ),


d∈ µ≥0

where the inequality follows from the minimax inequality (cf. Chapter 2). For
any d ∈ n , the supremum of L0 (d, µ) over µ ≥ 0 is attained at

µj = (aj d)+ , j = 1, . . . , r.

[to maximize µj aj d − (1/2)µ2j subject to the constraint µj ≥ 0, we calculate the


unconstrained maximum, which is aj d, and if it is negative we set it to 0, so that

9
the maximum subject to µj ≥ 0 is attained for µj = (aj d)+ ]. Hence, it follows
that, for any d ∈ n ,

1   + 2
r

sup L0 (d, µ) = a0 d + (aj d) ,


µ≥0 2
j=1

which yields the desired relations.


r  2
(b) Since the infimum of the quadratic cost function a0 d + 12 j=1 (aj d)+ is
bounded below, as given in part (a), it follows from the results of Section 2.3 that
the infimum of this function is attained at some d∗ ∈ n .
(c) From the Saddle Point Theorem, for all γ > 0, the coercive convex/concave
quadratic function Lγ has a saddle point, denoted (dγ , µγ ), over d ∈ n and
µ ≥ 0. This saddle point is unique and can be easily characterized, taking
advantage of the quadratic nature of Lγ . In particular, similar to part (a), the
maximization over µ ≥ 0 when d = dγ yields

µγj = (aj dγ )+ , j = 1, . . . , r. (5.22)

Moreover, we can find Lγ (d , µ ) by minimizing Lγ (d, µ ) over d ∈


γ γ γ
. To find n

the unconstrained minimum dγ , we take the gradient of Lγ (d, µγ ) and set it equal
to 0. This yields r
a0 + j=1 µγj aj
dγ = − .
γ
Hence,  2
 r 
a0 + j=1 µj aj 
γ
1 γ 2
Lγ (dγ , µγ ) = − − µ  .
2γ 2
We also have
1
− µ∗ 2 = sup infn L0 (d, µ)
2 µ≥0 d∈

≤ infn sup L0 (d, µ)


d∈ µ≥0 (5.23)
≤ infn sup Lγ (d, µ)
d∈ µ≥0

= Lγ (d , µγ ),
γ

where the first two relations follow from part (a), thus yielding the desired rela-
tion.
(d) From part (c), we have
 r 2
 
a0 + j=1 µj aj 
γ
1 1 γ 2 1
− µγ 2 ≥ Lγ (dγ , µγ ) = − − µ  ≥ − µ∗ 2 . (5.24)
2 2γ 2 2
From this, we see that µγ  ≤ µ∗ , so that µγ remains bounded as γ → 0. By
taking the limit above as γ → 0, we see that
 

r

lim a0 + µγj aj = 0,
γ→0
j=1

10
 r 
so any limit point of µγ , call it µ, satisfies − a0 + j=1
µj aj = 0. Since
µ ≥ 0, it follows that µ ≥ 0, so µ ∈ M . We also have µ ≤ µ∗  (since
γ

µγ  ≤ µ∗ ), so by using the minimum norm property of µ∗ , we conclude that
any limit point µ of µγ must be equal to µ∗ . Thus, µγ → µ∗ . From Eq. (5.24),
we then obtain
1
Lγ (dγ , µγ ) → − µ∗ 2 . (5.25)
2

(e) Equations (5.23) and (5.25), together with part (b), show that

L0 (d∗ , µ∗ ) = infn sup L0 (d, µ) = sup infn L0 (d, µ),


d∈ µ≥0 µ≥0 d∈

[thus proving that (d∗ , µ∗ ) is a saddle point of L0 (d, µ)], and that

a0 d∗ = −µ∗ 2 , (aj d∗ )+ = µ∗j , j = 1, . . . , r.

5.6 (Strict Complementarity)

Consider the following example

minimize x1 + x2
subject to x1 ≤ 0, x2 ≤ 0, −x1 − x2 ≤ 0.

The only feasible vector is x∗ = (0, 0), which is therefore also the optimal solution
of this problem. The vector (1, 1, 2) is a Lagrange multiplier vector which satisfies
strict complementarity. However, it is not possible to find a vector that violates
simultaneously all the constraints, showing that this Lagrange multiplier vector
is not informative.
For the converse statement, consider the example of Fig. 5.1.3. The La-
grange multiplier vectors, that involve three nonzero components out of four, are
informative, but they do not satisfy strict complementarity.

5.7

Let x∗ be a local minimum of the problem


n

minimize fi (xi )
i=1

subject to x ∈ S, xi ∈ Xi , i = 1, . . . , n,

where fi : → are smooth functions, Xi are closed intervals of real numbers


of n , and S is a subspace of n . We introduce artificial optimization variables

11
z1 , . . . , zn and the linear constraints xi = zi , i = 1, . . . , n, while replacing the
constraint x ∈ S with z ∈ S, so that the problem becomes


n

minimize fi (xi )
i=1
(5.26)
subject to z ∈ S, xi ∈ Xi , xi = zi , i = 1, . . . , n.

Let a1 , . . . , am be a basis for S ⊥ , the orthogonal complement of S. Then,


we can represent S as

S = {y | aj y = 0, ∀ j = 1, . . . , m}.

We also represent the closed intervals Xi as

Xi = {y | ci ≤ y ≤ di }.

With the previous identifications, the constraint set of problem (5.26) can be
described alternatively as


n

minimize fi (xi )
i=1

subject to aj z = 0, j = 1, . . . , m,
ci ≤ xi ≤ di , i = 1, . . . , n,
xi = zi , i = 1, . . . , n

[cf. extended representation of the constraint set of problem (5.26)]. This is a


problem with linear constraints, so by Prop. 5.4.1, it admits Lagrange multipliers.
But, by Prop. 5.6.1, this implies that the problem admits Lagrange multipliers in
the original representation as well. We associate a Lagrange multiplier λ∗i with
each equality constraint xi = zi in problem (5.26). By taking the gradient with
respect to the variable x, and using the definition of Lagrange multipliers, we get
 
∇fi (x∗i ) + λ∗i (xi − x∗i ) ≥ 0, ∀ xi ∈ Xi , i = 1, . . . , n,

whereas, by taking the gradient with respect to the variable z, we obtain λ∗ ∈ S ⊥ ,


thus completing the proof.

5.8

We first show that CQ5a implies CQ6. Assume CQ5a holds:


(a) There does not exist a nonzero vector λ = (λ1 , . . . , λm ) such that


m

λi ∇hi (x∗ ) ∈ NX (x∗ ).


i=1

12
(b) There exists a d ∈ NX (x∗ )∗ = TX (x∗ ) (since X is regular at x∗ ) such that

∇hi (x∗ ) d = 0, i = 1, . . . , m, ∇gj (x∗ ) d < 0, ∀ j ∈ A(x∗ ).

To arrive at a contradiction, assume that CQ6 does not hold, i.e., there are
scalars λ1 , . . . , λm , µ1 , . . . , µr , not all of them equal to zero, such that
(i) m 
 
r

− λi ∇hi (x∗ ) + µj ∇gj (x∗ ) ∈ NX (x∗ ).


i=1 j=1

/ A(x∗ ).
(ii) µj ≥ 0 for all j = 1, . . . , r, and µj = 0 for all j ∈
In view of our assumption that X is regular at x∗ , condition (i) can be
written as  

m

r

− λi ∇hi (x∗ ) + µj ∇gj (x∗ ) ∈ TX (x∗ )∗ ,


i=1 j=1

or equivalently,
m 
 
r

λi ∇hi (x∗ ) + µj ∇gj (x∗ ) y ≥ 0, ∀ y ∈ TX (x∗ ). (5.27)


i=1 j=1

Since not all the λi and µj are equal to 0, we conclude that µj > 0 for at least
one j ∈ A(x∗ ); otherwise condition (a) of CQ5a would be violated. Since µ∗j ≥ 0
for all j, with µ∗j = 0 for j ∈
/ A(x∗ ) and µ∗j > 0 for at least one j, we obtain


m 
r

λi ∇hi (x∗ ) d + µj ∇gj (x∗ ) d < 0,


i=1 j=1

where d ∈ TX (x∗ ) is the vector in condition (b) of CQ5a. But this contradicts
Eq. (5.27), showing that CQ6 holds.
Conversely, assume that CQ6 holds. It can be seen that this implies
condition (a) of CQ5a. Let H denote the subspace spanned by the vectors
∇h1 (x∗ ), . . . , ∇hm (x∗ ), and let G denote the cone generated by the vectors
∇gj (x∗ ), j ∈ A(x∗ ). Then, the orthogonal complement of H is given by
 
H ⊥ = y | ∇hi (x∗ ) y = 0, ∀ i = 1, . . . , m ,

whereas the polar of G is given by


 
G∗ = y | ∇gj (x∗ ) y ≤ 0, ∀ j ∈ A(x∗ ) ,

(cf. the results of Section 3.1). The interior of G∗ is the set


 
int(G∗ ) = y | ∇gj (x∗ ) y < 0, ∀ j ∈ A(x∗ ) .

13
Under CQ6, we have int(G∗ ) = Ø, since otherwise the vectors ∇gj (x∗ ), j ∈ A(x∗ )
would be linearly dependent, contradicting CQ6. Similarly, under CQ6, we have
H ⊥ ∩ int(G∗ ) = Ø. (5.28)
To see this, assume the contrary, i.e., H ⊥ and int(G∗ ) are disjoint. The sets H ⊥
and int(G∗ ) are convex, therefore by the Separating Hyperplane Theorem, there
exists some nonzero vector ν such that
ν  x ≤ ν  y, ∀ x ∈ H ⊥ , ∀ y ∈ int(G∗ ),
or equivalently,
ν  (x − y) ≤ 0, ∀ x ∈ H ⊥ , ∀ y ∈ G∗ ,
which implies, using also Exercise 3.4., that
ν ∈ (H ⊥ − G∗ )∗ = H ∩ (−G).
But this contradicts CQ6, and proves Eq. (5.28).
Finally, we show that CQ6 implies condition (b) of CQ5a. Assume, to
arrive at a contradiction, that condition (b) of CQ5a does not hold. This implies
that  
NX (x∗ )∗ ∩ H ⊥ ∩ int G∗ = Ø.
Since X is regular at x∗ , the preceding is equivalent to
 
TX (x∗ ) ∩ H ⊥ ∩ int G∗ = Ø.
The regularity of X at x∗ implies that TX (x∗ ) is convex. Similarly, since the
interior of a convex set is convex
 and
 the intersection of two convex sets is convex,
it follows that the set H ⊥ ∩ int G∗ is convex. It is also nonempty by Eq. (5.28).
Thus, by the Separating Hyperplane Theorem, there exists some vector a = 0
such that  
a x ≤ a y, ∀ x ∈ TX (x∗ ), ∀ y ∈ H ⊥ ∩ int G∗ ,
or equivalently,
a (x − y) ≤ 0, ∀ x ∈ TX (x∗ ), ∀ y ∈ H ⊥ ∩ G∗ ,
which implies that  ∗
a ∈ TX (x∗ ) − (H ⊥ ∩ G∗ ) .
We have
 ∗  
TX (x∗ ) − (H ⊥ ∩ G∗ ) = TX (x∗ )∗ ∩ −(H ⊥ ∩ G∗ )∗
  
= TX (x∗ )∗ ∩ − cl(H + G)
 
= TX (x∗ )∗ ∩ −(H + G)
 
= NX (x∗ ) ∩ −(H + G) ,
where the second equality follows since H ⊥ and G∗ are closed and convex, and
the third equality follows since H and G are both polyhedral cones (cf. Chapter
3). Combining the preceding relations, it follows that there exists a nonzero
vector a that belongs to the set
 
NX (x∗ ) ∩ −(H + G) .
But this contradicts CQ6, thus completing our proof.

14
5.9 (Minimax Problems)

Let x∗ be a local minimum of the minimax problem,

 
minimize max f1 (x), . . . , fp (x)
subject to x ∈ X.

We introduce an additional scalar variable z and convert the preceding problem


to the smooth problem

minimize z
subject to x ∈ X, fi (x) ≤ z, i = 1, . . . , p,

which is an optimization problem in the variables x and z and with an abstract


set constraint (x, z) ∈ X × . Let

 
z ∗ = max f1 (x∗ ), . . . , fp (x∗ ) .

It can be seen that (x∗ , z ∗ ) is a local minimum of the above problem.


It is straightforward to show that

NX× (x∗ , z ∗ ) = NX (x∗ ) × {0}, (5.29)

and
NX× (x∗ , z ∗ )∗ = NX (x∗ )∗ × . (5.30)

Let d = (0, 1). By Eq. (5.30), this vector belongs to the set NX× (x∗ , z ∗ )∗ , and
also
 
0
[∇fi (x∗ ) , −1] = −1 < 0, ∀ i = 1, . . . , p.
1

Hence, CQ5a is satisfied, which together with Eq. (5.29) implies that there exists
a nonnegative vector µ∗ = (µ∗1 , . . . , µ∗p ) such that
 
µ∗j ∇fi (x∗ ) ∈ NX (x∗ ).
p
(i) − j=1
p
(ii) j=1
µ∗j = 1.
(iii) For all j = 1, . . . , p, if µ∗j > 0, then

 
fj (x∗ ) = max f1 (x∗ ), . . . , fp (x∗ ) .

15
5.10 (Exact Penalty Functions)

We consider the problem


minimize f (x)
(5.31)
subject to x ∈ C,

where
   
C = X ∩ x | h1 (x) = 0, . . . , hm (x) = 0 ∩ x | g1 (x) ≤ 0, . . . , gr (x) ≤ 0 ,

and the exact penalty function


m 
 
r

Fc (x) = f (x) + c |hi (x)| + gj+ (x) ,


i=1 j=1

where c is a positive scalar.


(a) In view of our assumption that, for some given c > 0, x∗ is also a local
minimum of Fc over X, we have, by Prop. 5.5.1, that there exist λ1 , . . . , λm and
µ1 , . . . , µr such that
 m 

 ∗

r

− ∇f (x ) + c λi ∇hi (x ) + µj ∇gj (x ) ∈ NX (x∗ ),
i=1 j=1

λi = 1 if hi (x∗ ) > 0, λi = −1 if hi (x∗ ) < 0,

λi ∈ [−1, 1] if hi (x∗ ) = 0,

µj = 1 if gj (x∗ ) > 0, µj = 0 if gj (x∗ ) < 0,

µj ∈ [0, 1] if gj (x∗ ) = 0.

By the definition of R-multipliers, the preceding relations imply that the vector
(λ∗ , µ∗ ) = c(λ, µ) is an R-multiplier for problem (5.31) such that

|λ∗i | ≤ c, i = 1, . . . , m, µ∗j ∈ [0, c], j = 1, . . . , r. (5.32)

(b) Assume that the functions f and the gj are convex, the functions hi are
linear, and the set X is convex. Since x∗ is a local minimum of problem (5.31),
and (λ∗ , µ∗ ) is a corresponding Lagrange multiplier vector, we have by definition
that
 m 

 
r

∇f (x ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) (x − x∗ ) ≥ 0, ∀ x ∈ X.


i=1 j=1

In view of the convexity assumptions, 


this is a sufficient
condition for x∗ to be a
m ∗ r ∗
local minimum of the function f (x) + i=1 λi hi (x) + j=1 µj gj (x) over x ∈ X.

16
Since x∗ is feasible for the original problem, and (λ∗ , µ∗ ) satisfy Eq. (5.32), we
have for all x ∈ X,

FC (x∗ )= f (x∗ )

m 
r

≤ f (x) + λ∗i hi (x) + µ∗j gj (x)


i=1 j=1
m 
 
r

≤ f (x) + c hi (x) + gj (x)


i=1 j=1
m 
 
r

≤ f (x) + c |hi (x)| + gj+ (x)


i=1 j=1

= Fc (x),

implying that x∗ is a local minimum of Fc over X.

5.11 (Extended Representations)

(a) The hypothesis implies that for every smooth cost function f for which x∗ is
a local minimum there exist scalars λ∗1 , . . . , λ∗m and µ∗1 , . . . , µ∗r satisfying
 


m

r

∇f (x ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) y ≥ 0, ∀ y ∈ TX (x∗ ), (5.33)


i=1 j=1

µ∗j ≥ 0, ∀ j = 1, . . . , r,
µ∗j = 0, / A(x∗ ),
∀j∈
where  
A(x∗ ) = j | gj (x∗ ) = 0, j = 1, . . . , r .

Since X ⊂ X, we have TX (x∗ ) ⊂ TX (x∗ ), so Eq. (5.33) implies that


 


m

r

∇f (x ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) y ≥ 0, ∀ y ∈ TX (x∗ ). (5.34)


i=1 j=1

Let V (x∗ ) denote the set



V (x∗ ) = y | ∇hi (x∗ ) y = 0, i = m + 1, . . . , m,

∇gj (x∗ ) y ≤ 0, j = r + 1, . . . , r with j ∈ A(x∗ ) .

We claim that TX (x∗ ) ⊂ V (x∗ ). To see this, let y be a nonzero vector that
belongs to TX (x∗ ). Then, there exists a sequence {xk } ⊂ X such that xk = x∗
for all k and
xk − x∗ y
→ .
xk − x∗  y

17
Since xk ∈ X, for all i = m + 1, . . . , m and k, we have

0 = hi (xk ) = hi (x∗ ) + ∇hi (x∗ ) (xk − x∗ ) + o(xk − x∗ ),

which can be written as

(xk − x∗ ) o(xk − x∗ )
∇hi (x∗ ) + = 0.
xk − x∗  xk − x∗ 

Taking the limit as k → ∞, we obtain

∇hi (x∗ ) y = 0, ∀ i = m + 1, . . . , m. (5.35)

Similarly, we have for all j = r + 1, . . . , r with j ∈ A(x∗ ) and for all k

0 ≥ gj (xk ) = gj (x∗ ) + ∇gj (x∗ ) (xk − x∗ ) + o(xk − x∗ ),

which can be written as

(xk − x∗ ) o(xk − x∗ )
∇gj (x∗ ) + ≤ 0.
xk − x 
∗ xk − x∗ 

By taking the limit as k → ∞, we obtain

∇gj (x∗ ) y ≤ 0, ∀ j = r + 1, . . . , r with j ∈ A(x∗ ).

Equation (5.35) and the preceding relation imply that y ∈ V (x∗ ), showing that
TX (x∗ ) ⊂ V (x∗ ).
Hence Eq. (5.34) implies that
 


m 
r

∇f (x ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) y ≥ 0, ∀ y ∈ TX (x∗ ),


i=1 j=1

and it follows that λ∗i , i = 1, . . . , m, and µ∗j , j = 1, . . . , r, are Lagrange multipliers


for the original representation.
(b) Consider the exact penalty function for the extended representation:
m 
 
r

F c (x) = f (x) + c |hi (x)| + gj+ (x) .


i=1 j=1

We have Fc (x) = F c (x) for all x ∈ X. Hence if x∗ ∈ C is a local minimum of


F c (x) over x ∈ X, it is also a local minimum of Fc (x) over x ∈ X. Thus, for a
given c > 0, if x∗ is both a strict local minimum of f over C and a local minimum
of F c (x) over x ∈ X, it is also a local minimum of Fc (x) over x ∈ X.

18
Convex Analysis and
Optimization
Chapter 6 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE April 15, 2003

CHAPTER 6: SOLUTION MANUAL

6.1

We consider the dual function


 
q(µ1 , µ2 ) = inf x1 − x2 + µ1 (x1 + 1) + µ2 (1 − x1 − x2 )
x1 ≥0, x2 ≥0
 
= inf x1 (1 + µ1 − µ2 ) + x2 (−1 − µ2 ) + µ1 + µ2 .
x1 ≥0, x2 ≥0

It can be seen that if −µ1 + µ2 − 1 ≤ 0 and µ2 + 1 ≤ 0, then the infimum above


is attained at x1 = 0 and x2 = 0. In this case, the dual function is given by
q(µ1 , µ2 ) = µ1 + µ2 . On the other hand, if 1 + µ1 − µ2 < 0 or −1 − µ2 < 0, then
we have q(µ1 , µ2 ) = −∞. Thus, the dual problem is

maximize µ1 + µ2
subject to µ1 ≥ 0, µ2 ≥ 0, −µ1 + µ2 − 1 ≤ 0, µ2 + 1 ≤ 0.

6.2 (Extended Representation)

Assume that there exists a geometric multiplier in the extended representation.


This implies that there exist nonnegative scalars λ∗1 , . . . , λ∗m , λ∗m+1 , . . . , λ∗m and
µ∗1 , . . . , µ∗r , µ∗r+1 , . . . , µ∗r such that
 


m

r

f = infn f (x) + λ∗i hi (x) + µ∗j gj (x) ,


x∈
i=1 j=1

implying that


m 
r

f ∗ ≤ f (x) + λ∗i hi (x) + µ∗j gj (x), ∀ x ∈ n .


i=1 j=1

For any x ∈ X, we have hi (x) = 0 for all i = m + 1, . . . , m, and gj (x) ≤ 0 for all
j = r + 1, . . . , r, so that µ∗j gj (x) ≤ 0 for all j = r + 1, . . . , r. Therefore, it follows
from the preceding relation that



m

r

f ≤ f (x) + λ∗i hi (x) + µ∗j gj (x), ∀ x ∈ X.


i=1 j=1

2
Taking the infimum over all x ∈ X, it follows that
 


m

r

f ≤ inf f (x) + λ∗i hi (x) + µ∗j gj (x)


x∈X
i=1 j=1
 

m

r

≤ inf f (x) + λ∗i hi (x) + µ∗j gj (x)


x∈X, hi (x)=0, i=1,...,m
gj (x)≤0, j=1,...,r i=1 j=1

≤ inf f (x)
x∈X, hi (x)=0, i=1,...,m
gj (x)≤0, j=1,...,r

=f ∗ .

Hence, equality holds throughout above, showing that the scalars λ∗1 , . . . , λ∗m ,
µ∗1 , . . . , µ∗r constitute a geometric multiplier for the original representation.

6.3 (Quadratic Programming Duality)

Consider the extended representation of the problem in which the linear inequal-
ities that represent the polyhedral part are lumped with the remaining linear
inequality constraints. From Prop. 6.3.1, finiteness of the optimal value implies
that there exists an optimal solution and a geometric multiplier. From Exercise
6.2, it follows that there exists a geometric multiplier for the original representa-
tion of the problem.

6.4 (Sensitivity)

We have   
f = inf f (x) + µ g(x) − u ,
x∈X
  
f̃ = inf f (x) + µ̃ g(x) − ũ .
x∈X

Let q(µ) denote the dual function of the problem corresponding to u:


  
q(µ) = inf f (x) + µ g(x) − u .
x∈X

We have
     
f − f̃ = inf f (x) + µ g(x) − u − inf f (x) + µ̃ g(x) − ũ
x∈X x∈X
     
= inf f (x) + µ g(x) − u − inf f (x) + µ̃ g(x) − u + µ̃ (ũ − u)
x∈X x∈X

= q(µ) − q(µ̃) + µ̃ (ũ − u)


≥ µ̃ (ũ − u),

where the last inequality holds because µ maximizes q.


This proves the left-hand side of the desired inequality. Interchanging the
roles of f , u, µ, and f̃ , ũ, µ̃, shows the desired right-hand side.

3
6.5

We first consider the relation

(P ) min c x ⇐⇒ max b µ. (D)


A x≥b Aµ=c,µ≥0

The dual problem to (P ) is


  

n

m

m

max q(µ) = max infn cj − µi aij xj + µi bi .


µ≥0 µ≥0 x∈
j=1 i=1 i=1

m
If cj − i=1
µi aij = 0, then q(µ) = −∞. Thus the dual problem is


m

maximize µi bi
i=1
 m

subject to µi aij = cj , j = 1, . . . , n, µ ≥ 0.
i=1

To determine the dual of (D), note that (D) is equivalent to

min −b µ,
Aµ=c,µ≥0

and so its dual problem is


 
maxn p(x) = max inf (Ax − b) µ − c x .
x∈ x µ≥0

If ai x − bi < 0 for any i, then p(x) = −∞. Thus the dual of (D) is

maximize − c x
subject to A x ≥ b,

or
minimize c x
subject to A x ≥ b.

The Lagrangian optimality condition for (P ) is


  


m

m

x = arg min c− µ∗i ai x+ µ∗i bi ,


x
i=1 i=1

from which we obtain the complementary slackness conditions for (P ):

Aµ = c.

4
The Lagrangian optimality condition for (D) is

µ∗ = arg min{(Ax∗ − b) µ − c x∗ },


µ≥0

from which we obtain the complementary slackness conditions for (D):

Ax∗ − b ≥ 0, (Ax∗ − b)i µ∗i = 0, ∀ i.

Next, consider

(P ) min c x ⇐⇒ max b µ. (D)


A x≥b,x≥0 Aµ≤c,µ≥0

The dual problem to (P ) is


  

n

m

m

max q(µ) = max inf cj − µi aij xj + µi bi .


µ≥0 µ≥0 x≥0
j=1 i=1 i=1

m
If cj − i=1
µi aij < 0, then q(µ) = −∞. Thus the dual problem is


m

maximize µi bi
i=1
 m

subject to µi aij ≤ cj , j = 1, . . . , n, µ ≥ 0.
i=1

To determine the dual of (D), note that (D) is equivalent to

min −b µ,
Aµ≤c,µ≥0

and so its dual problem is


 
max p(x) = max inf (Ax − b) µ − c x .
x≥0 x≥0 µ≥0

If ai x − bi < 0 for any i, then p(x) = −∞. Thus the dual of (D) is

maximize − c x
subject to A x ≥ b, x≥0

or
minimize c x
subject to A x ≥ b, x ≥ 0.
The Lagrangian optimality condition for (P ) is
  


m

m

x = arg min c− µ∗i ai x+ µ∗i bi ,


x≥0
i=1 i=1

5
from which we obtain the complementary slackness conditions for (P ):


m

cj − µ∗i aij x∗j = 0, x∗j ≥ 0, ∀ j = 1, . . . , n,


i=1


m

c− µ∗i ai ≥ 0.
i=1

The Lagrangian optimality condition for (D) is


 
µ∗ = arg min (Ax∗ − b) µ − c x∗ ,
µ≥0

from which we obtain the complementary slackness conditions for (D):

Ax∗ − b ≥ 0, (Ax∗ − b)i µ∗i = 0, ∀ i.

6.6 (Duality and Zero Sum Games)

Consider the linear program

min ζ,
ζe≥A x
n
x =1, xi ≥0
i=1 i

whose optimal value is equal to minx∈X maxz∈Z x Az. Introduce dual variables
z ∈ m and ξ ∈ , corresponding to the constraints A x − ζe ≤ 0 and i=1 xi =
n

1, respectively. The dual function is


  

n

q(z, ξ) = inf ζ + z  (A x − ζe) + ξ 1− xi


xi ≥0, i=1,...,n
i=1
  

m

= inf ζ 1− zj + x (Az − ξe) + ξ
xi ≥0, i=1,...,n
j=1
m
= ξ if z = 1, ξe − Az ≤ 0,
j=1 j
−∞ otherwise.

Thus the dual problem, which is to maximize q(z, ξ) subject to z ≥ 0 and ξ ∈ ,


is equivalent to the linear program

max ξ,
ξe≤Az, z∈Z

whose optimal value is equal to maxz∈Z minx∈X x Az.

6
6.7 (Goldman-Tucker Complementarity Theorem [GoT56])

Consider the subspace


 
S = (x, w) | bw − Ax = 0, c x = wv, x ∈ n , w ∈  ,
where v is the optimal value of (LP). Its orthogonal complement is the range of
the matrix
−A c
,
b −v
so it has the form
 
S ⊥ = (cζ − A λ, b λ − vζ) | λ ∈ m , ζ ∈  .
Applying the Tucker Complementarity Theorem (Exercise 3.32) for this choice of
S, we obtain a partition of the index set {1, . . . , n + 1} in two subsets. There are
two possible cases: (1) the index n + 1 belongs to the first subset, or (2) the index
n + 1 belongs to the second subset. Since the vectors (x, 1) such that x ∈ X ∗
satisfy Ax − bw = 0 and c x = wv, we see that case (1) holds, i.e., the index n + 1
belongs to the first index subset. In particular, we have that there exist disjoint
index sets I and I such that I ∪ I = {1, . . . , n} and the following properties hold:
(a) There exist vectors (x, w) ∈ S and (λ, ζ) ∈ m+1 with the property
xi > 0, ∀ i ∈ I, xi = 0, ∀ i ∈ I, w > 0, (6.1)
  
ci ζ − (A λ)i = 0, ∀ i ∈ I, ci ζ − (A λ)i > 0, ∀ i ∈ I, b λ = vζ.
(6.2)
(b) For all (x, w) ∈ S with x ≥ 0, and (λ, ζ) ∈ m+1 with cζ − A λ ≥ 0,
vζ − b λ ≥ 0, we have
xi = 0, ∀ i ∈ I,
ci ζ − (A λ)i = 0, ∀ i ∈ I, b λ = vζ.

By dividing (x, w) by w, we obtain [cf. Eq. (6.1)] an optimal primal solution


x∗ = x/w such that
x∗i > 0, ∀ i ∈ I, x∗i = 0, ∀ i ∈ I.
Similarly, if the scalar ζ in Eq. (6.2) is positive, by dividing with ζ in Eq. (6.2),
we obtain an optimal dual solution λ∗ = λ/ζ, which satisfies the desired property
ci − (A λ∗ )i = 0, ∀ i ∈ I, ci − (A λ∗ )i > 0, ∀ i ∈ I.
If the scalar ζ in Eq. (6.2) is nonpositive, we choose any optimal dual solution
λ∗ , and we note, using also property (b), that we have
ci −(A λ∗ )i = 0, ∀ i ∈ I, ci −(A λ∗ )i ≥ 0, ∀ i ∈ I, b λ∗ = v. (6.3)
Consider the vector
λ̃ = (1 − ζ)λ∗ + λ.
By multiplying Eq. (6.3) with the positive number 1 − ζ, and by combining it
with Eq. (6.2), we see that
ci − (A λ̃)i = 0, ∀ i ∈ I, ci − (A λ̃)i > 0, ∀ i ∈ I, b λ̃ = v.
Thus, λ̃ is an optimal dual solution that satisfies the desired property.

7
6.8

The problem of finding the minimum distance from the origin to a line is written
as
min 12 x 2

subject to Ax = b,

where A is a 2 × 3 matrix with full rank, and b ∈ 2 . Let f ∗ be the optimal value
and consider the dual function
1 
q(λ) = min 2
x 2
+ λ (Ax − b) .
x

By Prop. 6.3.1, since the optimal value is finite, it follows that this problem
has no duality gap.
Let V ∗ be the supremum over all distances of the origin from planes that
contain the line {x | Ax = b}. Clearly, we have V ∗ ≤ f ∗ , since the distance to the
line {x | Ax = b} cannot be smaller than the distance to the plane that contains
the line.
We now note that any plane of the form {x | p Ax = p b}, where p ∈ 2 ,
contains the line {x | Ax = b}, so we have for all p ∈ 2 ,

V (p) ≡ min 1
2
x 2
≤ V ∗.
p Ax=p x

On the other hand, by duality in the minimization of the preceding equation, we


have
1 
U (p, γ) ≡ min 2
x 2
+ γ(p Ax − p x) ≤ V (p), ∀ p ∈ 2 , γ ∈ .
x

Combining the preceding relations, it follows that

sup q(λ) = sup U (p, γ) ≤ sup U (p, 1) ≤ sup V (p) ≤ V ∗ ≤ f ∗ .


λ p,γ p p

Since there is no duality gap for the original problem, we have supλ q(λ) = f ∗ , it
follows that equality holds throughout above. Hence V ∗ = f ∗ , which was to be
proved.

6.9

We introduce artificial variables x0 , x1 , . . . , xm , and we write the problem in the


equivalent form


m

minimize fi (xi )
i=0
(6.4)
subject to xi ∈ Xi , i = 0, . . . , m xi = x0 , i = 1, . . . , m.

8
By relaxing the equality constraints, we obtain the dual function
 

m

q(λ1 , . . . , λm ) = inf fi (xi ) + λi (xi − x0 )


xi ∈Xi , i=0,...,m
i=0

  
m
 
= inf f0 (x) − (λ1 + · · · λm ) x + inf fi (x) + λi x ,
x∈X0 x∈Xi
i=1

which is of the form given in the exercise. Note that the infima above are attained
since fi are continuous (being convex functions over n ) and Xi are compact
polyhedra.
Because the primal problem involves minimization of the continuous func-
m
tion f (x) over the compact set ∩m
i=0 i i=0 Xi , a primal optimal solution exists.
Applying Prop. 6.4.2 to problem (6.4), we see that there is no duality gap and
there exists at least one geometric multiplier, which is a dual optimal solution.

6.10

Let M denote the set of geometric multipliers, i.e.,

  
M= µ ≥ 0  f ∗ = inf f (x) + µ g(x) .
x∈X

We will show that if the set M is nonempty and compact, then the Slater condi-
tion holds. Indeed, if this were not so, then 0 would not be an interior point of
the set
 
D = u | there exists some x ∈ X such that g(x) ≤ u .

By a similar argument as in the proof of Prop. 6.6.1, it can be seen that D is


convex. Therefore, we can use the Supporting Hyperplane Theorem to assert the
existence of a hyperplane that passes through 0 and contains D in its positive
halfspace, i.e., there is a nonzero vector µ such that µ u ≥ 0 for all u ∈ D. This
implies that µ ≥ 0, since for each u ∈ D, we have that (u1 , . . . , uj +γ, . . . , ur ) ∈ D
for all γ > 0 and j. Since g(x) ∈ D for all x ∈ X, it follows that

µ g(x) ≥ 0, ∀ x ∈ X.

Thus, for any µ ∈ M , we have

f (x) + (µ + γµ) g(x) ≥ f ∗ , ∀ x ∈ X, ∀ γ ≥ 0.

Hence, it follows that (µ + γµ) ∈ M for all γ ≥ 0, which contradicts the bound-
edness of M .

9
6.11 (Inconsistent Convex Systems of Inequalities)

The dual function for the problem in the hint is


 

r
 
q(µ) = inf y+ µj gj (x) − y
y∈, x∈X
j=1
 r r
inf x∈X j=1
µj gj (x) if j=1
µj = 1,
= r
−∞ if j=1
µj = 1.
The problem in the hint satisfies Assumption 6.4.2, so by Prop. 6.4.3, the dual
problem has an optimal solution µ∗ and there is no duality gap.
Clearly the problem in the hint has an optimal value that is greater or
equal to 0 if and only if the system of inequalities
gj (x) < 0, j = 1, . . . , r,
has no solution within X. Since there is no duality gap, we have
max
r
q(µ) ≥ 0
µ≥0, µj =1
j=1

if and only if the system of inequalities gj (x) < 0, j = 1, . . . , r, has no solution


within X. This is equivalent to the statement we want to prove.

6.12

Since c ∈ cone{a1 , . . . , ar } + ri(N ), there exists a vector µ ≥ 0 such that




r

− −c + µj aj ∈ ri(N ).
j=1

By Example 6.4.2, this implies that the problem


 2
minimize − c d + (aj d)+
1 r
2 j=1

subject to d ∈ N ∗ ,
has an optimal solution, which we denote by d∗ . Consider the set
  
 
r

M= µ ≥ 0  − −c + µj aj ∈N ,
j=1

which is nonempty by assumption. Let µ∗ be the vector of minimum norm in M


and let the index set J be defined by
J = {j | µ∗j > 0}.
Then, it follows from Lemma 5.3.1 that
aj d∗ > 0, ∀ j ∈ J,
and
aj d∗ ≤ 0, ∀j∈
/ J,
thus proving that the properties (1) and (2) of the exercise hold.

10
6.13 (Pareto Optimality)

(a) Assume that x∗ is not a Pareto optimal solution. Then there is a vector
x ∈ X such that either

f1 (x̄) ≤ f1 (x∗ ), f2 (x̄) < f2 (x∗ ),

or
f1 (x̄) < f1 (x∗ ), f2 (x̄) ≤ f2 (x∗ ).
Multiplying the left equation by λ∗1 , the right equation by λ∗2 , and adding the
two in either case yields

λ∗1 f1 (x̄) + λ∗2 f2 (x̄) < λ∗1 f1 (x∗ ) + λ∗2 f2 (x∗ ),

yielding a contradiction. Therefore x∗ is a Pareto optimal solution.


(b) Let
 
A = (z1 , z2 )| there exists x ∈ X such that f1 (x) ≤ z1 , f2 (x) ≤ z2 .

We first show that A is convex. Indeed, let (a1 , a2 ), and (b1 , b2 ) be elements of
A, and let (c1 , c2 ) = α(a1 , a2 ) + (1 − α)(b1 , b2 ) for any α ∈ [0, 1]. Then for some
xa ∈ X, xb ∈ X, we have f1 (xa ) ≤ a1 , f2 (xa ) ≤ a2 , f1 (xb ) ≤ b1 , and f2 (xb ) ≤ b2 .
Let xc = αxa + (1 − α)xb . Since X is convex, xc ∈ X. Since f is convex, we also
have
f1 (xc ) ≤ c1 , and f2 (xc ) ≤ c2 .
Hence, (c1 , c2 ) ∈ A and it follows
 that A is a convex set.  
For any x ∈ X, we have f1 (x), f2 (x) ∈ A. In addition, f1 (x∗ ), f2 (x∗ )
is in the boundary of A. [If this were not the case, then either (1) or (2) would
hold and x∗ would not be Pareto optimal.] Then by the Supporting Hyperplane
Theorem, there exists λ∗1 and λ∗2 , not both equal to 0, such that

λ∗1 z1 + λ∗2 z2 ≥ λ∗1 f1 (x∗ ) + λ∗2 f2 (x∗ ), ∀ (z1 , z2 ) ∈ A.

Since
 z1 andz2 can be made arbitrarily large, we must have λ∗1 , λ∗2 ≥ 0. Since
f1 (x), f2 (x) ∈ A, the above equation yields

λ∗1 f1 (x) + λ∗2 f2 (x) ≥ λ∗1 f1 (x∗ ) + λ∗2 f2 (x∗ ), ∀x ∈ X,

or, equivalently,
 
min λ∗1 f1 (x) + λ∗2 f2 (x) ≥ λ∗1 f1 (x∗ ) + λ∗2 f2 (x∗ ).
x∈X

Combining this with the fact that


 
min λ∗1 f1 (x) + λ∗2 f2 (x) ≤ λ∗1 f1 (x∗ ) + λ∗2 f2 (x∗ )
x∈X

yields the desired result.

11
(c) Generalization of (a): If x∗ is a vector in X, and λ∗1 , . . . , λ∗m are positive
scalars such that  

m

m

λ∗i fi (x∗ ) = min λ∗i fi (x) ,


x∈X
i=1 i=1

then x∗ is a Pareto optimal solution.


Generalization of (b): Assume that X is convex and f1 , . . . , fm are convex over
X. If x∗ is a Pareto optimal solution, then there exist non-negative scalars
λ∗1 , . . . , λ∗m , not all zero, such that
 

m

m

λ∗i fi (x∗ ) = min λ∗i fi (x) .


x∈X
i=1 i=1

6.14 (Polyhedral Programming)

Using Prop. 3.2.3, it follows that


 
gj (x) = max aij x + bij ,
i=1,...,m

where aij are vectors in n and bij are scalars. Hence the constraint functions
can equivalently be represented as

aij x + bij ≤ 0, ∀ i = 1, . . . , m, ∀ j = 1, . . . , r.

By assumption, the set X is a polyhedral set, and the cost function f is a poly-
hedral function, hence convex over n . Therefore, we can use the Strong Duality
Theorem for linear constraints (cf. Prop. 6.4.2) to conclude that there is no du-
ality gap and there exists at least one geometric multiplier, i.e., there exists a
nonnegative vector µ such that
 
f ∗ = inf f (x) + µ g(x) .
x∈X

Let p(u) denote the primal function for this problem. The preceding relation
implies that

 
p(0) − µ u = inf f (x) + µ g(x) − u
x∈X
 
≤ inf f (x) + µ g(x) − u
x∈X, g(x)≤u

≤ inf f (x)
x∈X, g(x)≤u

= p(u),

which, in view of the assumption that p(0) is finite, shows that p(u) > −∞ for
all u ∈ r .

12
The primal function can be obtained by partial minimization as

p(u) = infn F (x, u),


x∈

where
F (x, u) = f (x) if gj (x) ≤ uj ∀ j, x ∈ X,
∞ otherwise.
Since, by assumption f is polyhedral, the gj are polyhedral (which implies that
the level sets of the gj are polyhedral), and X is polyhedral, it follows that
F (x, u) is a polyhedral function. Since we have also shown that p(u) > −∞ for
all u ∈ r , we can use Exercise 3.13 to conclude that the primal function p is
polyhedral, and therefore also closed. Since p(0), the optimal value, is assumed
finite, it follows that p is proper.

13
Convex Analysis and
Optimization
Chapter 7 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE April 15, 2003

CHAPTER 7: SOLUTION MANUAL

7.1 (Fenchel’s Inequality)

(a) From the definition of g,


 
g(λ) = sup x λ − f (x) ,
x∈n

we have the inequality x λ ≤ f (x) + g(λ). In view of this inequality, the equality
x λ = f (x) + g(λ) of (i) is equivalent to the inequality
 
x λ − f (x) ≥ g(λ) = sup z  λ − f (z) ,
z∈n

or
x λ − f (x) ≥ z  λ − f (z), ∀ z ∈ n ,
or
f (z) ≥ f (x) + λ (z − x), ∀ z ∈ n ,
which is equivalent to (ii). Since f is closed, f is equal to the conjugate of g,
so by using the equivalence of (i) and (ii) with the roles of f and g reversed, we
obtain the equivalence of (i) and (iii).
(b) A vector x∗ minimizes f if and only if 0 ∈ ∂f (x∗ ), which by part (a), is true
if and only if x∗ ∈ ∂g(0).
(c) The result follows by combining part (b) and Prop. 4.4.2.

7.2

Let f : n → (−∞, ∞] be a proper convex function, and let g be its conjugate.


Show that the linealityspace ofg is equal to the orthogonal complement of the
subspace parallel to aff dom(f ) .

7.3

Let fi : n → (−∞, ∞], i = 1, . . . ,m, be proper  convex functions, and let


f = f1 + · · · + fm . Show that if ∩m
i=1 ri dom(f i ) is nonempty, then we have
 
g(λ) = inf g1 (λ1 ) + · · · + gm (λm ) , ∀ λ ∈ n ,
λ1 +···+λm =λ
λi ∈n , i=1,...,m

where g, g1 , . . . , gm are the conjugates of f, f1 , . . . , fm , respectively.

2
7.4 (Finiteness of the Optimal Dual Value)

Consider the function q̃ given by



q(µ) if µ ≥ 0,
q̃(µ) =
−∞ otherwise,

and note that −q̃ is closed and convex, and that by the calculation of Example
7.1.6, we have  
q̃(µ) = inf r p(u) + µ u , ∀ µ ∈ r . (1)
u∈

Since q̃(µ) ≤ p(0) for all µ ∈ r , given the feasibility of the problem [i.e.,
p(0) < ∞], we see that q ∗ is finite if and only if −q̃ is proper. From Eq. (1), −q̃
is the conjugate of p(−u), and by the Conjugacy Theorem [Prop. 7.1.1(b)], −q̃ is
proper if and only if p is proper. Hence, (i) is equivalent to (ii).
We note that the epigraph of p is the closure of M . Hence, given the
feasibility of the problem, (ii) is equivalent to the closure of M not containing a
vertical line. Since M is convex, its closure does not contain a line if and only if
M does not contain a line (since the closure and the relative interior of M have
the same recession cone). Hence (ii) is equivalent to (iii).

7.5 (General Perturbations and Min Common/Max Crossing


Duality)

(a) We have  
h(λ) = sup λ u − p(u)
u
 
= sup λ u − inf F (x, u)
u x
 

= sup λ u − F (x, u)
x,u

= G(0, λ).
Also
q(λ) = inf {w + λ u}
(u,w)∈M
 
= inf F (x, u) + λ u
x,u
 
= − sup −λ u − F (x, u)
x,u

= −G(0, −λ).
Consider the constrained minimization propblem of Example 7.1.6:

minimize f (x)
subject to x ∈ X, g(x) ≤ 0,

and define 
F (x, u) = f (x) if x ∈ X and g(x) ≤ u,
∞ otherwise.

3
Then p is the primal function of the constrained minimization problem. Consider
now q(λ), the cost function of the max crossing problem corresponding to M . For
λ ≥ 0, q(λ) is equal to the dual function value of the constrained optimization
problem, and otherwise q(λ) is equal to −∞. Thus, the relations h(λ) = G(0, λ)
and q(λ) = −G(0, −λ) proved earlier, show the relation proved in Example 7.1.6,
i.e., that q(λ) = −h(−λ).
(b) Let  
M = (u, w) | there is an x such that F (x, u) ≤ w .
Then the corresponding min common value is
inf w = inf F (x, 0) = p(0).
{(x,w) | F (x,0)≤w} x

Since p(0) is the min common value corresponding to epi(p), the min common
values corresponding to the two choises for M are equal. Similarly, we show that
the cost functions of the max crossing problem corresponding to the two choises
for M are equal.
(c) If F (x, u) = f1 (x) − f2 (Qx + u), we have
 
p(u) = inf f1 (x) − f2 (Qx + u) ,
x

so p(0), the min common value, is equal to the primal optimal value in the Fenchel
duality framework. By part (a), the max crossing value is
 
q ∗ = sup −h(−λ) ,
λ

where h is the conjugate of p. By using the change of variables z = Qx + u in


the following calculation, we have
  
−h(−λ) = − sup −λ u − inf f1 (x) − f2 (Qx + u)
u x
 

= − sup −λ (z − Qx) − f1 (x) + f2 (z)
z,x

= g2 (λ) − g1 (Qλ),
where g1 and g2 are the conjugate convex and conjugate concave functions of f1
and f2 , respectively:
   
g1 (λ) = sup x λ − f1 (x) , g2 (λ) = inf z  λ − f2 (z) .
x z

Thus, no duality
 gap in the min common/max crossing framework [i.e., p(0) =
q ∗ = supλ −h(−λ) ] is equivalent to no duality gap in the Fenchel duality
framework.
The minimax framework of Section 2.6.1 (using the notation of that section)
is obtained for  
F (x, u) = sup φ(x, z) − u z .
z∈Z

The constrained optimization framework of Section 6.1 (using the notation of


that section) is obtained for the function

f (x) if x ∈ X, h(x) = u1 , g(x) ≤ u2 ,
F (x, u) =
∞ otherwise,
where u = (u1 , u2 ).

4
7.6

By Exercise 1.35,
cl f1 + cl (−f2 ) = cl (f1 − f2 ).

Furthermore,
 
infn cl (f1 − f2 )(x) = infx∈n f1 (x) − f2 (x) .
x∈

Thus, we may replace f1 and −f2 with their closures, and the result follows by
applying Minimax Theorem III.

7.7 (Monotropic Programming Duality)

We apply Fenchel duality with


 n
fi (xi ) if x ∈ X1 × · · · × Xn ,
f1 (x) = i=1
∞ otherwise,

and 
0 if x ∈ S,
f2 (x) =
−∞ otherwise.
The corresponding conjugate concave and convex functions g2 and g1 are

0 if λ ∈ S ⊥ ,
inf λ x =
x∈S −∞ / S⊥,
if λ ∈

where S ⊥ is the orthogonal subspace of S, and


 n
  n

sup xi λi − fi (xi ) = gi (λi ),


xi ∈Xi
i=1 i=1

where for each i,  


gi (λi ) = sup xi λi − fi (xi ) .
xi ∈Xi

By the Primal Fenchel Duality Theorem (Prop. 7.2.1), the dual problem has an
optimal solution and there is no duality gap if the functions fi are convex over
Xi and one of the following two conditions holds:
(1) The subspace S contains a point in the relative interior of X1 × · · · × Xn .
(2) The intervals Xi are closed (so that the Cartesian product X1 × · · · × Xn is
a polyhedral set) and the functions fi are convex over the entire real line.
These conditions correspond to the two conditions for no duality gap given fol-
lowing Prop. 7.2.1.

5
7.8 (Network Optimization and Kirchhoff ’s Laws)

This problem is a monotropic programming problem, as considered in Exercise


7.7. For each (i, j) ∈ A, the function fij (xij ) = 12 Rij x2ij − tij xij is continuously
differentiable and convex over . The dual problem is

maximize q(v)
subject to no constraints on p,

with the dual function q given by

q(v) = qij (vi − vj ),


(i,j)∈A

where 
1
qij (vi − vj ) = min Rij x2ij − (vi − vj + tij )xij .
xij ∈ 2

Since the primal cost functions fij are real-valued and convex over the entire real
line, there is no duality gap. The necessary and sufficient conditions for a set of
variables {xij | (i, j) ∈ A} and {vi | i ∈ N } to be an optimal solution-Lagrange
multiplier pair are:
(1) The set of variables {xij | (i, j) ∈ A} must be primal feasible, i.e., Kirch-
hoff’s current law must be satisfied.
(2)

1 2
xij ∈ arg min Rij yij − (vi − vj + tij )yij , ∀ (i, j) ∈ A,
yij ∈ 2

which is equivalent to Ohm’s law:

Rij xij − (vi − vj + tij ) = 0, ∀ (i, j) ∈ A.

Hence a set of variables {xij | (i, j) ∈ A} and {vi | i ∈ N } are an optimal


solution-Lagrange multiplier pair if and only if they satisfy Kirchhoff’s current
law and Ohm’s law.

7.9 (Symmetry of Duality)

(a) We have f ∗ = p(0). Since p(u) is monotonically nonincreasing, its minimal


value over u ∈ P and u ≤ 0 is attained for u = 0. Hence, f ∗ = p∗ , where
p∗ = inf u∈P, u≤0 p(u). For µ ≥ 0, we have
   
inf f (x) + µ g(x) = inf inf f (x) + µ g(x)
x∈X u∈P x∈X, g(x)≤u
 
= inf p(u) + µ u .
u∈P

6
 
Since f ∗ = p∗ , we see that f ∗ = inf x∈X f (x) + µ g(x) if and only if p∗ =
 
inf u∈P p(u) + µ u . In other words, the two problems have the same geometric
multipliers.
(b) This part was proved by the preceding argument.
(c) From Example 7.1.6, we have that −q(−µ) is the conjugate convex function
of p. Let us view the dual problem as the minimization problem

minimize − q(−µ)
(1)
subject to µ ≤ 0.

Its dual problem is obtained by forming the conjugate convex function of its
primal function, which is p, based on the analysis of Example 7.1.6, and the
closedness and convexity of p. Hence the dual of the dual problem (1) is

maximize − p(u)
subject to u ≤ 0

and the optimal solutions to this problem are the geometric multipliers to problem
(1).

7.10 (Second-Order Cone Programming)

(a) Define
 
X = (x, u, t) | x ∈ n , uj = Aj x + bj , tj = ej x + dj , j = 1, . . . , r ,

 
C = (x, u, t) | x ∈ n , uj ≤ tj , j = 1, . . . , r .

It can be seen that X is convex and C is a cone. Therefore the modified problem
can be written as
minimize f (x)
subject to x ∈ X ∩ C,

and is a cone programming problem of the type described in Section 7.2.2.


(b) Let (λ, z, w) ∈ Ĉ, where Ĉ is the dual cone (Ĉ = −C ∗ , where C ∗ is the polar
cone). Then we have

r r

λ x + zj uj + wj tj ≥ 0, ∀ (x, u, t) ∈ C.
j=1 j=1

Since x is unconstrained, we must have λ = 0 for otherwise the above inequality


will be violated. Furthermore, it can be seen that
 
Ĉ = (0, z, w) | zj ≤ wj , j = 1, . . . , r .

7
By the conic duality theory of Section 7.2.2, the dual problem is given by
r

minimize (zj bj + wj dj )
j=1
r

subject to (Aj zj + wj ej ) = c, zj ≤ wj , j = 1, . . . , r.
j=1

If there exists a feasible solution of the modified primal problem satisfying strictly
all the inequality constraints, then the relative interior condition ri(X)∩ri(C) = Ø
is satisfied, and there is no duality gap. Similarly, if there exists a feasible solution
of the dual problem satisfying strictly all the inequality constraints, there is no
duality gap.

7.11 (Quadratically Constrained Quadratic Problems [LVB98])

Since each Pi is symmetric and positive definite, we have


 
−1/2
x Pi x + 2qi x + ri = Pi
1/2 1/2 1/2
x Pi x + 2 Pi qi Pi x + ri
−1/2
+ ri − qi Pi−1 qi ,
1/2 2
= Pi x + Pi qi

for i = 0, 1, . . . , p. This allows us to write the original problem as


−1/2
+ r0 − q0 P0−1 q0
1/2 2
minimize P0 x + P0 q0
−1/2
+ ri − qi Pi−1 qi ≤ 0, i = 1, . . . , p.
1/2 2
subject to Pi x + Pi qi

By introducing a new variable xn+1 , this problem can be formulated in n+1 as

minimize xn+1
1/2 −1/2
subject to P0 x + P0 q0 ≤ xn+1
−1/2
 1/2
qi Pi−1 qi
1/2
Pi x + Pi qi ≤ − ri , i = 1, . . . , p.

The optimal values of this problem and the original problem are equal up
to a constant and a square root. The above problem is of the type described
1/2 −1/2
in Exercise 7.10. To see this, define Ai = Pi | 0 , bi = Pi qi , ei = 0,
 1/2 −1/2
di = qi Pi−1 qi − ri
1/2
for i = 1, . . . , p, A0 = P0 | 0 , b 0 = P0 q0 , e0 =
(0, . . . , 0, 1), d0 = 0, and c = (0, . . . , 0, 1). Its dual is given by
p
−1/2
 1/2 −1/2
maximize − qi Pi zi + qi Pi−1 qi − ri wi − q0 P0 z0
i=1
p
1/2
subject to Pi zi = 0, z0 ≤ 1, zi ≤ wi , i = 1, . . . , p.
i=0

8
7.12 (Minimizing the Sum or the Maximum of Norms [LVB98])

Consider the problem


p

minimize ||Fi x + gi ||
i=1

subject to x ∈ n .

By introducing variables t1 , . . . , tp , this problem can be expressed as a second-


order cone programming problem (see Exercise 7.10):

minimize ti
i=1

subject to ||Fi x + gi || ≤ ti , i = 1, . . . , p.

Define

X = {(x, u, t) | x ∈ n , ui = Fi x + gi , ti ∈ , i = 1, . . . , p},

C = {(x, u, t) | x ∈ n , ||ui || ≤ ti , i = 1, . . . , p}.

Then, similar to Exercise 7.10, we have

−C ∗ = {(0, z, w) | ||zi || ≤ wi , i = 1, . . . , p},

and
 p p p

g(0, z, w) = sup zi ui + wi ti − ti


(x,u,t)∈X
i=1 i=1 i=1
 p p

= sup zi (Fi x + gi ) + (wi − 1)ti


x∈n ,t∈p
i=1 i=1
 p
  p p

= sup Fi zi x + sup (wi − 1)ti + gi zi


x∈n t∈p
i=1 i=1 i=1
 p p
g z
i=1 i i
if F z
i=1 i i
= 0, wi = 1, i = 1, . . . , p
=
+∞ otherwise.

Hence the dual problem is given by

maximize − gi zi
i=1
p

subject to Fi zi = 0, ||zi || ≤ 1, i = 1, . . . , p.


i=1

9
Now, consider the problem

minimize max ||Fi x + gi ||


1≤i≤p

subject to x ∈ n .

By introducing a new variable xn+1 , we obtain

minimize xn+1
subject to ||Fi x + gi || ≤ xn+1 , i = 1, . . . , p,

or equivalently

minimize en+1 x
subject to ||Ai x + gi || ≤ en+1 x, i = 1, . . . , p,

where x ∈ n+1 , Ai = (Fi , 0), and en+1 = (0, . . . , 0, 1) ∈ n+1 . Evidently, this
is a second-order cone programming problem. From Exercise 7.10 we have that
its dual problem is given by
p

maximize − gi zi
i=1
p   
Fi
subject to zi + en+1 wi = en+1 , ||zi || ≤ wi , i = 1, . . . , p,
0
i=1

or equivalently
p

maximize − gi zi
i=1
p p

subject to Fi zi = 0, wi = 1, ||zi || ≤ wi , i = 1, . . . , p.


i=1 i=1

7.13 (Complex l1 and l∞ Approximation [LVB98])

For v ∈ C p we have
  
p p
 Re(vi ) 
v = |vi | =  
1  Im(vi )  ,
i=1 i=1

where Re(vi ) and Im(vi ) denote the real and the imaginary parts of vi , respec-
tively. Then the complex l1 approximation problem is equivalent to
  
p
 Re(ai x − bi ) 
minimize  
 Im(ai x − bi )  (1)
i=1
n
subject to x ∈ C ,

10
where ai is the i-th row of A (A is a p × n matrix). Note that
      
Re(ai x − bi ) Re(ai ) −Im(ai ) Re(x) Re(bi )
= − .
Im(ai x − bi ) Im(ai ) Re(ai ) Im(x) Im(bi )
By introducing new variables y = (Re(x ), Im(x )) , problem (1) can be rewritten
as
p

minimize Fi y + g i
i=1

subject to y ∈ 2n ,
where    
Re(ai ) −Im(ai ) Re(bi )
Fi = , gi = − . (2)
Im(ai ) Re(ai ) Im(bi )
According to Exercise 7.12, the dual problem is given by
p
 
maximize Re(bi ), Im(bi ) zi
i=1
p  
Re(ai ) Im(ai )
subject to zi = 0, zi ≤ 1, i = 1, . . . , p,
−Im(ai ) Re(ai )
i=1

where zi ∈ 2n for all i.


For v ∈ C p we have
 
 Re(vi ) 
v = max |vi | = max   .
Im(vi ) 

1≤i≤p 1≤i≤p

Therefore the complex l∞ approximation problem is equivalent to


  
 Re(ai x − bi ) 
minimize max  
1≤i≤p Im(ai x − bi ) 
subject to x ∈ C n .
By introducing new variables y = (Re(x ), Im(x )) , this problem can be rewrit-
ten as
minimize max Fi y + gi
1≤i≤p

subject to y ∈ 2n ,
where Fi and gi are given by Eq. (2). From Exercise 7.12, it follows that the dual
problem is
p
 
maximize Re(bi ), Im(bi ) zi
i=1
p   p
Re(ai ) −Im(ai )
subject to zi = 0, wi = 1, zi ≤ wi ,
Im(ai ) Re(ai )
i=1 i=1

i = 1, . . . , p,
where zi ∈ 2 for all i.

11
7.14

The condition u µ∗ ≤ P (u) for all u ∈ r can be written as

r r

uj µ∗j ≤ Pj (uj ), ∀ u = (u1 , . . . , ur ),


j=1 j=1

and is equivalent to

uj µ∗j ≤ Pj (uj ), ∀ uj ∈ , ∀ j = 1, . . . , r.

In view of the requirement that Pj is convex with Pj (uj ) = 0 for uj ≤ 0, and


Pj (uj ) > 0 for all uj > 0, it follows that the condition uj µ∗j ≤ Pj (uj ) for
 
all uj ∈ , is equivalent to µ∗j ≤ limzj ↓0 Pj (zj )/zj . Similarly, the condition
 
uj µ∗j < Pj (uj ) for all uj ∈ , is equivalent to µ∗j < limzj ↓0 Pj (zj )/zj .

7.15 [Ber99b]

Following [Ber99b], we address the problem by embedding it in a broader class


of problems. Let Y be a subset of n , let y be a parameter vector taking values
in Y , and consider the parametric program

minimize f (x, y)
(1)
subject to x ∈ X, gj (x, y) ≤ 0, j = 1, . . . , r,

where X is a convex subset of n , and for each y ∈ Y , f (·, y) and gj (·, y) are
real-valued functions that are convex over X. We assume that for each y ∈ Y ,
this program has a finite optimal value, denoted by f ∗ (y). Let c > 0 denote a
penalty parameter and assume that the penalized problem
 
minimize f (x, y) + cg + (x, y)
(2)
subject to x ∈ X

has a finite optimal value, thereby coming under the framework of Section 7.3.
By Prop. 7.3.1, we have
  
f ∗ (y) = inf f (x, y) + cg + (x, y) , ∀ y ∈ Y, (3)
x∈X

if and only if
u µ∗ (y) ≤ c u+ , ∀ u ∈ r , ∀ y ∈ Y,

for some geometric multiplier µ∗ (y).


It is seen that Eq. (3) is equivalent to the bound
 
f ∗ (y) ≤ f (x, y) + cg + (x, y), ∀ x ∈ X, ∀ y ∈ Y, (4)

12
so this bound holds if and only if there exists a uniform bounding constant c > 0
such that
u µ∗ (y) ≤ c u+ , ∀ u ∈ r , ∀ y ∈ Y. (5)
Thus the bound (4), holds if and only if for every y ∈ Y , it is possible to select
a geometric multiplier
  µ∗ (y) of the parametric problem (1) such that the set

µ (y) | y ∈ Y is bounded.
Let us now specialize the preceding discussion to the parametric program

minimize f (x, y) = y − x
(6)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,

where · is the Euclidean norm, X is a convex subset of n , and gj are convex


over X. This is the projection problem of the exercise. Let us take Y = X. If c
satisfies Eq. (5), the bound (4) becomes
 + 
d(y) ≤ y − x + c g(x) , ∀ x ∈ X, ∀ y ∈ X,

and (by taking x = y) implies the bound


 + 
d(y) ≤ c g(y) , ∀ y ∈ X. (7)

This bound holds if a geometric multiplier µ∗ (y) of the projection problem (6)
can be found such that Eq. (5) holds. We will now show the reverse assertion.
Indeed, assume that for some c, Eq. (7) holds, and to arrive at a contra-
diction, assume that there exist x ∈ X and y ∈ Y such that
 + 
d(y) > y − x + c g(x) .

Then, using Eq. (7), we obtain

d(y) > y − x + d(x).

From this relation and the triangle inequality, it follows that

inf y−z > y−x + inf x−z


z∈X, g(z)≤0 z∈X, g(z)≤0
 
= inf y−x + x−z
z∈X, g(z)≤0

≥ inf y−z ,
z∈X, g(z)≤0

which is a contradiction. Thus Eq. (7) implies that we have


 + 
d(y) ≤ y − x + c g(x) , ∀ x ∈ X, ∀ y ∈ X.

Using Prop. 7.3.1, this implies that there exists a geometric multiplier µ∗ (y) such
that
u µ∗ (y) ≤ c u+ , ∀ u ∈ r , ∀ y ∈ X.
 
This in turn implies the boundedness of the set µ∗ (y) | y ∈ X .

13
Convex Analysis and
Optimization
Chapter 8 Solutions

Dimitri P. Bertsekas

with

Angelia Nedić and Asuman E. Ozdaglar

Massachusetts Institute of Technology

Athena Scientific, Belmont, Massachusetts


http://www.athenasc.com
LAST UPDATE April 15, 2003

CHAPTER 8: SOLUTION MANUAL

8.1

To obtain a contradiction, assume that q is differentiable at some dual optimal


solution µ∗ ∈ M , where M = {µ ∈ r | µ ≥ 0}. Then by the optimality theory
of Section 4.7 (cf. Prop. 4.7.2, concave function q), we have

∇q(µ∗ )(µ∗ − µ) ≥ 0, ∀ µ ≥ 0.

If µ∗j = 0, then by letting µ = µ∗ + γej for a scalar γ ≥ 0, and the vector ej whose
jth component is 1 and the other components are 0, from the preceding relation
we obtain ∂q(µ∗ )/∂µj ≤ 0. Similarly, if µ∗j > 0, then by letting µ = µ∗ + γej
for a sufficiently small scalar γ (small enough so that µ∗ + γej ∈ M ), from the
preceding relation we obtain ∂q(µ∗ )/∂µj = 0. Hence

∂q(µ∗ )/∂µj ≤ 0, ∀ j = 1, . . . , r,

µ∗j ∂q(µ∗ )/∂µj = 0, ∀ j = 1, . . . , r.



Since q is differentiable at µ , we have that

∇q(µ∗ ) = g(x∗ ),

for some vector x∗ ∈ X such that q(µ∗ ) = L(x∗ , µ∗ ). This and the preceding
two relations imply that x∗ and µ∗ satisfy the necessary and sufficient optimality
conditions for an optimal solution-geometric multiplier pair (cf. Prop. 6.2.5). It
follows that there is no duality gap, a contradiction.

8.2 (Sharpness of the Error Tolerance Estimate)

Consider the incremental subgradient method with the stepsize α and the starting
point x = (αM C0 , αM C0 ), and the following component processing order:
M components of the form |x1 | [endpoint is (0, αM C0 )],
M components of the form |x1 + 1| [endpoint is (−αM C0 , αM C0 )],
M components of the form |x2 | [endpoint is (−αM C0 , 0)],
M components of the form |x2 + 1| [endpoint is (−αM C0 , −αM C0 )],
M components of the form |x1 | [endpoint is (0, −αM C0 )],
M components of the form |x1 − 1| [endpoint is (αM C0 , −αM C0 )],

2
M components of the form |x2 | [endpoint is (αM C0 , 0)], and
M components of the form |x2 − 1| [endpoint is (αM C0 , αM C0 )].
With this processing order, the method returns to x at the end of a cycle.
Furthermore, the smallest function value within the cycle is attained at points
(±αM C0 , 0) and (0, ±αM C0 ), and is equal to 4M C0 + 2αM 2 C02 . The optimal
function value is f ∗ = 4M C0 , so that

lim inf f (ψi,k ) ≥ f ∗ + 2αM 2 C02 .


k→∞

Since m = 8M and mC0 = C, we have M 2 C02 = C 2 /64, implying that

1 αC 2
2αM 2 C02 = ,
16 2
and therefore
βαC 2
lim inf f (ψi,k ) ≥ f ∗ + ,
k→∞ 2
with β = 1/16.

8.3 (A Variation of the Subgradient Method [CFM75])

At first, by induction, we show that

(µ∗ − µk ) dk ≥ (µ∗ − µk ) g k . (8.0)

Since d0 = g 0 , the preceding relation obviously holds for k = 0. Assume now


that this relation holds for k − 1. By using the definition of dk ,

dk = g k + β k dk−1 ,

we obtain
(µ∗ − µk ) dk = (µ∗ − µk ) g k + β k (µ∗ − µk ) dk−1 . (8.1)
We further have

(µ∗ − µk ) dk−1 = (µ∗ − µk−1 ) dk−1 + (µk−1 − µk ) dk−1


≥ (µ∗ − µk−1 ) dk−1 − µk−1 − µk dk−1 .

By the induction hypothesis, we have that

(µ∗ − µk−1 ) dk−1 ≥ (µ∗ − µk−1 ) g k−1 ,

while by the subgradient inequality, we have that

(µ∗ − µk−1 ) g k−1 ≥ q(µ∗ ) − q(µk−1 ).

Combining the preceding three relations, we obtain

(µ∗ − µk ) dk−1 ≥ q(µ∗ ) − q(µk−1 ) − µk−1 − µk dk−1 .

3
Since
µk−1 − µk  ≤ sk−1 dk−1 ,
it follows that

(µ∗ − µk ) dk−1 ≥ q(µ∗ ) − q(µk−1 ) − sk−1 dk−1 2 .


 
Finally, because 0 < sk−1 ≤ q(µ∗ ) − q(µk−1 ) /dk−1 2 , we see that

(µ∗ − µk ) dk−1 ≥ 0. (8.2)

Since β k ≥ 0, the preceding relation and equation (8.1) imply that

(µ∗ − µk ) dk ≥ (µ∗ − µk ) g k .

Assuming µk = µ∗ , we next show that

µ∗ − µk+1  < µ∗ − µk , ∀ k.

Similar to the proof of Prop. 8.2.1, it can be seen that this relation holds for k = 0.
For k > 0, by using the nonexpansive property of the projection operation, we
obtain

µ∗ − µk+1 2 ≤ µ∗ − µk − sk dk 2


= µ∗ − µk 2 − 2sk (µ∗ − µk ) dk + (sk )2 dk 2 .

By using equation (8.1) and the subgradient inequality,

(µ∗ − µk ) g k ≥ q(µ∗ ) − q(µk ),

we further obtain

µ∗ − µk+1 2 ≤ µk − µ∗ 2 − 2sk (µ∗ − µk ) g k + (sk )2 dk 2


 
≤ µk − µ∗ 2 − 2sk q(µ∗ ) − q(µk ) + (sk )2 dk 2 .
 
Since 0 < sk ≤ q(µ∗ ) − q(µk ) /dk 2 , it follows that
   
−2sk q(µ∗ ) − q(µk ) + (sk )2 dk 2 ≤ −sk q(µ∗ ) − q(µk ) < 0,

implying that
µ∗ − µk+1 2 < µk − µ∗ 2 .
We next prove that

(µ∗ − µk ) dk (µ∗ − µk ) g k
≥ .
dk  g k 

It suffices to show that


dk  ≤ g k ,

4

since this inequality and Eq. (8.0) imply the desired relation. If g k dk−1 ≥ 0,
then by the definition of dk and β k , we have that dk = g k , and we are done, so

assume that g k dk−1 < 0. We then have


dk 2 = g k 2 + 2β k g k dk−1 + (β k )2 dk−1 2 .


Since β k = −γg k dk−1 /dk−1 2 , it follows that

   
2β k g k dk−1 + (β k )2 dk−1 2 = 2β k g k dk−1 − γβ k g k dk−1 = (2 − γ)β k g k dk−1 .


Furthermore, since g k dk−1 < 0, β k ≥ 0, and γ ∈ [0, 2], we see that


2β k g k dk−1 + (β k )2 dk−1 2 ≤ 0,

implying that
dk 2 ≤ g k 2 .

8.4 (Subgradient Randomization for Stochastic Programming)

The stochastic programming problem of Example 8.2.2 can be written in the


following form

m
 
minimize πi f0 (x) + fi (x)
i=1

subject to x ∈ X,

where
fi (x) = max (bi − Ax) λi , i = 1, . . . , m,
B  λi ≤di
i

and the outcome i occurs with probability πi . Assume that for each outcome
i ∈ {1, ..., m} and each vector x ∈ n , the maximum in the expression for fi (x)
is attained at some λi (x). Then, the vector A λi (x) is a subgradient of fi at x.
One possible form of the randomized incremental subgradient method is
 
xk+1 = PX xk − αk (gk + A λkωk ) ,

where gk is a subgradient of f0 at xk , λkωk = λωk (xk ), and the random variable


ωk takes value i from the set {1, . . . , m} with probability πi . The convergence
analysis of Section 8.2.2 goes through in its entirety for this method, with only
some adjustments in various bounding constants.
In an alternative method, we could use as components the m + 1 func-
tions f0 , f1 , . . . , fm , with f0 chosen with probability 1/2 and each component fi ,
1, . . . , m, chosen with probability πi /2.

5
8.5

Consider the cutting plane method applied to the following one-dimensional prob-
lem
maximize q(µ) = −µ2 ,
subject to µ ∈ [0, 1].
Suppose that the method is started at µ0 = 0, so that the initial polyhedral ap-
proximation is Q1 (µ) = 0 for all µ. Suppose also that in all subsequent iterations,
when maximizing Qk (µ), k = 0, 1, ..., over [0, 1], we choose µk to be the largest of
all maximizers of Qk (µ) over [0, 1]. We will show by induction that in this case,
we have µk = 1/2k−1 for k = 1, 2, ....
Since Q1 (µ) = 0 for all µ, the set of maximizers of Q1 (µ) = 0 over [0, 1] is
the entire interval [0, 1], so that the largest maximizer is µ1 = 1. Suppose now
that µi = 1/2i−1 for i = 1, ..., k. Then

Qk+1 (µ) = min{l0 (µ), l1 (µ), ..., lk (µ)},

where l0 (µ) = 0 and

li (µ) = q(µi ) + ∇q(µi ) (µ − µi ) = −2µi µ + (µi )2 , i = 1, ..., k.

The maximum value of Qk+1 (µ) over [0, 1] is 0 and it is attained at any point
in the interval [0, µk /2]. By the induction hypothesis, we have µk = 1/2k−1 ,
implying that the largest maximizer of Qk+1 (µ) over [0, 1] is µk+1 = 1/2k .
Hence, in this case, the cutting plane method generates an infinite sequence
{µk } converging to the optimal solution µ∗ = 0, thus showing that the method
need not terminate finitely even if it starts at an optimal solution.

8.6 (Approximate Subgradient Method)

(a) We have for all µ ∈ r


 
q(µ) = inf f (x) + µ g(x)
x∈X

≤ f (xk ) + µ g(xk )

= f (xk ) + µk g(xk ) + g(xk ) (µ − µk )
= q(µk ) + + g(xk ) (µ − µk ),

where the last inequality follows from the equation

L(xk , µk ) ≤ inf L(x, µk ) + .


x∈X

Thus g(xk ) is an -subgradient of q at µk .


(b) For all µ ∈ M , by using the nonexpansive property of the projection, we have

µk+1 − µ2 ≤ µk + sk g k − µ2



≤ µk − µ2 − 2sk g k (µ − µk ) + (sk )2 g k 2 ,

6
where
q(µ∗ ) − q(µk )
sk = ,
g k 2

and g k ∈ ∂ q(µk ). From this relation and the definition of an -subgradient we


obtain
 
µk+1 − µ2 ≤ µk − µ2 − 2sk q(µ) − q(µk ) − + (sk )2 g k 2 , ∀ µ ∈ M.

Let µ∗ be an optimal solution. Substituting the expression for sk and taking


µ = µ∗ in the above inequality, we have

q(µ∗ ) − q(µk )  
µk+1 − µ∗ 2 ≤ µk − µ∗ 2 − q(µ∗ ) − q(µk ) − 2 .
g 
k 2

Thus, if q(µk ) < q(µ∗ ) − 2 , we obtain

µk+1 − µ∗  ≤ µk − µ∗ .

You might also like