CntrlEngg (Optimization) ConvexAnalysisAndOptimization Solutions DimitriBertsekas
CntrlEngg (Optimization) ConvexAnalysisAndOptimization Solutions DimitriBertsekas
p. 3 (+22) Change “as the union of the closures of all line segments” to
“as the closure of the union of all line segments”
p. 37 (-2) Change “Every x” to “Every x = 0”
p. 38 (+1) Change “Every x in” to “Every x ∈
/ X that belongs to”
p. 38 (+19) Change “i.e.,” to “with x1 , . . . , xm ∈ n and m ≥ 2, i.e.,”
p. 63 (+4, +6, +7, +19) Change four times “c y” to “a y”
p. 67 (+3) Change “y ∈ AC” to “y ∈ AC”
p. 70 (+9) Change “[BeN02]” to “[NeB02]”
p. 110 (+3 after the figure caption) Change “... does not belong to
the interior of C” to “... does not belong to the interior of C and hence
does not belong to the interior of cl(C) [cf. Prop. 1.4.3(b)]”
p. 148 (-8) Change “ x | r(x) ≤ γ ” to “ z | r(z) ≤ γ ”
p. 213 (-6) Change “remaining vectors vj , j = i.” to “vectors vj with
vj = vi .”
p. 219 (+3) Change “fi : C → ” to “fi : n → ”
p. 265 (+10) Change “d/d” to “−d/d”
p. 268 (-3) Change “j ∈ A(x∗ )” to “j ∈
/ A(x∗ )”
p. 338 (+17) Change “Section 5.2” to “Section 5.3”
p. 384 (+6) Change “convex, possibly nonsmooth functions” to “smooth
functions, and convex (possibly nonsmooth) functions”
p. 446 (+6 and +8) Interchange “... constrained problem (7.16)” and
“... penalized problem (7.19)”
p. 458 (+13) Change “... as well real-valued” to “... as well as real-
valued”
p. 458 (-10) Change “We will focus on this ... dual functions.” to “In
this case, the dual problem can be solved using gradient-like algorithms for
differentiable optimization (see e.g., Bertsekas [Ber99a]).”
1
Convex Analysis and
Optimization
Chapter 1 Solutions
Dimitri P. Bertsekas
with
1.1
(a) Let x ∈ ∩i∈I Ci and let α be a positive scalar. Since x ∈ Ci for all i ∈ I and
each Ci is a cone, the vector αx belongs to Ci for all i ∈ I. Hence, αx ∈ ∩i∈I Ci ,
showing that ∩i∈I Ci is a cone.
(b) Let x ∈ C1 × C2 and let α be a positive scalar. Then x = (x1 , x2 ) for some
x1 ∈ C1 and x2 ∈ C2 , and since C1 and C2 are cones, it follows that αx1 ∈ C1
and αx2 ∈ C2 . Hence, αx = (αx1 , αx2 ) ∈ C1 × C2 , showing that C1 × C2 is a
cone.
(c) Let x ∈ C1 + C2 and let α be a positive scalar. Then, x = x1 + x2 for some
x1 ∈ C1 and x2 ∈ C2 , and since C1 and C2 are cones, αx1 ∈ C1 and αx2 ∈ C2 .
Hence, αx = αx1 + αx2 ∈ C1 + C2 , showing that C1 + C2 is a cone.
(d) Let x ∈ cl(C) and let α be a positive scalar. Then, there exists a sequence
{xk } ⊂ C such that xk → x, and since C is a cone, αxk ∈ C for all k. Further-
more, αxk → αx, implying that αx ∈ cl(C). Hence, cl(C) is a cone.
(e) First we prove that A·C is a cone, where A is a linear transformation and A·C
is the image of C under A. Let z ∈ A · C and let α be a positive scalar. Then,
Ax = z for some x ∈ C, and since C is a cone, αx ∈ C. Because A(αx) = αz,
the vector αz is in A · C, showing that A · C is a cone.
Next we prove that the inverse image A−1 · C of C under A is a cone. Let
−1
x ∈ A · C and let α be a positive scalar. Then Ax ∈ C, and since C is a cone,
αAx ∈ C. Thus, the vector A(αx) is in C, implying that αx ∈ A−1 · C, and
showing that A−1 · C is a cone.
2
1.3 (Lower Semicontinuity under Composition)
which together with the fact {xk }K → x and the lower semicontinuity of f yields
showing that f (xk ) K → f (x). By our choice of the sequence {xk }K and by
lower semicontinuity of g, it follows that
lim g f (xk ) = lim inf g f (xk ) ≥ g f (x) ,
k→∞, k∈K k→∞, k∈K
3
1.4 (Convexity under Composition)
(a) It can be seen that f1 is twice continuously differentiable over X and its
Hessian matrix is given by
⎡ 1−n 1
··· 1 ⎤
x2 x1 x2 x1 xn
⎢ ⎥
1
⎢
1 1−n
x2
··· 1
⎥
f1 (x) ⎢
x2 x1 x2 xn
⎥
∇2 f1 (x) = ⎢
2
⎥
n2 ..
⎣ . ⎦
1
xn x1
1
x1 x2
··· 1−n
x2
n
for all x = (x1 , . . . , xn ) ∈ X. From this, direct computation shows that for all
z = (z1 , . . . , zn ) ∈ n and x = (x1 , . . . , xn ) ∈ X, we have
2 n
n
2 f1 (x) zi zi 2
z ∇ f1 (x)z = −n .
n2 xi xi
i=1 i=1
4
Note that this quadratic form is nonnegative for all z ∈ n and x ∈ X, since
f1 (x) < 0, and for any real numbers α1 , . . . , αn , we have
(α1 + · · · + αn )2 ≤ n(α12 + · · · + αn
2
),
in view of the fact that 2αj αk ≤ αj2 + αk2 . Hence, ∇2 f1 (x) is positive semidefinite
for all x ∈ X, and it follows from Prop. 1.2.6(a) that f1 is convex.
(b) We show that the Hessian of f2 is positive semidefinite at all x ∈ n . Let
β(x) = ex1 + · · · + exn . Then a straightforward calculation yields
1 (xi +xj )
n n
z ∇2 f2 (x)z = e (zi − zj )2 ≥ 0, ∀ z ∈ n .
β(x)2
i=1 j=1
is convex over .
n
(d) The function f5 (x) = αf (x) + β can be viewed as a composition g f (x) of
the function g(t) = αt + β, where t ∈ , and the function f (x) for x ∈ n . In this
case, g is convex and monotonically increasing over (since α ≥ 0), while f is
convex over n . Using Exercise 1.4, it follows that the function f5 (x) = αf (x)+β
is convex over n .
(e) The function f6 (x) = eβx Ax can be viewed as a composition g f (x) of the
function g(t) = eβt for t ∈ and the function f (x) = x Ax for x ∈ n . In this
case, g is convex and monotonically increasing over , while f is convex over n
(since A is positive semidefinite). Using Exercise 1.4, it follows that the function
f6 (x) = eβx Ax is convex over n .
(f) This part is straightforward using the definition of a convex function.
(a) Let x1 , x2 , x3 be three scalars such that x1 < x2 < x3 . Then we can write x2
as a convex combination of x1 and x3 as follows
x3 − x2 x2 − x1
x2 = x1 + x3 ,
x3 − x1 x3 − x1
5
so that by convexity of f , we obtain
x3 − x2 x2 − x1
f (x2 ) ≤ f (x1 ) + f (x3 ).
x3 − x1 x3 − x1
imply that
x3 − x2 x2 − x1
f (x2 ) − f (x1 ) ≤ f (x3 ) − f (x2 ) .
x3 − x1 x3 − x1
(b) Let {xk } be an increasing scalar sequence, i.e., x1 < x2 < x3 < · · · . Then
according to part (a), we have for all k
f (xk ) − f (xk−1 )
→ γ, (1.3)
xk − xk−1
f (xk+1 ) − f (xk )
≤ γ, ∀ k. (1.4)
xk+1 − xk
f (yj+1 ) − f (yj )
lim ≤ γ.
j→∞ yj+1 − yj
Similarly, by exchanging the roles of {xk } and {yj }, we can show that
f (yj+1 ) − f (yj )
lim ≥ γ.
j→∞ yj+1 − yj
6
Thus the limit in Eq. (1.3) is independent of the choice for {xk }, and Eqs. (1.2)
and (1.4) hold for any increasing scalar sequence {xk }.
We consider separately each of the three possibilities γ < 0, γ = 0, and
γ > 0. First, suppose that γ < 0, and let {xk } be any increasing sequence. By
using Eq. (1.4), we obtain
k−1
f (xj+1 ) − f (xj )
f (xk ) = (xj+1 − xj ) + f (x1 )
xj+1 − xj
j=1
k−1
≤ γ(xj+1 − xj ) + f (x1 )
j=1
= γ(xk − x1 ) + f (x1 ),
and since γ < 0 and xk → ∞, it follows that f (xk ) → −∞. To show that f
decreases monotonically, pick any x and y with x < y, and consider the sequence
x1 = x, x2 = y, and xk = y + k for all k ≥ 3. By using Eq. (1.4) with k = 1, we
have
f (y) − f (x)
≤ γ < 0,
y−x
so that f (y) − f (x) < 0. Hence f decreases monotonically to −∞, corresponding
to case (1).
Suppose now that γ = 0, and let {xk } be any increasing sequence. Then,
by Eq. (1.4), we have f (xk+1 ) − f (xk ) ≤ 0 for all k. If f (xk+1 ) − f (xk ) < 0 for all
k, then f decreases monotonically. To show this, pick any x and y with x < y,
and consider a new sequence given by y1 = x, y2 = y, and yk = xK+k−3 for all
k ≥ 3, where K is large enough so that y < xK . By using Eqs. (1.2) and (1.4)
with {yk }, we have
f (y) − f (x) f (xK+1 ) − f (xK )
≤ < 0,
y−x xK+1 − xK
implying that f (y) − f (x) < 0. Hence f decreases monotonically, and it may
decrease to −∞ or to a finite value, corresponding to cases (1) or (2), respectively.
If for some K we have f (xK+1 ) − f (xK ) = 0, then by Eqs. (1.2) and (1.4)
where γ = 0, we obtain f (xk ) = f (xK ) for all k ≥ K. To show that f stays at
the value f (xK ) for all x ≥ xK , choose any x such that x > xK , and define {yk }
as y1 = xK , y2 = x, and yk = xN +k−3 for all k ≥ 3, where N is large enough so
that x < xN . By using Eqs. (1.2) and (1.4) with {yk }, we have
f (x) − f (xK ) f (xN ) − f (x)
≤ ≤ 0,
x − xK xN − x
so that f (x) ≤ f (xK ) and f (xN ) ≤ f (x). Since f (xK ) = f (xN ), we have
f (x) = f (xK ). Hence f (x) = f (xK ) for all x ≥ xK , corresponding to case (3).
Finally, suppose
that γ > 0, and let {xk } be any increasing sequence. Since
f (xk ) − f (xk−1 ) /(xk − xk−1 ) is nondecreasing and tends to γ [cf. Eqs. (1.3)
and (1.4)], there is a positive integer K and a positive scalar with < γ such
that
f (xk ) − f (xk−1 )
≤ , ∀ k ≥ K. (1.5)
xk − xk−1
7
Therefore, for all k > K
k−1
f (xj+1 ) − f (xj )
f (xk ) = (xj+1 − xj ) + f (xK ) ≥ (xk − xK ) + f (xK ),
xj+1 − xj
j=K
≥ 0.
Thus, dh/dt is nondecreasing on [0, 1] and for any t ∈ (0, 1), we have
t 1
h(t) − h(0) 1 dh(τ ) 1 dh(τ ) h(1) − h(t)
= dτ ≤ h(t) ≤ dτ = .
t t 0
dτ 1−t t
dτ 1−t
Equivalently,
th(1) + (1 − t)h(0) ≥ h(t),
and from the definition of h, we obtain
tf (y) + (1 − t)f (x) ≥ f ty + (1 − t)x .
Since this inequality has been proved for arbitrary t ∈ [0, 1] and x, y ∈ C, we
conclude that f is convex.
8
1.8 (Characterization of Twice Continuously Differentiable
Convex Functions)
Suppose that f : n → is convex over C. We first show that for all x ∈ ri(C)
and y ∈ S, we have y ∇2 f (x)y ≥ 0. Assume to arrive at a contradiction, that
there exists some x ∈ ri(C) such that for some y ∈ S, we have
y ∇2 f (x)y < 0.
1 2
f (x̄ + αy) = f (x̄) + α∇f (x̄) y + y ∇ f (x̄ + ᾱy)y,
2
for some ᾱ ∈ [0, α]. Furthermore, (x + αy) − x ≤ [since y = 1 and ᾱ < ].
Hence, from Eq. (1.7), it follows that
On the other hand, by the choice of and the assumption that y ∈ S, the vectors
x̄ + αy are in C for all α with α ∈ [0, ), which is a contradiction in view of
the convexity of f over C. Hence, we have y ∇2 f (x)y ≥ 0 for all y ∈ S and all
x ∈ ri(C).
Next, let x be a point in C that is not in the relative interior of C. Then, by
the Line Segment Principle, there is a sequence {xk } ⊂ ri(C) such that xk → x.
As seen above, y ∇2 f (xk )y ≥ 0 for all y ∈ S and all k, which together with the
continuity of ∇2 f implies that
for some α ∈ [0, 1]. Since x, z ∈ C, we have that (z − x) ∈ S, and using the
convexity of C and our assumption, it follows that
9
1.9 (Strong Convexity)
h(t) = f x + t(y − x) . Consider scalars t and s such that t < s. Using the chain
rule and the equation
2
∇f (x) − ∇f (y) (x − y) ≥ α x − y , ∀ x, y ∈ n , (1.8)
≥ α(s − t)2 x − y 2
> 0.
Thus, dh/dt is strictly increasing and for any t ∈ (0, 1), we have
t 1
h(t) − h(0) 1 dh(τ ) 1 dh(τ ) h(1) − h(t)
= dτ < dτ = .
t t 0
dτ 1−t t
dτ 1−t
c2 2
f (x + cy) = f (x) + cy ∇f (x) + y ∇ f (x + tcy)y,
2
and
c2 2
f (x) = f (x + cy) − cy ∇f (x + cy) +y ∇ f (x + scy)y,
2
for some t and s belonging to [0, 1]. Adding these two equations and using Eq.
(1.8), we obtain
c2 2
y ∇ f (x + scy) + ∇2 f (x + tcy) y = ∇f (x + cy) − ∇f (x) (cy) ≥ αc2 y 2
.
2
We divide both sides by c2 and then take the limit as c → 0 to conclude that
y ∇2 f (x)y ≥ α y 2 . Since this inequality is valid for every y ∈ n , it follows
that ∇2 f (x) − αI is positive semidefinite.
For the converse, assume that ∇2 f (x) − αI is positive semidefinite for all
x ∈ . Consider the function g : → defined by
n
g(t) = ∇f tx + (1 − t)y (x − y).
10
for some t ∈ [0, 1]. On the other hand,
dg(t)
= (x − y) ∇2 f tx + (1 − t)y (x − y) ≥ α x − y 2
,
dt
where the last inequality holds because ∇2 f tx+(1−t)y −αI is positive semidef-
inite. Combining the last two relations, it follows that f is strongly convex with
coefficient α.
1.10 (Posynomials)
where gk is a posynomial and γk > 0 for all k. Using a change of variables similar
to part (b), we see that we can represent the function f (x) = ln g(y) as
r
f (x) = γk ln exp(Ak x + bk ),
k=1
with the matrix Ak and the vector bk being associated with the posynomial gk for
each k. Since f (x) is a linear combination of convex functions with nonnegative
coefficients [part (b)], it follows from Prop. 1.2.4(a) that f (x) is convex.
11
1.11 (Arithmetic-Geometric Mean Inequality)
Consider the function f (x) = − ln(x). Since ∇2 f (x) = 1/x2 > 0 for all x > 0, the
function − ln(x) is strictly convex over (0, n∞). Therefore, for all positive scalars
x1 , . . . , xn ∈ (0, ∞) and α1 , . . . αn with α = 1, we have
i=1 i
which is equivalent to
eln(α1 x1 +···+αn xn ) ≥ eα1 ln(x1 )+···+αn ln(xn ) = eα1 ln(x1 ) · · · eαn ln(xn ) ,
or
α
α1 x1 + · · · + αn xn ≥ x1 1 · · · xαn
n ,
as desired. Since − ln(x) is strictly convex, the above inequality is satisfied with
equality if and only if x1 , . . . , xn are all equal.
where 1/p + 1/q = 1, p > 0, and q > 0. The above relation also holds if u = 0 or
v = 0. By setting u = xp and v = y q , we obtain Young’s inequality
xp yq
xy ≤ + , ∀ x ≥ 0, ∀ y ≥ 0.
p q
To show Holder’s inequality, note that it holds if x1 = · · · = xn = 0 or
y1 = · · · = yn = 0. If x1 , . . . , xn and y1 , . . . , yn are such that (x1 , . . . , xn ) = 0
and (y1 , . . . , yn ) = 0, then by using
|xi | |yi |
x= 1/p and y= 1/q
n n
j=1
|xj |p j=1
|yj |q
|xi | |yi | |x |p |y |q
1/q ≤ n + i .
i
1/p n
n
|xj |p
n
|yj |q p j=1
|xj | p q j=1
|yj |q
j=1 j=1
12
1.13
Let (x, w) and (y, v) be two vectorsin epi(f). Then f (x) ≤ w and f (y) ≤ v,
implying that there exist sequences (x, wk ) ⊂ C and (y, v k ) ⊂ C such that
for all k,
1 1
wk ≤ w + , vk ≤ v + .
k k
By the convexity of C, we have for all α ∈ [0, 1] and all k,
αx + (1 − αy), αwk + (1 − α)v k ∈ C,
1.14
1.15
m
y= λi xi ,
i=1
m xi ∈ C. Since
for some positive integer m, nonnegative scalars λi , and vectors
y = 0, we cannot have all λi equal to zero, implying that λ > 0. Because
i=1 i
xi ∈ C for all i and C is convex, the vector
m
λ
x= m i xi
i=1
λi
i=1
13
belongs to C. For this vector, we have
m
y= λi x,
i=1
m
with i=1
λi > 0, implying that y ∈ ∪x∈C γx | γ ≥ 0} and showing that
showing that λx ∈ C and that C is a cone. Let x, y ∈ C and let λ ∈ [0, 1]. Then
ai λx + (1 − λ)y = λai x + (1 − λ)ai y ≤ 0, ∀ i ∈ I,
showing that λx + (1 − λ)y ∈ C and that C is convex. Let a sequence {xk } ⊂ C
converge to some x̄ ∈ n . Then
1 1
x= x1 + x2 ,
2 2
14
By taking the convex hull of both sides in the above inclusion and by using the
convexity of C1 + C2 , we obtain
conv(C1 ∪ C2 ) ⊂ conv(C1 + C2 ) = C1 + C2 .
αC1 ∩ (1 − α)C2 = C1 ∩ C2 .
1.17
α1 Ci1 + · · · + αm Cim ,
with
m
αi ≥ 0, ∀ i = 1, . . . , m, αi = 1,
i=1
15
1.18 (Convex Hulls, Affine Hulls, and Generated Cones)
(a) We first show that X and cl(X) have the same affine hull. Since X ⊂ cl(X),
there holds
aff(X) ⊂ aff cl(X) .
Conversely, because X ⊂ aff(X) and aff(X) is closed, we have cl(X) ⊂ aff(X),
implying that
aff cl(X) ⊂ aff(X).
We now show that X and conv(X) have the same affine hull. By using a
translation argument if necessary, we assume without loss
of generality that X
contains the origin, so that both aff(X) and aff conv(X) are subspaces. Since
X ⊂ conv(X), evidently aff(X) ⊂ aff conv(X) . To show the reverse inclusion,
let the dimension of aff conv(X) be m, and let x1 , . . . , xm be linearly indepen-
dent vectors in conv(X) that span aff conv(X) . Then every x ∈ aff conv(X) is
a linear combination of the vectors x1 , . . . , xm , i.e., there exist scalars β1 , . . . , βm
such that
m
x= βi xi .
i=1
x= αi xi .
i=1
As an example
showing that the above inclusion can be strict, consider the
set X = (1, 1) in 2 . Then conv(X) = X, so that
aff conv(X) = X = (1, 1) ,
and the dimension of conv(X) is zero. On the other hand, cone(X) = (α, α) |
α ≥ 0 , so that
aff cone(X) = (x1 , x2 ) | x1 = x2 ,
16
and the dimension of cone(X) is one.
(d) In view of parts (a) and (c), it suffices to show that
aff cone(X) ⊂ aff conv(X) = aff(X).
It is always true that 0 ∈ cone(X), so aff cone(X) is a subspace. Let the
dimension of aff cone(X) be m, and let x1 , . . . , xm be linearly independent
vectors in cone(X) that span aff cone(X) . Since every vector in aff cone(X) is
a linear combination of x1 , . . . , xm , and since each xi is a nonnegative
combination
of some vectors in X, it follows that every vector in aff cone(X) is a linear
combination of some vectors in X. In view of the assumption that 0 ∈ conv(X),
the affine hull of conv(X) is a subspace, which implies by part (a) that the affine
hull of X is a subspace. Hence, every vector in aff cone(X) belongs to aff(X),
showing that aff cone(X) ⊂ aff(X).
1.19
By definition, f (x) is the infimum of the values of w such that (x, w) ∈ C, where
C is the convex hull of the union of nonempty convex sets epi(fi ). By Exercise
1.17, (x, w) ∈ C if and only if (x, w) can be expressed as a convex combination
of the form
⎛ ⎞
(x, w) = αi (xi , wi ) = ⎝ αi x i , αi wi ⎠ ,
i∈I i∈I i∈I
where I ⊂ I is a finite set and (xi , wi ) ∈ epi(fi ) for all i ∈ I. Thus, f (x) can be
expressed as
f (x) = inf αi wi (x, w) = αi (xi , wi ),
i∈I i∈I
(xi , wi ) ∈ epi(fi ), αi ≥ 0, ∀ i ∈ I, αi = 1 .
i∈I
Since the set xi , fi (xi ) | xi ∈ n is contained in epi(fi ), we obtain
⎧ ⎫
⎨ ⎬
f (x) ≤ inf αi fi (xi ) x = αi xi , xi ∈ n , αi ≥ 0, ∀ i ∈ I, αi = 1 .
⎩ ⎭
i∈I i∈I i∈I
On the other hand, by the definition of epi(fi ), for each (xi , wi ) ∈ epi(fi ) we
have wi ≥ fi (xi ), implying that
⎧ ⎫
⎨ ⎬
f (x) ≥ inf αi fi (xi ) x = αi xi , xi ∈ n , αi ≥ 0, ∀ i ∈ I, αi = 1 .
⎩ ⎭
i∈I i∈I i∈I
17
By combining the last two relations, we obtain
⎧ ⎫
⎨ ⎬
f (x) = inf αi fi (xi ) x = αi xi , xi ∈ n , αi ≥ 0, ∀ i ∈ I, αi = 1 ,
⎩ ⎭
i∈I i∈I i∈I
On the other hand, by the definition of epi(f ), for each (xi , wi ) ∈ epi(f ) we have
wi ≥ f (xi ), implying that
F (x) ≥ inf αi f (xi ) (x, w) = αi (xi , wi ),
i i
(xi , wi ) ∈ epi(f ), αi ≥ 0, αi = 1 ,
i
= inf αi f (xi ) x = αi xi , xi ∈ X, αi ≥ 0, αi = 1 ,
i i i
which combined with the preceding inequality implies the desired relation.
(b) By using part (a), we have for every x ∈ X
F (x) ≤ f (x),
18
since f (x) corresponds to the value of the function α f (xi ) for a particular
i i
representation of x as a finite convex combination of elements of X, namely
x = 1 · x. Therefore, we have
Let f ∗ = inf x∈X f (x). If inf x∈conv(X) F (x) < f ∗ , then there exists z ∈
conv(X) with F (z) < f ∗ . According
exist points xi ∈ X and
to part (a), there
nonnegative scalars αi with α = 1 such that z = i αi xi and
i i
F (z) ≤ αi f (xi ) < f ∗ ,
i
implying that
αi f (xi ) − f ∗ < 0.
i
Since each αi is nonnegative, for this inequality to hold, we must have f (xi )−f ∗ <
0 for some i, but this cannot be true because xi ∈ X and f ∗ is the optimal value
of f over X. Therefore
19
and
F (x) = inf αi c x i αi xi = x, xi ∈ X, αi = 1, αi ≥ 0
i i i
= inf c αi xi αi xi = x, xi ∈ X, αi = 1, αi ≥ 0
i i i
= c x,
showing that
inf c x = inf c x.
x∈conv(X) x∈X
c x∗ = αi c xi .
i=1
c x ∗ = αi c xi ≥ αi c x∗ = c x∗ ,
i=1 i=1
x= γi xi + γi yi ,
i=1 i=k+1
k
m
20
for some positive scalars α1 , . . . , αm and vectors
k
m
k
x= αi xi + αi yi , 1= αi .
i=1 i=k+1 i=1
λi (xi − x1 ) + λi yi = 0.
i=2 i=k+1
k
m
λi (xi , 1) + λi (yi , 0) = 0,
i=1 i=k+1
1.23
cl conv(X) ⊂ cl conv cl(X) = conv cl(X) .
conv cl(X) ⊂ conv cl conv(X) = cl conv(X) ,
since by Prop. 1.2.1(d), the closure of a convex set is convex. Hence, the result
follows.
21
1.24 (Radon’s Theorem)
λi xi = 0, λi = 0.
i=1 i=1
where
λ∗i
αi = , i ∈ I.
λ∗k
m m
k∈I
where
−λ∗j
αj = , j ∈ J.
k∈J
(−λ∗k )
It is seen that the αi and αj are nonnegative, and that
αi = αj = 1,
i∈I j∈J
Let Bj be defined as in the hint, and for each j, let xj be a vector in Bj . Since
M + 1 ≥ n + 2, we can apply Radon’s Theorem to the vectors x1 , . . . , xM +1 .
Thus, there exist nonempty and disjoint index subsets I and J such that I ∪ J =
{1, . . . , m}, nonnegative scalars α1 , . . . , αm+1 , and a vector x∗ such that
x∗ = αi xi = αj xj , αi = αj = 1.
i∈I j∈J i∈I j∈J
22
1.26
Assume the contrary, i.e., that for every index set I ⊂ {1, . . . , M }, which contains
no more than n + 1 indices, we have
"
infn max fi (x) < f ∗.
x∈ i∈I
This means that for every such I, the intersection ∩i∈I Xi is nonempty, where
Xi = x | fi (x) < f ∗ .
This contradicts the definition of f ∗ . Note: The result of this exercise relates to
the following question: what is the minimal number of functions fi that we need
to include in the cost function maxi fi (x) in order to attain the optimal value f ∗ ?
According to the result, the number is no more than n + 1. For applications of
this result in structural design and Chebyshev approximation, see Ben Tal and
Nemirovski [BeN01].
1.27
1.28
From Prop. 1.4.5(b), we have that for any vector a ∈ n , ri(C + a) = ri(C) + a.
Therefore, we can assume without loss of generality that 0 ∈ C, and aff(C)
coincides with S. We need to show that
ri(C) = int(C + S ⊥ ) ∩ C.
Let x ∈ ri(C). By definition, this implies that x ∈ C and there exists some
open ball B(x, ) centered at x with radius > 0 such that
B(x, ) ∩ S ⊂ C. (1.9)
23
We now show that B(x, ) ⊂ C + S ⊥ . Let z be a vector in B(x, ). Then,
we can express z as z = x + αy for some vector y ∈ n with y = 1, and
some α ∈ [0, ). Since S and S ⊥ are orthogonal subspaces, y can be uniquely
decomposed as y = yS + yS ⊥ , where yS ∈ S and yS ⊥ ∈ S ⊥ . Since y = 1, this
implies that yS ≤ 1 (Pythagorean Theorem), and using Eq. (1.9), we obtain
x + αyS ∈ B(x, ) ∩ S ⊂ C,
B(x, ) ∩ S ⊂ C,
1.29
(a) Let C be the given convex set. The convex hull of any subset of C is contained
in C. Therefore, the maximum dimension of the various simplices contained in
C is the largest m for which C contains m + 1 vectors x0 , . . . , xm such that
x1 − x0 , . . . , xm − x0 are linearly independent.
Let K = {x0 , . . . , xm } be such a set with
m maximal,
and let aff(K) denote
the affine hull of set K. Then, we have dim aff(K) = m, and since K ⊂ C, it
follows that aff(K) ⊂ aff(C).
We claim that C ⊂ aff(K). To see this, assume that there exists some
x ∈ C, which does not belong to aff(K). This implies that the set {x, x0 , . . . , xm }
is a set of m + 2 vectors in C such that x − x0 , x1 − x0 , . . . , xm − x0 are linearly
independent, contradicting the maximality of m. Hence, we have C ⊂ aff(K),
and it follows that
aff(K) = aff(C),
thereby implying that dim(C) = m.
(b) We first consider the case where C is n-dimensional with n > 0 and show that
the interior of C is not empty. By part (a), an n-dimensional convex set contains
an n-dimensional simplex. We claim that such a simplex S has a nonempty
interior. Indeed, applying an affine transformation if necessary, we can assume
that the vertices of S are the vectors (0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (0, 0, . . . , 1),
i.e.,
n
S= (x1 , . . . , xn ) xi ≥ 0, ∀ i = 1, . . . , n, xi ≤ 1 .
i=1
24
is nonempty, which in turn implies that int(C) is nonempty.
For the case where dim(C) < n, consider the n-dimensional set C + S ⊥ ,
where S ⊥ is the orthogonal complement of the subspace parallel to aff(C). Since
C + S ⊥ is a convex set, it follows from the above argument that int(C + S ⊥ ) is
nonempty. Let x ∈ int(C + S ⊥ ). We can represent x as x = xC + xS ⊥ , where
xC ∈ C and xS ⊥ ∈ S ⊥ . It can be seen that xC ∈ int(C + S ⊥ ). Since
ri(C) = int(C + S ⊥ ) ∩ C,
1.30
(a) Let C1 be the segment (x1 , x2 ) | 0 ≤ x1 ≤ 1, x2 = 0 and let C2 be the box
(x1 , x2 ) | 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1 . We have
ri(C1 ) = (x1 , x2 ) | 0 < x1 < 1, x2 = 0 ,
ri(C2 ) = (x1 , x2 ) | 0 < x1 < 1, 0 < x2 < 1 .
ri(C1 ) = ri(C1 ∩ C2 ).
25
1.31
(a) Let x ∈ ri(C). We will show that for every x ∈ aff(C), there exists a γ > 1
such that x + (γ − 1)(x − x) ∈ C. This is true if x = x, so assume that x = x.
Since x ∈ ri(C), there exists > 0 such that
z | z − x < ∩ aff(C) ⊂ C.
Choose a point x ∈ C in the intersection of the ray x + α(x − x) | α ≥ 0 and
the set z | z − x < ∩ aff(C). Then, for some positive scalar α ,
x − x = α (x − x).
x + (γ − 1)(x − x ) ∈ C,
x + (γ − 1)α (x − x) ∈ C.
The result follows by letting γ = 1 + (γ − 1)α and noting that γ > 1, since
(γ − 1)α > 0. The converse assertion follows from the fact C ⊂ aff(C) and
Prop. 1.4.1(c).
(b) The inclusion cone(C) ⊂ aff(C) always holds if 0 ∈ C. To show the reverse
inclusion, we note that by part (a) with x = 0, for every x ∈ aff(C), there exists
γ > 1 such that x̃ = (γ − 1)(−x) ∈ C. By using part (a) again with x = 0, for
x̃ ∈ C ⊂ aff(C), we see that there is γ̃ > 1 such that z = (γ̃ − 1)(−x̃) ∈ C, which
combined with x̃ = (γ − 1)(−x) yields z = (γ̃ − 1)(γ − 1)x ∈ C. Hence
1
x= z
(γ̃ − 1)(γ − 1)
with z ∈ C and (γ̃ − 1)(γ − 1) > 0, implying that x ∈ cone(C) and, showing that
aff(C) ⊂ cone(C).
(c) This follows by part (b), where C = conv(X), and the fact
cone conv(X) = cone(X)
[Exercise 1.18(b)].
26
1.32
αk xk ≤ sup ym < ∞, ∀ k.
m≥0
We have inf m≥0 xm > 0, since {xk } ⊂ C and C is a compact set not containing
the origin, so that
supm≥0 ym
0 ≤ αk ≤ < ∞, ∀ k.
inf m≥0 xm
Thus, the sequence {(αk , xk )} is bounded and has a limit point (α, x) such that
α ≥ 0 and x ∈ C. By taking a subsequence of {(αk , xk )} that converges to (α, x),
and by using the facts yk = αk xk for all k and {yk } → y, we see that y = αx
with α ≥ 0 and x ∈ C. Hence, y ∈ cone(C), showing that cone(C) is closed.
(b) To
see that the assertion in part
(a) fails when C is unbounded, let C be the
line (x1 , x2 ) | x1 = 1, x2 ∈ in 2 not passing through the origin. Then,
cone(C) is the nonclosed set (x1 , x2 ) | x1 > 0, x2 ∈ ∪ (0, 0) .
To see that the assertion in part (a) fails
when C contains the origin on its
relative boundary, let C be the closed ball (x1 , x2 ) | (x1 − 1)2 + x22 ≤ 1 in 2 .
Then, cone(C) is the nonclosed set (x1 , x2 ) | x1 > 0, x2 ∈ ∪ (0, 0) (see
Fig. 1.3.2).
(c) Since C is compact, the convex hull of C is compact (cf. Prop. 1.3.2). Because
conv(C) does not contain the origin on its relative boundary, by part (a),
the cone
generated by conv(C) is closed. By Exercise 1.18(b), cone conv(C) coincides
with cone(C) implying that cone(C) is closed.
1.33
(a) By Prop. 1.4.1(b), the relative interior of a convex set is a convex set. We
only need to show that ri(C) is a cone. Let y ∈ ri(C). Then, y ∈ C and since C
is a cone, αy ∈ C for all α > 0. By the Line Segment Principle, all the points on
the line segment connecting y and αy, except possibly αy, belong to ri(C). Since
this is true for every α > 0, it follows that αy ∈ ri(C) for all α > 0, showing that
ri(C) is a cone.
(b) Consider the linear transformation A that maps (α1 , . . . , αm ) ∈ m into
m
i=1
αi xi ∈ . Note that C is the image of the nonempty convex set
n
(α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0
27
under the linear transformation A. Therefore, by using Prop. 1.4.3(d), we have
ri(C) = ri A · (α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0
= A · ri (α1 , . . . , αm ) | α1 ≥ 0, . . . , αm ≥ 0
= A · (α1 , . . . , αm ) | α1 > 0, . . . , αm > 0
m
= αi xi | α1 > 0, . . . , αm > 0 .
i=1
1.34
Let T be the linear transformation that maps (x, y) ∈ n+m into x ∈ n . Then
it can be seen that
A−1 · C = T · (D ∩ S). (1.10)
The relative interior of D is given by ri(D) = n × ri(C), and the relative interior
of S is equal to S (since S is a subspace). Hence,
A−1 · ri(C) = T · ri(D) ∩ S . (1.11)
In view of the assumption that A−1 · ri(C) is nonempty, we have that the in-
tersection ri(D) ∩ S is nonempty. Therefore, it follows from Props. 1.4.3(d) and
1.4.5(a) that
ri T · (D ∩ S) = T · ri(D) ∩ S . (1.12)
Combining Eqs. (1.10)-(1.12), we obtain
ri(A−1 · C) = A−1 · ri(C).
28
1.35 (Closure of a Convex Function)
(a) Let g : n → [−∞, ∞] be such that g(x) ≤ f (x) for all x ∈ n . Choose
any x ∈ dom(cl f ). Since epi(cl f ) = cl epi(f ) , we can choose a sequence
(xk , wk ) ∈ epi(f ) such that xk → x, wk → (cl f )(x). Since g is lower semicon-
tinuous at x, we have
g(x) ≤ lim inf g(xk ) ≤ lim inf f (xk ) ≤ lim inf wk = (cl f )(x).
k→∞ k→∞ k→∞
Note also that since epi(f ) ⊂ epi(cl f ), we have (cl f )(x) ≤ f (x) for all x ∈ n .
(b) For the proof of this part and the next, we will use the easily shown fact that
for any convex function f , we have
ri epi(f ) = (x, w) | x ∈ ri dom(f ) , f (x) < w .
Let x ∈ ri dom(f ) , and consider the vertical line L = (x, w) | w ∈ .
Then there exists ŵ such that (x, ŵ) ∈ L∩ri epi(f ) . Let w be such that (x, w) ∈
L ∩ cl epi(f ) . Then, by Prop. 1.4.5(a), we have L ∩ cl epi(f ) = cl L ∩ epi(f ) ,
so that (x, w) ∈ cl L ∩ epi(f ) . It follows from the Line Segment Principle that
the vector x, ŵ + α(w − ŵ) belongs to epi(f ) for all α ∈ [0, 1). Taking the
limit as α → 1, we see that f (x) ≤ w for all w such that (x, w) ∈ L ∩ cl epi(f ) ,
implying that f (x) ≤ (cl f )(x). On the other hand, since epi(f ) ⊂ epi(cl f ), we
have (cl f )(x) ≤ f (x) for all x ∈ n , so f (x) = (cl f )(x).
We know that a closed convex function that is improper cannot take a finite
value at any point. Since cl f is closedand convex, and takes a finite value at all
points of the nonempty set ri dom(f ) , it follows that cl f must be proper.
(c) Since the function cl f is closed and is majorized by f , we have
(cl f )(y) ≤ lim inf (cl f ) y + α(x − y) ≤ lim inf f y + α(x − y) .
α↓0 α↓0
29
1.36
RC∩M = RC ∩ RM = RC ∩ S = {0}.
(a) We first show that the convex hull of X is equal to the Cartesian product of
the convex hulls of the sets Xi , i = 1, . . . , m. Let y be a vector that belongs to
conv(X). Then, by definition, for some k, we have
k
k
y= αi y i , with αi ≥ 0, i = 1, . . . , m, αi = 1,
i=1 i=1
where yi ∈ X for all i. Since yi ∈ X, we have that yi = (xi1 , . . . , xim ) for all i,
with xi1 ∈ X1 , . . . , xim ∈ Xm . It follows that
k
k
k
ki
ki
(x11 , x2r1 , . . . , xm 1 2 m 1 2 m
rm−1 ), (x2 , xr1 , . . . , xrm−1 ), . . . , (xki , xr1 , . . . , xrm−1 ),
for all possible values of r1 , . . . , rm−1 , i.e., we fix all components except the
first one, and vary the first component over all possible x1j ’s used in the convex
30
combination that yields y1 . Since all these vectors belong to X, their convex
combination given by
k
1
αj1 x1j , x2r1 , . . . , xm
rm−1
j=1
belongs to the convex hull of X for all possible values of r1 , . . . , rm−1 . Now,
consider the vectors
k k
1 1
αj1 x1j , x21 , . . . , xm
rm−1 ,..., αj1 x1j , x2k2 , . . . , xm
rm−1 ,
j=1 j=1
i.e., fix all components except the second one, and vary the second component
over all possible x2j ’s used in the convex combination that yields y2 . Since all
these vectors belong to conv(X), their convex combination given by
k
1
k2
αj1 x1j , αj2 x2j , . . . , xm
rm−1
j=1 j=1
belongs to the convex hull of X for all possible values of r2 , . . . , rm−1 . Proceeding
in this way, we see that the vector given by
k
1
k2
km
αj1 x1j , αj2 x2j ,..., αjm xm
j
j=1 j=1 j=1
y= β i yi ,
i=1
31
where β 1 , . . . , β r are scalars. Since y i ∈ X, we have that y i = (xi1 , . . . , xim ) with
xij ∈ Xj . Thus,
r
r
i
r
y= β1j xj1 , . . . , j j
βm xm .
j=1 j=1
belong to aff(X), and so does their sum, which is the vector y. Thus, y ∈ aff(X),
concluding the proof.
r
y= αi y i ,
i=1
for some r, where α1 , . . . , αr are nonnegative scalars and yi ∈ X for all i. Since
y i ∈ X, we have that y i = (xi1 , . . . , xim ) with xij ∈ Xj . Thus,
r
r
i
r
y= α1j xj1 , . . . , j
αm xjm ,
j=1 j=1
where xji ∈ Xi and αij ≥ 0 for each i and j. Since each Xi contains the origin,
we have that the vectors
r
1
r2
rm
32
belong to the cone(X), and so does their sum, which is the vector y. Thus,
y ∈ cone(X), concluding the proof.
Finally, consider the example where
X1 = {0, 1} ⊂ , X2 = {1} ⊂ .
Let x = (x1 , . . . , xm ) ∈ ri(X). Then, by Prop. 1.4.1 (c), we have that for all
x = (x1 , . . . , xm ) ∈ X, there exists some γ > 1 such that
x + (γ − 1)(x − x) ∈ X.
xi + (γ − 1)(xi − xi ) ∈ Xi ,
RX = RX1 × · · · × RXm .
RC ⊂ Rcl(C) .
By taking closures in this relation and by using the fact that Rcl(C) is closed [part
(a) of the Recession Cone Theorem], we obtain cl(RC ) ⊂ Rcl(C) .
33
To see that the inclusion cl(RC ) ⊂ Rcl(C) can be strict, consider the set
C = (x1 , x2 ) | 0 ≤ x1 , 0 ≤ x2 < 1 ∪ (0, 1) ,
whose closure is
34
where the equalities follow from part (a) and the assumption that C = ri(C).
To see that the inclusion RC ⊂ RC can fail when C = ri(C), consider the
sets
C = (x1 , x2 ) | x1 ≥ 0, 0 < x2 < 1 , C = (x1 , x2 ) | x1 ≥ 0, 0 ≤ x2 < 1 ,
1.40
C k+1 ⊂ C k , ∀ k,
showing that assumption (1) of Prop. 1.5.6 is satisfied. Similarly, since by as-
sumption Xk ∩ Ck is nonempty for all k, we have that, for all k, the set
X ∩ C k = X ∩ Xk ∩ Ck = Xk ∩ Ck ,
is nonempty, showing that assumption (2) is satisfied. Finally, let R denote the
set R = ∩∞ k=0 RC . Since by assumption C k is nonempty for all k, we have, by
k
part (e) of the Recession Cone Theorem, that RC = RXk ∩ RCk implying that
k
R= ∩∞
k=0 RC
k
= ∩∞
k=0 (RXk ∩ RCk )
= ∩∞ ∞
k=0 RXk ∩ ∩k=0 RCk
= RX ∩ RC .
RX ∩ R = RX ∩ RC ⊂ LC ,
RX ∩ R ⊂ LC ∩ LX = L,
showing that assumption (3) of Prop. 1.5.6 is satisfied, and thus proving that the
intersection X ∩ (∩∞
k=0 C k ) is nonempty.
35
1.41
is closed. Since A·C ⊂ A·cl(C) and y ∈ cl(A·C), it follows that y is in the closure
of A · cl(C), so that
C is nonempty for every > 0. Furthermore, the recession
cone of the set x | Ax − y ≤ coincides with the null space N (A), so that
RC = Rcl(C) ∩ N (A). By assumption we have Rcl(C) ∩ N (A) = {0}, and by part
(c) of the Recession Cone Theorem (cf. Prop. 1.5.1), it follows that C is bounded
for every > 0. Now, since the sets C are nested nonempty compact sets, their
intersection ∩>0 C is nonempty. For any x in this intersection, we have x ∈ cl(C)
and Ax − y = 0, showing that y ∈ A · cl(C). Hence, cl(A · C) ⊂ A · cl(C). The
converse A · cl(C) ⊂ cl(A · C) is clear, since for any x ∈ cl(C) and sequence
{xk } ⊂ C converging to x, we have Axk → Ax, showing that Ax ∈ cl(A · C).
Therefore,
cl(A · C) = A · cl(C). (1.13)
xk − x
uk = , ∀ k.
xk − x
Let u be a limit point of {uk }, and note that u = 0. It can be seen that
u is a direction of recession of cl(C) [this can be done similar to the proof of
part (c) of the Recession Cone Theorem (cf. Prop. 1.5.1)]. By taking an appro-
priate subsequence if necessary, we may assume without loss of generality that
limk→∞ uk = u. Then, by the choices of uk and xk , we have
Axk − Ax k
Au = lim Auk = lim = lim y,
k→∞ k→∞ xk − x k→∞ xk − x
36
positive and Au = λy, so that A(u/λ) = y. Since Rcl(C) is a cone [part (a) of the
Recession Cone Theorem] and u ∈ Rcl(C) , the vector u/λ is in Rcl(C) , so that y
belongs to A · Rcl(C) . Hence, RA·cl(C) ⊂ A · Rcl(C) , completing the proof.
As an example showing that A·Rcl(C) and RA·cl(C) may differ when Rcl(C) ∩
N (A) = {0}, consider the set
C = (x1 , x2 ) | x1 ∈ , x2 ≥ x21 ,
1.42
Let S be defined by
S = Rcl(C) ∩ N (A),
and note that S is a subspace of Lcl(C) by the given assumption. Then, by Lemma
1.5.4, we have
cl(C) = cl(C) ∩ S ⊥ + S,
so that the images of cl(C) and cl(C) ∩ S ⊥ under A coincide [since S ⊂ N (A)],
i.e.,
A · cl(C) = A · cl(C) ∩ S ⊥ . (1.14)
Define
C = cl(C) ∩ S ⊥
RC = Rcl(C) ∩ S ⊥ , (1.16)
37
[cf. part (e) of the Recession Cone Theorem, Prop. 1.5.1], for which, since S =
Rcl(C) ∩ N (A), we have
RC ∩ N (A) = S ∩ S ⊥ = {0}.
RA·cl(C) = RA·C .
RA·C = A · RC ,
A · RC ⊂ A · Rcl(C) .
The preceding three relations yield RA·cl(C) ⊂ A · Rcl(C) , completing the proof.
38
RC ∩ N (A) is a subspace of LC . By Exercise 1.42, the set A · C is closed and
RA·C = A · RC . Since A · C = C1 + · · · + Cm , the assertions of part (a) follow.
(b) The proof is similar to that of part (a). Let C be the Cartesian product
C1 × · · · × Cm . Then, by Exercise 1.37(a),
cl(C) = cl(C1 ) × · · · × cl(Cm ), (1.18)
and its recession cone and lineality space are given by
Rcl(C) = Rcl(C1 ) × · · · × Rcl(Cm ) , (1.19)
Lcl(C) = Lcl(C1 ) × · · · × Lcl(Cm ) .
Let A be a linear transformation that maps (x1 , . . . , xm ) ∈ mn into x1 + · · · +
xm ∈ n . Then, the intersection Rcl (C) ∩ N (A) consists of points (y1 , . . . , ym )
such that y1 + · · · + ym = 0 with yi ∈ Rcl(Ci ) for all i. By the given condition,
every vector (y1 , . . . , ym ) in the intersection Rcl(C) ∩N (A) is such that yi ∈ Lcl(Ci )
for all i, implying that (y1 , . . . , ym ) belongs to the lineality space Lcl(C) . Thus,
Rcl(C) ∩ N (A) ⊂ Lcl(C) ∩ N (A). On the other hand by definition of the lineality
space, we have Lcl(C) ⊂ Rcl(C) , so that Lcl(C) ∩ N (A) ⊂ Rcl(C) ∩ N (A). Hence,
Rcl(C) ∩ N (A) = Lcl(C) ∩ N (A), implying that Rcl(C) ∩ N (A) is a subspace of
Lcl(C) . By Exercise 1.42, we have cl(A · C) = A · cl(C) and RA·cl(C) = A · Rcl(C) ,
from which by using the relation A · C = C1 + · · · + Cm , and Eqs. (1.18) and
(1.19), we obtain
cl(C1 + · · · + Cm ) = cl(C1 ) + · · · + cl(Cm ),
Rcl(C1 +···+Cm ) = Rcl(C1 ) + · · · + Rcl(Cm ) .
1.44
where the Qij are appropriately defined symmetric positive semidefinite mn×mn
matrices and the aij are appropriately defined vectors in mn . Hence, the set C
is specified by convex quadratic inequalities. Thus, we can use Prop. 1.5.8(c) to
assert that the set AC = C1 + · · · + Cm is closed.
Helly’s Theorem implies that the sets C k defined in the hint are nonempty. These
sets are also nested and satisfy the assumptions of Props. 1.5.5 and 1.5.6. There-
fore, the intersection ∩∞
i=1 C i is nonempty. Since
∩∞ ∞
i=1 C i ⊂ ∩i=1 Ci ,
39
Convex Analysis and
Optimization
Chapter 2 Solutions
Dimitri P. Bertsekas
with
2.1
which is a contradiction. [Note that in the above string of equations, the second
inequality follows by the convexity of f , and the strict inequality follows from the
assumption that f (x) < f (x∗ ).] Hence, it follows that x∗ is a global minimum of
f.
(b) Consider the function f (x1 , x2 ) = (x2 − px21 )(x2 − qx21 ), where 0 < p < q and
let x∗ = (0, 0).
We first show that g(α) = f (x∗ + αd) is minimized at α = 0 for all d ∈ 2 .
We have
g(α) = f (x∗ + αd) = (αd2 − pα2 d21 )(αd2 − qα2 d21 ) = α2 (d2 − pαd21 )(d2 − qαd21 ).
Also,
g (α) = 2α(d2 − pαd21 )(d2 − qαd21 ) + α2 (−pd21 )(d2 − qαd21 ) + α2 (d2 − pαd21 )(−qd21 ).
2
Thus g (0) = 2d22 , which is greater than 0 if d2 = 0. If d2 = 0, g(α) = pqα4 d41 ,
which is clearly minimized at α = 0. Therefore, (0, 0) is a local minimum of f
along every line that passes through (0, 0).
Let’s now show that if p < m < q, f (y, my 2 ) < 0 if y = 0 and that
f (y, my 2 ) = 0 otherwise. Consider a point of the form (y, my 2 ). We have
f (y, my 2 ) = y 4 (m − p)(m − q). Clearly, f (y, my 2 ) < 0 if and only if p < m < q
and y = 0. In any −neighborhood of (0, 0), there exists a y = 0 such that for
some m ∈ (p, q), (y, my 2 ) also belongs to the neighborhood. Since f (0, 0) = 0,
we see that (0, 0) is not a local minimum.
We claim that the set C is compact. Indeed, since X is bounded, so is its closure,
which implies that z ≤ maxx∈cl(X) x + for all z ∈ C , showing that C is
bounded. To show the closedness of C , let {zk } be a sequence in C converging
to some z. By the definition of C , there is a corresponding sequence {xk } in
cl(X) such that
zk − xk ≤ , ∀ k. (2.1)
Because cl(X) is compact, {xk } has a subsequence converging to some x ∈ cl(X).
Without loss of generality, we may assume that {xk } converges to x ∈ cl(X). By
taking the limit in Eq. (2.1) as k → ∞, we obtain z − x ≤ with x ∈ cl(X),
showing that z ∈ C . Hence, C is closed.
We now show that f has the Lipschitz property over X. Let x and y be
two distinct points in X. Then, by the definition of C , the point
z=y+ (y − x)
y − x
is in C . Thus
y − x
y= z+ x,
y − x + y − x +
showing that y is a convex combination of z ∈ C and x ∈ C . By convexity of
f , we have
y − x
f (y) ≤ f (z) + f (x),
y − x + y − x +
implying that
y − x y − x
f (y) − f (x) ≤ f (z) − f (x) ≤ max f (u) − min f (v) ,
y − x + u∈C v∈C
3
which combined with the preceding relation yields f (x) − f (y) ≤ Lx − y,
where L = maxu∈C f (u) − minv∈C f (v) /.
where we use the Lipschitz continuity of f to get the second inequality. Taking
the infimum over all y ∈ X, we obtain
Hence, x∗ minimizes Fc (x) over Y for all c > L. (Note that the infimum in
the preceding relation is attained by Weierstrass’ Theorem, since X is closed by
assumption, and · is a continuous function that has compact level sets.)
(b) Suppose, to arrive at a contradiction, that x∗ minimizes Fc (x) over Y , but
x∗ ∈
/ X.
We have that Fc (x∗ ) = f (x∗ ) + c miny∈X y − x∗ . Using the argument
given earlier, the minimum of y − x over y ∈ X is attained at some x̃ ∈ X,
which is not equal to x∗ , and therefore,
which contradicts the fact that x∗ minimizes Fc (x) over Y . (Note that the first
inequality follows from c > L and x̃ = x∗ . The second inequality follows from
the Lipschitz continuity of function f .) Hence, if x∗ minimizes Fc (x) over Y , it
follows that x∗ ∈ X.
4
where B x, (γ − f ∗ )/δ denotes the closed ball centered at x with radius (γ −
f ∗ )/δ. Hence, it follows by Weierstrass’ Theorem that F attains a minimum over
n , i.e., the set arg minx∈n F (x) is nonempty and compact.
Consider now minimizing f over the set arg minx∈n F (x). Since f is closed
by assumption, we conclude by using Weierstrass’ Theorem that f attains a
minimum at some x̃ over the set arg minx∈n F (x). Hence, we have
Since x̃ ∈ arg minx∈n F (x), it follows that F (x̃) ≤ F (x), for all x ∈ n , and
thereby implying that x̃ is the unique optimal solution of the problem of mini-
mizing f (x) + δx − x̃ over n .
Moreover, since F (x̃) ≤ F (x) for all x ∈ n , we have F (x̃) ≤ F (x), which
implies that
f (x̃) ≤ f (x) − δx̃ − x ≤ f (x),
and also
F (x̃) ≤ F (x) = f (x) ≤ f ∗ + .
Using Eq. (2.2), it follows that x̃ ∈ B(x, /δ), proving the desired result.
(a) Let > 0 be given. Assume, to arrive at a contradiction, that for any sequence
{δk } with δk ↓ 0, there exists a sequence {xk } ⊂ X such that for all k
It follows that, for some k, xk belongs to the set x ∈ X | f (x) ≤ f ∗ + δk , for
all k ≥ k. Since by assumption f and X have no common nonzero direction of
recession,
by the Recession
Cone Theorem, we have that the closed convex set
x ∈ X | f (x) ≤ f ∗ + δk is bounded. Therefore, the sequence {xk } is bounded
and has a limit point x ∈ X, which, in view of the preceding relations, satisfies
f (x) ≤ f ∗ , x − x∗ ≥ , ∀ x∗ ∈ X ∗ ,
5
which is a contradiction. This proves that, for every > 0, there exists a δ > 0
such that every vector x ∈ X with f (x) ≤ f ∗ + δ satisfies minx∗ ∈X ∗ x − x∗ < .
(b) Fix > 0. By part (a), there exists some δ > 0 such that every vector x ∈ X
with f (x) ≤ f ∗ + δ satisfies minx∗ ∈X ∗ x − x∗ ≤ . Since f (xk ) → f ∗ , there
exists some K such that
f (xk ) ≤ f ∗ + δ , ∀ k ≥ K .
By part (a), this implies that {xk }k≥K ⊂ X ∗ + B. Since X ∗ is nonempty and
compact (cf. Prop. 2.3.2), it follows that every such sequence {xk } is bounded.
Let x be a limit point of the sequence {xk } ⊂ X satisfying f (xk ) → f ∗ .
By lower semicontinuity of the function f , we get that
(a) Let d ∈ RX ∩ Rf . If d ∈
/ Ff , then we must have limα→∞ f (x + αd) = −∞, for
some x ∈ dom(f ) ∩ X, implying that for each k, x + αd ∈ Ck for all sufficiently
large α. Thus d is a direction of recession of Ck and hence also of f , i.e., d ∈ Rf .
Therefore, we must have RX ∩ Ff = RX ∩ Rf , so using the hypothesis, we have
RX ∩ Rf ⊂ Lf . From Prop. 1.5.6 [condition (3)], it follows that X ∩ ∩∞ k=0 Ck
is nonempty.
(b) Let d ∈ RX ∩ Rf . If d ∈/ Ff , then we must have limα→∞ f (x + αd) = −∞,
for some x ∈ dom(f ) ∩ X. Since d ∈ RX , we have x + αd ∈ X for all x ∈ X and
αγe0. It follows that inf x∈X f (x) = −∞, a contradiction. Therefore, we must
have RX ∩ Ff = RX ∩ Rf , so using the hypothesis, we have RX ∩ Rf ⊂ Lf .
From Prop. 2.3.3 [condition (2)], it follows that there exists at least one optimal
solution.
(c) Let X = and f (x) = x. Then
Ff = Lf = y | y = 0 ,
6
Let x be a vector in C0 , and for each k ≥ 1, let xk be the projection of x on
Ck . If {xk } is bounded, then since the gj are closed, any limit point x̃ of {xk }
satisfies
gj (x̃) ≤ lim inf gj (xk ) ≤ 0,
k→∞
so x̃ ∈ ∩∞ and ∩∞
k=1 Ck , k=1 Ck = Ø. If {xk } is unbounded,
let y be a limit point
of the sequence (xk − x)/xk − x | xk = x , and without loss of generality,
assume that
xk − x
→ y.
xk − x
We claim that
y ∈ ∩rj=0 Rgj .
Indeed, if for some j, we have y ∈/ Rgj , then there exists α > 0 such that
gj (x + αy) > w0 . Let
xk − x
zk = x + α ,
xk − x
and note that for sufficiently large k, zk lies in the line segment connecting x and
xk , so that g1 (zk ) ≤ w0 . On the other hand, we have zk → x + αy, so using the
closedness of gj , we must have
∩∞
k=1 Ck = Ø ⇒ there exists j ∈ {1, . . . , r} and y ∈ ∩j=0 Rgj with y ∈
r
/ Fg .
j
(1)
We now use induction on r. For r = 0, the preceding proof shows that
∩∞ ∞
k=1 Ck = Ø. Assume that ∩k=1 Ck = Ø for all cases where r < r. We will show
∞
that ∩k=1 Ck = Ø for r = r. Assume the contrary. Then, by Eq. (1), there exists
j ∈ {1, . . . , r} and y ∈ ∩rj=0 Rgj with y ∈
/ Fg . Let us consider the sets
j
C k = x | g0 (x) ≤ wk , gj (x) ≤ 0, j = 1, . . . , r, j = j .
Since these sets are nonempty, by the induction hypothesis, ∩∞ k=1 C k = Ø. For
any x̃ ∈ ∩∞ ∞
k=1 C k , the vector x̃ + αy belongs to ∩k=1 C k for all α > 0, since
y ∈ ∩rj=0 Rgj . Since g0 (x̃) ≤ 0, we have x̃ ∈ dom(gj ), by the hypothesis regarding
the domains of the gj . Since y ∈ ∩rj=0 Rgj with y ∈ / Fg , it follows that gj (x̃ +
j
7
αy) → −∞ as α → ∞. Hence, for sufficiently large α, we have gj (x̃ + αy) ≤ 0,
so x̃ + αy belongs to ∩∞
k=1 Ck .
{x | x1 ≤ 0, x2 ∈ },
{x | 0 < x1 < 1, x2 ∈ }
and
Ck = x | g0 (x) ≤ wk , gj (x) ≤ 0, j = 1, . . . , r .
The functions involved in the definition of Ck are bidirectionally flat, and each Ck
is nonempty by construction. By applying part (a), we see that the intersection
∩∞k=0 Ck is nonempty. For any x in this intersection, we have Ax = y (since
yk → y), showing that y ∈ A C.
(c) We will use part (a) and the line of proof of Prop. 2.3.3 [condition (3)]. Denote
f ∗ = inf f (x),
x∈C
8
2.8 (Minimization of Quasiconvex Functions)
Under each of the conditions (1)-(4), we show that the set of minima of f
over X, which is given by
X ∗ = ∩∞
k=1 (X ∩ Vγ k )
is nonempty.
Let condition (1) hold. The sets X ∩ Vγ k are nonempty, closed, convex,
and nested. Furthermore, for each k, their recession cone is given by RX ∩ Rγ k
and their lineality space is given by LX ∩ Lγ k . We have that
∩∞
k=1 (RX ∩ Rγ k ) = RX ∩ Rf ,
and
∩∞
k=1 (LX ∩ Lγ k ) = LX ∩ Lf ,
9
2.9 (Partial Minimization)
Combined with Eq. (2.6), the preceding relation yields a contradiction, thus
showing that Ef is convex.
Next assume that Ef is convex. We show that epi(f ) is convex. Let
(x1 , w1 ) and (x2 , w2 ) be arbitrary vectors in epi(f ). Consider sequences of vectors
(x1 , w1k ) and (x2 , w2k ) such that w1k > w1 , w2k > w2 , and w1k → w1 , w2k →
w2 . It follows that for each k, (x1 , w1k ) and (x2 , w2k ) belong to Ef . Since Ef is
convex by assumption, this implies that for each α ∈ [0, 1] and all k, the vector
αx1 + (1 − α)x2 , αw1k + (1 − α)w2k ∈ Ef , i.e., we have for each k
f αx1 + (1 − α)x2 < αw1k + (1 − α)w2k .
10
which implies that there exists some z ∈ m such that
F (x, z) < w,
showing that (x, z, w) belongs to the set (x, z, w) | F (x, z) < w , and (x, w) ∈ T .
Conversely, let (x, w) ∈ T . This implies that there exists some z such that
F (x, z) < w, from which we get
(a) For each u ∈ m , let fu (x) = f (x, u). There are two cases; either fu ≡ ∞,
or fu is lower semicontinuous with bounded level sets. The first case, which
corresponds to p(u) = ∞, can’t hold for every u, since f is not identically equal
to ∞. Therefore, dom(p) = Ø, and for each u ∈ dom(p), we have by Weierstrass’
Theorem that p(u) = inf x fu (x) is finite [i.e., p(u) > −∞ for all u ∈ dom(p)] and
the set P (u) = arg minx fu (x) is nonempty and compact.
We now show that p is lower semicontinuous. By assumption, for all u ∈
m and for all α ∈
, there exists a neighborhood N of u such that the set
(x, u) | f (x, u) ≤ α ∩ (n × N ) is bounded in n × m . We can choose a
smaller closed set N containing u such that the set (x, u) | f (x, u) ≤ α ∩
( × N ) is closed (since f is lower semicontinuous) and bounded. In view of the
n
where x and u are scalars. This function is continuous in (x, u) and the level sets
are bounded in x for each u, but not locally uniformly in u, i.e., there does not
11
exists a neighborhood N of u = 0 such that the set (x, u) | u ∈ N, f (x, u) ≤ α
is bounded for some α > 0.
For this function, we have
0 if u = 0,
p(u) =
1 if u = 0.
w − x if w ∈ C,
f (w, x) =
∞ if w ∈
/ C.
We now show that f (w, x) satisfies the assumptions of Exercise 2.10, so that we
can apply the results of this exercise to this problem.
Since the set C is closed by assumption, it follows that f (w, x) is lower
semicontinuous. Moreover, by Weierstrass’ Theorem, we see that f (w, x) > −∞
for all x and w. Since the set C is nonempty by assumption, we also have that
12
dom(f ) is nonempty. It is also straightforward to see that the function · ,
and therefore the function f , satisfies the locally uniformly level-boundedness
assumption of Exercise 2.10.
(a) Since the function · is lower semicontinuous and the set C is closed, it follows
from Weierstrass’ Theorem that for all x∗ ∈ n , the infimum in inf w f (w, x) is
attained at some w∗ , i.e., P (x∗ ) is nonempty. Hence, we see that for all x∗ ∈ n ,
there exists some w∗ ∈ P (x∗ ) such that f (w∗ , ·) is continuous at x∗ , which
follows by continuity of the function · . Hence, the function f (w, x) satisfies
the sufficiency condition given in Exercise 2.10(d), and it follows that dC (x)
depends continuously on x.
(b) This part follows from part (a) of Exercise 2.10.
(c) This part follows from part (c) of Exercise 2.10.
(a) We set s = 1/c and consider the function g(x, s) : n × → (−∞, ∞] defined
by
g(x, s) = f (x) + θ̃ F (x), s ,
where
0 if u ∈ D,
δD (u) =
∞ if u ∈
/ D.
We identify the original problem with that of minimizing g(x, 0) in x ∈ n , and
the approximate problem for parameter s ∈ (0, s] with that of minimizing g(x, s)
in x ∈ n where s = 1/c. With the notation introduced in Exercise 2.10, the
optimal value of the original problem is given by p(0) and the optimal value of
the approximate problem is given by p(s). Hence, we have
We now show that, for the function g(x, s), the assumptions of Exercise 2.10 are
satisfied.
We have that g(x, s) > −∞ for all (x, s), since by assumption f (x) > −∞
for all x and θ(u, s) > −∞ for all (u, s). The function θ̃ is such that θ̃(u, s) < ∞
at least for one vector (u, s), since the set D is nonempty. Therefore, it follows
that g(x, s) < ∞ for at least one vector (x, s), unless g ≡ ∞, in which case all
the results of this exercise follow trivially.
We now show that the function θ̃ is lower semicontinuous. This is easily
seen at all points where s = 0 in view of the assumption that the function θ is
13
lower semicontinuous on m × (0, ∞). We next consider points where s = 0. We
claim that for any α ∈ ,
u | θ̃(u, 0) ≤ α = u | θ̃(u, s) ≤ α . (2.8)
s∈(0,s]
thus, proving the relation in (2.8). Note that for all α ∈ and all s ∈ (0, s], the
set
u | θ̃(u, s) ≤ α = u | θ(u, 1/s) ≤ α ,
is closed by the lower semicontinuity of the function θ. Hence, the relation
in Eq. (2.8) implies that the set u | θ̃(u, 0) ≤ α is closed for all α ∈ ,
thus showing that the function θ̃ is lower semicontinuous everywhere (cf. Prop.
1.2.2). Together with the assumptions that f is lower semicontinuous and F is
continuous, it follows that g is lower semicontinuous.
Finally, we show that g satisfies the locally uniform level boundedness
property given in Exercise 2.10, i.e., for all s∗ ∈ and for all α ∈ , there
exists a neighborhood N of s∗ such that the set (x, s) | s ∈ N, g(x, s) ≤ α is
bounded. By assumption,
we have that the level sets of the function g(x, s) =
f (x) + θ̃ F (x), 1/s are bounded. The definition of θ̃, together with the fact that
θ̃(u, s) is monotonically increasing as s ↓ 0, implies that g is indeed level-bounded
in x locally uniformly in s.
Therefore, all the assumptions of Exercise 2.10 are satisfied and we get
that the function p is lower semicontinuous in s. Since θ̃(u, s) is monotonically
increasing as s ↓ 0, it follows that p is monotonically nondecreasing as s ↓ 0. This
implies that
p(s) → p(0), as s ↓ 0.
Defining sk = 1/ck for all k, where {ck } is the given sequence of parameter values,
we get
p(sk ) → p(0),
thus proving that the optimal value of the approximate problem converges to the
optimal value of the original problem.
(b) We have by assumption that sk → 0 with xk ∈ P1/sk . It follows from part (a)
that p(sk ) → p(0), so Exercise 2.10(b) implies that the sequence {xk } is bounded
and all its limit points are optimal solutions of the original problem.
14
2.13 (Approximation by Envelope Functions [RoW98])
1
f (w) + ( 2c )w − x2 if c ∈ (0, c0 ],
h(w, x, c) = f (x) if c = 0 and w = x,
∞ otherwise.
and
Pc f (x) = P (x, c) = arg min h(w, x, c).
w
We now show that, for the function h(w, x, c), the assumptions given in Exercise
2.10 are satisfied.
We have that h(w, x, c) > −∞ for all (w, x, c), since by assumption f (x) >
−∞ for all x ∈ n . Furthermore, h(w, x, c) < ∞ for at least one vector (w, x, c),
since by assumption f (x) < ∞ for at least one vector x ∈ X.
We next show that the function h is lower semicontinuous in (w, x, c). This
is easily seen at all points where c ∈ (0, c0 ] in view of the assumption that f is
lower semicontinuous and the function · 2 is lower semicontinuous.
We now
consider points where c = 0 and w = x. Let (wk , xk , ck ) be a sequence that
converges to some (w, x, 0) with w = x. We can assume without loss of generality
that wk = xk for all k. Note that for some k, we have
∞ if ck = 0,
h(wk , xk , ck ) = f (wk ) + ( 2c1 )wk − xk 2 if ck > 0.
k
15
for some scalar α, with (xk , ck ) → (x∗ , c∗ ), and wk → ∞. Then, for sufficiently
large k, we have wk = xk , which in view of Eq. (2.9) and the definition of the
function h, implies that ck ∈ (0, c0 ] and
1
f (wk ) + wk − xk 2 ≤ α,
2ck
for all sufficiently large k. In particular, since ck ≤ c0 , it follows from the pre-
ceding relation that
1
f (wk ) + wk − xk 2 ≤ α. (2.10)
2c0
The choice of c0 ensures, through the definition of cf , the existence of some
c1 > c0 , some x ∈ n , and some scalar β such that
1
f (w) ≥ − w − x2 + β, ∀ w.
2c1
1 1
− wk − x2 + wk − xk 2 ≤ α − β,
2c1 2c0
for all sufficiently large k. Dividing this relation by wk 2 and taking the limit
as k → ∞, we get
1 1
− + ≤ 0,
2c1 2c0
from which it follows that c1 ≤ c0 . This is a contradiction by our choice of c1 .
Hence, the function h(w, x, c) satisfies all the assumptions of Exercise 2.10.
By assumption, we have that f (x) < ∞ for some x ∈ n . Using the
definition of ec f (x), this implies that
1
ec f (x)= inf f (w) + w − x2
w 2c
1
≤ f (x) + x − x2 < ∞, ∀ x ∈ n ,
2c
16
2.14 (Envelopes and Proximal Mappings under Convexity [RoW98])
1
gc (x, w) = f (w) + w − x2 .
2c
(a) In order to show that cf is ∞, it suffices to show that ec f (0) > −∞ for all
c > 0. This will follow from Weierstrass’ Theorem, once we show the boundedness
of the level sets of gc (0, ·). Assume the contrary, i.e., there exists some α ∈
and a sequence {xk } such that xk → ∞ and
1
gc (0, xk ) = f (xk ) + xk 2 ≤ α, ∀ k. (2.11)
2c
Assume without loss of generality that xk > 1 for all k. We fix an x0 with
f (x0 ) < ∞. We define
1
τk = ∈ (0, 1),
xk
and
xk = (1 − τk )x0 + τk xk .
Since xk → ∞, it follows that τk → 0. Using Eq. (2.11) and the convexity of
f , we obtain
f (xk )≤ (1 − τk )f (x0 ) + τk f (xk )
τk
≤ (1 − τk )f (x0 ) + τk α − xk 2 .
2c
Taking the limit as k → ∞ in the above equation, we see that f (xk ) → −∞. It
follows from the definitions of τk and xk that
17
that the level sets of the function gc (0, ·) are bounded. Therefore, using Weier-
strass’ Theorem, we have that the infimum in ec f (0) = inf w gc (0, w) is attained,
and ec f (0) > −∞ for every c > 0. This shows that the supremum cf of all c > 0,
such that ec f (x) > −∞ for some x ∈ n , is ∞.
(b) Since the value cf is equal to ∞ by part (a), it follows that ec f and Pc f have
all the properties given in Exercise 2.13 for all c > 0: The set Pc f (x) is nonempty
and compact, and the function ec f (x) is finite for all x, and is continuous in (x, c).
Consider a sequence {wk } with wk ∈ Pck f (xk ) for some sequences xk → x∗ and
ck → c∗ > 0. Then, it follows from Exercise 2.13(b) that the sequence {wk }
is bounded and all its limit points belong to the set Pc∗ f (x∗ ). Since gc (x, w) is
strictly convex in w, it follows from Prop. 2.1.2 that the proximal mapping Pc f is
single-valued. Hence, we have that Pc f (x) → Pc∗ f (x∗ ) whenever (x, c) → (x∗ , c∗ )
with c∗ > 0.
(c) The envelope function ec f is convex by Exercise 2.15 [since gc (x, w) is convex
in (x, w)], and continuous by Exercise 2.13. We now prove that it is differentiable.
Consider any point x, and let w = Pc f (x). We will show that ec f is differentiable
at x with
(x − w)
∇ec f (x) = .
c
Equivalently, we will show that the function h given by
(x − w)
h(u) = ec f (x + u) − ec f (x) − u (2.12)
c
is differentiable at 0 with ∇h(0) = 0. Since w = Pc f (x), we have
1
ec f (x) = f (w) + w − x2 ,
2c
whereas
1
ec f (x + u) ≤ f (w) + w − (x + u)2 , ∀ u,
2c
so that
1 1 1 1
h(u) ≤ w − (x + u)2 − w − x2 − (x − w) u = u2 , ∀ u. (2.13)
2c 2c c 2c
Since ec f is convex, it follows from Eq. (2.12) that h is convex, and therefore,
1 1 1 1
0 = h(0) = h u + (−u) ≤ h(u) + h(−u),
2 2 2 2
which implies that h(u) ≥ −h(−u). From Eq. (2.13), we obtain
1 1
−h(−u) ≥ − − u2 = − u2 , ∀ u,
2c 2c
which together with the preceding relation yields
1
h(u) ≥ − u2 , ∀ u.
2c
Thus, we have
1
|h(u)| ≤ u2 , ∀ u,
2c
which implies that h is differentiable at 0 with ∇h(0) = 0. From the formula
for ∇ec f (·) and the continuity of Pc f (·), it also follows that ec is continuously
differentiable.
18
2.15
(a) In view of the assumption that int(C1 ) and C2 are disjoint and convex [cf
Prop. 1.2.1(d)], it follows from the Separating Hyperplane Theorem that there
exists a vector a = 0 such that
a x1 ≤ a x2 , ∀ x1 ∈ int(C1 ), ∀ x2 ∈ C2 .
a x ≤ b, ∀ x ∈ int(C1 ). (2.14)
We claim that the closed halfspace {x | a x ≥ b}, which contains C2 , does not
intersect int(C1 ).
Assume to arrive at a contradiction that there exists some x1 ∈ int(C1 )
such that a x1 ≥ b. Since x1 ∈ int(C1 ), we have that there exists some > 0
such that x1 + a ∈ int(C1 ), and
C1 = (x1 , x2 ) | x1 = 0 ,
C2 = (x1 , x2 ) | x1 > 0, x2 x1 ≥ 1 .
These two sets are convex and C2 is disjoint from ri(C1 ), which is equal to C1 . The
only separating hyperplane is the x2 axis, which corresponds to having a = (0, 1),
as defined in part (a). For this example, there does not exist a closed halfspace
that contains C2 but is disjoint from ri(C1 ).
2.16
19
2.17 (Strong Separation)
(a) We first show that (i) implies (ii). Suppose that C1 and C2 can be separated
strongly. By definition, this implies that for some nonzero vector a ∈ n , b ∈ ,
and > 0, we have
C1 + B ⊂ {x | a x > b},
C2 + B ⊂ {x | a x < b},
where B denotes the closed unit ball. Since a = 0, we also have
proving (ii).
Next, we show that (ii) implies (iii). Suppose that (ii) holds, i.e., there
exists some vector a ∈ n such that
= inf a (x1 − x2 ),
x1 ∈C1 , x2 ∈C2
≤ inf ax1 − x2 .
x1 ∈C1 , x2 ∈C2
It follows that
inf x1 − x2 > 0,
x1 ∈C1 , x2 ∈C2
From this we obtain for all x1 ∈ C1 , all x2 ∈ C2 , and for all y1 , y2 with y1 ≤ ,
y2 ≤ ,
20
which implies that 0 ∈
/ (C1 + B) − (C2 + B). Therefore, the convex sets C1 + B
and C2 + B are disjoint. By the Separating Hyperplane Theorem, we see that
C1 + B and C2 + B can be separated, i.e., C1 + B and C2 + B lie in opposite
closed halfspaces associated with the hyperplane that separates them. Then,
the sets C1 + (/2)B and C2 + (/2)B lie in opposite open halfspaces, which by
definition implies that C1 and C2 can be separated strongly.
(b) Since C1 and C2 are disjoint, we have 0 ∈ / (C1 − C2 ). Any one of conditions
(2)-(5) of Prop. 2.4.3 imply condition (1) of that proposition (see the discussion
in the proof of Prop. 2.4.3), which states that the set C1 − C2 is closed, i.e.,
cl(C1 − C2 ) = C1 − C2 .
From part (a), it follows that there exists a hyperplane separating C1 and C2
strongly.
2.18
(a) If C1 and C2 can be separated properly, we have from the Proper Separation
Theorem that there exists a vector a = 0 such that
Let
b = sup a x. (2.18)
x∈C2
H = {x | a x = b}.
21
We choose b to be
b = sup a x, (2.20)
x∈C2
K = {x | a x ≤ b},
b ≤ β < a x, ∀ x ∈ C1 ,
22
where RM denotes the recession cone of set M . Similarly, we have
LM ∩ N (A) = (y, 0) | y ∈ LX ∩ Lg1 · · · ∩ Lgr ,
where LM denotes the lineality space of set M . Under conditions (1), (2), and
(3), it follows from Prop. 1.5.8 that the set AM = C is closed. Similarly, under
condition (4), it follows from Exercise 2.7(b) that the set AM = C is closed.
By assumption, there is no vector x ∈ X such that
g1 (x) ≤ 0, . . . , gr (x) ≤ 0.
This implies that the origin does not belong to C. Therefore, by the Strict Sep-
aration Theorem, it follows that there exists a hyperplane that strictly separates
the origin and the set C, i.e., there exists a vector µ such that
0 < ≤ µ u, ∀ u ∈ C. (2.21)
g1 (x) ≤ 0, . . . , gr (x) ≤ 0.
This implies by part (a) that there exists a positive scalar , and a vector µ ∈ r
with µ ≥ 0, such that
µ1 g1 (x) + · · · + µr gr (x) ≥ , ∀ x ∈ X.
Let x be an arbitrary vector in X and let j(x) be the smallest index that satisfies
j(x) = arg maxj=1,...,r gj (x). Then Eq. (2.22) implies that for all x ∈ X
r
r
r
This contradicts the statement that for every > 0, there exists a vector x ∈ X
such that
g1 (x) < , . . . , gr (x) < ,
and concludes the proof.
23
2.21
(a) Let us denote the optimal value of the min common point problem and the
∗ ∗
max crossing point problem corresponding to conv(M ) by wconv(M ) and qconv(M ) ,
respectively. In view of the assumption that M is compact, it follows from Prop.
1.3.2 that the set conv(M ) is compact. Therefore, by Weierstrass’ Theorem,
∗
wconv(M ) , defined by
∗
wconv(M ) = inf w
(0,w)∈conv(M )
is convex. Indeed, we consider vectors (u, w) ∈ conv(M ) and (ũ, w̃) ∈ conv(M ),
and we show that their convex combinations lie in conv(M ). The definition of
conv(M ) implies that there exists some wM and w̃M such that
wM ≤ w, (u, wM ) ∈ conv(M ),
24
Therefore, by Min Common/Max Crossing Theorem I, we have
∗ ∗
wconv(M ) = qconv(M ) . (2.23)
Let q ∗ be the optimal value of the max crossing point problem corresponding to
M , i.e.,
q ∗ = sup q(µ),
µ∈n
q ∗ = qconv(M
∗ ∗
) = wconv(M ) ,
We claim that limk→∞ f (xk ) is finite, i.e., that limk→∞ f (xk ) > −∞. Indeed, by
Prop. 2.5.1, the epigraph of f is contained in the upper halfspace
of anonvertical
hyperplane of n+1 . Since {xk } converges to x, the limit of f (xk ) cannot be
equal to −∞. Thus the sequence xk , f (xk ) , which belongs to M , converges to
x, limk→∞ f (xk ) Therefore, since M is closed, x, limk→∞ f (xk ) ∈ M . By the
definition of f , this implies that f (x) ≤ limk→∞ f (xk ), contradicting our earlier
hypothesis.
(c) We prove this result by showing that all the assumptions of Min Com-
mon/Max Crossing Theorem I are satisfied. By assumption, w∗ < ∞ and
the set M is convex. Therefore, we only need to show that for every sequence
{uk , wk } ⊂ M with uk → 0, there holds w∗ ≤ lim inf k→∞ wk .
25
Consider a sequence {uk , wk } ⊂ M with uk → 0. If lim inf k→∞ wk = ∞,
then we are done, so assume that lim inf k→∞ wk = w̃ for some scalar w̃. Since
M ⊂ M and M is closed by assumption, it follows that (0, w̃) ∈ M . By the
definition of the set M , this implies that there exists some w with w ≤ w̃ and
(0, w) ∈ M . Hence we have
q ∗ = sup q(µ),
µ∈m
f (x) ≤ w, ei x − di = ui , i = 1, . . . , m,
f (x̃) ≤ w̃, ei x̃ − di = ũi , i = 1, . . . , m.
For any α ∈ [0, 1], we multiply these relations with α and 1-α, respectively, and
add. By using the convexity of f , we obtain
f αx + (1 − α)x̃ ≤ αf (x) + (1 − α)f (x̃) ≤ αw + (1 − α)w̃,
ei αx + (1 − α)x̃ − di = αui + (1 − α)ũi , i = 1, . . . , m.
In view of the convexity of X, we have αx+(1−α)x̃ ∈ X, so these equations imply
that the convex combination of (u, w) and (ũ, w̃) belongs to M , thus proving that
M is convex.
(c) We prove this result by showing that all the assumptions of Min Com-
mon/Max Crossing Theorem I are satisfied. By assumption, w∗ is finite. It
follows from part (b) that the set M is convex. Therefore, we only need to
26
show that for every sequence (uk , wk ) ⊂ M with uk → 0, there holds w∗ ≤
lim inf k→∞ wk .
Consider a sequence (uk , wk ) ⊂ M with uk → 0. Since X is compact
and f is convex by assumption (which implies that f is continuous by Prop.
1.4.6),
it follows
from Prop. 1.1.9(c) that set M is compact. Hence, the sequence
(uk , wk ) has a subsequence that converges to some (0, w) ∈ M . Assume with-
out loss of generality that (uk , wk ) converges to (0, w). Since (0, w) ∈ M , we
get
w∗ = inf w ≤ w = lim inf wk ,
(0,w)∈M k→∞
x̂(z) = arg min φ(x, z), ẑ(x) = arg max φ(x, z).
x∈X z∈Z
which is a continuous function in view of the assumption that the functions x̂(z)
and ẑ(x) are continuous over Z and X, respectively. Assume that the compact
interval X is given by [a, b]. We now show that the function f has a fixed point,
i.e., there exists some x∗ ∈ [a, b] such that
f (x∗ ) = x∗ .
27
Define the function g : X → X by
g(x) = f (x) − x.
Assume that f (a) > a and f (b) < b, since otherwise we are done. We have
or equivalently, if x = x̂(z) and z = ẑ(x). Therefore, from Eq. (2.24), we see that
(x∗ , z ∗ ) is a saddle point of φ.
We now consider the function φ(x, z) = x2 + z 2 over X = [0, 1] and Z =
[0, 1]. For each z ∈ [0, 1], the function φ(·, z) is minimized over [0, 1] at a unique
point x̂(z) = 0, and for each x ∈ [0, 1], the function φ(x, ·) is maximized over
[0, 1] at a unique point ẑ(x) = 1. These two curves intersect at (x∗ , z ∗ ) = (0, 1),
which is the unique saddle point of φ.
Let X and Z be closed and convex sets. Then, for each z ∈ Z, the function
tz : n → (−∞, ∞] defined by
φ(x, z) if x ∈ X,
tz (x) =
∞ otherwise,
28
and
−∞ < sup inf φ(x, z).
z∈Z x∈X
where RX is the recession cone of the convex set X and N (Q) is the null space
of the matrix Q. Similarly, for each z ∈ Z, the constancy space of the function
tz is given by
Ltz = LX ∩ N (Q) ∩ {y | y Dz = 0},
then it follows from the Saddle Point Theorem part (a), that the set of saddle
points of φ is nonempty and compact. [In particular, the condition given in Eq.
(2.25) holds when Q and R are positive definite matrices, or if X and Z are
compact.]
Similarly, if
then it follows from the Saddle Point Theorem part (b), that the set of saddle
points of φ is nonempty.
29
Convex Analysis and
Optimization
Chapter 3 Solutions
Dimitri P. Bertsekas
with
(a) Let x̂ be the projection of x on C, which exists and is unique since C is closed
and convex. By the Projection Theorem (Prop. 2.2.1), we have
(x − x̂) (y − x̂) ≤ 0, ∀ y ∈ C.
(x − x̂) x̂ = 0.
(x − x̂) y ≤ 0, ∀ y ∈ C,
implying that x − x̂ ∈ C ∗ .
Conversely, if x̂ ∈ C, (x − x̂) x̂ = 0, and x − x̂ ∈ C ∗ , then it follows that
(x − x̂) (y − x̂) ≤ 0, ∀ y ∈ C,
x1 ∈ C, (x − x1 ) x1 = 0, x − x1 ∈ C ∗ .
x − y ∈ C = (C ∗ )∗ , y (x − y) = 0, y ∈ C ∗.
x1 ∈ C, (x − x1 ) x1 = 0, x − x1 ∈ C ∗ ,
x2 ∈ C ∗ , (x − x2 ) x2 = 0, x − x2 ∈ C
are satisfied, so that by part (a), x1 and x2 are the projections of x on C and
C ∗ , respectively. Hence, property (i) holds.
2
3.2
If a ∈ C ∗ + x | x ≤ γ/β , then
Since C is a closed convex cone, by the Polar Cone Theorem (Prop. 3.1.1), we
have (C ∗ )∗ = C, implying that for all x in C with x ≤ β,
Hence,
a x = (â + a) x ≤ γ, ∀ x ∈ C with x ≤ β,
thus implying that
max a x ≤ γ.
x≤β, x∈C
a a
a β = (â + a) β = aβ ≤ γ,
a a
implying that a ≤ γ/β, and showing that a ∈ C ∗ + x | x ≤ γ/β .
3.3
y x ≤ 0, (−y) x ≤ 0, ∀ x ∈ C,
implying that
y x = 0, ∀ x ∈ C. (3.1)
Let the dimension of the subspace aff(C) be m. By Prop. 1.4.1, there exist vectors
x0 , x1 , . . . , xm in ri(C) such that x1 − x0 , . . . , xm − x0 span aff(C). Thus, for any
z ∈ aff(C), there exist scalars β1 , . . . , βm such that
m
z= βi (xi − x0 ).
i=1
3
By using this relation and Eq. (3.1), for any z ∈ aff(C), we obtain
m
y z = βi y (xi − x0 ) = 0,
i=1
⊥ ⊥
implying that y ∈ aff(C) . Hence, LC ∗ ⊂ aff(C) .
⊥
Conversely, let y ∈ aff(C) , so that in particular, we have
y x = 0, (−y) x = 0, ∀ x ∈ C.
dim(C) + dim(LC ∗ ) = n.
Furthermore, since
Lconv(C) ⊂ Lcl(conv(C)) ,
it follows that
dim(C ∗ ) + dim Lconv(C) ≤ dim(C ∗ ) + dim Lcl(conv(C)) = n.
(a) It suffices to consider the case where m = 2. Let (y1 , y2 ) ∈ (C1 × C2 )∗ . Then,
we have (y1 , y2 ) (x1 , x2 ) ≤ 0 for all (x1 , x2 ) ∈ C1 × C2 , or equivalently
y1 x1 + y2 x2 ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 .
4
Conversely, let y1 ∈ C1∗ and y2 ∈ C2∗ . Then, we have
implying that (y1 , y2 ) ∈ (C1 × C2 )∗ , and showing that C1∗ × C2∗ ⊂ (C1 × C2 )∗ .
(b) A vector y belongs to the polar cone of ∪i∈I Ci if and only if y x ≤ 0 for all
x ∈ Ci and all i ∈ I, which
∗ is equivalent to having y ∈ Ci∗ for every i ∈ I. Hence,
y belongs to ∪i∈I Ci if and only if y belongs to ∩i∈I Ci∗ .
(c) Let y ∈ (C1 + C2 )∗ , so that
y (x1 + x2 ) ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 . (3.2)
y x1 ≤ 0, ∀ x1 ∈ C1 ,
y x2 ≤ 0, ∀ x2 ∈ C2 .
y x1 ≤ 0, ∀ x1 ∈ C1 ,
y x2 ≤ 0, ∀ x2 ∈ C2 ,
implying that
y (x1 + x2 ) ≤ 0, ∀ x1 ∈ C1 , ∀ x2 ∈ C2 .
By taking the polars and by using the Polar Cone Theorem, we obtain
∗
(C1 ∩ C2 )∗ = (C1∗ + C2∗ )∗ = cl conv(C1∗ + C2∗ ) .
Suppose now that ri(C1 ) ∩ ri(C2 ) = Ø. We will show that C1∗ + C2∗ is
closed by using Exercise 1.43. According to this exercise, if for any nonempty
closed convex sets C 1 and C 2 in n , the equality y1 + y2 = 0 with y1 ∈ RC and
1
5
y2 ∈ RC implies that y1 and y2 belong to the lineality spaces of C 1 and C 2 ,
2
respectively, then the vector sum C 1 + C 2 is closed.
Let y1 + y2 = 0 with y1 ∈ RC ∗ and y2 ∈ RC ∗ . Because C1∗ and C2∗ are
1 2
closed convex cones, we have RC ∗ = C1∗ and RC ∗ = C2∗ , so that y1 ∈ C1∗ and
1 2
y2 ∈ C2∗ . The lineality space of a cone is the set of vectors y such that y and
−y belong to the cone, so that in view of the preceding discussion, to show that
C1∗ + C2∗ is closed, it suffices to prove that −y1 ∈ C1∗ and −y2 ∈ C2∗ .
Since y1 = −y2 and y1 ∈ C1∗ , it follows that
y2 x ≥ 0, ∀ x ∈ C1 , (3.3)
y2 x ≤ 0, ∀ x ∈ C2 ,
y2 x = 0, ∀ x ∈ C1 ∩ C2 . (3.4)
In view of the fact ri(C1 ) ∩ ri(C2 ) = Ø, and Eqs. (3.3) and (3.4), it follows that
the linear function y2 x attains its minimum over the convex set C1 at a point in
the relative interior of C1 , implying that y2 x = 0 for all x ∈ C1 (cf. Prop. 1.4.2).
Therefore, y2 ∈ C1∗ and since y2 = −y1 , we have −y1 ∈ C1∗ . By exchanging the
roles of y1 and y2 in the preceding analysis, we similarly show that −y2 ∈ C2∗ ,
completing the proof.
(e) By drawing the cones C1 and C2 , it can be seen that ri(C1 ) ∩ ri(C2 ) = Ø and
C1 ∩ C2 = (x1 , x2 , x3 ) | x1 = 0, x2 = −x3 , x3 ≤ 0 ,
C1∗ = (y1 , y2 , y3 ) | y12 + y22 ≤ y32 , y3 ≥ 0 ,
C2∗ = (z1 , z2 , z3 ) | z1 = 0, z2 = z3 .
may fail.
6
3.5 (Linear Transformations and Polar Cones)
∗ ∗
Therefore, a ∈ cl(A K ∗ ) , and since cl(A K ∗ ) ⊂ (A K ∗ )∗ , it follows that
a ∈ (A K ∗ )∗ . In view of Eq. (3.5) and the Polar Cone Theorem (Prop. 3.1.1), we
have
(A K ∗ )∗ = A−1 (K ∗ )∗ = A−1 · K,
∗
implying that a ∈ A−1 · K. Because y ∈ A−1 · K , it follows that y a ≤ 0,
contradicting Eq. (3.6). Hence, we must have y ∈ cl(A K ∗ ), showing that
∗
A−1 · K ⊂ cl(A K ∗ ).
Similar to the preceding analysis, since (A−1 · K)∗ is a cone, it can be seen that
7
∗
implying that a ∈ (A−1 ·K)∗ . Since K is a closed convex cone and A is a linear
(and therefore continuous) transformation, the set A−1 ·K
is a closedconvex cone.
∗
Furthermore, by the Polar Cone Theorem, we have that (A−1 ·K)∗ = A−1 ·K.
Therefore, a ∈ A−1 · K, implying that Aa ∈ K. Since y ∈ A K ∗ , we have y = A v
for some v ∈ K ∗ , and it follows that
y a = (A v) a = v Aa ≤ 0,
contradicting Eq. (3.7). Hence, we must have y ∈ (A−1 · K)∗ , implying that
A K ∗ ⊂ (A−1 · K)∗ .
cl(A K ∗ ) = A K ∗ .
(−y) x ≥ 0, ∀ x ∈ K. (3.8)
For y ∈ N (A ), we have −y ∈ N (A ) and since N (A ) = R(A)⊥ , it follows that
In view of the relation ri(K) ∩ R(A) = Ø, and Eqs. (3.8) and (3.9), the linear
function (−y) x attains its minimum over the convex set K at a point in the
relative interior of K, implying that (−y) x = 0 for all x ∈ K (cf. Prop. 1.4.2).
Hence (−y) ∈ K ∗ , so that y ∈ LK ∗ and because y ∈ N (A ), we see that y ∈
LK ∗ ∩N (A ). The reverse inclusion follows directly from the relation LK ∗ ⊂ RK ∗ ,
thus completing the proof.
8
which when combined with the preceding relation yields cl(C ∗ − C ∗ ) = n .
(b) ⇒ (c) Since C is a closed convex cone, by the polar cone operations of Exercise
3.4, it follows that
∗
C ∩ (−C) = cl(C ∗ − C ∗ ) = n .
By taking the polars and using the Polar Cone Theorem (Prop. 3.1.1), we obtain
∗ ∗
C ∩ (−C) = C ∩ (−C) = {0}. (3.10)
a x̂ ≥ a x, ∀ x ∈ C∗ − C∗.
(C ∗ − C ∗ )∗ = (C ∗ )∗ ∩ (−C ∗ )∗ = C ∩ (−C).
y
v+δ x ≤ 0, ∀ x ∈ C, ∀ y ∈ n , y = 0.
y
implying that
v x + δx ≤ 0, ∀ x ∈ C, x = 0.
v x ≤ −δx, ∀ x ∈ C.
9
Multiplying the preceding relation with −1 and letting x̂ = −v, we obtain
x̂ x ≥ δx, ∀ x ∈ C.
Then, D is a closed convex set since it is the intersection of the closed convex
cone C and the closed convex set {y | x̂ y = 1}. Obviously, 0 ∈ D. Thus, to show
that D is a base for C, it remains to prove that C = cone(D). Take any x ∈ C.
If x = 0, then x ∈ cone(D) and we are done, so assume that x = 0. We have by
hypothesis
x̂ x ≥ δx > 0, ∀ x ∈ C, x = 0,
so we may define ŷ = x̂x x . Clearly, ŷ ∈ D and x = (x̂ x)ŷ with x̂ x > 0,
showing that x ∈ cone(D) and that C ⊂ cone(D). Since D ⊂ C, the inclusion
cone(D) ⊂ C is obvious. Thus, C = cone(D) and D is a base for C. Furthermore,
for every y in D, since y is also in C, we have
1 = x̂ y ≥ δy,
λk d = µ k y k , ∀ k.
λ
Therefore, yk = µk d ∈ D for all k and because D is bounded, the sequence yk }
k
has a subsequence converging to some y ∈ cl(D). Without loss of generality, we
λ
may assume that yk → y, which in view of yk = µk d for all k, implies that y = αd
k
and αd ∈ cl(D) for some α ≥ 0. Furthermore, by the definition of base, we have
0 ∈ cl(D), so that α > 0. Similar to the preceding, by replacing d with −d, we
can show that α̃(−d) ∈ cl(D) for some positive scalar α̃. Therefore, αd ∈ cl(D)
and α̃(−d) ∈ cl(D) with α > 0 and α̃ > 0. Since D is convex, its closure cl(D)
is also convex, implying that 0 ∈ cl(D), contradicting the definition of a base.
Hence, the cone C must be pointed.
3.7
10
for some vectors aj in n . By Farkas’ Lemma [Prop. 3.2.1(b)], we have
C ∗ = cone {a1 , . . . , ar } ,
so the polar cone of a polyhedral cone is finitely generated. Conversely, using the
Polar Cone Theorem, we have
∗
cone {a1 , . . . , ar } = x | aj x ≤ 0, j = 1, . . . , r ,
so the polar of a finitely generated cone is polyhedral. Thus, a closed convex cone
is polyhedral if and only if its polar cone is finitely generated. By the Minkowski-
Weyl Theorem [Prop. 3.2.1(c)], a cone is finitely generated if and only if it is
polyhedral. Therefore, a closed convex cone is polyhedral if and only if its polar
cone is polyhedral.
3.8
implying that
lim y − (1/k)y k = 0.
k→∞
Therefore, the sequence (1/k)y k converges to y. Since y k ∈ C for all k ≥ 1,
the sequence (1/k)y k is in C, and by the closedness of C, it follows that y ∈ C.
Hence, RP ⊂ C.
(b) Any point in P has the form v + y with v ∈ conv {v1 , . . . , vm } and y ∈ C,
or equivalently
1 1
v + y = v + (v + 2y),
2 2
with v and v + 2y being twodistinct points in P if y = 0. Therefore, none of the
points v + y, with v ∈ conv {v1 , . . . , vm } and y ∈ C, is an extreme point of P
if y = 0. Hence, an extreme point of P must be in the set {v1 , . . . , vm }. Since
by definition, an extreme point of P is not a convex combination of points in P ,
an extreme point of P must be equal to some vi that cannot be expressed as a
convex combination of the remaining vectors vj , j = i.
11
3.9 (Polyhedral Cones and Sets under Linear Transformations)
C= x x= µj aj , µj ≥ 0, j = 1, . . . , r ,
j=1
AC = {y | y = Ax, x ∈ C} = y y= µj Aaj , µj ≥ 0, j = 1, . . . , r ,
j=1
A−1 · K = {x | Ax ∈ K}
= x | dj Ax ≤ 0, j = 1, . . . , r
= x | (A dj ) x ≤ 0, j = 1, . . . , r ,
P = x x= µj vj + y, µj = 1, µj ≥ 0, j = 1, . . . , m, y ∈ C ,
j=1 j=1
AP = {z | z = Ax, x ∈ P }
m
m
= z z= µj Avj + Ay, µj = 1, µj ≥ 0, j = 1, . . . , m, Ay ∈ AC .
j=1 j=1
AP = z z= µj wj + u, µj = 1, µj ≥ 0, j = 1, . . . , m, u ∈ AC
j=1 j=1
= conv {w1 , . . . , wm } + AC,
12
where w1 , . . . , wm ∈ m . By part (a), the cone AC is polyhedral, implying by the
Minkowski-Weyl Theorem [Prop. 3.2.1(c)] that AC is finitely generated. Hence,
the set AP has a Minkowski-Weyl representation and therefore, it is polyhedral
(cf. Prop. 3.2.2).
Let also Q be a polyhedral set in m given by
Q = y | dj y ≤ bj , j = 1, . . . , r ,
A−1 · Q = {x | Ax ∈ Q}
= x | dj Ax ≤ bj , j = 1, . . . , r
= x | (A dj ) x ≤ bj , j = 1, . . . , r ,
3.10
where a1 , . . . , ar1 and ã1 , . . . , ãr2 are some vectors in n1 and n2 , respectively.
Define
aj = (aj , 0), ∀ j = 1, . . . , r1 ,
aj = (0, ãj ), ∀ j = r1 + 1, . . . , r1 + r2 .
aj x1 ≤ 0, ∀ j = 1, . . . , r1 ,
ãj x2 ≤ 0, ∀ j = r1 + 1, . . . , r1 + r2 ,
or equivalently
aj (x1 , x2 ) ≤ 0, ∀ j = 1, . . . , r1 + r2 .
Therefore,
C1 × C2 = x ∈ n1 +n2 | aj x ≤ 0, j = 1, . . . , r1 + r2 ,
13
By part (a), the Cartesian product C1 × C2 is a polyhedral cone in n+n .
Under the linear transformation A that maps (x1 , x2 ) ∈ n+n into x1 + x2 ∈
n , the image A · (C1 × C2 ) is the set C1 + C2 , which is a polyhedral cone by
Exercise 3.9(a).
(c) Let P1 and P2 be polyhedral sets in n1 and n2 , respectively, given by
P1 = x1 ∈ n1 | aj x1 ≤ bj , j = 1, . . . , r1 ,
P2 = x2 ∈ n2 | ãj x2 ≤ b̃j , j = 1, . . . , r2 ,
where a1 , . . . , ar1 and ã1 , . . . , ãr2 are some vectors in n1 and n2 , respectively,
and bj and b̃j are some scalars. By defining
aj = (aj , 0), b j = bj , ∀ j = 1, . . . , r1 ,
3.11
P = x x= µj vj + y, µj = 1, µj ≥ 0, j = 1, . . . , m, y ∈ C ,
j=1 j=1
C= y y= λi ai , λi ≥ 0, i = 1, . . . , r ,
i=1
P = x x= µj vj + λi ai , µj = 1, µj ≥ 0, ∀ j, λi ≥ 0, ∀ i .
j=1 i=1 j=1
14
We claim that
cone(P ) = cone {v1 , . . . , vm , a1 , . . . , ar } .
Since P ⊂ cone {v1 , . . . , vm , a1 , . . . , ar } , it follows that
cone(P ) ⊂ cone {v1 , . . . , vm , a1 , . . . , ar } .
Conversely, let y ∈ cone {v1 , . . . , vm , a1 , . . . , ar } . Then, we have
m
r
y= µj vj + λi a i ,
j=1 i=1
r
with µj ≥ 0 and λi ≥ 0 for all i and j. If µj = 0 for all j, then y = i=1 λi ai ∈ C,
and since C = RP (cf. Exercise 3.8), it follows that y ∈ RP . Because the origin
belongs to P and y ∈ RP , we have 0 + y ∈ P , implying that y ∈ P , and
m
consequently y ∈ cone(P ). If µj > 0 for some j, then by setting µ = µ ,
j=1 j
µj = µj /µ for all j, and λi = λi /µ for all i, we obtain
m
r
y=µ µj vj + λi a i ,
j=1 i=1
m
where µ > 0, µj ≥ 0 with µ = 1, and λi ≥ 0. Therefore y = µ x with
j=1 j
x ∈ P and µ > 0, implying that y ∈ cone(P ) and showing that
cone {v1 , . . . , vm , a1 , . . . , ar } ⊂ cone(P ).
J = {j | bj = 0}.
15
Conversely, let x ∈ x | aj x ≤ 0, j ∈ J . We will show that x ∈ cone(P ).
If x ∈ P , then x ∈ cone(P ) and we are done, so assume that x ∈ P , implying
that the set
J = {j ∈ J | aj x > bj } (3.11)
bj
µ = min ,
j∈J aj x
aj (µx) ≤ 0, ∀ j ∈ J,
aj (µx) ≤ bj , ∀ j ∈ J.
For j ∈ J ∪ J and aj x ≤ 0 < bj , since µ > 0, we still have aj (µx) ≤ 0 < bj . For
j ∈ J ∪ J and 0 < aj x ≤ bj , since µ < 1, we have 0 < aj (µx) < bj . Therefore,
µx ∈ P , implying that x = µ1 (µx) ∈ cone(P ). It follows that
x | aj x ≤ 0, j ∈ J ⊂ cone(P ),
and hence, cone(P ) = x | aj x ≤ 0, j ∈ J .
If J = Ø, then we will show that cone(P ) = n . To see this, take any
x ∈ . If x ∈ P , then clearly x ∈ cone(P ), so assume that x ∈ P , implying that
n
the set J as defined in Eq. (3.11) is nonempty. Note that bj > 0 for all j, since
J is empty. The rest of the proof is similar to the preceding case.
As an example, where cone(P ) is not polyhedral when P does not contain
the origin, consider the polyhedral set P ⊂ 2 given by
P = (x1 , x2 ) | x1 ≥ 0, x2 = 1 .
Then, we have
cone(P ) = (x1 , x2 ) | x1 > 0, x2 > 0 ∪ (x1 , x2 ) | x1 = 0, x2 ≥ 0 ,
f2 (x) = max a1 x + b1 , . . . , am x + bm , ∀ x ∈ dom(f2 ),
16
where ai and ai are vectors in n , and bi and bi are scalars. The domain of
f1 +f2 coincides with dom(f1 )∩dom(f2 ), which is polyhedral by Exercise 3.10(d).
Furthermore, we have for all x ∈ dom(f1 + f2 ),
f1 (x) + f2 (x) = max a1 x + b1 , . . . , am x + bm + max a1 x + b1 , . . . , am x + bm
= max ai x + bi + aj x + bj
1≤i≤m, 1≤j≤m
= max (ai + aj ) x + (bi + bj ) .
1≤i≤m, 1≤j≤m
17
3.14 (Existence of Minima of Polyhedral Functions)
If the set of minima of f over P is nonempty, then evidently inf x∈P f (x) must
be finite.
Conversely, suppose that inf x∈P f (x) is finite. Since f is a polyhedral
function, by Prop. 3.2.3, we have
f (x) = max a1 x + b1 , . . . , am x + bm , ∀ x ∈ dom(f ),
minimize y
subject to aj x + bj ≤ y, j = 1, . . . , m, x ∈ P , y ∈ .
minimize c z
subject to z ∈ P̂ ,
where P̂ is polyhedral (P̂ = Ø since P = Ø). Furthermore, because inf x∈P f (x)
is finite, it follows that inf z∈P̂ c z is also finite. Thus, by Prop. 2.3.4 of Chapter
2, the set Z ∗ of minimizers
of c z over P̂ is nonempty, and the nonempty set
∗
x | z = (x, y), z ∈ Z is the set of minimizers of f over P .
We use induction on the dimension of the set X. Suppose that the dimension
of X is 0. Then, X consists of a single point, which is the global minimum of f
over X.
18
Assume that, for some l < n, f attains its minimum over every set X of
dimension less than or equal to l that is specified by linear inequality constraints,
and is such that f is bounded over X. Let X be of the form
X = {x | aj x ≤ bj , j = 1, . . . , r},
have dimension l + 1, and be such that f is bounded over X. We will show that
f attains its minimum over X.
If X is a bounded polyhedral set, f attains a minimum over X by Weier-
strass’ Theorem. We thus assume that X is unbounded. Using the the Minkowski-
Weyl representation, we can write X as
X = {x | x = v + αy, v ∈ V, y ∈ C, α ≥ 0},
where V is the convex hull of finitely many vectors and C is the intersection of a
finitely generated cone with the surface of the unit sphere {x | x = 1}. Then,
for any x ∈ X and y ∈ C, the vector x + αy belongs to X for every positive scalar
α and
f (x + αy) = f (x) + α(c + x Q)y + α2 y Qy.
Since the minimization in the right hand side is over a compact set, it follows
from Weierstrass’ Theorem and the preceding relation that the minimum of f
over X is attained.
Next, assume that there exists some y ∈ C such that y Qy = 0. From
Exercise 3.8, it follows that y belongs to the recession cone of X, denoted by RX .
If y is in the lineality space of X, denoted by LX , the vector x + αy belongs to
X for every x ∈ X and every scalar α, and we have
19
Let S = {γy | γ ∈ } be the subspace generated by y and consider the following
decomposition of X:
X = S + (X ∩ S ⊥ ),
(cf. Prop. 1.5.4). Then, we can write any x ∈ X as x = z+αy for some z ∈ X ∩S ⊥
and some scalar α, and it follows from Eq. (3.12) that f (x) = f (z), which implies
that
inf f (x) = inf f (x).
x∈X x∈X∩S ⊥
It can be seen that the dimension of set X ∩ S ⊥ is smaller than the dimension
of set X. To see this, note that S ⊥ contains the subspace parallel to the affine
hull of X ∩ S ⊥ . Therefore, y does not belong to the subspace parallel to the
affine hull of X ∩ S ⊥ . On the other hand, y belongs to the subspace parallel to
the affine hull of X, hence showing that the dimension of set X ∩ S ⊥ is smaller
than the dimension of set X. Since X ∩ S ⊥ ⊂ X, f is bounded over X ∩ S ⊥ ,
so by using the induction hypothesis, it follows that f attains its minimum over
X ∩ S ⊥ , which, in view of the preceding relation, is also the minimum of f over
X.
Finally, assume that y is not in LX , i.e., y ∈ RX , but −y ∈ / RX . The
recession cone of X is of the form
RX = {y | aj y ≤ 0, j = 1, . . . , r}.
Since y ∈ RX , we have
aj y ≤ 0, ∀ j = 1, . . . , r,
and since −y ∈
/ RX , the index set
J = {j | aj y < 0}
is nonempty.
Let {xk } be a minimizing sequence, i.e.,
f (xk ) → f ∗ ,
where f ∗ = inf x inX f (x). Suppose that for each k, we start at xk and move
along −y as far as possible without leaving the set X, up to the point where we
encounter the vector
xk = xk − βk y,
where βk is the nonnegative scalar given by
aj xk − bj
βk = min .
j∈J aj y
20
By construction of the sequence {xk }, it follows that there exists some j0 ∈ J such
that aj0 xk = bj0 for all k in an infinite index set K ⊂ {0, 1, . . .}. By reordering
the linear inequalities if necessary, we can assume that j0 = 1, i.e.,
a1 xk = b1 , ∀ k ∈ K.
and note that {xk }K ⊂ X. The dimension of X is smaller than the dimension
of X. To see this, note that the set {x | a1 x = b1 } contains X, so that a1 is
orthogonal to the subspace SX that is parallel to aff(X). Since a1 y < 0, it follows
that y ∈
/ SX . On the other hand, y belongs to SX , the subspace that is parallel
to aff(X), since for all k, we have xk ∈ X and xk − βk y ∈ X.
Since X ⊂ X, f is also bounded over X, so it follows from the induction
hypothesis that f attains its minimum over X at some x∗ . Because {xk }K ⊂ X,
and using also Eq. (3.13), we have
3.16
Assume that P has an extreme point, say v. Then, by Prop. 3.3.3(a), the set
Av = aj | aj v = bj , j ∈ {1, . . . , r}
L = {x + λd | λ ∈ },
21
3.17
has two extreme points, (0,0) and (0,1). Its image AC ⊂ is given by
AC = {x1 | x1 ≥ 0},
3.18
For the sets C1 and C2 as given in this exercise, the set C1 ∪ C2 is compact, and
its convex hull is also compact by Prop. 1.3.2 of Chapter 1. The set of extreme
points of conv(C1 ∪ C2 ) is not closed, since it consists of the two end points of the
line segment C1 , namely (0, 0, −1) and (0, 0, 1), and all the points x = (x1 , x2 , x3 )
such that
x = 0, (x1 − 1)2 + x22 = 1, x3 = 0.
3.19
By Prop. 3.3.2, a polyhedral set has a finite number of extreme points. Con-
versely, let P be a compact convex set having a finite number of extreme points
{v1 , . . . , vm }. By the Krein-Milman Theorem (Prop. 3.3.1), a compact
convex set
is equal to the convex hull of its extreme points, so that P = conv {v1 , . . . , vm } ,
22
which is a polyhedral set by the Minkowski-Weyl Representation Theorem (Prop.
3.2.2).
As an example showing that the assertion fails if compactness of the set
is replaced by a weaker assumption that the set is closed and contains no lines,
consider the set D ⊂ 3 given by
D = (x1 , x2 , x3 ) | x21 + x22 ≤ 1, x3 = 1 .
Let C = cone(D). It can seen that C is not a polyhedral set. On the other hand,
C is closed, convex, does not contain a line, and has a unique extreme point at
the origin.
[For a more formal argument, note that if C were polyhedral, then the set
D = C ∩ (x1 , x2 , x3 ) | x3 = 1
would also be polyhedral by Exercise 3.10(d), since both C and (x1 , x2 , x3 ) |
x3 = 1 are polyhedral sets. Thus, by Prop. 3.2.2, it would follow that D has a
finite number of extreme points. But this is a contradiction because the set of
extreme points of D coincides with (x1 , x2 , x3 ) | x21 + x22 = 1, x3 = 1 , which
contains an infinite number of points. Thus, C is not a polyhedral cone, and
therefore not a polyhedral set, while C is closed, convex, does not contain a line,
and has a unique extreme point at the origin.]
3.20 (Faces)
aj v = bj , ∀ j = 1, . . . , n
[cf. Prop. 3.3.3(a)]. Define the vector a ∈ n , the scalar b, and the hyperplane
H as follows
1 1
n n
a= aj , b= bj , H = x | a x = b .
n n
j=1 j=1
23
Then, we have
a v = b,
aj v = bj , ∀ j = 1, . . . , n. (3.14)
By multiplying each of the above inequalities with 1/n and by summing the
obtained inequalities, we obtain
1 1
n n
aj v < bj ,
n n
j=1 j=1
implying that a v < b, which contradicts the fact that v ∈ H. Hence, Eq. (3.14)
holds, and since the vectors a1 , . . . , an are linearly independent, it follows that
v = v, showing that P ∩ H = {v}.
As discussed in Section 3.3, every extreme point of P is a relative boundary
point of P . Since every relative boundary point of P is also a boundary point of
P , it follows that every extreme point of P is a boundary point of P . Thus, v is
a boundary point of P , and as shown earlier, H passes through v and contains P
in one of its halfspaces. By definition, it follows that P ∩ H = {v} is a face of P .
(c) Since P is not an affine set, it cannot consist of a single point, so we must
have dim(P ) > 0. Let P be given by
P = x | aj x ≤ bj , j = 1, . . . , r ,
for some vectors aj ∈ n and scalars bj . Also, let A be the matrix with rows aj
and b be the vector with components bj , so that
P = {x | Ax ≤ b}.
24
there exists an inequality aj0 x ≤ bj0 that is not an implicit equality of the
system Ax ≤ b. Consider the set
F = x ∈ P | aj0 x = bj0 .
P = {x | Ax ≤ b},
where J ⊂ {1, . . . , r}. From this it will follow that the number of distinct faces
of P is finite.
By removing the redundant inequalities if necessary, we may assume that
the system Ax ≤ b defining P is nonredundant. Let F be a face of P , so that
F = P ∩ H, where H is a hyperplane that passes through a boundary
point of P
and contains P in one of its halfspaces. Let H = x | c x = cx for a nonzero
vector c ∈ n and a boundary point x of P , so that
F = x ∈ P | c x = cx
and
c x ≤ cx, ∀ x ∈ P.
These relations imply that the set of points x such that Ax ≤ b and c x ≤ cx
coincides with P , and since the system Ax ≤ b is nonredundant, it follows that
c x ≤ cx is a redundant inequality of the system Ax ≤ b and c x ≤ cx. Therefore,
the inequality c x ≤ cx is implied by the inequalities of Ax ≤ b, so that there
exists some µ ∈ r with µ ≥ 0 such that
r
r
µj aj = c, µj bj = c x.
j=1 j=1
25
implying that
F = x ∈ P | aj x = bj , j ∈ J .
Conversely, let F be a nonempty set given by
F = x ∈ P | aj x = bj , j ∈ J ,
Then, we have
x ∈ P | aj x = bj , j ∈ J = x ∈ P | c x = β ,
[cf. Eq. (3.15) where µj = 1 for all j ∈ J]. Let H = x | c x = β , so that in
view of the preceding relation, we have that F = P ∩ H. Since every point of F
is a boundary point of P , it follows that H passes through a boundary point of
P . Furthermore, for every x ∈ P , we have aj x ≤ bj for all j ∈ J, implying that
c x ≤ β for every x ∈ P . Thus, H contains P in one of its halfspaces. Hence, F
is a face.
Assume that x∗ is an extreme point of P and let y ∗ = f (x∗ ). We will show that
y ∗ is an extreme point of Q. Since x∗ is an extreme point of P , by Exercise
3.20(b), it is also a face of P , and therefore, there exists a vector c ∈ n such
that
c x < c x∗ , ∀ x ∈ P, x = x∗ .
For any y ∈ Q with y = y ∗ , we have
f g(y) = y = y ∗ = f (x∗ ),
implying that
g(y) = g(y ∗ ) = x∗ , with g(y) ∈ P.
Hence,
c g(y) < c g(y ∗ ), ∀ y ∈ Q, y = y ∗ .
Let the affine function g be given by g(y) = By + d for some n × m matrix B
and vector d ∈ n . Then, we have
26
implying that
(B c) y < (B c) y ∗ , ∀ y ∈ Q, y = y ∗ .
If y ∗ were not an extreme point of Q, then we would have y ∗ = αy1 + (1 − α)y2
for some distinct points y1 , y2 ∈ Q, y1 = y ∗ , y2 = y ∗ , and α ∈ (0, 1), so that
g(x, z) = x, ∀ (x, z) ∈ Q.
Evidently, f and g are affine functions. Furthermore, clearly
f (x) ∈ Q, g f (x) = x, ∀ x ∈ P,
g(x, z) ∈ P, f g(x, z) = x, ∀ (x, z) ∈ Q.
Hence, P and Q are isomorphic.
3.22 (Unimodularity I)
Suppose that the system Ax = b has integer components for every vector b ∈ n
with integer components. Since A is invertible, it follows that the vector A−1 b has
integer components for every b ∈ n with integer components. For i = 1, . . . , n,
let ei be the vector with ith component equal to 1 and all other components equal
to 0. Then, for b = ei , the vectors A−1 ei , i = 1, . . . , n, have integer components,
implying that the columns of A−1 are vectors with integer components, so that
A−1 has integer entries. Therefore, det(A−1 ) is integer, and since det(A) is
also integer and det(A) · det(A−1 ) = 1, it follows that either det(A) = 1 or
det(A) = −1, showing that A is unimodular.
Suppose now that A is unimodular. Take any vector b ∈ n with integer
components, and for each i ∈ {1, . . . , n}, let Ai be the matrix obtained from A
by replacing the ith column of A with b. Then, according to Cramer’s rule, the
components of the solution x̂ of the system Ax = b are given by
det(Ai )
x̂i = , i = 1, . . . , n.
det(A)
Since each matrix Ai has integer entries, it follows that det(Ai ) is integer for all
i = 1, . . . , n. Furthermore, because A is invertible and unimodular, we have either
det(A) = 1 or det(A) = −1, implying that the vector x̂ has integer components.
27
3.23 (Unimodularity II)
(a) The proof is straightforward from the definition of the totally unimodular
matrix and the fact that B is a submatrix of A if and only if B is a submatrix
of A .
(b) Suppose that A is totally unimodular. Let J be a subset of {1, . . . , n}. Define
z by zj = 1 if j ∈ J, and zj = 0 otherwise. Also let w = Az, ci = di = 12 wi if
wi is even, and ci = 12 (wi − 1) and di = 12 (wi + 1) if wi is odd. Consider the
polyhedral set
P = {x | c ≤ Ax ≤ d, 0 ≤ x ≤ z},
and note that P = Ø because 12 z ∈ P . Since A is totally unimodular, the
polyhedron P has integer extreme points. Let x̂ ∈ P be one of them. Because
0 ≤ x̂ ≤ z and x̂ has integer components, it follows that x̂j = 0 for j ∈ J and
x̂j ∈ {0, 1} for j ∈ J. Therefore, zj − 2x̂j = ±1 for j ∈ J. Define J1 = {j ∈ J |
zj − 2x̂j = 1} and J2 = {j ∈ J | zj − 2x̂j = −1}. We have
aij − aij = aij (zj − 2x̂j )
j∈J1 j∈J2 j∈J
n
= [Az]i − 2[Ax̂]i
= wi − 2[Ax̂]i ,
where [Ax]i denotes the ith component of the vector Ax. If wi is even, then since
ci ≤ [Ax̂]i ≤ di and ci = di = 12 wi , it follows that [Ax̂]i = wi , so that
Therefore,
aij − aij ≤ 1, ∀ i = 1, . . . , m. (3.16)
j∈J1 j∈J2
Suppose now that the matrix A is such that any J ⊂ {1, . . . , n} can be
partitioned into two subsets so that Eq. (3.16) holds. We prove that A is totally
28
unimodular, by showing that each of its square submatrices is unimodular, i.e.,
the determinant of every square submatrix of A is -1, 0, or 1. We use induction
on the size of the square submatrices of A.
To start the induction, note that for J ⊂ {1, . . . , n} with J consisting of a
single element, from Eq. (3.16) we obtain aij ∈ {−1, 0, 1} for all i and j. Assume
now that the determinant of every (k − 1) × (k − 1) submatrix of A is -1, 0, or
1. Let B be a k × k submatrix of A. If det(B) = 0, then we are done, so assume
that B is invertible. Our objective is to prove that | det B| = 1. By Cramer’s
B∗
rule and the induction hypothesis, we have B −1 = det(B) , where b∗ij ∈ {−1, 0, 1}.
By the definition of B , we have Bb1 = det(B)e1 , where b∗1 is the first column of
∗ ∗
k
[Bb∗1 ]i = bij b∗j1 = bij − bij = 0, ∀ i = 2, . . . , k.
j=1 j∈J 1 j∈J 2
Thus, the cardinality of the set J is even, so that for any partition (J̃1 , J̃2 ) of J,
it follows that j∈J̃ bij − j∈J̃ bij is even for all i = 2, . . . , k. By assumption,
1 2
there is a partition (J1 , J2 ) of J such that
bij − bij ≤ 1 ∀ i = 1, . . . , k, (3.17)
j∈J1 j∈J2
implying that
bij − bij = 0, ∀ i = 2, . . . , k. (3.18)
j∈J1 j∈J2
29
the 1 × 1 submatrices of A are the entries of A, which are -1, 0, or 1. Suppose
that the determinant of each (k − 1) × (k − 1) submatrix of A is -1, 0, or 1, and
consider a k × k submatrix B of A. If B has a zero column, then det(B) = 0
and we are done. If B has a column with a single nonzero component (1 or -1),
then by expanding its determinant along that column and by using the induction
hypothesis, we see that det(B) = 1 or det(B) = −1. Finally, if each column of
B has exactly two nonzero components (one 1 and one -1), the sum of its rows
is zero, so that B is singular and det(B) = 0, completing the proof and showing
that A is totally unimodular.
(b) The proof is based on induction as in part (a). The 1 × 1 submatrices of A
are the entries of A, which are 0 or 1. Suppose now that the determinant of each
(k−1)×(k−1) submatrix of A is -1, 0, or 1, and consider a k×k submatrix B of A.
Since in each column of A, the entries that are equal to 1 appear consecutively, the
same is true for the matrix B. Take the first column b1 of B. If b1 = 0, then B is
singular and det(B) = 0. If b1 has a single nonzero component, then by expanding
the determinant of B along b1 and by using the induction hypothesis, we see
that det(B) = 1 or det(B) = −1. Finally, let b1 have more than one nonzero
component (its nonzero entries are 1 and appear consecutively). Let l and p be
rows of B such that bi1 = 0 for all i < l and i > p, and bi1 = 1 for all l ≤ i ≤ p.
By multiplying the lth row of B with (-1) and by adding it to the l + 1st, l + 2nd,
. . ., kth row of B, we obtain a matrix B such that det(B) = det(B) and the first
column b1 of B has a single nonzero component. Furthermore, the determinant
of every square submatrix of B is -1, 0, or 1 (this follows from the fact that the
determinant of a square matrix is unaffected by adding a scalar multiple of a
row of the matrix to some of its other rows, and from the induction hypothesis).
Since b1 has a single nonzero component, by expanding the determinant of B
along b1 , it follows that det(B) = 1 or det(B) = −1, implying that det(B) = 1 or
det(B) = −1, completing the induction and showing that A is totally unimodular.
aij − aij ≤ 1, ∀ j = 1, . . . , n.
i∈I1 i∈I2
Since aij ∈ {−1, 0, 1} and exactly two of a1j , . . . , amj are nonzero for each j, it
follows that
aij − aij = 0, ∀ j = 1, . . . , n.
i∈I1 i∈I2
Take any j ∈ {1, . . . , n}, and let l and p be such that aij = 0 for all i = l and
i = p, so that in view of the preceding relation and the fact aij ∈ {−1, 0, 1}, we
see that: if alj = −apj , then both l and p are in the same subset (I1 or I2 ); if
alj = apj , then l and p are not in the same subset.
30
Suppose now that the rows of A can be divided into two subsets such
that for each column the following property holds: if the two nonzero entries in
the column have the same sign, they are in different subsets, and if they have
the opposite sign, they are in the same subset. By multiplying all the rows in
one of the subsets by −1, we obtain the matrix A with entries aij ∈ {−1, 0, 1},
and exactly one 1 and exactly one -1 in each of its columns. Therefore, by
Exercise 3.24(a), A is totally unimodular, so that every square submatrix of A
has determinant -1, 0, or 1. Since the determinant of a square submatrix of A
and the determinant of the corresponding submatrix of A differ only in sign, it
follows that every square submatrix of A has determinant -1, 0, or 1, showing
that A is totally unimodular.
(a) Assume that there exist x̂ ∈ n and µ ∈ r such that both conditions (i) and
(ii) hold, i.e.,
aj x̂ < 0, ∀ j = 1, . . . , r, (3.19)
r
µ = 0, µ ≥ 0, µj aj = 0. (3.20)
j=1
µj aj x̂ < 0.
j=1
r
µj aj x̂ = 0,
j=1
which is a contradiction. Hence, both conditions (i) and (ii) cannot hold simul-
taneously.
The proof will be complete if we show that the conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
C1 = w ∈ r | aj x ≤ wj , j = 1, . . . , r, x ∈ n ,
C2 = {ξ ∈ r | ξj < 0, j = 1, . . . , r}.
It can be seen that both C1 and C2 are convex. Furthermore, because the condi-
tion (i) does not hold, C1 and C2 are disjoint sets. Therefore, by the Separating
Hyperplane Theorem (Prop. 2.4.2), C1 and C2 can be separated, i.e., there exists
a nonzero vector µ ∈ r such that
µ w ≥ µ ξ, ∀ w ∈ C1 , ∀ ξ ∈ C2 ,
31
implying that
inf µ w ≥ µ ξ, ∀ ξ ∈ C2 .
w∈C1
Since each component ξj of ξ ∈ C2 can be any negative scalar, for the preceding
relation to hold, µj must be nonnegative for all j. Furthermore, by letting ξ → 0,
in the preceding relation, it follows that
inf µ w ≥ 0,
w∈C1
implying that
µ1 w1 + · · · + µr wr ≥ 0, ∀ w ∈ C1 .
By setting wj = aj x for all j, we obtain
(µ1 a1 + · · · + µr ar ) x ≥ 0, ∀ x ∈ n ,
µ1 a1 + · · · + µr ar = 0.
Hence, the condition (ii) holds, showing that the conditions (i) and (ii) cannot
fail to hold simultaneously.
Alternative proof : We will show the equivalent statement of part (b), i.e., that
a polyhedral cone contains an interior point if and only if the polar C ∗ does not
contain a line. This is a special case of Exercise 3.2 (the dimension of C plus the
dimension of the lineality space of C ∗ is n), as well as Exercise 3.6(d), but we
will give an independent proof.
Let
C = x | aj x ≤ 0, j = 1, . . . , r ,
where aj = 0 for all j. Assume that C contains an interior point, and to arrive
at a contradiction, assume that C ∗ contains a line. Then there exists a d = 0
such that d and −d belong to C ∗ , i.e., d x ≤ 0 and −d x ≤ 0 for all x ∈ C, so
that d x = 0 for all x ∈ C. Thus for the interior point x ∈ C, we have d x = 0,
and since d ∈ C ∗ and d = j=1 µj aj for some µj ≥ 0, we have
r
r
µj aj x = 0.
j=1
This is a contradiction, since x is an interior point of C, and we have aj x < 0 for
all j.
Conversely, assume that C ∗ does not contain a line. Then by Prop. 3.3.1(b),
C ∗ has an extreme point, and since the origin is the only possible extreme point
of a cone, it follows that the origin is an extreme point of C ∗, which is the cone
generated by {a1 , . . . , ar }. Therefore 0 ∈
/ conv {a1 , . . . , ar } , and there exists
a hyperplane that strictly separates the origin from conv {a1 , . . . , ar } . Thus,
there exists a vector x such that y x < 0 for all y ∈ conv {a1 , . . . , ar } , so in
particular,
aj x < 0, ∀ j = 1, . . . , r,
32
and x is an interior point of C.
(b) Let C be a polyhedral cone given by
C = x | aj x ≤ 0, j = 1, . . . , r ,
int(C) = x | aj x < 0, j = 1, . . . , r ,
so that C has nonempty interior if and only if the condition (i) of part (a) holds.
By Farkas’ Lemma [Prop. 3.2.1(b)], the polar cone of C is given by
∗
r
C = x x= µj aj , µj ≥ 0, j = 1, . . . , r .
j=1
We now show that C ∗ contains a line if and only if there is a µ ∈ r such that
r
µ = 0, µ ≥ 0, and j=1 µj aj = 0 [condition (ii) of part (a) holds]. Suppose that
C contains a line, i.e., a set of the form {x + αz | α ∈ }, where x ∈ C ∗ and
∗
r
z= µj aj , ∀ j, µj ≥ 0, µj = 0 for some j,
j=1
r
−z = µj aj , ∀ j, µj ≥ 0, µj = 0 for some j.
j=1
r
Thus, j=1
(µj + µj )aj = 0, where (µj + µj ) ≥ 0 for all j and (µj + µj ) = 0 for
at least one j, showing that the condition (ii) of part (a) holds.
r
Conversely, suppose that µ a = 0 with µj ≥ 0 for all j and µj = 0
j=1 j j
for some j. Assume without loss of generality that µ1 > 0, so that
µj
−a1 = aj ,
µ1
j =1
with µj /µ1 ≥ 0 for all j, which implies that −a1 ∈ C ∗ . Since a1 ∈ C ∗ , −a1 ∈ C ∗ ,
and a1 = 0, it follows that C ∗ contains a line, completing the proof.
33
3.27 (Linear System Alternatives)
Assume that there exist x̂ ∈ n and µ ∈ r such that both conditions (i) and
(ii) hold, i.e.,
aj x̂ ≤ bj , ∀ j = 1, . . . , r, (3.21)
r
r
µ ≥ 0, µj aj = 0, µj bj < 0. (3.22)
j=1 j=1
µj aj x̂ ≤ µ j bj .
j=1 j=1
r
r
µj aj x̂ = 0 > µj bj ,
j=1 j=1
which is a contradiction. Hence, both conditions (i) and (ii) cannot hold simul-
taneously.
The proof will be complete if we show that conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
P1 = {ξ ∈ r | ξj ≤ 0, j = 1, . . . , r},
P2 = w ∈ r | aj x − bj = wj , j = 1, . . . , r, x ∈ n .
Clearly, P1 is a polyhedral set. For the set P2 , we have
P2 = {w ∈ r | Ax − b = w, x ∈ n } = R(A) − b,
where A is the matrix with rows aj and b is the vector with components bj .
Thus, P2 is an affine set and is therefore polyhedral. Furthermore, because the
condition (i) does not hold, P1 and P2 are disjoint polyhedral sets, and they
can be strictly separated [Prop. 2.4.3 under condition (5)]. Hence, there exists a
vector µ ∈ r such that
sup µ ξ < inf µ w.
ξ∈P1 w∈P2
Since each component ξj of ξ ∈ P1 can be any negative scalar, for the preceding
relation to hold, µj must be nonnegative for all j. Furthermore, since 0 ∈ P1 , it
follows that
0 < inf µ w,
w∈P2
implying that
0 < µ 1 w1 + · · · + µ r wr , ∀ w ∈ P2 .
By setting wj = aj x − bj for all j, we obtain
µ1 b1 + · · · + µr br < (µ1 a1 + · · · + µr ar ) x, ∀ x ∈ n .
34
Since this relation holds for all x ∈ n , we must have
µ1 a1 + · · · + µr ar = 0,
implying that
µ1 b1 + · · · + µr br < 0.
Hence, the condition (ii) holds, showing that the conditions (i) and (ii) cannot
fail to hold simultaneously.
Assume that there exist x̂ ∈ C and µ ∈ r such that both conditions (i) and (ii)
hold, i.e.,
fj (x̂) < 0, ∀ j = 1, . . . , r, (3.23)
r
µ = 0, µ ≥ 0, µj fj (x̂) ≥ 0. (3.24)
j=1
r
µj fj (x̂) < 0,
j=1
contradicting the last relation in Eq. (3.24). Hence, both conditions (i) and (ii)
cannot hold simultaneously.
The proof will be complete if we show that conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
P = {ξ ∈ r | ξj ≤ 0, j = 1, . . . , r},
C1 = w ∈ r | fj (x) < wj , j = 1, . . . , r, x ∈ C .
The set P is polyhedral, while C1 is convex by the convexity of C and fj for all j.
Furthermore, since condition (i) does not hold, P and C1 are disjoint, implying
that ri(C1 ) ∩ P = Ø. By the Polyhedral Proper Separation Theorem (cf. Prop.
3.5.1), the polyhedral set P and convex set C1 can be properly separated by a
hyperplane that does not contain C1 , i.e., there exists a vector µ ∈ r such that
Since each component ξj of ξ ∈ P can be any negative scalar, the first relation
implies that µj ≥ 0 for all j, while the second relation implies that µ = 0.
Furthermore, since µ ξ ≤ 0 for all ξ ∈ P and 0 ∈ P , it follows that
sup µ ξ = 0 ≤ inf µ w,
ξ∈P w∈C1
35
implying that
0 ≤ µ1 w1 + · · · + µr wr , ∀ w ∈ C1 .
By letting wj → fj (x) for all j, we obtain
f = µ1 f1 + · · · + µr fr
C̃ = C ∩ dom(f1 ) ∩ · · · ∩ dom(fr ).
By Exercise 1.27, the function f is nonnegative over cl C̃ . Given that ri(C) ⊂
dom(fi ) for all i, we have ri(C) ⊂ C̃, and therefore
C ⊂ cl ri(C) ⊂ cl C̃ .
Hence, f is nonnegative over C and condition (ii) holds, showing that the condi-
tions (i) and (ii) cannot fail to hold simultaneously.
Assume that there exist x̂ ∈ C and µ ∈ r such that both conditions (i) and (ii)
hold, i.e.,
r
r
µj fj (x̂) < 0,
j=1
contradicting the last relation in Eq. (3.26). Hence, both conditions (i) and (ii)
cannot hold simultaneously.
The proof will be complete if we show that conditions (i) and (ii) cannot
fail to hold simultaneously. Assume that condition (i) fails to hold, and consider
the sets given by
P = {ξ ∈ r | ξj ≤ 0, j = 1, . . . , r},
C1 = w ∈ r | fj (x) < wj , j = 1, . . . , r, fj (x) = wj , j = r + 1, . . . , r, x ∈ C .
36
The set P is polyhedral, and it can be seen that C1 is convex, since C and
f1 , . . . , fr are convex, and fr+1 , . . . , fr are affine. Furthermore, since the con-
dition (i) does not hold, P and C1 are disjoint, implying that ri(C1 ) ∩ P = Ø.
Therefore, by the Polyhedral Proper Separation Theorem (cf. Prop. 3.5.1), the
polyhedral set P and convex set C1 can be properly separated by a hyperplane
that does not contain C1 , i.e., there exists a vector µ ∈ r such that
Since each component ξj of ξ ∈ P can be any negative scalar, the first relation
implies that µj ≥ 0 for all j. Therefore, µ ξ ≤ 0 for all ξ ∈ P and since 0 ∈ P , it
follows that
sup µ ξ = 0 ≤ inf µ w.
ξ∈P w∈C1
0 ≤ µ1 w1 + · · · + µr wr , ∀ w ∈ C1 ,
f = µ1 f1 + · · · + µr fr
C = C ∩ dom(f1 ) ∩ · · · ∩ dom(fr ).
By Exercise 1.27, f is nonnegative over cl C . Given that ri(C) ⊂ dom(fi ) for
all i = 1, . . . , r, we have ri(C) ⊂ C, and therefore
C ⊂ cl ri(C) ⊂ cl C .
37
so that
inf µr+1 fr+1 (x) + · · · + µr fr (x) = µr+1 fr+1 (x) + · · · + µr fr (x) = 0,
x∈C
with x ∈ ri(C). Thus, the affine function µr+1 fr+1 + · · · + µr fr attains its
minimum value over C at a point in the relative interior of C. Hence, by Prop.
1.4.2 of Chapter 1, the function µr+1 fr+1 + · · · + µr fr is constant over C, i.e.,
This contradicts the second relation in (3.27). Hence, not all µ1 , . . . , µr are zero,
showing that the condition (ii) holds, and proving that the conditions (i) and (ii)
cannot fail to hold simultaneously.
(a) If two elementary vectors z and z had the same support, the vector z − γz
would be nonzero and have smaller support than z and z for a suitable scalar
γ. If z and z are not scalar multiples of each other, then z − γz = 0, which
contradicts the definition of an elementary vector.
(b) We note that either y is elementary or else there exists a nonzero vector z
with support strictly contained in the support of y. Repeating this argument for
at most n − 1 times, we must obtain an elementary vector.
(c) We first show that every nonzero vector y ∈ S has the property that there
exists an elementary vector of S that is in harmony with y and has support that
is contained in the support of y.
We show this by induction on the number of nonzero components of y. Let
Vk be the subset of nonzero vectors in S that have k or less nonzero components,
and let k be the smallest k for which Vk is nonempty. Then, by part (b), every
vector y ∈ Vk must be elementary, so it has the desired property. Assume that all
vectors in Vk have the desired property for some k ≥ k. We let y be a vector in
Vk+1 and we show that it also has the desired property. Let z be an elementary
vector whose support is contained in the support of y. By using the negative of
z if necessary, we can assume that yj zj > 0 for at least one index j. Then there
exists a largest value of γ, call it γ, such that
38
The vector y − γz is in harmony with y and has support that is strictly contained
in the support of y. Thus either y − γz = 0, in which case the elementary
vector z is in harmony with y and has support equal to the support of y, or else
y − γz is nonzero. In the latter case, we have y − γz ∈ Vk , and by the induction
hypothesis, there exists an elementary vector z that is in harmony with y − γz
and has support that is contained in the support of y − γz. The vector z is also
in harmony with y and has support that is contained in the support of y. The
induction is complete.
Consider now the given nonzero vector x ∈ S, and choose any elementary
vector z 1 of S that is in harmony with x and has support that is contained in
the support of x (such a vector exists by the property just shown). By using the
negative of z 1 if necessary, we can assume that xj z 1j > 0 for at least one index j.
Let γ be the largest value of γ such that
xj − γz 1j ≥ 0, ∀ j with xj > 0,
xj − γz 1j ≤ 0, ∀ j with xj < 0.
The vector x − z , where
1
z1 = γ z1,
is in harmony with x and has support that is strictly contained in the support of
x. There are two cases: (1) x = z 1 , in which case we are done, or (2) x = z 1 , in
which case we replace x by x − z 1 and we repeat the process. Eventually, after m
steps where m ≤ n (since each step reduces the number of nonzero components
by at least one), we will end up with the desired decomposition x = z 1 + · · · + z m .
For simplicity, assume that B is the Cartesian product of bounded open intervals,
so that B has the form
where bj and bj are some scalars. The proof is easily modified for the case where
B has a different form.
Since B ∩ S ⊥ = Ø, there exists a hyperplane that separates B and S ⊥ . The
normal of this hyperplane is a nonero vector d ∈ S such that
t d ≤ 0, ∀ t ∈ B.
t d < 0, ∀ t ∈ B.
Equivalently, we have
(bj − )dj + (bj + )dj < 0, (3.28)
{j|dj >0} {j|dj <0}
39
for all > 0 such that bj + < bj − . Let
d = z1 + · · · + zm,
where the last equality holds because the vectors z i are in harmony with d and
their supports are contained in the support of d. From the preceding relation,
we see that for at least one elementary vector z i , we must have
0> (bj − )zji + (bj + )zji ,
{j|z i >0} {j|z i <0}
j j
for all > 0 that are sufficiently small and are such that bj + < bj − , or
equivalently
0 > t z i , ∀ t ∈ B.
(0, ∞) if i = k,
Bi =
[0, ∞) if i = k,
does not intersect the subspace S, so by the result of Exercise 3.31, there exists
a vector z of S ⊥ such that x z < 0 for all x ∈ B. For this to hold, we must have
z ∈ B ∗ or equivalently z ≤ 0, while by choosing x = (0, . . . , 0, 1, 0, . . . , 0) ∈ B,
40
with the 1 in the kth position, the inequality x z < 0 yields zk < 0. Thus
assertion (2) holds with y = −z. Similarly, we show that if (2) does not hold,
then (1) must hold.
Let now I be the set of indices k such that (1) holds, and for each k ∈ I,
let x(k) be a vector in S such that x(k) ≥ 0 and xk (k) > 0 (note that we do not
exclude the possibility that one of the sets I and I is empty). Let I be the set of
indices such that (2) holds, and for each k ∈ I, let y(k) be a vector in S ⊥ such
that y(k) ≥ 0 and yk (k) > 0. From what has already been shown, I and I are
disjoint, I ∪ I = {1, . . . , n}, and the vectors
x= x(k), y= y(k),
k∈I k∈I
satisfy
xi > 0, ∀ i ∈ I, xi = 0, ∀ i ∈ I,
yi = 0, ∀ i ∈ I, yi > 0, ∀ i ∈ I.
The uniqueness of I and I follows from their construction and the preceding
arguments. In particular, if for some k ∈ I, there existed a vector x ∈ S with
x ≥ 0 and xk > 0, then since for the vector y(k) of S ⊥ we have y(k) ≥ 0
and yk (k) > 0, assertions (a) and (b) must hold simultaneously, which is a
contradiction.
The last assertion follows from the fact that for each k, exactly one of the
assertions (1) and (2) holds.
(b) Consider the subspace
S = (x, w) | Ax − bw = 0, x ∈ n , w ∈ .
Its orthogonal complement is the range of the transpose of the matrix [A − b],
so it has the form
S ⊥ = (A z, −b z) | z ∈ m .
By applying the result of part (a) to the subspace S, we obtain a partition of the
index set {1, . . . , n + 1} into two subsets. There are two possible cases:
(1) The index n + 1 belongs to the first subset.
(2) The index n + 1 belongs to the second subset.
In case (2), the two subsets are of the form I and I∪{n+1} with I∪I = {1, . . . , n},
and by the last assertion of part (a), we have w = 0 for all (x, w) such that
x ≥ 0, w ≥ 0 and Ax − bw = 0. This, however, contradicts the fact that the
set F = {x | Ax = b, x ≥ 0} is nonempty. Therefore, case (1) holds, i.e., the
index n + 1 belongs to the first index subset. In particular, we have that there
exist disjoint index sets I and I with I ∪ I = {1, . . . , n}, and vectors (x, w) with
Ax − bw = 0, and z ∈ m such that
w > 0, b z = 0,
xi > 0, ∀ i ∈ I, xi = 0, ∀ i ∈ I,
yi = 0, ∀ i ∈ I, yi > 0, ∀ i ∈ I,
where y = A z. By dividing (x, w) with w if needed, we may assume that w = 1
so that Ax − b = 0, and the result follows.
41
Convex Analysis and
Optimization
Chapter 4 Solutions
Dimitri P. Bertsekas
with
(a) Since f (x; 0) = 0, the relation f (x; λy) = λf (x; y) clearly holds for λ = 0
and all y ∈ n . Choose λ > 0 and y ∈ n . By the definition of directional
derivative, we have
f x + α(λy) − f (x) f x + (αλ)y − f (x)
f (x; λy) = inf = λ inf .
α>0 α α>0 αλ
f (x + βy) − f (x)
f (x; λy) = λ inf = λf (x; y).
β>0 β
(b) Let (y1 , w1 ) and (y2 , w2 ) be two points in epi f (x; ·) , and let γ be a scalar
with γ ∈ (0, 1). Consider a point (yγ , wγ ) given by
f (x + αy) − f (x)
α
2
Combining the preceding two relations, we see that for all α ≤ α1 and α ≤ α2 ,
f (x + αyγ ) − f (x) f (x + α1 y1 ) − f (x) f (x + α2 y2 ) − f (x)
≤γ + (1 − γ) .
α α1 α2
By taking the infimum over α, and then over α1 and α2 , we obtain
f (x; yγ ) ≤ γf (x; y1 ) + (1 − γ)f (x; y2 ) ≤ γw1 + (1 − γ)w2 = wγ ,
where in the last inequality we use the fact (y1 , w1 ), (y2 , w2 ) ∈ epi f (x; ·) Hence
the point (yγ , wγ ) belongs to epi f (x; ·) , implying that f (x; ·) is a convex
function.
(c) Since f (x; 0) = 0 and (1/2)y + (1/2)(−y) = 0, it follows that
f x; (1/2)y + (1/2)(−y) = 0, ∀ y ∈ n .
By part (b), the function f (x; ·) is convex, so that
0 ≤ (1/2)f (x; y) + (1/2)f (x; −y),
and
−f (x; −y) ≤ f (x; y).
(d) Let a vector y be in the level set y | f (x; y) ≤ 0 , and let λ > 0. By part
(a),
f (x; λy) = λf (x; y) ≤ 0,
so that λy also belongs to this level set, which is therefore a cone. By part (b),
the function f (x; ·) is convex, implying that the level set y | f (x; y) ≤ 0 is
convex.
Since dom(f ) = n , f (x; ·) is a real-valued function, and since
it is convex,
by Prop. 1.4.6, it is also continuous over n . Therefore the level set y | f (x; y) ≤
0 is closed.
We now show that
∗
y | f (x; y) ≤ 0 = cl cone ∂f (x) .
and the desired relation follows by the Polar Cone Theorem [Prop. 3.1.1(b)].
3
4.2 (Chain Rule for Directional Derivatives)
showing that F is directionally differentiable at x and that the given chain rule
holds.
4.3
or equivalently
d x − f (x) ≥ d y − f (y), ∀ y ∈ n .
Therefore, d ∈ n is a subgradient of f at x if and only if
d x − f (x) = max d y − f (y) .
y
4.4
z ≥ d z, ∀ z ∈ n .
By letting z = d in this relation, we obtain d ≤ 1, showing that ∂f (0) ⊂ d |
d ≤1 .
On the other hand, for any d ∈ n with d ≤ 1, we have
d z ≤ d · z ≤ z , ∀ z ∈ n ,
which is equivalent
to f (0) + d z ≤ f (z) for all z, so that d ∈ ∂f (0), and therefore
d | d ≤ 1 ⊂ ∂f (0).
4
Note that an alternative proof is obtained by writing
x = max x z,
z≤1
4.5
When f is defined on the real line, by Prop. 4.2.1, ∂f (x) is a compact interval of
the form
∂f (x) = [α, β].
By Prop. 4.2.2, we have
f (x; y) = max y d, ∀ y ∈ n ,
d∈∂f (x)
4.6
5
4.7
Let x and y be any two points in the set X. Since ∂f (x) is nonempty, by using
the subgradient inequality, it follows that
implying that
According to Prop. 4.2.3, the set ∪x∈X ∂f (x) is bounded, so that for some con-
stant L > 0, we have
d ≤ L, ∀ d ∈ ∂f (x), ∀ x ∈ X, (4.1)
and therefore,
f (x) − f (y) ≤ L x − y .
f (y) − f (x) ≤ L x − y ,
implying that
f x + α(z − x) − f (x)
f (x; z − x) = inf ≥ d (z − x) > −∞.
α>0 α
6
Furthermore, for all α ∈ (0, 1), and x, z ∈ dom(f ), the vector x+α(z −x) belongs
to dom(f ). Therefore, for all α ∈ (0, 1),
f x + α(z − x) − f (x)
< ∞,
α
implying that
f (x; z − x) < ∞.
for some scalar β ≥ 0. Because βf (x; y) = f (x; βy) for all β ≥ 0 and y ∈ n
[see Exercise 4.1(a)], it follows that
ν = f (x) + f x; β(x − x) = f (x) + f x; z − x),
contradicting Eq. (4.2), and thus showing that C and P do not have any common
point. Hence, ri(C) and P do not have any common point, so by the Polyhedral
Proper Separation Theorem (Prop. 3.5.1), the polyhedral set P and the convex
set C can be properly separated by a hyperplane that does not contain C, i.e.,
there exists a vector (a, γ) ∈ n+1 such that
a z + γν ≥ a x + β(x − x) + γ f (x) + βf (x; x − x) , ∀ (z, ν) ∈ C, ∀ β ≥ 0,
(4.3)
inf a z + γν < sup a z + γν , (4.4)
(z,ν)∈C (z,ν)∈C
We cannot have γ < 0 since then the left-hand side of Eq. (4.3) could be made
arbitrarily small by choosing ν sufficiently large. Also if γ = 0, then for β = 1,
from Eq. (4.3) we obtain
a (z − x) ≥ 0, ∀ z ∈ dom(f ).
7
Since x ∈ ri dom(f ) , we have that the linear function a z attains its minimum
over dom(f ) at a point in the relative interior of dom(f ). By Prop. 1.4.2, it
follows that a z is constant over dom(f ), i.e., a z = a x for all z ∈ dom(f ),
contradicting Eq. (4.4). Hence, we must have γ > 0 and by dividing with γ in
Eq. (4.3), we obtain
a z + ν ≥ a x + β(x − x) + f (x) + βf (x; x − x), ∀ (z, ν) ∈ C, ∀ β ≥ 0,
Because
f x + λ(z − x) − f (x)
f (z) − f (x) = f x + (z − x) − f (x) ≥ inf = f (x; z − x),
λ>0 λ
it follows that
Finally, by using the fact f (z) = ∞ for all z ∈ dom(f ), we see that
It will suffice to prove the result for the case where f = f1 + f2 . If d1 ∈ ∂f1 (x)
and d2 ∈ ∂f2 (x), then by the subgradient inequality, it follows that
g1 (y) = f1 (x + y) − f1 (x) − d y, ∀ y,
g2 (y) = f2 (x + y) − f2 (x), ∀ y.
8
Then, for the function g = g1 + g2 , we have g(0) = 0 and by using d ∈ ∂f (x), we
obtain
g(y) = f (x + y) − f (x) − d y ≥ 0, ∀ y. (4.5)
Consider the convex sets
C1 = (y, µ) ∈ n+1 | y ∈ dom(g1 ), µ ≥ g1 (y) ,
C2 = (u, ν) ∈ n+1 | u ∈ dom(g2 ), ν ≤ −g2 (y) ,
and note that
ri(C1 ) = (y, µ) ∈ n+1 | y ∈ ri dom(g1 ) , µ > g1 (y) ,
ri(C2 ) = (u, ν) ∈ n+1 | u ∈ ri dom(g2 ) , ν < −g2 (y) .
Suppose that there exists a vector (ŷ, µ̂) ∈ ri(C1 ) ∩ ri(C2 ). Then,
yielding
g(ŷ) = g1 (ŷ) + g2 (ŷ) < 0,
which contradicts Eq. (4.5). Therefore, the sets ri(C1 ) and ri(C2 ) are disjoint,
and by the Proper Separation (Prop. 2.4.6), the two convex sets C1 and C2 can
be properly separated, i.e., there exists a vector (w, γ) ∈ n+1 such that
sup wy> inf w u,
y∈dom(g1 ) u∈dom(g2 )
imply that
dom(f1 ) and dom(f2 ) are properly separated. But this is impossible
since ri dom(f1 ) and ri dom(f1 ) have a point in common. Hence γ > 0, and
by dividing in Eq. (4.6) with γ and by setting b = w/γ, we obtain
inf b y + µ ≥ sup b u + ν .
(y,µ)∈C1 (u,ν)∈C2
9
Since g1 (0) = 0 and g2 (0) = 0, we have (0, 0) ∈ C1 ∩ C2 , implying that
b y + µ ≥ 0 ≥ b u + ν, ∀ (y, µ) ∈ C1 , ∀ (u, ν) ∈ C2 .
g2 (u) ≥ b ν, ∀ u ∈ dom(g2 ),
and by using the definitions of g1 and g2 , we see that
implying that g(ŷ) = g1 (ŷ) + g2 (ŷ) < 0 and contradicting Eq. (4.5). Therefore,
by the Polyhedral Proper Separation Theorem (Prop. 3.5.1), the convex set C1
and the polyhedral set C2 can be properly separated by a hyperplane that does
not contain C1 , i.e., there exists a vector (w, γ) ∈ n+1 such that
inf w y + γµ ≥ sup w u + γν ,
(y,µ)∈C1 (u,ν)∈C2
inf w y + γµ < sup w y + γµ .
(y,µ)∈C1 (y,µ)∈C1
inf w y ≥ sup w u,
y∈dom(g1 ) u∈dom(g2 )
10
inf w y < sup w y.
y∈dom(g1 ) y∈dom(g1 )
it follows that dom(f1 ) and dom(f2 ) are properly separated by a hyperplane that
does not contain dom(f1 ), while dom(f2 ) is polyhedral since f2 is polyhedral [see
Prop. 3.2.3). Therefore,
by the Polyhedral Proper Separation Theorem (Prop.
3.5.1), we have that ri dom(f1 ) ∩ dom(f2 ) = Ø, which is a contradiction. Hence
γ > 0, and the remainder of the proof is similar to the preceding one.
We note that dom(F ) is nonempty since in contains the inverse image under A
of the common point of the range of A and the relative interior of dom(f ). In
particular, F is proper. We fix an x in dom(F ). If d ∈ A ∂f (Ax), there exists a
g ∈ ∂f (Ax) such that d = A g. We have for all z ∈ m ,
where the inequality follows from the fact g ∈ ∂f (Ax). Hence, d ∈ ∂F (x), and
we have A ∂f (Ax) ⊂ ∂F (x).
We next show the reverse inclusion. By using a translation argument if
necessary, we may assume that x = 0 and F (0) = 0. Let d ∈ ∂F (0). Then we
have
F (z) − z d ≥ 0, ∀ z ∈ n ,
or
f (Az) − z d ≥ 0, ∀ z ∈ n ,
or
f (y) − z d ≥ 0, ∀ z ∈ n , y = Az,
or
H(y, z) ≥ 0, ∀ z ∈ n , y = Az,
where the function H : m × n → (−∞, ∞] has the form
H(y, z) = f (y) − z d.
Since the range of A contains a point in ri dom(f ) , and dom(H) = dom(f )×n ,
we see that the set (y, z) ∈ dom(H) | y = Az contains a point in the relative
interior of dom(H). Hence, we can apply the Nonlinear Farkas’ Lemma [part (b)]
with the following identification:
11
In this case, we have
x ∈ C | g1 (x) ≤ 0, g2 (x) ≤ 0 = (y, z) ∈ dom(H) | Az − y = 0 .
As asserted earlier, this set contains a relative interior point of C, thus implying
that the set
Q∗ = µ ≥ 0 | H(y, z) + µ1 g1 (y, z) + µ2 g2 (y, z) ≥ 0, ∀ (y, z) ∈ dom(H)
is nonempty. Hence, there exists (µ1 , µ2 ) such that
f (y) − z d + (µ1 − µ2 ) (Az − y) ≥ 0, ∀ (y, z) ∈ dom(H).
Since dom(H) = m
× , by letting λ = µ1 − µ2 , we obtain
n
f (y) − z d + λ (Az − y) ≥ 0, ∀ y ∈ m , z ∈ n ,
or equivalently
f (y) ≥ λ y + z (d − A λ), ∀ y ∈ m , z ∈ n .
Because this relation holds for all z, we have d = A λ implying that
f (y) ≥ λ y, ∀ y ∈ m ,
which shows that λ ∈ ∂f (0). Hence d ∈ A ∂f (0), thus completing the proof.
4.11
Suppose that the set ∪x∈X ∂ f (x) is unbounded for some > 0. Then, there exist
a sequence {xk } ⊂ X, and a sequence {dk } such that dk ∈ ∂ f (xk ) for all k and
dk → ∞. Without loss of generality, we may assume that dk = 0 for all k,
and we denote yk = dk / dk . Since both {xk } and {yk } are bounded, they must
contain convergent subsequences, and without loss of generality, we may assume
that xk converges to some x and yk converges to some y with y = 1. Since
dk ∈ ∂ f (xk ) for all k, it follows that
f (xk + yk ) ≥ f (xk ) + dk yk − = f (xk ) + dk − .
By letting k → ∞ and by using the continuity of f , we obtain f (x + y) = ∞, a
contradiction. Hence, the set ∪x∈X ∂ f (x) must be bounded for all > 0.
4.12
12
4.13 (Continuity Properties of -Subdifferential [Nur77])
Since the sequence {xk } is bounded, by Exercise 4.11, the sequence {dk } is also
bounded and therefore, it has a limit point d. Taking the limit in the preceding
relation along a subsequence of {dk } converging to d, we obtain
f (y) ≥ f (x) + d (y − x) − , ∀ y ∈ n ,
Let d ∈ ∂δ f (x) for some scalar δ satisfying 0 < δ < . Then, by the definition of
-subgradient, we have
implying that
∪0<δ< ∂δ f (x) ⊂ ∂ f (x).
Since ∂ f (x) is closed, by taking the closures of both sides in the preceding
relation, we obtain
cl ∪0<δ< ∂δ f (x) ⊂ ∂ f (x).
Conversely, assume
to arrive at a contradiction that there is a vector d ∈
∂ f (x) with d ∈ cl ∪0<δ< ∂δ f (x) . Note that the set ∪0<δ< ∂δ f (x) is bounded
since it is contained in the compact set ∂ f (x). Furthermore, we claim that
∪0<δ< ∂δ f (x) is convex. Indeed if d1 and d2 belong to this set, then d1 ∈ ∂δ1 f (x)
and d2 ∈ ∂δ2 f (x) for some positive scalars δ1 and δ2 . Without loss of generality,
let δ1 ≤ δ2 . Then, by Eq. (4.8), it follows that d1 , d2 ∈ ∂δ2 f (x), which is a convex
set by Prop. 4.3.1(a). Hence, λd1 + (1 − λ)d2 ∈ ∂δ2 f (x) for all λ ∈ [0, 1], implying
that λd1 + (1 − λ)d2 ∈ ∪0<δ< ∂δ f (x) for all λ ∈ [0, 1], and showing that the set
∪0<δ< ∂δ f (x) is convex.
The vector d and the convex and compact set cl ∪0<δ< ∂δ f (x) can be
strongly separated (see Exercise 2.17), i.e., there exists a vector b ∈ n such that
b d > max
b g.
g∈cl ∪0<δ< ∂δ f (x)
13
By Prop. 4.3.1(a), we have
f (x + αb) − f (x) + δ
inf = max b g,
α>0 α g∈∂δ f (x)
so that
f (x + αb) − f (x) + δ
b d > inf + 2β, ∀ δ, 0 < δ < .
α>0 α
Let {δk } be a positive scalar sequence converging to . In view of the preceding
relation, for each δk , there exists a small enough αk > 0 such that
αk b d ≥ f (x + αk b) − f (x) + δk + β. (4.9)
g ≤ L, ∀ g ∈ ∂ f (x).
Let
γk = f (xk ) − f (x) + L xk − x , ∀ k. (4.10)
Since xk → x, by continuity of f , it follows that γk → 0 as k → ∞, so that
k = − γk converges to . Let {ki } ⊂ {0, 1, . . .} be an index sequence such that
{ ki } is positive and monotonically increasing to , i.e.,
implying that for a given vector d ∈ ∂ f (x), there exists a sequence {dki } such
that
dki → d with dki ∈ ∂k f (x), ∀ i. (4.12)
i
14
There remains to show that dki ∈ ∂ f (xki ) for all i. Since dki ∈ ∂k f (x),
i
it follows that for all i and y ∈ n ,
Using this relation, the definition of γk [cf. Eq. (4.10)], and the fact k = − γk
for all k, from Eq. (4.13) we obtain for all i and y ∈ n ,
ϕ(b) − ϕ(a)
g(t) = ϕ(t) − ϕ(a) − (t − a),
b−a
and note that g is convex and g(a) = g(b) = 0. We first show that g attains its
minimum over at some point t∗ ∈ [a, b]. For t < a, we have
b−a a−t
a= t+ b,
b−t b−t
b−a t−b
b= t+ a,
t−a t−a
implying that g(t) ≥ 0 for t > b. Therefore g(t) ≥ 0 for t ∈ (a, b), while
g(a) = g(b) = 0. Hence
min g(t) = min g(t). (4.14)
t∈ t∈[a,b]
15
Because g is convex over , it is also continuous over , and since [a, b] is compact,
the set of minimizers of g over [a, b] is nonempty. Thus, in view of Eq. (4.14),
there exists a scalar t∗ ∈ [a, b] such that g(t∗ ) = mint∈ g(t). If t∗ ∈ (a, b), then
we are done. If t∗ = a or t∗ = b, then since g(a) = g(b) = 0, it follows that
every t ∈ [a, b] attains the minimum of g over , so that we can replace t∗ by a
point in the interval (a, b). Thus, in any case, there exists t∗ ∈ (a, b) such that
g(t∗ ) = mint∈ g(t).
We next show that
ϕ(b) − ϕ(a)
∈ ∂ϕ(t∗ ).
b−a
The function g is the sum of the convex function ϕ and the linear (and therefore
smooth) function − ϕ(b)−ϕ(a)
b−a
(t − a). Thus the subdifferential of ∂g(t∗ ) is the sum
of the sudifferential of ∂ϕ(t∗ ) and the gradient − ϕ(b)−ϕ(a)
b−a
(see Prop. 4.2.4),
ϕ(b) − ϕ(a)
∂g(t∗ ) = ∂ϕ(t∗ ) − .
b−a
ϕ(b) − ϕ(a)
∈ ∂ϕ(t∗ ).
b−a
16
4.15 (Steepest Descent Direction of a Convex Function)
Note that the problem statement in the book contains a typo: d̄/ d̄ should be
replaced by −d̄/ d̄ .
The sets d | d ≤ 1 and ∂f (x) are compact, and the function φ(d, g) =
d g is linear in each variable when the other variable is fixed, so that φ(·, g) is
convex and closed for all g, while the function −φ(d, ·) is convex and closed for
all d. Thus, by Prop. 2.6.9, the order of min and max can be interchanged,
From the generic characterization of a saddle point (cf. Prop. 2.6.1), it follows
that the set of saddle points of d g is D∗ × G∗ , where D∗ is the set of minima
of f (x; d) subject to d ≤ 1 [cf. Eq. (4.15)], and G∗ is the set of minima of
g subject to g ∈ ∂f (x) [cf. Eq. (4.16)], i.e., G∗ consists of the unique vector g ∗
of minimum norm on ∂f (x). Furthermore, again by Prop. 2.6.1, every d∗ ∈ D∗
must minimize d g ∗ subject to d ≤ 1, so it must satisfy d∗ = −g ∗ / g ∗ .
Suppose
that
the process does not terminate in a finite number of steps, and let
(wk , gk ) be the sequence generated by the algorithm. Since wk is the projection
of the origin on the set conv{g1 , . . . , gk−1 }, by the Projection Theorem (Prop.
2.2.1), we have
(g − wk ) wk ≥ 0, ∀ g ∈ conv{g1 , . . . , gk−1 },
implying that
gi wk ≥ wk 2
≥ g∗ 2
> 0, ∀ i = 1, . . . , k − 1, ∀ k ≥ 1, (4.17)
where g ∗ ∈ ∂f (x) is the vector with minimum norm in ∂f (x). Note that g ∗ > 0
because x does not minimize f . The sequences {wk } and {gk } are contained in
∂f (x), and since ∂f (x) is compact, {wk } and {gk } have limit points in ∂f (x).
17
Without loss of generality, we may assume that these sequences converge, so that
for some ŵ, ĝ ∈ ∂f (x), we have
which in view of Eq. (4.17) implies that ĝ ŵ > 0. On the other hand, because
none of the vectors (−wk ) is a descent direction of f at x, we have f (x; −wk ) ≥ 0,
so that
gk (−wk ) = max g (−wk ) = f (x; −wk ) ≥ 0.
g∈∂f (x)
Suppose
that
the process does not terminate in a finite number of steps, and let
(wk , gk ) be the sequence generated by the algorithm. Since wk is the projection
of the origin on the set conv{g1 , . . . , gk−1 }, by the Projection Theorem (Prop.
2.2.1), we have
(g − wk ) wk ≥ 0, ∀ g ∈ conv{g1 , . . . , gk−1 },
implying that
gi wk ≥ wk 2
≥ g∗ 2
> 0, ∀ i = 1, . . . , k − 1, ∀ k ≥ 1, (4.18)
where g ∗ ∈ ∂ f (x) is the vector with minimum norm in ∂ f (x). Note that
g ∗ > 0 because x is not an -optimal solution, i.e., f (x) > inf z∈ n f (z) + [see
Prop. 4.3.1(b)]. The sequences {wk } and {gk } are contained in ∂ f (x), and since
∂ f (x) is compact [Prop. 4.3.1(a)], {wk } and {gk } have limit points in ∂ f (x).
Without loss of generality, we may assume that these sequences converge, so that
for some ŵ, ĝ ∈ ∂ f (x), we have
which in view of Eq. (4.18) implies that ĝ ŵ > 0. On the other hand, because
none of the vectors (−wk ) is an -descent direction of f at x, by Prop. 4.3.1(a),
we have
f (x − αwk ) − f (x) +
gk (−wk ) = max g (−wk ) = inf ≥ 0.
g∈∂f (x) α>0 α
18
4.18
TC (x) = n .
x 2
+ 2αx y + α2 y 2
≤ 1, ∀ α, 0 < α ≤ α.
2x y + α y 2
≤ 0. ∀ α, 0 < α ≤ α.
(c) Let C be a closed halfspace given by C = x | a x ≤ b with a nonzero vector
a ∈ n and a scalar b. For x ∈ int(C), i.e., a x < b, we have FC (x) = n and
since C is convex, by Props. 4.6.2(a) and 4.6.3, we have
TC (x) = cl FC (x) = n , NC (x) = TC (x)∗ = {0}.
FC (x) = y | a y ≤ 0 .
19
By Prop. 4.6.2(a), it follows that
TC (x) = cl FC (x) = y | a y ≤ 0 ,
while by Prop. 4.6.3 and the Farkas’ Lemma [Prop. 3.2.1(b)], it follows that
NC (x) = TC (x)∗ = cone {a} .
(d) For x ∈ C with x ∈ int(C), i.e., xi > 0 for all i ∈ I, we have FC (x) = n .
Then, by using Props. 4.6.2(a) and 4.6.3, we obtain
TC (x) = cl FC (x) = n , NC (x) = TC (x)∗ = {0}.
or equivalently
TC (x) = y | y ei ≤ 0, ∀ i ∈ Ax ,
where ei ∈ n is the vector whose ith component is 1 and all other components
∗
are 0. By Prop. 4.6.3, we further have NC (x) = TC (x) , while by the
Farkas’
∗
Lemma [Prop. propforea(b)], we see that TC (x) = cone {ei | i ∈ Ax } , implying
that
NC (x) = cone {ei | i ∈ Ax } .
4.19
(a) ⇒ (b) Let x ∈ ri(C) and let S be the subspace that is parallel to aff(C).
Then, for every y ∈ S, x + αy ∈ ri(C) for all sufficiently small positive scalars
α, implying that y ∈ FC (x) and showing that S ⊂ FC (x). Furthermore, by the
definition of the set of feasible directions, it follows that if y ∈ FC (x), then there
exists α > 0 such that x + αy ∈ C for all α ∈ (0, α]. Hence y ∈ S, implying that
FC (x) ⊂ S. This and the relation S ⊂ FC (x) show that FC (x) = S. Since C is
convex, by Prop. 4.6.2(a), it follows that
TC (x) = cl FC (x) = S,
20
showing that NC (x) is a subspace.
(c) ⇒ (a) Let NC (x) be a subspace, and to arrive at a contradiction suppose that
x is not a point in the relative interior of C. Then, by the Proper Separation
Theorem (Prop. 2.4.5), the point x and the relative interior of C can be properly
separated, i.e., there exists a vector a ∈ n such that
sup a y ≤ a x, (4.19)
y∈C
(−a) (x − y) ≤ 0, ∀ y ∈ C. (4.21)
a (x − y) ≤ 0, ∀ y ∈ C.
a (x − y) = 0, ∀ y ∈ C,
FC (x) = {y | Ay = 0} = N (A),
21
4.21 (Tangent and Normal Cones of Level Sets)
Let
C = z | f (z) ≤ f (x) .
We first show that
cl FC (x) = y | f (x; y) ≤ 0 .
Let y ∈ FC (x) be arbitrary. Then, by the definition of FC (x), there exists a
scalar α such that x + αy ∈ C for all α ∈ (0, α]. By the definition of C, it follows
that f (x + αy) ≤ f (x) for all α ∈ (0, α], implying that
f (x + αy) − f (x)
f (x; y) = inf ≤ 0.
α>0 α
Therefore y ∈ y | f (x; y) ≤ 0 , thus showing that
FC (x) ⊂ y | f (x; y) ≤ 0 .
By Exercise 4.1(d), the set y | f (x; y) ≤ 0 is closed, so that by taking closures
in the preceding relation, we obtain
cl FC (x) ⊂ y | f (x; y) ≤ 0 .
To show the converse inclusion, let y be such that f (x; y) < 0, so that for all
small enough α ≥ 0, we have
y | f (x; y) ≤ 0 = cl y | f (x; y) < 0 ⊂ cl FC (x) .
Hence
cl FC (x) = y | f (x; y) ≤ 0 .
Since C is convex, by Prop. 4.6.2(c), we have cl FC (x) = TC (x). This and
the preceding relation imply that
TC (x) = y | f (x; y) ≤ 0 .
22
Furthermore, by Exercise 4.1(d), we have that
∗
y | f (x; y) ≤ 0 = cl cone ∂f (x) ,
implying that
NC (x) = cl cone ∂f (x) .
If x does not minimize f over n , then the subdifferential ∂f (x) does not
contain the origin. Furthermore, by Prop. 4.2.1, ∂f (x) is nonempty and compact,
implying by Exercise 1.32(a) that the cone generated by ∂f (x) is closed. There-
fore, in this case, the closure operation in the preceding relation is unnecessary,
i.e.,
NC (x) = cone ∂f (x) .
4.22
It suffices to consider the case m = 2. From the definition of the cone of feasible
directions, it can be seen that
FC1 ×C2 (x1 , x2 ) = FC1 (x1 ) × FC2 (x2 ).
By taking the closure of both sides in the preceding relation, and by using the fact
that the closure of the Cartesian product of two sets coincides with the Cartesian
product of their closures (see Exercise 1.37), we obtain
cl FC1 ×C2 (x1 , x2 ) = cl FC1 (x1 ) × cl FC2 (x2 ) .
Since C1 and C2 are convex, by Prop. 4.6.2(c), we have
TC1 (x1 ) = cl FC1 (x1 ) , TC2 (x2 ) = cl FC2 (x2 ) .
Furthermore, the Cartesian product C1 ×C2 is also convex, and by Prop. 4.6.2(c),
we also have
TC1 ×C2 (x1 , x2 ) = cl FC1 ×C2 (x1 , x2 ) .
By combining the preceding three relations, we obtain
TC1 ×C2 (x1 , x2 ) = TC1 (x1 ) × TC2 (x2 ).
By taking polars in the preceding relation, we obtain
∗
TC1 ×C2 (x1 , x2 )∗ = TC1 (x1 ) × TC2 (x2 ) ,
and because the polar of the Cartesian product of two cones coincides with the
Cartesian product of their polar cones (see Exercise 3.4), it follows that
TC1 ×C2 (x1 , x2 )∗ = TC1 (x1 )∗ × TC2 (x2 )∗ .
Since the sets C1 , C2 , and C1 × C2 are convex, by Prop. 4.6.3, we have
TC1 ×C2 (x1 , x2 )∗ = NC1 ×C2 (x1 , x2 ),
TC1 (x1 )∗ = NC1 (x1 ), TC2 (x2 )∗ = NC2 (x2 ),
so that
NC1 ×C2 (x1 , x2 ) = NC1 (x1 ) × NC2 (x2 ).
23
4.23 (Tangent and Normal Cone Relations)
In particular, this relation holds for every x ∈ dom(f ) and since dom(f ) =
C1 ∩ C2 , we obtain
implying that
and since ∗
NC1 (x) + NC2 (x) = NC1 (x)∗ ∩ NC2 (x)∗
(see Exercise 3.4), we obtain
24
NC1 (x)∗ = TC1 (x), NC2 (x)∗ = TC2 (x).
In view of Eq. (4.24), it follows that
(b) Let x1 ∈ C1 and x2 ∈ C2 be arbitrary. Since C1 and C2 are convex, the sum
C1 + C2 is also convex, so that by Prop. 4.6.3, we have
z ∈ NC1 +C2 (x1 + x2 ) ⇐⇒ z (y1 + y2 ) − (x1 + x2 ) ≤ 0, ∀ y1 ∈ C1 , ∀ y2 ∈ C2 ,
(4.25)
z1 ∈ NC1 (x1 ) ⇐⇒ z1 (y1 − x1 ) ≤ 0, ∀ y1 ∈ C1 , (4.26)
z2 ∈ NC2 (x2 ) ⇐⇒ z2 (y2 − x2 ) ≤ 0, ∀ y2 ∈ C2 . (4.27)
If z ∈ NC1 +C2 (x1 + x2 ), then by using y2 = x2 in Eq. (4.25), we obtain
z (y1 − x1 ) ≤ 0, ∀ y1 ∈ C1 ,
implying that z ∈ NC1 (x1 ). Similarly, by using y1 = x1 in Eq. (4.25), we see that
z ∈ NC2 (x2 ). Hence z ∈ NC1 (x1 ) ∩ NC2 (x2 ) implying that
Conversely, let z ∈ NC1 (x1 ) ∩ NC2 (x2 ), so that both Eqs. (4.26) and (4.27)
hold, and by adding them, we obtain
z (y1 + y2 ) − (x1 + x2 ) ≤ 0, ∀ y1 ∈ C1 , ∀ y2 ∈ C2 .
Therefore, in view of Eq. (4.25), we have z ∈ NC1 +C2 (x1 + x2 ), showing that
Hence
NC1 +C2 (x1 + x2 ) = NC1 (x1 ) ∩ NC2 (x2 ).
By taking polars in this relation, we obtain
∗
NC1 +C2 (x1 + x2 )∗ = NC1 (x1 ) ∩ NC2 (x2 ) .
Since NC1 (x1 ) and NC2 (x2 ) are closed convex cones, by Exercise 3.4, it follows
that
NC1 +C2 (x1 + x2 )∗ = cl NC1 (x1 )∗ + NC2 (x2 )∗ .
The sets C1 , C2 , and C1 + C2 are convex, so that by Prop. 4.6.3, we have
25
NC1 +C2 (x1 + x2 )∗ = TC1 +C2 (x1 + x2 ),
implying that
TC1 +C2 (x1 + x2 ) = cl TC1 (x1 ) + TC2 (x2 ) .
(c) Let x ∈ C be arbitrary. Since C is convex, its image AC under the linear
transformation A is also convex, so by Prop. 4.6.3, we have
which is equivalent to
is the same as
(A z) (v − x) ≤ 0, ∀ v ∈ C,
and since C is convex, by Prop. 4.6.3, this is equivalent to A z ∈ NC (x). Thus,
which together with the fact A z ∈ NC (x) if and only if z ∈ (A )−1 · NC (x) yields
26
4.24 [GoT71], [RoW98]
We assume for simplicity that all the constraints are inequalities. Consider the
scalar function θ0 : [0, ∞) →
defined by
θ0 (r) = sup y (x − x∗ ), r ≥ 0.
x∈C, x−x∗ ≤r
that θ0 (r) = o(r), which implies that θ0 is differentiable at r = 0 with ∇θ0 (0) = 0.
Thus, the function F0 defined by
F0 (x) = θ0 ( x − x∗ ) − y (x − x∗ )
is differentiable at x∗ , attains a global minimum over C at x∗ , and satisfies
−∇F0 (x∗ ) = y.
If F0 were smooth we would be done, but since it need not even be contin-
uous, we will successively perturb it into a smooth function. We first define the
function θ1 : [0, ∞) → by
1 2r
θ1 (r) = r r
θ0 (s)ds if r > 0,
0 if r = 0,
(the integral above is well-defined since the function θ0 is nondecreasing). The
function θ1 is seen to be nondecreasing and continuous, and satisfies
0 ≤ θ0 (r) ≤ θ1 (r), ∀ r ≥ 0,
θ1 (0) = 0, and ∇θ1 (0) = 0. Thus the function
F1 (x) = θ1 ( x − x∗ ) − y (x − x∗ )
has the same significant properties for our purposes as F0 [attains a global mini-
mum over C at x∗ , and has −∇F1 (x∗ ) = y], and is in addition continuous.
We next define the function θ2 : [0, ∞) → by
1 2r
θ2 (r) = θ1 (s)ds if r > 0,
r r
0 if r = 0.
Again θ2 is seen to be nondecreasing, and satisfies
0 ≤ θ1 (r) ≤ θ2 (r), ∀ r ≥ 0,
θ2 (0) = 0, and ∇θ2 (0) = 0. Also, because θ1 is continuous, θ2 is smooth, and so
is the function F2 given by
F2 (x) = θ2 ( x − x∗ ) − y (x − x∗ ).
The function F2 fulfills all the requirements of the proposition, except that it may
have global minima other than x∗ . To ensure the uniqueness of x∗ we modify F2
as follows:
F (x) = F2 (x) + x − x∗ 2 .
The function F is smooth, attains a strict global minimum over C at x∗ , and
satisfies −∇F (x∗ ) = y.
27
4.25
minimize x1 − x2 + x2 − x3 + x3 − x1
subject to xi ∈ Ci , i = 1, 2, 3,
with the additional condition that x1 , x2 and x3 do not lie on the same line.
Suppose that (x∗1 , x∗2 , x∗3 ) defines an optimal triangle. Then, x∗1 solves the problem
4.26
28
By taking polars in this relation and by using the Farkas’ Lemma [Prop. 3.2.1(b)],
we obtain ⎧ ⎫
⎨ ⎬
TX (x∗ )∗ = µj aj µj ≥ 0, ∀ j ∈ A(x∗ ) ,
⎩ ⎭
j∈A(x∗ )
0 ∈ ∂f (x∗ ) + TX (x∗ )∗ .
In view of Eq. (4.31) and the definition of A(x∗ ), it follows that x∗ minimizes f
over X if and only if there exist µ∗1 , . . . , µ∗r such that
(i) µ∗j ≥ 0 for all j = 1, . . . , r, and µ∗j = 0 for all j such that aj x∗ < bj .
r
(ii) 0 ∈ ∂f (x∗ ) + j=1
µ∗j aj .
4.27 (Quasiregularity)
ξ k → 0, xk → x∗ ,
and
xk − x∗ y
= + ξk . (4.32)
xk − x∗ y
By the mean value theorem, we have for all k
where x̃k is a vector that lies on the line segment joining xk and x∗ . Using Eq.
(4.32), the last relation can be written as
xk − x∗
f (xk ) = f (x∗ ) + ∇f (x̃k ) y k , (4.33)
y
where
yk = y + y ξk .
29
(b) Assume first that there are no equality constraints. Let x ∈ X and let y be
a nonzero tangent of X at x. Then there exists a sequence {ξ k } and a sequence
{xk } ⊂ X such that xk = x for all k,
ξ k → 0, xk → x,
and
xk − x y
= + ξk .
xk − x y
By the mean value theorem, we have for all j and k
where x̃k is a vector that lies on the line segment joining xk and x. This relation
can be written as
xk − x
∇gj (x̃k ) y k ≤ 0,
y
where y k = y + ξ k y , or equivalently
∇gj (x̃k ) y k ≤ 0, yk = y + ξk y .
Taking the limit as k → ∞, we obtain ∇gj (x) y ≤ 0 for all j, thus proving that
y ∈ V (x), and TX (x) ⊂ V (x). If there are some equality constraints hi (x) = 0,
they can be converted to the two inequality constraints hi (x) ≤ 0 and −hi (x) ≤ 0,
and the result follows similarly.
(c) Assume first that there are no equality constraints. From part (a), we have
D(x∗ ) ∩ V (x∗ ) = Ø, which is equivalent to having ∇f (x∗ ) y ≥ 0 for all y with
∇gj (x∗ ) y ≤ 0 for all j ∈ A(x∗ ). By Farkas’ Lemma, this is equivalent to the
existence of Lagrange multipliers µ∗j with the properties stated in the exercise. If
there are some equality constraints hi (x) = 0, they can be converted to the two
inequality constraints hi (x) ≤ 0 and −hi (x) ≤ 0, and the result follows similarly.
30
Convex Analysis and
Optimization
Chapter 5 Solutions
Dimitri P. Bertsekas
with
Proof: Assume the contrary. Then for every integer k, there exists a vector xk
with xk = 1 such that
xk P xk + kxk Qxk ≤ 0.
Since, by the positive semidefiniteness of Q, xk Qxk ≥ 0, we see that {xk Qxk }K
must converge to zero, for otherwise the left-hand side of the above inequality
would be ∞. Therefore, x Qx = 0 and since P is positive definite, we obtain
x P x > 0. This contradicts Eq. (5.1). Q.E.D.
c
Lc (x, λ) = f (x) + λ h(x) + h(x)2 ,
2
c
minimize f (x) + h(x)2
2
subject to h(x) = 0,
which has the same local minima as our original problem of minimizing f (x)
subject to h(x) = 0. The gradient and Hessian of Lc with respect to x are
∇x Lc (x, λ) = ∇f (x) + ∇h(x) λ + ch(x) ,
2
m
∇2xx Lc (x, λ) = ∇2 f (x) + λi + chi (x) ∇2 hi (x) + c∇h(x)∇h(x) .
i=1
∗ ∗
In particular, if x and λ satisfy the given conditions, we have
∇x Lc (x∗ , λ∗ ) = ∇f (x∗ ) + ∇h(x∗ ) λ∗ + ch(x∗ ) = ∇x L(x∗ , λ∗ ) = 0, (5.2)
m
By assumption, we have that y ∇2xx L(x∗ , λ∗ )y > 0 for all y = 0 such that
y ∇h(x∗ )∇h(x∗ ) y = 0, so by applying Lemma 5.1 with P = ∇2xx L(x∗ , λ∗ ) and
Q = ∇h(x∗ )∇h(x∗ ) , it follows that there exists a c such that
γ
Lc (x, λ∗ ) ≥ Lc (x∗ , λ∗ ) + x − x∗ 2 , ∀ x with x − x∗ < .
2
Since for all x with h(x) = 0 we have Lc (x, λ∗ ) = f (x), ∇λ L(x∗ , λ∗ ) = h(x∗ ) = 0,
it follows that
γ
f (x) ≥ f (x∗ ) + x − x∗ 2 , ∀ x with h(x) = 0, and x − x∗ < .
2
minimize f (x)
subject to h1 (x) = 0, . . . , hm (x) = 0, (5.4),
g1 (x) + z12 = 0, . . . , gr (x) + zr2 = 0,
3
We will show that (x∗ , z ∗ ) and (λ∗ , µ∗ ) satisfy the sufficiency conditions of Ex-
ercise 5.1, thus showing that (x∗ , z ∗ ) is a strict local minimum of problem (5.4),
proving that x∗ is a strict local minimum of the original inequality-constrained
problem.
Let L(x, z, λ, µ) be the Lagrangian function for this problem, i.e.,
m
r
L(x, z, λ, µ) = f (x) + λi hi (x) + µj gj (x) + zj2 .
i=1 j=1
We have
∇(x,z) L(x∗ , z ∗ , λ∗ , µ∗ ) = ∇x L(x∗ , z ∗ , λ∗ , µ∗ ) , ∇z L(x∗ , z ∗ , λ∗ , µ∗ )
= ∇x L(x∗ , λ∗ , µ∗ ) , 2µ∗1 z1∗ , . . . , 2µ∗r zr∗
= [0, 0],
= [0, 0].
Hence the first order conditions of the sufficiency conditions for equality-constrained
problems, given in Exercise 5.1, are satisfied.
We next show that for all (y, w) = (0, 0) satisfying
we have
∇2 L(x∗ , λ∗ , µ∗ ) 0
xx
2µ∗1 0 ... 0
0 2µ∗2 ... 0 y
( y w ) w > 0. (5.6)
0 .. .. .. ..
. . . .
0 0 ... 2µ∗r
r
Let (y, w) = (0, 0) be a vector satisfying Eq. (5.5). We have that zj∗ = 0
for all j ∈ A(x∗ ), so it follows from Eq. (5.5) that
4
Hence, if y = 0, it follows by assumption that
which implies, by Eq. (5.7) and the assumption µ∗j ≥ 0 for all j, that (y, w)
satisfies Eq. (5.6), proving our claim.
If y = 0, it follows that wk = 0 for some k = 1, . . . , r. In this case, by using
Eq. (5.5), we have
2zj∗ wj = 0, j = 1, . . . , r,
from which we obtain that zk∗ must be equal to 0, and hence k ∈ A(x∗ ). By
assumption, we have that
r
We first prove the result for the special case of equality-constrained problems.
Proposition 5.3: Let x∗ and λ∗ be a local minimum and Lagrange multiplier,
respectively, satisfying the second order sufficiency conditions of Exercise 5.1,
and assume that the gradients ∇hi (x∗ ), i = 1, . . . , m, are linearly independent.
Consider the family of problems
minimize f (x)
(5.8)
subject to h(x) = u,
∇p(u) = −λ(u),
5
For each fixed u, this system represents n + m equations with n + m unknowns
– the vectors x and λ. For u = 0 the system has the solution (x∗ , λ∗ ). The
corresponding (n + m) × (n + m) Jacobian matrix with respect to (x, λ) is given
by 2
∇xx L(x∗ , λ∗ ) ∇h(x∗ )
J= .
∇h(x∗ ) 0
∇h(x∗ ) y = 0. (5.11)
y ∇2xx L(x∗ , λ∗ )y = 0.
In view of Eq. (5.11), it follows that y = 0, for otherwise our second order suffi-
ciency assumption would be violated. Since y = 0, Eq. (5.10) yields ∇h(x∗ )z = 0,
which in view of the linear independence of the columns ∇hi (x∗ ), i = 1, . . . , m,
of ∇h(x∗ ), yields z = 0. Thus, we obtain y = 0, z = 0, which is a contradiction.
Hence, J is nonsingular.
Returning now to the system (5.9), it follows from the nonsingularity of J
and the Implicit Function Theorem that for all u in some open sphere S centered
at u = 0, there exist x(u) and λ(u) such that x(0) = x∗ , λ(0) = λ∗ , the functions
x(·) and λ(·) are continuously differentiable, and
∇f x(u) + ∇h x(u) λ(u) = 0, (5.12)
h x(u) = u.
For u sufficiently close to 0, the vectors x(u) and λ(u) satisfy the second order
sufficiency conditions for problem (5.8), since they satisfy them by assumption for
u = 0. This is straightforward to verify by using our continuity assumptions. [If
it were not true, there would exist a sequence {uk } with uk → 0, and a sequence
{y k } with y k = 1 and ∇h x(uk ) y k = 0 for all k, such that
y k ∇2xx L x(uk ), λ(uk ) y k ≤ 0, ∀ k.
6
By differentiating the relation h x(u) = u, it follows that
I = ∇u h x(u) = ∇x(u)∇h x(u) , (5.13)
where I is the m × m identity matrix. Finally, by using the chain rule, we have
∇p(u) = ∇u f x(u) = ∇x(u)∇f x(u) .
We next use the preceding result to show the corresponding result for
inequality-constrained problems. We assume that x∗ and (λ∗ , µ∗ ) are a local
minimum and Lagrange multiplier, respectively, of the problem
minimize f (x)
subject to h1 (x) = 0, . . . , hm (x) = 0, (5.15)
g1 (x) ≤ 0, . . . , gr (x) ≤ 0,
and they satisfy the second order sufficiency conditions of Exercise 5.2. We also
assume that the gradients ∇hi (x∗ ), i = 1, . . . , m, ∇gj (x∗ ), j ∈ A(x∗ ) are linearly
independent, i.e., x∗ is regular. We consider the equality-constrained problem
minimize f (x)
subject to h1 (x) = 0, . . . , hm (x) = 0, (5.16)
g1 (x) + z12 = 0, . . . , gr (x) + zr2 = 0,
minimize f (x)
subject to hi (x) = ui , i = 1, . . . , m, (5.17)
gj (x) + zj2 = vj , j = 1, . . . , r,
parametrized by u and v.
7
Using Prop. 5.3, given in the beginning of this exercise, we have that there
exists an open sphere S centered at (u, v) = (0, 0) such that for every (u, v) ∈ S
there is an x(u, v) ∈ n , z(u, v) ∈ r and λ(u, v) ∈ m , µ(u, v) ∈ r , which are
a local minimum and associated Lagrange multiplier vectors of problem (5.17).
We claim that the vectors x(u, v) and λ(u, v) ∈ m , µ(u, v) ∈ r are a
local minimum and Lagrange multiplier vector for the problem
minimize f (x)
subject to hi (x) = ui , ∀ i = 1, . . . , m, (5.18)
gj (x) ≤ vj , ∀ j = 1, . . . , r.
m
r
∇f x(u, v) + λi (u, v)∇hi x(u, v) + µj (u, v)∇gj x(u, v) = 0,
i=1 j=1
A x(u, v) = j | gj x(u, v) = vj ,
Thus, to show λ(u, v) and µ(u, v) are Lagrange multipliers for problem (5.18),
there remains to show the nonnegativity of µ(u, v). For this purpose we use the
second order necessary condition for the equivalent equality constrained problem
(5.17). It yields
∇2xx L x(u, v), λ(u, v), µ(u, v) 0
2µ1 (u, v) 0 ... 0
y
(y
w )
0 2µ2 (u, v) ... 0
w ≥ 0,
0 .. .. .. ..
. . . .
0 0 ... 2µr (u, v)
(5.20)
for all y ∈ n
and w ∈ r
satisfying
∇h x(u, v) y = 0, ∇gj x(u, v) y + 2zj (u, v)wj = 0,
j ∈ A x(u, v) .
(5.21)
Next let us select, for every j ∈ A x(u, v) , a vector (y, w) with y = 0, wj = 0,
wk = 0 for all k = j. Such a vector satisfies the condition of Eq. (5.21). By using
such a vector in Eq. (5.20), we obtain 2µj (u, v)wj2 ≥ 0, and
µj (u, v) ≥ 0, ∀ j ∈ A x(u, v) .
8
Furthermore, by Prop. 5.3 given in the beginning of this exercise, it follows
that x(·, ·), λ(·, ·), and µ(·, ·) are continuously differentiable in S and we have
x(0, 0) = x∗ , λ(0, 0) = λ∗ , µ(0, 0) = µ∗ . In addition, for all (u, v) ∈ S, there
holds
∇u p(u, v) = −λ(u, v),
∇v p(u, v) = −µ(u, v),
where p(u, v) is the optimal cost of problem (5.17), parameterized by (u, v), which
is the same as the optimal cost of problem (5.18), completing our proof.
We have
f (x∗ ) = f (x∗ ) + µ∗ g(x∗ )
= min f (x) + µ∗ g(x)
x∈X
≤ min f (x) + µ∗ g(x)
x∈X, g(x)≤0
≤ min f (x)
x∈X, g(x)≤0
≤ f (x∗ ),
where the first equality follows from the hypothesis, which implies that µ∗ g(x∗ ) =
0, the next-to-last inequality follows from the nonnegativity of µ∗ , and the last
inequality follows from the feasibility of x∗ . It folows that equality holds through-
out, and x∗ is a optimal solution.
5.5
1
− µ∗ 2 = sup infn L0 (d, µ)
2 µ≥0 d∈
where the inequality follows from the minimax inequality (cf. Chapter 2). For
any d ∈ n , the supremum of L0 (d, µ) over µ ≥ 0 is attained at
µj = (aj d)+ , j = 1, . . . , r.
9
the maximum subject to µj ≥ 0 is attained for µj = (aj d)+ ]. Hence, it follows
that, for any d ∈ n ,
1 + 2
r
the unconstrained minimum dγ , we take the gradient of Lγ (d, µγ ) and set it equal
to 0. This yields r
a0 + j=1 µγj aj
dγ = − .
γ
Hence, 2
r
a0 + j=1 µj aj
γ
1 γ 2
Lγ (dγ , µγ ) = − − µ .
2γ 2
We also have
1
− µ∗ 2 = sup infn L0 (d, µ)
2 µ≥0 d∈
= Lγ (d , µγ ),
γ
where the first two relations follow from part (a), thus yielding the desired rela-
tion.
(d) From part (c), we have
r 2
a0 + j=1 µj aj
γ
1 1 γ 2 1
− µγ 2 ≥ Lγ (dγ , µγ ) = − − µ ≥ − µ∗ 2 . (5.24)
2 2γ 2 2
From this, we see that µγ ≤ µ∗ , so that µγ remains bounded as γ → 0. By
taking the limit above as γ → 0, we see that
r
lim a0 + µγj aj = 0,
γ→0
j=1
10
r
so any limit point of µγ , call it µ, satisfies − a0 + j=1
µj aj = 0. Since
µ ≥ 0, it follows that µ ≥ 0, so µ ∈ M . We also have µ ≤ µ∗ (since
γ
µγ ≤ µ∗ ), so by using the minimum norm property of µ∗ , we conclude that
any limit point µ of µγ must be equal to µ∗ . Thus, µγ → µ∗ . From Eq. (5.24),
we then obtain
1
Lγ (dγ , µγ ) → − µ∗ 2 . (5.25)
2
(e) Equations (5.23) and (5.25), together with part (b), show that
[thus proving that (d∗ , µ∗ ) is a saddle point of L0 (d, µ)], and that
minimize x1 + x2
subject to x1 ≤ 0, x2 ≤ 0, −x1 − x2 ≤ 0.
The only feasible vector is x∗ = (0, 0), which is therefore also the optimal solution
of this problem. The vector (1, 1, 2) is a Lagrange multiplier vector which satisfies
strict complementarity. However, it is not possible to find a vector that violates
simultaneously all the constraints, showing that this Lagrange multiplier vector
is not informative.
For the converse statement, consider the example of Fig. 5.1.3. The La-
grange multiplier vectors, that involve three nonzero components out of four, are
informative, but they do not satisfy strict complementarity.
5.7
n
minimize fi (xi )
i=1
subject to x ∈ S, xi ∈ Xi , i = 1, . . . , n,
11
z1 , . . . , zn and the linear constraints xi = zi , i = 1, . . . , n, while replacing the
constraint x ∈ S with z ∈ S, so that the problem becomes
n
minimize fi (xi )
i=1
(5.26)
subject to z ∈ S, xi ∈ Xi , xi = zi , i = 1, . . . , n.
S = {y | aj y = 0, ∀ j = 1, . . . , m}.
Xi = {y | ci ≤ y ≤ di }.
With the previous identifications, the constraint set of problem (5.26) can be
described alternatively as
n
minimize fi (xi )
i=1
subject to aj z = 0, j = 1, . . . , m,
ci ≤ xi ≤ di , i = 1, . . . , n,
xi = zi , i = 1, . . . , n
5.8
m
12
(b) There exists a d ∈ NX (x∗ )∗ = TX (x∗ ) (since X is regular at x∗ ) such that
To arrive at a contradiction, assume that CQ6 does not hold, i.e., there are
scalars λ1 , . . . , λm , µ1 , . . . , µr , not all of them equal to zero, such that
(i) m
r
/ A(x∗ ).
(ii) µj ≥ 0 for all j = 1, . . . , r, and µj = 0 for all j ∈
In view of our assumption that X is regular at x∗ , condition (i) can be
written as
m
r
or equivalently,
m
r
Since not all the λi and µj are equal to 0, we conclude that µj > 0 for at least
one j ∈ A(x∗ ); otherwise condition (a) of CQ5a would be violated. Since µ∗j ≥ 0
for all j, with µ∗j = 0 for j ∈
/ A(x∗ ) and µ∗j > 0 for at least one j, we obtain
m
r
where d ∈ TX (x∗ ) is the vector in condition (b) of CQ5a. But this contradicts
Eq. (5.27), showing that CQ6 holds.
Conversely, assume that CQ6 holds. It can be seen that this implies
condition (a) of CQ5a. Let H denote the subspace spanned by the vectors
∇h1 (x∗ ), . . . , ∇hm (x∗ ), and let G denote the cone generated by the vectors
∇gj (x∗ ), j ∈ A(x∗ ). Then, the orthogonal complement of H is given by
H ⊥ = y | ∇hi (x∗ ) y = 0, ∀ i = 1, . . . , m ,
13
Under CQ6, we have int(G∗ ) = Ø, since otherwise the vectors ∇gj (x∗ ), j ∈ A(x∗ )
would be linearly dependent, contradicting CQ6. Similarly, under CQ6, we have
H ⊥ ∩ int(G∗ ) = Ø. (5.28)
To see this, assume the contrary, i.e., H ⊥ and int(G∗ ) are disjoint. The sets H ⊥
and int(G∗ ) are convex, therefore by the Separating Hyperplane Theorem, there
exists some nonzero vector ν such that
ν x ≤ ν y, ∀ x ∈ H ⊥ , ∀ y ∈ int(G∗ ),
or equivalently,
ν (x − y) ≤ 0, ∀ x ∈ H ⊥ , ∀ y ∈ G∗ ,
which implies, using also Exercise 3.4., that
ν ∈ (H ⊥ − G∗ )∗ = H ∩ (−G).
But this contradicts CQ6, and proves Eq. (5.28).
Finally, we show that CQ6 implies condition (b) of CQ5a. Assume, to
arrive at a contradiction, that condition (b) of CQ5a does not hold. This implies
that
NX (x∗ )∗ ∩ H ⊥ ∩ int G∗ = Ø.
Since X is regular at x∗ , the preceding is equivalent to
TX (x∗ ) ∩ H ⊥ ∩ int G∗ = Ø.
The regularity of X at x∗ implies that TX (x∗ ) is convex. Similarly, since the
interior of a convex set is convex
and
the intersection of two convex sets is convex,
it follows that the set H ⊥ ∩ int G∗ is convex. It is also nonempty by Eq. (5.28).
Thus, by the Separating Hyperplane Theorem, there exists some vector a = 0
such that
a x ≤ a y, ∀ x ∈ TX (x∗ ), ∀ y ∈ H ⊥ ∩ int G∗ ,
or equivalently,
a (x − y) ≤ 0, ∀ x ∈ TX (x∗ ), ∀ y ∈ H ⊥ ∩ G∗ ,
which implies that ∗
a ∈ TX (x∗ ) − (H ⊥ ∩ G∗ ) .
We have
∗
TX (x∗ ) − (H ⊥ ∩ G∗ ) = TX (x∗ )∗ ∩ −(H ⊥ ∩ G∗ )∗
= TX (x∗ )∗ ∩ − cl(H + G)
= TX (x∗ )∗ ∩ −(H + G)
= NX (x∗ ) ∩ −(H + G) ,
where the second equality follows since H ⊥ and G∗ are closed and convex, and
the third equality follows since H and G are both polyhedral cones (cf. Chapter
3). Combining the preceding relations, it follows that there exists a nonzero
vector a that belongs to the set
NX (x∗ ) ∩ −(H + G) .
But this contradicts CQ6, thus completing our proof.
14
5.9 (Minimax Problems)
minimize max f1 (x), . . . , fp (x)
subject to x ∈ X.
minimize z
subject to x ∈ X, fi (x) ≤ z, i = 1, . . . , p,
z ∗ = max f1 (x∗ ), . . . , fp (x∗ ) .
and
NX× (x∗ , z ∗ )∗ = NX (x∗ )∗ × . (5.30)
Let d = (0, 1). By Eq. (5.30), this vector belongs to the set NX× (x∗ , z ∗ )∗ , and
also
0
[∇fi (x∗ ) , −1] = −1 < 0, ∀ i = 1, . . . , p.
1
Hence, CQ5a is satisfied, which together with Eq. (5.29) implies that there exists
a nonnegative vector µ∗ = (µ∗1 , . . . , µ∗p ) such that
µ∗j ∇fi (x∗ ) ∈ NX (x∗ ).
p
(i) − j=1
p
(ii) j=1
µ∗j = 1.
(iii) For all j = 1, . . . , p, if µ∗j > 0, then
fj (x∗ ) = max f1 (x∗ ), . . . , fp (x∗ ) .
15
5.10 (Exact Penalty Functions)
where
C = X ∩ x | h1 (x) = 0, . . . , hm (x) = 0 ∩ x | g1 (x) ≤ 0, . . . , gr (x) ≤ 0 ,
λi ∈ [−1, 1] if hi (x∗ ) = 0,
µj ∈ [0, 1] if gj (x∗ ) = 0.
By the definition of R-multipliers, the preceding relations imply that the vector
(λ∗ , µ∗ ) = c(λ, µ) is an R-multiplier for problem (5.31) such that
(b) Assume that the functions f and the gj are convex, the functions hi are
linear, and the set X is convex. Since x∗ is a local minimum of problem (5.31),
and (λ∗ , µ∗ ) is a corresponding Lagrange multiplier vector, we have by definition
that
m
∗
r
16
Since x∗ is feasible for the original problem, and (λ∗ , µ∗ ) satisfy Eq. (5.32), we
have for all x ∈ X,
FC (x∗ )= f (x∗ )
m
r
= Fc (x),
(a) The hypothesis implies that for every smooth cost function f for which x∗ is
a local minimum there exist scalars λ∗1 , . . . , λ∗m and µ∗1 , . . . , µ∗r satisfying
∗
m
r
µ∗j ≥ 0, ∀ j = 1, . . . , r,
µ∗j = 0, / A(x∗ ),
∀j∈
where
A(x∗ ) = j | gj (x∗ ) = 0, j = 1, . . . , r .
We claim that TX (x∗ ) ⊂ V (x∗ ). To see this, let y be a nonzero vector that
belongs to TX (x∗ ). Then, there exists a sequence {xk } ⊂ X such that xk = x∗
for all k and
xk − x∗ y
→ .
xk − x∗ y
17
Since xk ∈ X, for all i = m + 1, . . . , m and k, we have
(xk − x∗ ) o(xk − x∗ )
∇hi (x∗ ) + = 0.
xk − x∗ xk − x∗
(xk − x∗ ) o(xk − x∗ )
∇gj (x∗ ) + ≤ 0.
xk − x
∗ xk − x∗
Equation (5.35) and the preceding relation imply that y ∈ V (x∗ ), showing that
TX (x∗ ) ⊂ V (x∗ ).
Hence Eq. (5.34) implies that
∗
m
r
18
Convex Analysis and
Optimization
Chapter 6 Solutions
Dimitri P. Bertsekas
with
6.1
maximize µ1 + µ2
subject to µ1 ≥ 0, µ2 ≥ 0, −µ1 + µ2 − 1 ≤ 0, µ2 + 1 ≤ 0.
implying that
m
r
For any x ∈ X, we have hi (x) = 0 for all i = m + 1, . . . , m, and gj (x) ≤ 0 for all
j = r + 1, . . . , r, so that µ∗j gj (x) ≤ 0 for all j = r + 1, . . . , r. Therefore, it follows
from the preceding relation that
∗
m
r
2
Taking the infimum over all x ∈ X, it follows that
∗
m
r
≤ inf f (x)
x∈X, hi (x)=0, i=1,...,m
gj (x)≤0, j=1,...,r
=f ∗ .
Hence, equality holds throughout above, showing that the scalars λ∗1 , . . . , λ∗m ,
µ∗1 , . . . , µ∗r constitute a geometric multiplier for the original representation.
Consider the extended representation of the problem in which the linear inequal-
ities that represent the polyhedral part are lumped with the remaining linear
inequality constraints. From Prop. 6.3.1, finiteness of the optimal value implies
that there exists an optimal solution and a geometric multiplier. From Exercise
6.2, it follows that there exists a geometric multiplier for the original representa-
tion of the problem.
6.4 (Sensitivity)
We have
f = inf f (x) + µ g(x) − u ,
x∈X
f̃ = inf f (x) + µ̃ g(x) − ũ .
x∈X
We have
f − f̃ = inf f (x) + µ g(x) − u − inf f (x) + µ̃ g(x) − ũ
x∈X x∈X
= inf f (x) + µ g(x) − u − inf f (x) + µ̃ g(x) − u + µ̃ (ũ − u)
x∈X x∈X
3
6.5
m
If cj − i=1
µi aij = 0, then q(µ) = −∞. Thus the dual problem is
m
maximize µi bi
i=1
m
subject to µi aij = cj , j = 1, . . . , n, µ ≥ 0.
i=1
min −b µ,
Aµ=c,µ≥0
If ai x − bi < 0 for any i, then p(x) = −∞. Thus the dual of (D) is
maximize − c x
subject to A x ≥ b,
or
minimize c x
subject to A x ≥ b.
Aµ = c.
4
The Lagrangian optimality condition for (D) is
Next, consider
m
If cj − i=1
µi aij < 0, then q(µ) = −∞. Thus the dual problem is
m
maximize µi bi
i=1
m
subject to µi aij ≤ cj , j = 1, . . . , n, µ ≥ 0.
i=1
min −b µ,
Aµ≤c,µ≥0
If ai x − bi < 0 for any i, then p(x) = −∞. Thus the dual of (D) is
maximize − c x
subject to A x ≥ b, x≥0
or
minimize c x
subject to A x ≥ b, x ≥ 0.
The Lagrangian optimality condition for (P ) is
∗
m
m
5
from which we obtain the complementary slackness conditions for (P ):
m
m
c− µ∗i ai ≥ 0.
i=1
min ζ,
ζe≥A x
n
x =1, xi ≥0
i=1 i
whose optimal value is equal to minx∈X maxz∈Z x Az. Introduce dual variables
z ∈ m and ξ ∈ , corresponding to the constraints A x − ζe ≤ 0 and i=1 xi =
n
max ξ,
ξe≤Az, z∈Z
6
6.7 (Goldman-Tucker Complementarity Theorem [GoT56])
7
6.8
The problem of finding the minimum distance from the origin to a line is written
as
min 12 x 2
subject to Ax = b,
where A is a 2 × 3 matrix with full rank, and b ∈ 2 . Let f ∗ be the optimal value
and consider the dual function
1
q(λ) = min 2
x 2
+ λ (Ax − b) .
x
By Prop. 6.3.1, since the optimal value is finite, it follows that this problem
has no duality gap.
Let V ∗ be the supremum over all distances of the origin from planes that
contain the line {x | Ax = b}. Clearly, we have V ∗ ≤ f ∗ , since the distance to the
line {x | Ax = b} cannot be smaller than the distance to the plane that contains
the line.
We now note that any plane of the form {x | p Ax = p b}, where p ∈ 2 ,
contains the line {x | Ax = b}, so we have for all p ∈ 2 ,
V (p) ≡ min 1
2
x 2
≤ V ∗.
p Ax=p x
Since there is no duality gap for the original problem, we have supλ q(λ) = f ∗ , it
follows that equality holds throughout above. Hence V ∗ = f ∗ , which was to be
proved.
6.9
m
minimize fi (xi )
i=0
(6.4)
subject to xi ∈ Xi , i = 0, . . . , m xi = x0 , i = 1, . . . , m.
8
By relaxing the equality constraints, we obtain the dual function
m
m
= inf f0 (x) − (λ1 + · · · λm ) x + inf fi (x) + λi x ,
x∈X0 x∈Xi
i=1
which is of the form given in the exercise. Note that the infima above are attained
since fi are continuous (being convex functions over n ) and Xi are compact
polyhedra.
Because the primal problem involves minimization of the continuous func-
m
tion f (x) over the compact set ∩m
i=0 i i=0 Xi , a primal optimal solution exists.
Applying Prop. 6.4.2 to problem (6.4), we see that there is no duality gap and
there exists at least one geometric multiplier, which is a dual optimal solution.
6.10
M= µ ≥ 0 f ∗ = inf f (x) + µ g(x) .
x∈X
We will show that if the set M is nonempty and compact, then the Slater condi-
tion holds. Indeed, if this were not so, then 0 would not be an interior point of
the set
D = u | there exists some x ∈ X such that g(x) ≤ u .
µ g(x) ≥ 0, ∀ x ∈ X.
Hence, it follows that (µ + γµ) ∈ M for all γ ≥ 0, which contradicts the bound-
edness of M .
9
6.11 (Inconsistent Convex Systems of Inequalities)
6.12
− −c + µj aj ∈ ri(N ).
j=1
subject to d ∈ N ∗ ,
has an optimal solution, which we denote by d∗ . Consider the set
r
M= µ ≥ 0 − −c + µj aj ∈N ,
j=1
10
6.13 (Pareto Optimality)
(a) Assume that x∗ is not a Pareto optimal solution. Then there is a vector
x ∈ X such that either
or
f1 (x̄) < f1 (x∗ ), f2 (x̄) ≤ f2 (x∗ ).
Multiplying the left equation by λ∗1 , the right equation by λ∗2 , and adding the
two in either case yields
We first show that A is convex. Indeed, let (a1 , a2 ), and (b1 , b2 ) be elements of
A, and let (c1 , c2 ) = α(a1 , a2 ) + (1 − α)(b1 , b2 ) for any α ∈ [0, 1]. Then for some
xa ∈ X, xb ∈ X, we have f1 (xa ) ≤ a1 , f2 (xa ) ≤ a2 , f1 (xb ) ≤ b1 , and f2 (xb ) ≤ b2 .
Let xc = αxa + (1 − α)xb . Since X is convex, xc ∈ X. Since f is convex, we also
have
f1 (xc ) ≤ c1 , and f2 (xc ) ≤ c2 .
Hence, (c1 , c2 ) ∈ A and it follows
that A is a convex set.
For any x ∈ X, we have f1 (x), f2 (x) ∈ A. In addition, f1 (x∗ ), f2 (x∗ )
is in the boundary of A. [If this were not the case, then either (1) or (2) would
hold and x∗ would not be Pareto optimal.] Then by the Supporting Hyperplane
Theorem, there exists λ∗1 and λ∗2 , not both equal to 0, such that
Since
z1 andz2 can be made arbitrarily large, we must have λ∗1 , λ∗2 ≥ 0. Since
f1 (x), f2 (x) ∈ A, the above equation yields
or, equivalently,
min λ∗1 f1 (x) + λ∗2 f2 (x) ≥ λ∗1 f1 (x∗ ) + λ∗2 f2 (x∗ ).
x∈X
11
(c) Generalization of (a): If x∗ is a vector in X, and λ∗1 , . . . , λ∗m are positive
scalars such that
m
m
where aij are vectors in n and bij are scalars. Hence the constraint functions
can equivalently be represented as
aij x + bij ≤ 0, ∀ i = 1, . . . , m, ∀ j = 1, . . . , r.
By assumption, the set X is a polyhedral set, and the cost function f is a poly-
hedral function, hence convex over n . Therefore, we can use the Strong Duality
Theorem for linear constraints (cf. Prop. 6.4.2) to conclude that there is no du-
ality gap and there exists at least one geometric multiplier, i.e., there exists a
nonnegative vector µ such that
f ∗ = inf f (x) + µ g(x) .
x∈X
Let p(u) denote the primal function for this problem. The preceding relation
implies that
p(0) − µ u = inf f (x) + µ g(x) − u
x∈X
≤ inf f (x) + µ g(x) − u
x∈X, g(x)≤u
≤ inf f (x)
x∈X, g(x)≤u
= p(u),
which, in view of the assumption that p(0) is finite, shows that p(u) > −∞ for
all u ∈ r .
12
The primal function can be obtained by partial minimization as
where
F (x, u) = f (x) if gj (x) ≤ uj ∀ j, x ∈ X,
∞ otherwise.
Since, by assumption f is polyhedral, the gj are polyhedral (which implies that
the level sets of the gj are polyhedral), and X is polyhedral, it follows that
F (x, u) is a polyhedral function. Since we have also shown that p(u) > −∞ for
all u ∈ r , we can use Exercise 3.13 to conclude that the primal function p is
polyhedral, and therefore also closed. Since p(0), the optimal value, is assumed
finite, it follows that p is proper.
13
Convex Analysis and
Optimization
Chapter 7 Solutions
Dimitri P. Bertsekas
with
we have the inequality x λ ≤ f (x) + g(λ). In view of this inequality, the equality
x λ = f (x) + g(λ) of (i) is equivalent to the inequality
x λ − f (x) ≥ g(λ) = sup z λ − f (z) ,
z∈n
or
x λ − f (x) ≥ z λ − f (z), ∀ z ∈ n ,
or
f (z) ≥ f (x) + λ (z − x), ∀ z ∈ n ,
which is equivalent to (ii). Since f is closed, f is equal to the conjugate of g,
so by using the equivalence of (i) and (ii) with the roles of f and g reversed, we
obtain the equivalence of (i) and (iii).
(b) A vector x∗ minimizes f if and only if 0 ∈ ∂f (x∗ ), which by part (a), is true
if and only if x∗ ∈ ∂g(0).
(c) The result follows by combining part (b) and Prop. 4.4.2.
7.2
7.3
2
7.4 (Finiteness of the Optimal Dual Value)
and note that −q̃ is closed and convex, and that by the calculation of Example
7.1.6, we have
q̃(µ) = inf r p(u) + µ u , ∀ µ ∈ r . (1)
u∈
Since q̃(µ) ≤ p(0) for all µ ∈ r , given the feasibility of the problem [i.e.,
p(0) < ∞], we see that q ∗ is finite if and only if −q̃ is proper. From Eq. (1), −q̃
is the conjugate of p(−u), and by the Conjugacy Theorem [Prop. 7.1.1(b)], −q̃ is
proper if and only if p is proper. Hence, (i) is equivalent to (ii).
We note that the epigraph of p is the closure of M . Hence, given the
feasibility of the problem, (ii) is equivalent to the closure of M not containing a
vertical line. Since M is convex, its closure does not contain a line if and only if
M does not contain a line (since the closure and the relative interior of M have
the same recession cone). Hence (ii) is equivalent to (iii).
(a) We have
h(λ) = sup λ u − p(u)
u
= sup λ u − inf F (x, u)
u x
= sup λ u − F (x, u)
x,u
= G(0, λ).
Also
q(λ) = inf {w + λ u}
(u,w)∈M
= inf F (x, u) + λ u
x,u
= − sup −λ u − F (x, u)
x,u
= −G(0, −λ).
Consider the constrained minimization propblem of Example 7.1.6:
minimize f (x)
subject to x ∈ X, g(x) ≤ 0,
and define
F (x, u) = f (x) if x ∈ X and g(x) ≤ u,
∞ otherwise.
3
Then p is the primal function of the constrained minimization problem. Consider
now q(λ), the cost function of the max crossing problem corresponding to M . For
λ ≥ 0, q(λ) is equal to the dual function value of the constrained optimization
problem, and otherwise q(λ) is equal to −∞. Thus, the relations h(λ) = G(0, λ)
and q(λ) = −G(0, −λ) proved earlier, show the relation proved in Example 7.1.6,
i.e., that q(λ) = −h(−λ).
(b) Let
M = (u, w) | there is an x such that F (x, u) ≤ w .
Then the corresponding min common value is
inf w = inf F (x, 0) = p(0).
{(x,w) | F (x,0)≤w} x
Since p(0) is the min common value corresponding to epi(p), the min common
values corresponding to the two choises for M are equal. Similarly, we show that
the cost functions of the max crossing problem corresponding to the two choises
for M are equal.
(c) If F (x, u) = f1 (x) − f2 (Qx + u), we have
p(u) = inf f1 (x) − f2 (Qx + u) ,
x
so p(0), the min common value, is equal to the primal optimal value in the Fenchel
duality framework. By part (a), the max crossing value is
q ∗ = sup −h(−λ) ,
λ
= g2 (λ) − g1 (Qλ),
where g1 and g2 are the conjugate convex and conjugate concave functions of f1
and f2 , respectively:
g1 (λ) = sup x λ − f1 (x) , g2 (λ) = inf z λ − f2 (z) .
x z
Thus, no duality
gap in the min common/max crossing framework [i.e., p(0) =
q ∗ = supλ −h(−λ) ] is equivalent to no duality gap in the Fenchel duality
framework.
The minimax framework of Section 2.6.1 (using the notation of that section)
is obtained for
F (x, u) = sup φ(x, z) − u z .
z∈Z
4
7.6
By Exercise 1.35,
cl f1 + cl (−f2 ) = cl (f1 − f2 ).
Furthermore,
infn cl (f1 − f2 )(x) = infx∈n f1 (x) − f2 (x) .
x∈
Thus, we may replace f1 and −f2 with their closures, and the result follows by
applying Minimax Theorem III.
and
0 if x ∈ S,
f2 (x) =
−∞ otherwise.
The corresponding conjugate concave and convex functions g2 and g1 are
0 if λ ∈ S ⊥ ,
inf λ x =
x∈S −∞ / S⊥,
if λ ∈
By the Primal Fenchel Duality Theorem (Prop. 7.2.1), the dual problem has an
optimal solution and there is no duality gap if the functions fi are convex over
Xi and one of the following two conditions holds:
(1) The subspace S contains a point in the relative interior of X1 × · · · × Xn .
(2) The intervals Xi are closed (so that the Cartesian product X1 × · · · × Xn is
a polyhedral set) and the functions fi are convex over the entire real line.
These conditions correspond to the two conditions for no duality gap given fol-
lowing Prop. 7.2.1.
5
7.8 (Network Optimization and Kirchhoff ’s Laws)
maximize q(v)
subject to no constraints on p,
where
1
qij (vi − vj ) = min Rij x2ij − (vi − vj + tij )xij .
xij ∈ 2
Since the primal cost functions fij are real-valued and convex over the entire real
line, there is no duality gap. The necessary and sufficient conditions for a set of
variables {xij | (i, j) ∈ A} and {vi | i ∈ N } to be an optimal solution-Lagrange
multiplier pair are:
(1) The set of variables {xij | (i, j) ∈ A} must be primal feasible, i.e., Kirch-
hoff’s current law must be satisfied.
(2)
1 2
xij ∈ arg min Rij yij − (vi − vj + tij )yij , ∀ (i, j) ∈ A,
yij ∈ 2
6
Since f ∗ = p∗ , we see that f ∗ = inf x∈X f (x) + µ g(x) if and only if p∗ =
inf u∈P p(u) + µ u . In other words, the two problems have the same geometric
multipliers.
(b) This part was proved by the preceding argument.
(c) From Example 7.1.6, we have that −q(−µ) is the conjugate convex function
of p. Let us view the dual problem as the minimization problem
minimize − q(−µ)
(1)
subject to µ ≤ 0.
Its dual problem is obtained by forming the conjugate convex function of its
primal function, which is p, based on the analysis of Example 7.1.6, and the
closedness and convexity of p. Hence the dual of the dual problem (1) is
maximize − p(u)
subject to u ≤ 0
and the optimal solutions to this problem are the geometric multipliers to problem
(1).
(a) Define
X = (x, u, t) | x ∈ n , uj = Aj x + bj , tj = ej x + dj , j = 1, . . . , r ,
C = (x, u, t) | x ∈ n , uj ≤ tj , j = 1, . . . , r .
It can be seen that X is convex and C is a cone. Therefore the modified problem
can be written as
minimize f (x)
subject to x ∈ X ∩ C,
r r
λ x + zj uj + wj tj ≥ 0, ∀ (x, u, t) ∈ C.
j=1 j=1
7
By the conic duality theory of Section 7.2.2, the dual problem is given by
r
minimize (zj bj + wj dj )
j=1
r
subject to (Aj zj + wj ej ) = c, zj ≤ wj , j = 1, . . . , r.
j=1
If there exists a feasible solution of the modified primal problem satisfying strictly
all the inequality constraints, then the relative interior condition ri(X)∩ri(C) = Ø
is satisfied, and there is no duality gap. Similarly, if there exists a feasible solution
of the dual problem satisfying strictly all the inequality constraints, there is no
duality gap.
minimize xn+1
1/2 −1/2
subject to P0 x + P0 q0 ≤ xn+1
−1/2
1/2
qi Pi−1 qi
1/2
Pi x + Pi qi ≤ − ri , i = 1, . . . , p.
The optimal values of this problem and the original problem are equal up
to a constant and a square root. The above problem is of the type described
1/2 −1/2
in Exercise 7.10. To see this, define Ai = Pi | 0 , bi = Pi qi , ei = 0,
1/2 −1/2
di = qi Pi−1 qi − ri
1/2
for i = 1, . . . , p, A0 = P0 | 0 , b 0 = P0 q0 , e0 =
(0, . . . , 0, 1), d0 = 0, and c = (0, . . . , 0, 1). Its dual is given by
p
−1/2
1/2 −1/2
maximize − qi Pi zi + qi Pi−1 qi − ri wi − q0 P0 z0
i=1
p
1/2
subject to Pi zi = 0, z0 ≤ 1, zi ≤ wi , i = 1, . . . , p.
i=0
8
7.12 (Minimizing the Sum or the Maximum of Norms [LVB98])
minimize ||Fi x + gi ||
i=1
subject to x ∈ n .
minimize ti
i=1
subject to ||Fi x + gi || ≤ ti , i = 1, . . . , p.
Define
X = {(x, u, t) | x ∈ n , ui = Fi x + gi , ti ∈ , i = 1, . . . , p},
and
p p p
maximize − gi zi
i=1
p
9
Now, consider the problem
subject to x ∈ n .
minimize xn+1
subject to ||Fi x + gi || ≤ xn+1 , i = 1, . . . , p,
or equivalently
minimize en+1 x
subject to ||Ai x + gi || ≤ en+1 x, i = 1, . . . , p,
where x ∈ n+1 , Ai = (Fi , 0), and en+1 = (0, . . . , 0, 1) ∈ n+1 . Evidently, this
is a second-order cone programming problem. From Exercise 7.10 we have that
its dual problem is given by
p
maximize − gi zi
i=1
p
Fi
subject to zi + en+1 wi = en+1 , ||zi || ≤ wi , i = 1, . . . , p,
0
i=1
or equivalently
p
maximize − gi zi
i=1
p p
For v ∈ C p we have
p p
Re(vi )
v = |vi | =
1 Im(vi ) ,
i=1 i=1
where Re(vi ) and Im(vi ) denote the real and the imaginary parts of vi , respec-
tively. Then the complex l1 approximation problem is equivalent to
p
Re(ai x − bi )
minimize
Im(ai x − bi ) (1)
i=1
n
subject to x ∈ C ,
10
where ai is the i-th row of A (A is a p × n matrix). Note that
Re(ai x − bi ) Re(ai ) −Im(ai ) Re(x) Re(bi )
= − .
Im(ai x − bi ) Im(ai ) Re(ai ) Im(x) Im(bi )
By introducing new variables y = (Re(x ), Im(x )) , problem (1) can be rewritten
as
p
minimize Fi y + g i
i=1
subject to y ∈ 2n ,
where
Re(ai ) −Im(ai ) Re(bi )
Fi = , gi = − . (2)
Im(ai ) Re(ai ) Im(bi )
According to Exercise 7.12, the dual problem is given by
p
maximize Re(bi ), Im(bi ) zi
i=1
p
Re(ai ) Im(ai )
subject to zi = 0, zi ≤ 1, i = 1, . . . , p,
−Im(ai ) Re(ai )
i=1
subject to y ∈ 2n ,
where Fi and gi are given by Eq. (2). From Exercise 7.12, it follows that the dual
problem is
p
maximize Re(bi ), Im(bi ) zi
i=1
p p
Re(ai ) −Im(ai )
subject to zi = 0, wi = 1, zi ≤ wi ,
Im(ai ) Re(ai )
i=1 i=1
i = 1, . . . , p,
where zi ∈ 2 for all i.
11
7.14
r r
and is equivalent to
uj µ∗j ≤ Pj (uj ), ∀ uj ∈ , ∀ j = 1, . . . , r.
7.15 [Ber99b]
minimize f (x, y)
(1)
subject to x ∈ X, gj (x, y) ≤ 0, j = 1, . . . , r,
where X is a convex subset of n , and for each y ∈ Y , f (·, y) and gj (·, y) are
real-valued functions that are convex over X. We assume that for each y ∈ Y ,
this program has a finite optimal value, denoted by f ∗ (y). Let c > 0 denote a
penalty parameter and assume that the penalized problem
minimize f (x, y) + cg + (x, y)
(2)
subject to x ∈ X
has a finite optimal value, thereby coming under the framework of Section 7.3.
By Prop. 7.3.1, we have
f ∗ (y) = inf f (x, y) + cg + (x, y) , ∀ y ∈ Y, (3)
x∈X
if and only if
u µ∗ (y) ≤ c u+ , ∀ u ∈ r , ∀ y ∈ Y,
12
so this bound holds if and only if there exists a uniform bounding constant c > 0
such that
u µ∗ (y) ≤ c u+ , ∀ u ∈ r , ∀ y ∈ Y. (5)
Thus the bound (4), holds if and only if for every y ∈ Y , it is possible to select
a geometric multiplier
µ∗ (y) of the parametric problem (1) such that the set
∗
µ (y) | y ∈ Y is bounded.
Let us now specialize the preceding discussion to the parametric program
minimize f (x, y) = y − x
(6)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
This bound holds if a geometric multiplier µ∗ (y) of the projection problem (6)
can be found such that Eq. (5) holds. We will now show the reverse assertion.
Indeed, assume that for some c, Eq. (7) holds, and to arrive at a contra-
diction, assume that there exist x ∈ X and y ∈ Y such that
+
d(y) > y − x + c g(x) .
≥ inf y−z ,
z∈X, g(z)≤0
Using Prop. 7.3.1, this implies that there exists a geometric multiplier µ∗ (y) such
that
u µ∗ (y) ≤ c u+ , ∀ u ∈ r , ∀ y ∈ X.
This in turn implies the boundedness of the set µ∗ (y) | y ∈ X .
13
Convex Analysis and
Optimization
Chapter 8 Solutions
Dimitri P. Bertsekas
with
8.1
∇q(µ∗ )(µ∗ − µ) ≥ 0, ∀ µ ≥ 0.
If µ∗j = 0, then by letting µ = µ∗ + γej for a scalar γ ≥ 0, and the vector ej whose
jth component is 1 and the other components are 0, from the preceding relation
we obtain ∂q(µ∗ )/∂µj ≤ 0. Similarly, if µ∗j > 0, then by letting µ = µ∗ + γej
for a sufficiently small scalar γ (small enough so that µ∗ + γej ∈ M ), from the
preceding relation we obtain ∂q(µ∗ )/∂µj = 0. Hence
∂q(µ∗ )/∂µj ≤ 0, ∀ j = 1, . . . , r,
∇q(µ∗ ) = g(x∗ ),
for some vector x∗ ∈ X such that q(µ∗ ) = L(x∗ , µ∗ ). This and the preceding
two relations imply that x∗ and µ∗ satisfy the necessary and sufficient optimality
conditions for an optimal solution-geometric multiplier pair (cf. Prop. 6.2.5). It
follows that there is no duality gap, a contradiction.
Consider the incremental subgradient method with the stepsize α and the starting
point x = (αM C0 , αM C0 ), and the following component processing order:
M components of the form |x1 | [endpoint is (0, αM C0 )],
M components of the form |x1 + 1| [endpoint is (−αM C0 , αM C0 )],
M components of the form |x2 | [endpoint is (−αM C0 , 0)],
M components of the form |x2 + 1| [endpoint is (−αM C0 , −αM C0 )],
M components of the form |x1 | [endpoint is (0, −αM C0 )],
M components of the form |x1 − 1| [endpoint is (αM C0 , −αM C0 )],
2
M components of the form |x2 | [endpoint is (αM C0 , 0)], and
M components of the form |x2 − 1| [endpoint is (αM C0 , αM C0 )].
With this processing order, the method returns to x at the end of a cycle.
Furthermore, the smallest function value within the cycle is attained at points
(±αM C0 , 0) and (0, ±αM C0 ), and is equal to 4M C0 + 2αM 2 C02 . The optimal
function value is f ∗ = 4M C0 , so that
1 αC 2
2αM 2 C02 = ,
16 2
and therefore
βαC 2
lim inf f (ψi,k ) ≥ f ∗ + ,
k→∞ 2
with β = 1/16.
dk = g k + β k dk−1 ,
we obtain
(µ∗ − µk ) dk = (µ∗ − µk ) g k + β k (µ∗ − µk ) dk−1 . (8.1)
We further have
3
Since
µk−1 − µk ≤ sk−1 dk−1 ,
it follows that
(µ∗ − µk ) dk ≥ (µ∗ − µk ) g k .
Similar to the proof of Prop. 8.2.1, it can be seen that this relation holds for k = 0.
For k > 0, by using the nonexpansive property of the projection operation, we
obtain
we further obtain
implying that
µ∗ − µk+1 2 < µk − µ∗ 2 .
We next prove that
(µ∗ − µk ) dk (µ∗ − µk ) g k
≥ .
dk g k
4
since this inequality and Eq. (8.0) imply the desired relation. If g k dk−1 ≥ 0,
then by the definition of dk and β k , we have that dk = g k , and we are done, so
assume that g k dk−1 < 0. We then have
dk 2 = g k 2 + 2β k g k dk−1 + (β k )2 dk−1 2 .
Since β k = −γg k dk−1 /dk−1 2 , it follows that
2β k g k dk−1 + (β k )2 dk−1 2 = 2β k g k dk−1 − γβ k g k dk−1 = (2 − γ)β k g k dk−1 .
Furthermore, since g k dk−1 < 0, β k ≥ 0, and γ ∈ [0, 2], we see that
2β k g k dk−1 + (β k )2 dk−1 2 ≤ 0,
implying that
dk 2 ≤ g k 2 .
subject to x ∈ X,
where
fi (x) = max (bi − Ax) λi , i = 1, . . . , m,
B λi ≤di
i
and the outcome i occurs with probability πi . Assume that for each outcome
i ∈ {1, ..., m} and each vector x ∈ n , the maximum in the expression for fi (x)
is attained at some λi (x). Then, the vector A λi (x) is a subgradient of fi at x.
One possible form of the randomized incremental subgradient method is
xk+1 = PX xk − αk (gk + A λkωk ) ,
5
8.5
Consider the cutting plane method applied to the following one-dimensional prob-
lem
maximize q(µ) = −µ2 ,
subject to µ ∈ [0, 1].
Suppose that the method is started at µ0 = 0, so that the initial polyhedral ap-
proximation is Q1 (µ) = 0 for all µ. Suppose also that in all subsequent iterations,
when maximizing Qk (µ), k = 0, 1, ..., over [0, 1], we choose µk to be the largest of
all maximizers of Qk (µ) over [0, 1]. We will show by induction that in this case,
we have µk = 1/2k−1 for k = 1, 2, ....
Since Q1 (µ) = 0 for all µ, the set of maximizers of Q1 (µ) = 0 over [0, 1] is
the entire interval [0, 1], so that the largest maximizer is µ1 = 1. Suppose now
that µi = 1/2i−1 for i = 1, ..., k. Then
The maximum value of Qk+1 (µ) over [0, 1] is 0 and it is attained at any point
in the interval [0, µk /2]. By the induction hypothesis, we have µk = 1/2k−1 ,
implying that the largest maximizer of Qk+1 (µ) over [0, 1] is µk+1 = 1/2k .
Hence, in this case, the cutting plane method generates an infinite sequence
{µk } converging to the optimal solution µ∗ = 0, thus showing that the method
need not terminate finitely even if it starts at an optimal solution.
≤ f (xk ) + µ g(xk )
= f (xk ) + µk g(xk ) + g(xk ) (µ − µk )
= q(µk ) + + g(xk ) (µ − µk ),
6
where
q(µ∗ ) − q(µk )
sk = ,
g k 2
q(µ∗ ) − q(µk )
µk+1 − µ∗ 2 ≤ µk − µ∗ 2 − q(µ∗ ) − q(µk ) − 2 .
g
k 2
µk+1 − µ∗ ≤ µk − µ∗ .