0% found this document useful (0 votes)
22 views

NLP Notes

The document discusses convex sets and related concepts. It defines convex sets and provides examples. It also covers convex hulls, interior and closure of convex sets, separation of disjoint convex sets, and polyhedral sets. Key results proved include Caratheodory's theorem and properties regarding minimization of distance between a point and a convex set.

Uploaded by

Sony Asampalli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

NLP Notes

The document discusses convex sets and related concepts. It defines convex sets and provides examples. It also covers convex hulls, interior and closure of convex sets, separation of disjoint convex sets, and polyhedral sets. Key results proved include Caratheodory's theorem and properties regarding minimization of distance between a point and a convex set.

Uploaded by

Sony Asampalli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Convex Sets

The concept of convexity is of great importance in the study of optimization problems.


Convex sets, polyhedral sets, and separation of disjoint convex sets are used frequently
in the analysis of mathematical programming problems, the characterization of their
optimal solutions, and the development of computational procedures.

Under convex sets, we shall study

• Convex Hulls

• Closure and interior of a convex set

• Separation and support of convex sets

• Convex cones and polarity

• Polyhedral sets, extreme points,


and extreme directions

• Linear programming and the simplex method

Definition 1.1 (Convex Set). A nonempty set S in Rn is said to be convex if the line
segment joining any two points in the set is contained in S. Equivalently, if x, y ∈ S
and λ ∈ [0, 1], then λx + (1 − λ)y ∈ S.

Examples of Convex Sets:

1. S = {(x, y, z) : x + 2y − z = 4} ⊂ R3

2. S = {(x, y, z) : x + 2y − z ≤ 4} ⊂ R3

3. S = {(x, y, z) : x + 2y − z ≤ 4, 2x − y + z ≤ 6}

4. S = {(x, y) : y ≥ |x|} ⊂ R2

5. S = {(x, y) : x2 + y 2 ≤ 4} ⊂ R2

6. S is the set of solutions to the linear programming problem: Minimize ct x subject


to Ax = b, x ≥ 0.

1
Pk
Definition 1.2. Let x1 , x2 , . . . , xk be in Rn . Then x =
λi xi is called a convex
i=1
Pk
combination of the k points provided λi ≥ 0 for each i and i=1 λi = 1. If the

nonnegativity condition on λi s is dropped, then the combination is called an affine


combination.

Exercise 1.1. Shaw that a set S is convex if, and only if, for any positive integer k,
any convex combination any k points in S is in S.

Lemma 1.1. If S and T are convex sets, then

1. S ∩ T is convex,

2. S + T = {x + y : x ∈ S, y ∈ T } is convex,

3. S − T = {x − y : x ∈ S, y ∈ T } is convex.

Definition 1.3(Convex Hull). Let S be a nonempty set in Rn . The convex hull of


S is defined as the set of all convex combinations of S. Convex hull of S is denoted
by H(S). Note that x ∈ H(S) if, and only if, there exist x1 , x2 , . . . , xk (k is a positive
integer) in S such that x is convex combination of x1 , x2 , . . . , xk .

Exercise 1.2. What is the convex hull of three noncollinear points in R2 ?

Definition 1.4(Polytope and Simplex). The convex hull of a finite number of


points x1 , x2 , . . . , xk+1 in Rn is called a polytope. The polytope is called a simplex
with vertices x1 , x2 , . . . , xk+1 provided xk+1 − x1 , xk+1 − x2 , . . . , xk+1 − xk are linearly
independent.

Exercise 1.3. Is the polytope formed by (1, 1), (2, 2), (0, 0) a simplex? Draw a simplex
in R2 . Can you draw a simplex in R2 with four vertices in R2 ?

Theorem 1.1 (Caratheodory). Let S ⊆ Rn . If x ∈ H(S), then x can be written as


a convex combination of n + 1 points from S.

Proof. Let x ∈ H(S). By definition, there exist k + 1 points x1 , x2 , . . . , xk+1 in S (k is


Pk+1
a nonnegative integer) such that x = i=1 λi xi with λi > 0 for all i. If k ≤ n, there
is nothing to prove. Suppose k ≥ n + 1. It follows xk+1 − x1 , xk+1 − x2 , . . . , xk+1 − xk

2
Pk
are linearly dependent. This implies x = i=1µi (xk+1 − xi ) = 0 for some real numbers
Pk
µ1 , µ2 , . . . , µk , at least one of them different from zero. Let µk+1 = − i=1 µi . Then
Pk+1 Pk+1
i=1 µi = 0 and i=1 µi xi = 0. Since λi s are positive, we can choose a constant c

such that βi = λi − cµi ≥ 0 for all i and equal to zero for at least one i. Note that
k+1
X k+1
X k+1
X k+1
X
x= λi xi = λi xi − c µi xi = βi xi .
i=1 i=1 i=1 i=1

From the above, it follows that x can be written as a convex combination of k points
from S. If k ≤ n, the proof is complete. Otherwise we repeat the argument to show that
x can be written as a convex combination of fewer than k points from S. This argument
can be continued until x can be written as a convex combination of n + 1 points from
S. 2

Definition 1.5. Let S ⊆ Rn . A point x ∈ Rn is said to be an interior point of S if


N (x) ⊆ S for some  > 0, where N (x) = {u ∈ Rn : ||u − x|| < }. A point x is said
to be in the closure of S if there exists a sequence x1 , x2 , . . . of points from S such that
the sequence converges to x. The set of all interior points of S is denoted by int(S) and
the set of all points in the closure of S is denoted by Cl(S). The set of all points in the
intersection of Cl(S) and Cl(S c ) is called the boundary of S and is denoted by ∂S (S c
stands for complement of S in Rn ). A set S is said to be open if int(S) = S; and S is
said to be closed if Cl(S) = S. A set S is said to be compact if it is closed and bounded.

Exercise 1.4. Show that every polytope in Rn is a compact convex set.

Theorem 1.2. Let S be a nonempty convex set in Rn with nonempty interior. Let
x1 ∈ Cl(S) and let x2 ∈ int(S). Then λx1 + (1 − λ)x2 ∈ int(S) for all λ ∈ (0, 1).

Proof. Since x2 ∈ int(S), there exists an  > 0 such that N (x2 ) ⊂ S. Fix any λ ∈ (0, 1)
and let y = λx1 + (1 − λ)x2 . Will show that y ∈ int(S) by showing that Nδ (y) ⊂ S
where δ = (1 − λ).

δ−||z−y||
Let z ∈ Nδ (y). Since x1 ∈ Cl(S), there exists a z1 ∈ S such that ||z1 − x1 || < λ .

z−λz1
Let z2 = 1−λ . Then
z − λz1
||z2 − x2 || = || − x2 ||
1−λ
(z − λz1 ) − (y − λx1 )
= || ||
1−λ

3
1
= ||(z − y) + λ(x1 − z1 )||
1−λ
1
≤ [||(z − y)|| + λ||(x1 − z1 )||]
1−λ
< 

Corollary 1.1. If S is a convex set, then so is int(S).

Corollary 1.2. If S is a convex set with nonempty interior, then Cl(S) is a convex set.

Corollary 1.3. If S is a convex set with nonempty interior, then Cl(int(S)) = Cl(S).

Corollary 1.4. If S is a convex set with nonempty interior, then int(Cl(S)) = int(S).

Separation and Support of Convex Sets

The notion of separation and support of disjoint convex sets plays an important role in
deriving results regarding optimality conditions and also in computing optimal solutions.

Theorem 1.3. Let S be a nonempty closed convex set in Rn and let y ∈ Rn \ S. Then,
there exists a unique point x̄ ∈ S with minimum distance from S. Furthermore, x̄ is
the minimizing point if, and only if, (x − x̄)t (x̄ − y) ≥ 0 for all x ∈ S.

Proof. Let γ = inf x∈S ||x − y||. Since S is closed and y 6∈ S, γ > 0. It follows that there
exists a sequence xk ∈ S such that ||xk − y|| → γ. Clearly xk is a bounded sequence and
hence must have a convergent subsequence. Without loss of generality, we may assume
that xk itself converges to a point x̄. Since S is closed, x̄ ∈ S and ||x̄ − y|| = γ.

Suppose there is another point Z ∈ S such that


1
||z − y|| = γ. Since S is convex, 2 (z + x̄) ∈ S and
z + x̄ 1
γ ≤ ||(y − )|| ≤ (||y − z|| + ||y − x̄||) = γ.
2 2
It follows that y − z = p(y − x̄). Since ||y − z|| = ||y − x̄|| = γ, it follows that either
z+x̄
p = 1 or p = −1. If p = −1, then we must have y = 2 which in turn implies that
y ∈ S which is a contradiction. It follows that p = 1 and hence z = x̄.

Next, suppose x̄ is such that x − x̄)t (x̄ − y) ≥ 0 for all x ∈ S. Let x ∈ S. Then,

||x − y||2 = ||x − x̄ + x̄ − y)||2

4
= ||x − x̄||2 + ||x̄ − y||2 + 2(x − x̄)t (x̄ − y)

Since ||x − x̄||2 ≥ 0 and x − x̄)t (x̄ − y) ≥ 0, it follows from the above that ||x − y||2 ≥
||x̄ − y||2 and hence the x̄ is the distance minimizing point.

Next, assume that x̄ is the distance minimizing point. Fix x ∈ S and λ ∈ (0, 1).
Since S is convex, x̄ + λ(x − x̄) ∈ S and since x̄ is the minimizing point

||y − x̄ − λ(x − x̄)||2 ≥ ||y − x̄||2

Also
||y − x̄ − λ(x − x̄)||2 = ||y − x̄||2 + λ2 ||x − x̄||2 + 2λ(x̄ − y)t (x − x̄)

Rearranging and noting that x̄ is the minimizing point, we get

λ2 ||x − x̄||2 + 2λ(x̄ − y)t (x − x̄) = ||y − x̄ − λ(x − x̄)||2 − ||y − x̄||2 ≥ 0

Dividing both sides by λ and taking limit as λ → 0, we get (x − x̄)t (x̄ − y) ≥ 0 and this
completes the proof. 2

Definition 1.6 (Hyperplane). A hyperplane in Rn is a set of the form H = {x :


pt x = α}, where p is a nonzero vector in Rn and α is a real number. The sets

H + = {x : pt x ≥ α} and H − = {x : pt x ≤ α}

are called closed half spaces, and the sets

H + = {x : pt x > α} and H − = {x : pt x < α}

are called open half spaces.

The vector p is called the normal to the hyperplane H.

A hyperplane passing through a point x̄ can be written as H = {x : pt (x − x̄) = 0}.

Definition 1.7 (Separation). Let S and T be two nonempty subsets of Rn . A


hyperplane H = {x : pt x = α} is said to separate S and T if pt x ≥ α for all x ∈ S,
and pt x ≤ α for all x ∈ T . The separation is said to be proper if SU T 6⊆ H. The
separation is said to be strict if pt x > α for all x ∈ S, and pt x < α for all x ∈ T . The
separation is said to be strong if there exists an  > 0 such that pt x > α +  for all
x ∈ S, and pt x ≤ α for all x ∈ T .

5
Theorem 1.4. Let S be a nonempty closed convex set in Rn and let y ∈ Rn \ S.
Then there exist a nonzero vector p and a real number of α such that for all x ∈ S,
pt y > α ≥ pt x.

Proof. From Theorem 1.3, there exists a unique x̄ ∈ S such that

(x − x̄)t (x̄ − y) ≥ 0 for all x ∈ S.

Taking p = y − x̄ and α = pt x̄, we have p 6= 0 and pt x ≤ α for all x ∈ S. Also,


0 < pt p = pt (y − x̄) = pt y − pt x̄ = pt y − α which implies pt y > α. 2

Corollary 1.5. Every closed convex set in Rn is the intersection of all halfspaces
containing it.

Proof. Let S be a closed convex set (nonempty) in Rn . Suffices to show that if y is in


the intersection of half spaces containing S, then y ∈ S. Suppose, to the contrary, y 6∈ S.
From the theorem, there exist a nonzero vector p and a real number of α such that for
all x ∈ S, pt y > α ≥ pt x. Note that the halfspace H = {x : pt x ≤ α} contains S. Since
y is in the intersection of halfspaces containing S, y must belong to this halfspace. But
this implies that pt y ≤ α which is a contradiction. 2

Corollary 1.6. Let S be a nonempty set in Rn and let y 6∈ Cl(H(S)). Then, there
exists a hyperplane that strongly separates S and y.

Proof. Exercise.

Theorem 1.5 (Farka’s Lemma). Let A be an m × n real matrix and let c ∈ Rn .


Then, exactly one of the following systems has a solution:

System 1. Ax ≤ 0 and ct x > 0 for some x ∈ Rn

System 2. At y = c and y ≥ 0 for some y ∈ Rm

Proof. If both systems have solutions, say, x and y, then 0 < ct x = y t Ax ≤ 0. The
last inequality follows from Ax ≤ 0 and y ≥ 0. From this contradiction, it follows that
both systems cannot have solutions simultaneously.

To complete the proof, assume that System 2 has no solution. Let S = {u : u =

6
At y, for some y ≥ 0}. Since System 2 has no solution, c 6∈ S. From Theorem 4, there
exists a nonzero x and a number α such that xt u ≤ α for all u ∈ S and xt c > α. Since
0 ∈ S, α ≥ 0. It follows that xt At y ≤ 0 for all y ≥ 0, and hence xt At ≤ 0. Thus, we
have x satisfying Ax ≤ 0 and ct x > 0. 2

Corollary 1.7. (Gordan’s Theorem). Let A be an m × n real matrix. Then, exactly


one of the following systems has a solution:

System 1. Ax < 0 for some x ∈ Rn

System 2. At y = 0 and y ≥ 0 for some nonzero y ∈ Rm

Proof. Exercise. Deduce the proof from Farka’s lemma.

Corollary 1.8. Let A be an m × n real matrix and let c ∈ Rn . Then, exactly one of
the following systems has a solution:

System 1. Ax ≤ 0, x ≥ 0, ct x > 0 for some x ∈ Rn

System 2. At y ≥ c and y ≥ 0 for some y ∈ Rm

Proof. Exercise (Use [At − I] in System 2).

Corollary 1.9. Let A be an m × n matrix, let B be an l × n matrix and let c ∈ Rn .


Then, exactly one of the following systems has a solution:

System 1. Ax ≤ 0, Bx = 0, ct x > 0 for some x ∈ Rn

System 2. At y + B t z = c and y ≥ 0 for some y ∈ Rm and z ∈ Rl .

Proof. Exercise (Use [At B t − Bt] for A and


z = z + − z − in System 2).

Support of Sets at Boundary Points

Definition 1.8. Let S be a nonempty set in Rn , and let x̄ ∈ ∂S. A hyperplane


H = {x : pt (x − x̄) = 0} is called a supporting hyperplane of S at x̄ if either S ⊆ H +

7
or S ⊆ H − . If, in addition, S 6⊆ H, then H is called a proper supporting hyperplane of
S at x̄.
See Figure-6.

Exercise 5. Show that the hyperplane H = {x : pt (x − x̄) = 0} is a supporting


hyperplane of S at x̄ ∈ ∂S if, and only if, either pt x̄ = inf{pt x : x ∈ S} or pt x̄ =
sup{pt x : x ∈ S}.

It will be shown that convex sets have supporting hyperplanes at each of the boundary
points.

Theorem 1.6. Let S be a nonempty convex set in Rn , and let x̄ ∈ ∂S. Then, there
exists a supporting hyperplane H = {x : pt (x − x̄) = 0} that supports S at x̄.

Proof. Since x̄ ∈ ∂S, there is a sequence yk 6∈ Cl(S) such that yk → x̄. Since, Cl(S) is a
closed convex set, for each k, there exists a nonzero pk such that ptk x ≤ ykt x̄ for all x ∈ S.
We may assume, without loss of generality, that ||pk || = 1 for each k and that pk is a
convergent sequence. Let p be the limit of pk . Note that p 6= 0 as ||p|| will be 1. Taking
limits in the above inequality, we get pt x ≤ pt x̄. Therefore, H = {x : pt (x − x̄) = 0}
supports S at x̄. 2
Exercises

E1.6. Let S be a nonempty convex set in Rn . If x̄ 6∈ int(S), then show that there exists
a nonzero p such that pt (x − x̄) ≤ 0 for all x ∈ S.

E1.7. Let S be a nonempty set in Rn . If y 6∈ Cl(H(S)), then show that there exists a
hyperplane that separates y and S.

E1.8. Let S be a nonempty set in Rn , and let x̄ ∈ ∂S ∩ ∂H(S). Show that there exists
a hyperplane that supports S at x̄.
Separation of Two Convex Sets

Theorem 1.7. Let S and T be two nonempty disjoint convex sets in Rn . Then there
exists a hyperplane that separates S and T , that is, there exists a nonzero p such that

p t x ≤ pt y for all x ∈ S and for all y ∈ T .

8
Proof. Let U = S − T . Then U is a convex set and 0 6∈ U as S and T are disjoint.
In particular, 0 6∈ int(U ). Therefore, there exists a nonzero p such that pt u ≤ 0 for all
u ∈ U . That is, pt (x − y) ≤ 0 for all x ∈ S and for all y ∈ T . Hence the theorem. 2
Exercises

E1.9. How will you define proper, strict and strong separation of sets?

E1.10. Let S and T be two nonempty convex sets in Rn with int(T ) 6= ∅ and S ∩
int(T ) = ∅. Show that there exists a hyperplane that separates S and T .

E1.11. Deduce Gordan’s theorem using results on separation of convex sets.

E1.12. Let S and T be two nonempty sets in Rn with their convex hulls having
nonempty interiors. Assume that H(S) ∩ int(H(T )) = ∅. Show that there exists a
hyperplane that separates S and T .
Strong Separation of Convex Sets

Theorem 1.8. Let S and T be two nonempty disjoint closed convex sets in Rn . If S is
bounded, then there exists a hyperplane that strongly separates S and T , that is, there
exists a nonzero p and an  > 0 such that

p t x ≥  + pt y for all x ∈ S and for all y ∈ T .

Proof. Let U = S − T . Then U is a convex set and 0 6∈ U as S and T are disjoint. Note
that U is a closed set. To see this, let uk ∈ U be a sequence converging to u. Then,
for each k, there exist xk ∈ S and y k ∈ T such that uk = xk − y k . Since S is compact,
we may assume, without loss of generality, that xk converges to some x ∈ S (as S is
closed). This implies that y k → x − u. Since T is closed y ∈ T . Hence u = x − y and
u ∈ U . From Theorem 1.4, exists a nonzero p and a number  such that pt u ≥  for all
u ∈ U and pt 0 < . That is, pt (x − y) ≥  for all x ∈ S and for all y ∈ T . And hence
pt x ≥  + pt y for all x ∈ S and for all y ∈ T . 2

Exercise 1.13. Prove or disprove: If S and T are two nonempty disjoint closed convex
sets, then there exists a hyperplane that separates S and T .
Convex Cones, Polarity and Polyhedral Sets

Definition 1.9. A nonempty set C in Rn is called a cone with vertex zero if x ∈ C

9
implies λx ∈ C for all λ ≥ 0. If, in addition, C is convex, then C is called a convex
cone.

Draw the graphs of the following sets and check which of these are cones: (i) S =
{λ(1, 3) : λ ≥ 0}, (ii) S = {λ(1, 3) + β(2, 1) : λ, β ≥ 0}.

Exercise 1.14. Let S be a nonempty set in Rn . The polar cone of S, denoted by S ∗ ,


is defined as the set {p ∈ Rn : pt x ≤ 0 for all x ∈ S}.

Lemma 1.2. Let S and T be nonempty sets in Rn . The following statements hold
good:

1. S ∗ is a closed convex cone.

2. S ⊆ S ∗∗ , where S ∗∗ is the polar cone of S ∗ .

3. S ⊆ T implies T ∗ ⊆ S ∗ .

Proof. 1. Let x ∈ S ∗ and λ ≥ 0. Let y ∈ S. By definition xt y ≤ 0 and hence


(λx)t y ≤ 0. Thus, λx ∈ S ∗ and S ∗ is a cone.

2. Let x ∈ S. To show that x ∈ S ∗∗ we need to show that for any y ∈ S ∗ , xt y ≤ 0.


Fix any y ∈ S ∗ . Since y ∈ S ∗ , for every u ∈ S, y t u ≤ 0. Since x ∈ S, y t x ≤ 0. Hence,
x ∈ S ∗∗ and S ⊆ S ∗∗ .

3. Let y ∈ T ∗ . To show that y ∈ S ∗ , we need to show that for any x ∈ S, y t x ≤ 0. Fix


x ∈ S. Since S ⊆ T , x ∈ T . Since y ∈ T ∗ , y t x ≤ 0.2

Theorem 1.9. Let C be a closed convex cone. Then C = C ∗∗ .

Proof. From the Lemma, C ⊆ C ∗∗ . Conversely, let y ∈ C ∗∗ . To the contrary assume


that y 6∈ C. From Theorem 1.4, there exists a nonzero p such that pt y > pt x for all
x ∈ C. This implies pt x ≤ 0 for all x ∈ C (for if pt x > 0 for some x ∈ C, then we
can choose a λ > 0 such that λpx > pt y which implies that in the inequality is flouted
by λx which is in C as C is a cone). Hence, p ∈ C ∗ . Since y ∈ C ∗∗ , we must have
pt y ≤ 0. Since 0 ∈ C, we must also have pt y > 0. From this contradiction, it follows
that C = C ∗∗ . 2

10
Definition 1.10. Let A be an m × n real matrix and let b ∈ Rn . The set S = {x :
Ax = b, x ≥ 0} is called a polyhedral set.

Definition 1.11. Let S ⊆ Rn and x̄ ∈ S. Say that x̄ is an extreme point of S if the


following implication holds:

[x, y ∈ S, λ ∈ (0, 1), x̄ = λx + (1 − λ)y] ⇒ x = y = x̄.

Definition 1.12. Let S ⊆ Rn . A nonzero d ∈ Rn is called a direction of S if ∀ x ∈ S


and ∀ λ > 0, x + λd ∈ S. Two directions, d and f are said to be distinct if d 6= λf for
any λ > 0. An extreme direction of S is a direction of S that cannot be written as a
nonnegative linear combination of two distinct directions of S.

Theorem 1.10. Let A be an m × n real matrix of rank m and let b ∈ Rm . Let


S = {x : Ax = b, x ≥ 0}. Then, x̄ ∈ S is an extreme point of S if, and only if, the
columns of A corresponding to the the positive coordinates of x̄ are linearly independent.

Theorem 1.11. Let A be an m × n real matrix of rank m and let b ∈ Rm . Let


S = {x : Ax = b, x ≥ 0}. Then, d ∈ S is an extreme direction of S if, and only if,
there exist a nonsingular submatrix B of A of order m, a column Aj of A not in B and
a positive number µ such that dB = −µB −1 Aj , dj = µ, and all other coordinates of d
are zero.
Representation Theorem

Theorem 1.12. Let A be an m × n real matrix of rank m and let b ∈ Rm . Let


S = {x : Ax = b, x ≥ 0}. Let x1 , x2 , . . . , xg be the extreme points and d1 , d2 , . . . , dh be
the extreme directions of the set S. Then,
Xg h
X g
X
S={ λi xi + µj dj : λi , µj ≥ 0 ∀i, j, and λi = 1}
i=1 j=1 i=1

Exercise 1.15. Show that S has finite number of extreme points and extreme directions.

Exercise 1.16. Show that (see Theorem 1.12) either

ct xi = min{ct x : Ax = b, x ≥ 0} for some i

or ct dj < 0 for some j.

Exercises

11
E1.17. Let S be a compact set in Rn . Show that H(S) is closed. Is this result true if
S is only closed and not bounded.

E1.18. Show
 that the system
 Ax ≤ 0 and ct x > 0 has a solution x ∈ Rn , where
1 −1 −1
A=  and c = (1, 0, 5)t .
2 2 0

E1.19. Let A be a p × n matrix and B be a q × n matrix. Show that exactly one of the
following systems has a solution:

System 1. Ax < 0, Bx = 0 for some x ∈ Rn

System 2. At u + B t v = 0 for some (u, v), u 6= 0, v ≥ 0

E1.20. Let S and T be two nonempty convex sets in Rn . Show that there exists a
hyperplane that separates S and T if, and only if, inf{||x−y|| : x ∈ S, y ∈ T } > 0.

E1.21. Let S and T be two nonempty disjoint convex sets in Rn . Show that there exist
nonzero vectors p and q such that

pt x + q t y ≥ 0 for all x ∈ S and all y ∈ T.

E1.22. Let C and D be convex cones in Rn . Show that C + D is a convex cone and
that C + D = H(C ∪ D).

E1.23. Let C be a convex cone in Rn . Show that C + C ∗ = Rn . Is this representation


unique?

12
Convex Functions

Convex function play an important role in optimization and development computational


algorithms. In this chapter we shall look at convex function, their properties and some
generalizations of convex functions. Concavity is like flip side of convexity. If a function
f is convex, then −f is concave. Almost all the results that we derive for convex func-
tions can be stated for concave functions with necessary twists.Under convex functions
we shall study

1. Basic definitions of convex functions

2. Subgradients of convex functions

3. Differentiable convex functions

4. Maxima and minima of convex functions

5. Generalizations of convex functions

Definition 2.1. Let S be a nonempty convex set in Rn . A function f : S → R is said


to be convex on S if

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for all x, y ∈ S and all λ ∈ (0, 1)

Some examples of convex functions are

1. f (x) = 3x + 4

2. f (x) = |x|

3. f (x) = x2 − 2x

4. f (x) = xt Ax where A is a positive semidefinite matrix of order n and x ∈ Rn .


The function f (x) = − x is a convex function on R+ .

Lemma 2.1. Let f : S → R be a convex function, where S is a convex subset of Rn .


Then for any real number α, the level set fα defined by fα = {x : f (x) ≤ α} is a convex
set.

13
Proof. Let x, y ∈ fα and let λ ∈ (0, 1). Then, f (x) ≤ α and f (y) ≤ α. By convexity of
f,
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λα + (1 − λ)α = α

and hence λx + (1 − λ)y ∈ fα 2

Exercise 2.1. Show that f : S → R is a convex function if, and only if, for all λi ∈ (0, 1)
λi = 1 and for all xi ∈ S, f ( λi xi ) ≤ λi f (xi ).
P P P
with

Continuity of Convex Functions

Theorem 2.1. Let S be a convex set in Rn with nonempty interior and let f : S → R
be a convex function. Then f is continuous on the interior of S.

Proof. Let x̄ ∈ int(S). Fix  > 0. Since x̄ ∈ int(S), there exists a µ such that ||x− x̄|| <
µ implies x ∈ S. Let θ = 1 + maxi {max{f (x̄ + µei ) − f (x̄), f (x̄ − µei ) − f (x̄)}}, where ei
x̄+µei +x̄−µei
is the ith column of the identity matrix. Since f is convex and x̄ = 2 , either
f (x̄ + µei ) − f (x̄) ≥ 0 or f (x̄ − µei ) − f (x̄) ≥ 0. Therefore, θ > 0. Let δ = min{ nµ , nθµ
}.
Now, fix any x such that ||x − x̄|| < δ. Will show that ||f (x) − f (x̄)|| < .

|xi −x̄i | 1 µ 1
Then µ( αi2 ) 2 = ||x − x̄|| <
P
Let αi = µ . n which in turn implies αi ≤ n for
each i. Thus, for each i, 0 ≤ nαi ≤ 1. Let z i be the vector with its ith coordinate as
µ(xi − x̄i )/|xi − x̄i |, and all other coordinates as zero. Then, x = x̄ + αi z i .
P

X
f (x)
= f (x̄ + αi z i )
1X
= f( [x̄ + nαi z i ])
n
1X
≤ f ([x̄ + nαi z i ])
n
1X
= f ((1 − nαi )x̄ + nαi (x̄ + z i ))
n
1X
≤ [(1 − nαi )f (x̄) + nαi f (x̄ + z i )]
n
1X
= f (x̄) + nαi [f (x̄ + z i ) − f (x̄)] (1)
n
P
Note that αi < δ/µ and hence αi < nδ/µ < /θ. Therefore, rewriting (1), we get

1X
f (x) − f (x̄) ≤ nαi θ ≤  (2)
n

14
Next, let y = 2x̄ − x. Then, ||y − x̄|| < δ and hence

f (y) − f (x̄) ≤  (3)

Since x̄ = 21 x + 12 y, we have

1 1
f (x̄) ≤ f (x) + f (y) (4)
2 2
Combining (3) and (4), we get f (x̄)−f (x) ≤ . This completes the proof of the theorem.
2
Directional Derivative of Convex Functions

Definition 2.2. Let S be a nonempty set in Rn and let f : S → R be a function.


Let x̄ ∈ S. A nonzero vector d ∈ Rn is said to be a feasible direction of S at x̄ if
there exists a δ > 0 such that x̄ + λd ∈ S for all λ ∈ (0, δ). Furthermore, for a feasible
direction d of S at x̄, f is said to have a directional derivative at x̄ in the direction
d if the following limit exists:

f (x̄ + λd) − f (x̄)


f (x̄; d) = lim
λ→0+ λ
Note that we use the notation f (x̄; d) to denote the directional derivative of f at x̄ in
the direction d.

If the function f is convex and is defined globally (that is, S = Rn ), then the directional
derivative exists at all x ∈ Rn . However, when S is not whole of Rn , the directional
derivative may not exist on ∂S.

Lemma 2.2. Let f : Rn → R be a convex function. Consider any point x̄ ∈ Rn and a


direction d. Then, f (x̄; d) exists.

Proof. Let λ2 > λ1 > 0. Then

λ1 λ1
f (x̄ + λ1 d) = f[ (x̄ + λ2 d) + (1 − )x̄]
λ2 λ2
λ1 λ1
≤ f (x̄ + λ2 d) + (1 − )f (x̄)
λ2 λ2
(by convexity of f )

Rearranging the terms in the above inequality, we get

f (x̄ + λ1 d) − f (x̄) f (x̄ + λ2 d) − f (x̄)


≤ (5)
λ1 λ2

15
f (x̄+λd)−f (x̄)
Let g(λ) = λ . Then g is a nondecreasing function of λ over R+ .

Also by convexity of f , for any λ >, we have

λ 1
f (x̄) = f[ (x̄ − d) + (x̄ + λd)]
1+λ 1+λ
λ 1
≤ f (x̄ − d) + f (x̄ + λd) (6)
1+λ 1+λ

Rearranging the terms in (6), we get

f (x̄ + λ) − f (x̄)
g(λ) = ≥ f (x̄) − f (x̄ − d).
λ

Thus, g(λ) is bounded below and hence the limλ→0+ g(λ) exists. 2
Subgradients of Convex Functions

Definition 2.3. Let f : S → R. The set {(x, f (x)) : x ∈ S} ⊆ Rn+1 is called the graph
of the function. Furthermore, the sets {(x, y) : x ∈ S and y ≥ f (x)} and {(x, y) : x ∈ S
and y ≤ f (x)} are called the epigraph and hypograph of f respectively.

We shall denote the epigraph of a function by epi(f ) and its hypograph by hyp(f ).

Theorem 2.2. Let S be a nonempty convex set of Rn and let f : S → R be a function.


Then, f is convex if, and only if, epi(f ) is a convex set.

Proof. Assume f is convex. Let (x, y), (u, v) ∈ epi(f ) where x, u ∈ S. Let λ ∈ (0, 1).
Then, we have
f (x) ≤ y and f (u) ≤ v.

Since f is convex,

f (λx + (1 − λ)u) ≤ λf (x) + (1 − λ)f (u) ≤ λy + (1 − λ)v.

Hence λ(x, y) + (1 − λ)(u, v) ∈ epi(f ).

Conversely, assume that epi(f ) is convex. Let x, u ∈ S and let λ ∈ (0, 1). Let y =
f (x) and let v = f (u). Then, (x, y) and (u, v) are in epi(f ). Since epi(f ) is convex,
λ(x, y) + (1 − λ)(u, v) ∈ epi(f ). Hence

λy + (1 − λ)v ≥ f (λx + (1 − λ)u) or f (λx + (1 − λ)u) ≤ λf (x) + (1 − λ)f (u).

It follows that f is convex. 2

16
Definition 2.4. Let S be a nonempty convex set in Rn and let f : S → R be convex.
Then a vector ξ ∈ Rn is called a subgradient of f at a point x̄ ∈ S if

f (x) ≥ f (x̄) + ξ t (x − x̄) for all x ∈ S. (7)

17
Definition 2.5. Let S be a nonempty convex set in Rn and let f : S → R. Say that
f is concave on S if −f is convex. If f is a concave function, then a vector ξ ∈ Rn is
called a subgradient of f at a point x̄ ∈ S if

f (x) ≤ f (x̄) + ξ t (x − x̄) for all x ∈ S. (8)

Exercise 2.2. Analyze the convexity of the function h(x) = min{f (x), g(x)} where
f (x) = 4 − |x| and g(x) = 4 − (x − 2)2 , x ∈ R.

Theorem 2.3. Let S be a nonempty convex set of Rn and let f : S → R be a convex


function. Then for every x̄ ∈ int(S), f has a subgradient ξ at x̄.

Proof. Since f is convex, epi(f ) is convex by Theorem 2.2. Note that the point (x̄, f (x̄))
is a point on the boundary of epi(f ). By Theorem 1.6, there exists a hyperplane that
supports epi(f ) at (x̄, f (x̄)). That is, there will exist a nonzero (p, q) with p ∈ Rn and
q ∈ R such that

pt x̄ + qf (x̄) ≥ pt x + qy for all x ∈ S and all y ≥ f (x). (9)

We claim that q 6= 0. To the contrary, assume q = 0. Then from (9), we have pt x̄ ≥ pt x


for all x ∈ S. As x̄ is an interior point of S, x̄ + δp ∈ S for all positive δ sufficiently
small. This means pt x̄ ≥ pt (x̄ + δp) for a δ > 0. This implies pt p ≤ 0 which in turn
implies p = 0 leading to a contradiction that (p, q) = 0. Hence, it follows that q 6= 0.
This clubbed with (9) implies that q < 0. Dividing both sides of (9) by q and letting
−1
ξ= q p, we have

−ξ t x̄ + f (x̄) ≤ −ξ t x + y for all x ∈ S and all y ≥ f (x).

Letting y = f (x) in the above inequality we get

f (x) ≥ f (x̄) + ξ t (x − x̄) for all x ∈ S.

Thus, ξ is a subgradient of f at x̄. 2

Definition 2.6. Let S be a nonempty convex set in Rn and let f : S → R. Say that f
is strictly convex on S if

f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for all x, y ∈ S, x 6= y, and all λ ∈ (0, 1)

18
Corollary 2.1. Let S be a nonempty convex set of Rn and let f : S → R be a convex
function. Then for every x̄ ∈ int(S), there exists a ξ such that

f (x) > f (x̄) + ξ t (x − x̄) for all x ∈ S \ {x̄}.

Proof. By Theorem 2.2, there exists a ξ such that

f (x) ≥ f (x̄) + ξ t (x − x̄) (10)

for all x ∈ S. Assume, if possible, the equality holds for some u ∈ S f (u) = f (x̄) +
ξ t (u − x̄). Let λ ∈ (0, 1). By strict convexity of f , we have

f (λu + (1 − λ)x̄) < λf (u) + (1 − λ)f (x̄)

= λf (x̄) + λξ t (u − x̄) + (1 − λ)f (x̄)

= f (x̄) + λξ t (u − x̄)

Taking x = λu + (1 − λ)x̄ in (10), we have

f (λu + (1 − λ)x̄) ≥ f (x̄) + λξ t (u − x̄) > f (λu + (1 − λ)x̄)

which is a contradiction. Corollary follows. 2

Theorem 2.4. Let S be a nonempty convex set of Rn and let f : S → R. Suppose


for every x ∈ int(S), f has a subgradient ξ at x. That is, suppose for each x ∈ int(S),
there exists a ξx such that

f (u) ≥ f (x) + ξ t (u − x) for all u ∈ S.

Then f is convex on int(S).

Proof. Let x, y ∈ int(S) and let λ ∈ (0, 1). Note that x̄ = λx + (1 − λ)y is in int(S).
From the hypothesis, there exists a ξ such that

f (u) ≥ f (x̄) + ξ t (u − x̄) for all u ∈ S. (11)

Note that x − x̄ = (1 − λ)(x − y) and y − x̄ = −λ(x − y). Substituting x and y for u in


(11) we get
f (x) ≥ f (x̄) + (1 − λ)ξ t (x − y) (12)

and
f (y) ≥ f (x̄) − λξ t (x − y) (13)

19
Multiplying (12) by λ both sides and (13) by (1 − λ) both sides and adding the resulting
inequalities, we get

λf (x) + (1 − λ)f (y) ≥ f (x̄) = f (λx + (1 − λ)y).

Hence, f is convex on int(S). 2

Exercise 2.3. Consider the function f defined on S = {(x, y) : 0 ≤ x, y ≤ 1} as follows:



 0, 0 ≤ x ≤ 1, 0 < y ≤ 1;
f (x, y) =
 1 − (x − 1 ), 0 ≤ x ≤ 1, y = 0.
4 2

Is f a convex function? Does f have subgradient vectors at all interior points? If so,
what are they?

Differentiable Convex Functions

Definition 2.7. Let S be a set in Rn with nonempty interior and let f : S → R. Let
x̄ ∈ int(S). Say that f is differentiable at x̄ if there exists a vector ∇f (x̄), called the
gradient vector of f at x̄, and there exists a function α : Rn → R, such that

f (x) = f (x̄) + ∇f (x̄)t (x − x̄) + ||x − x̄||α(x̄, x − x̄)

where limx→x̄ α(x̄, x−x̄) = 0. If T is a open subset of S, then f is said to be differentiable


on T if f is differentiable at each point in T .

Remark 2.1. When f is differentiable at x̄, then the gradient vector is unique and is
given by
 t
∂f (x̄) ∂f (x̄) ∂f (x̄)
∇f (x̄) = , , .....,
∂x1 ∂x2 ∂xn

Theorem 2.5. Let S be a nonempty convex set of Rn and let f : S → R be convex.


Suppose f is differentiable at x̄ ∈ int(S). Then f has a unique subgradient at x̄ and is
equal to the gradient of f at x̄.

Proof. Let ξ be a subgradient of f at x̄ (exists by Theorem 2.3.) so that

f (x) ≥ f (x̄) + ξ t (x − x̄) for all x ∈ S. (14)

Since f is differentiable at x̄,

f (x) = f (x̄) + ∇f (x̄)t (x − x̄) + ||x − x̄||α(x̄, x − x̄) (15)

20
Subtracting (15) from (14), we get

(ξ − ∇f (x̄))t (x − x̄) − ||x − x̄||α(x̄, x − x̄) ≤ 0 (16)

Letting x = x̄ + δ(ξ − ∇f (x̄)) in the above inequality (for sufficiently small δ > 0), we
get
δ(ξ − ∇f (x̄))t (ξ − ∇f (x̄)) − δ||ξ − ∇f (x̄)||α(x̄, δ(ξ − ∇f (x̄))) ≤ 0 (17)

Dividing both sides by δ and taking limit as δ → 0+ , we get

(ξ − ∇f (x̄))t (ξ − ∇f (x̄)) ≤ 0

which implies ξ = ∇f (x̄). 2

Theorem 2.6. Let S be a nonempty open convex set of Rn and let f : S → R be


differentiable on S. Then f is convex if, and only if, for every x̄ ∈ S

f (x) ≥ f (x̄) + ∇f (x̄)t (x − x̄) for all x ∈ S.

Similarly, f os strictly convex if, and only if, for every x̄ ∈ S

f (x) > f (x̄) + ∇f (x̄)t (x − x̄) for all x ∈ S, x 6= x̄.

Proof. Exercise.

Remark 2.2. Consider the optimization problem: Minimize f (x), x ∈ X, where


X ⊆ Rn . Note that the right hand sides of the inequalities of the above theorem
provide lower bounds on f . Furthermore, the bounds are affine functions of x. This
aspect is very useful in developing algorithms to solve the optimization problems.

Remark 2.3. Consider the system of nonlinear constraints defined by the set

X = {x : gi (x) ≤ 0, i = 1, 2, .., m, }

where gi s are differentiable convex functions on X. Instead of looking at the nonlinear


system of constraints, we may first try and solve a linear system of constraints (linear
program) by looking at the problem

Y = {x : gi (x̄) + ∇gi (x̄)t (x − x̄) ≤ 0, i = 1, 2, .., m.}

21
Note that X ⊆ Y as x ∈ X implies gi (x) ≤ 0 and by the above theorem

gi (x̄) + ∇gi (x̄)t (x − x̄) ≤ gi (x) ≤ 0,

which implies x ∈ Y . In other words, we first try to solve a linear program over a bigger
set of X which is a polyhedral approximation of X and then try to push this bigger set
towards X in successive iterations. Here Y is called a relaxation of X.

Theorem 2.7. Let S be a nonempty open convex set of Rn and let f : S → R be


differentiable on S. Then f is convex if, and only if, for every x, y ∈ S, we have

[∇f (y) − ∇f (x)]t (y − x) ≥ 0.

Similarly, f is strictly convex if, and only if, for every x, y ∈ S, x 6= y, we have

[∇f (y) − ∇f (x)]t (y − x) > 0.

Proof. Fix x, y ∈ S and let x̄ = λx + (1 − λ)y, where λ ∈ (0, 1). From Theorem 2.6, f
is convex if, and only if, for all v ∈ S,

f (u) ≥ f (v) + ∇f (v)t (u − v) for all u ∈ S.

Substituting x and y for u and v in the above inequality, we

f (x) ≥ f (y) + ∇f (y)t (x − y)

and
f (y) ≥ f (x) + ∇f (x)t (y − x).

Combining these two inequalities, we can write

f (x) ≥ f (x) + ∇f (x)t (y − x) + ∇f (y)t (x − y)

which is same as
[∇f (y) − ∇f (x)]t (y − x) ≥ 0.

Therefore, convexity of f implies the above inequality.

Conversely, assume that for every u, v ∈ S, we have

[∇f (v) − ∇f (u)]t (v − u) ≥ 0.

22
Fix x, y ∈ S. By Mean Value Theorem,

f (x) = f (y) + ∇f (x̄)t (x − y), (18)

where x̄ = λx + (1 − λ)y for some λ ∈ (0, 1).

From the hypothesis, we have

(1 − λ)[∇f (x) − ∇f (x̄)]t (x − y) ≥ 0

Dividing by (1 − λ) and substituting for ∇f (x̄) from (18), we get f (y) ≥ f (x) +
∇f (x)t (y − x). By an earlier theorem, it follows that f is convex. 2

Exercise 2.4. Prove the above theorem for strict convexity part.

Definition 2.8. Let S be a set in Rn with nonempty interior and let f : S → R.


Let x̄ ∈ int(S). Say that f is twice differentiable at x̄ if there exist a vector ∇f (x̄), a
symmetric matrix H(x̄), called the Hessian matrix, and a function α : Rn → R, such
that

1
f (x) = f (x̄) + ∇f (x̄)t (x − x̄) + (x − x̄)t H(x̄)(x − x̄) + ||x − x̄||2 α(x̄, x − x̄)
2

for each x ∈ S and limx→x̄ α(x̄, x − x̄) = 0.

Remark 2.4. When the function f is twice differentiable, the Hessian matrix is given
by  
∂ 2 f (x̄) ∂ 2 f (x̄) ∂ 2 f (x̄)
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
 
 ∂ 2 f (x̄) ∂ 2 f (x̄) ∂ 2 f (x̄) 
 ∂x2 ∂x1 ∂x22
... ∂x2 ∂xn

H(x̄) = 



 ... ... ... ... 
 
∂ 2 f (x̄) ∂ 2 f (x̄) ∂ 2 f (x̄)
∂xn ∂x1 ∂xn ∂x2 ... ∂x2n

Exercise 2.5. Find the Hessian matrix of f (x, y) = 2x + 6y − 2x2 − 3y 2 + 4xy.

Exercise 2.6. Find the gradient and Hessian matrix of f (x) = ct x + xt Ax where A is
an n × n matrix and c, x ∈ Rn .

Definition 2.9. A square matrix A of order n is said to be positive semidefinite (positive


definite) if xt Ax ≥ 0 for all x (xt Ax > 0 for all x 6= 0).

23
Theorem 2.8. Let S be a nonempty open convex set of Rn and let f : S → R be
twice differentiable on S. Then f is convex if, and only if, the Hessian matrix of f is
positive semidefinite at each point in S.

Proof. Assume that f is convex. Fix an x̄ ∈ S. Then for any x ∈ Rn , x̄ + λx ∈ S for


all λ sufficiently small. We have

f (x̄ + λx) ≥ f (x̄) + λ∇f (x̄)t x (19)

and by twice differentiability of f

1
f (x̄ + λx) = f (x̄) + λ∇f (x̄)t x + λ2 xt H(x̄)x + λ2 ||x||2 α(x̄, λx) (20)
2

Combining (18) and (19) we get

1 2 t
λ x H(x̄)x + λ2 ||x||2 α(x̄, λx) ≥ 0
2

Dividing both sides by λ and taking limit as λ → 0, we get xt H(x̄)x ≥ 0. As x was


arbitrary, it follows that H(x̄) is positive semidefinite.

Next, assume that H(x) is positive semidefinite for all x ∈ S. Fix any x, y ∈ S. By
mean value theorem, we have

1
f (x) = f (y) + ∇f (y)t (x − y) + (x − y)t H(x̄)(x − y),
2

where x̄ = λx + (1 − λ)y for some λ ∈ (0, 1). Since x̄ ∈ S, (x − y)t H(x̄)(x − y) ≥ 0 and
hence f (x) ≥ f (y) + ∇f (y)t (x − y) and by Theorem 2.6 it follows that f is convex. 2

Theorem 2.9. Let S be a nonempty open convex set of Rn and let f : S → R be


twice differentiable on S. If the Hessian matrix of f is positive definite at each point in
S, then f is strictly convex.

Exercise 2.7. Prove the above theorem.

Exercise 2.8. Let f : Rn → R be a function. For any x, d ∈ Rn , define the function


g(x,d) : R → R by g(x,d) (λ) = f (x + λd). Show

1. f is convex if, and only if, g(x,d) is convex for all x ∈ Rn and for all nonzero
d ∈ Rn .

24
2. f is strictly convex if, and only if, g(x,d) is strictly convex for all x ∈ Rn and for
all nonzero d ∈ Rn .

25
Minima and Maxima of Convex Functions

Let f : Rn → R be a function, where S ⊆ Rn . Consider the optimization problem:


Minimize f (x) subject x ∈ S.

Definition 2.10. Every x ∈ S is called a feasible solution to the optimization problem.


An x̄ ∈ S is called a solution to the problem (also called called global optimal solution
or simply optimal solution to the problem) if f (x) ≥ f (x̄) for all x ∈ S. An x̄ ∈ S
is said to be a local optimal solution (or local minimum) if there exists an  > 0
such that f (x) ≥ f (x̄) for all x ∈ S with ||x − x̄|| < . An x̄ ∈ S is said to be a strict
local optimal solution (or strict local minimum) if there exists an  > 0 such that
f (x) > f (x̄) for all x ∈ S with ||x − x̄|| <  and x 6= x̄. An x̄ ∈ S is said to be a Strong
optimal solution (or strong local minimum) if there exists an  > 0 such that x̄ is
the only local optimal solution in S ∩ N (x̄), where N (x̄) is the -neighbourhood of x̄.

Exercise 2.9. Give an example to distinguish strict and strong local optimal solutions.
Show that every strong local optimal solution is a strict optimal solution. Is the converse
true?

Theorem 2.10. Let f : S → R be a convex function, where S is a nonempty convex


subset of Rn . Suppose x̄ ∈ S is a local optimal solution to the optimization problem:
Minimize f (x) subject x ∈ S.

1. Then, x̄ is a global optimal solution.

2. If x̄ is a strict local minimum, then x̄ is unique global optimal solution.

3. If f is strictly convex, then x̄ is unique global optimal solution.

Proof.
1. To the contrary, assume that f (y) < f (x̄) for some y ∈ S. For λ ∈ (0, 1), λy + (1 −
λ)x̄ ∈ S, and by convexity of f ,

f (λy + (1 − λ)x̄) ≤ λf (y) + (1 − λ)f (x̄)

< λf (x̄) + (1 − λ)f (x̄) = f (x̄).

26
For λ sufficiently close to 0, λy + (1 − λ)x̄ ∈ S can be made arbitrarily close to x̄ which
will contradict local optimality of x̄.

Exercise 2.10. Prove parts 2 and 3 of the above theorem.

Theorem 2.11. Let f : Rn → R be a convex function, where S is a nonempty convex


subset of Rn . An x̄ ∈ S is an optimal solution to the problem minimize f (x) subject
x ∈ S if, and only if, f has a subgradient ξ at x̄ such that ξ t (x − x̄) ≥ 0 for all x ∈ S.

Proof. Suppose f has a subgradient ξ at x̄ such that ξ t (x − x̄) ≥ 0 for all x ∈ S. Then,

f (x) ≥ f (x̄) + ξ t (x − x̄) for all x ∈ S.

Since ξ t (x − x̄) ≥ 0 for all x ∈ S, we have f (x) ≥ f (x̄) for all x ∈ S and hence x̄ is a
solution to the problem.

Conversely, assume that x̄ ∈ S is an optimal solution to the problem. Define the


sets
U = {(x − x̄, y) : x ∈ Rn , y > f (x) − f (x̄)}

V = {(x − x̄, y) : x ∈ S, y ≤ 0}

Since S is convex, it follows that V is convex. Using convexity of f , it can be checked


that U is convex (Check this! ). Since x̄ is an optimal solution to the problem, it follows
that U ∩ V = ∅. From Theorem 1.7, it follows that there exists a non-zero vector (β, µ)
and a number α such that

β t (x − x̄) + µy ≤ α for all x ∈ Rn and y > f (x) − f (x̄), (21)

and
β t (x − x̄) + µy ≥ α for all x ∈ S and y ≤ 0. (22)

If µ > 0, then (22) will be violated for large negative y. Hence µ ≤ 0. Letting x = x̄
and y =  > 0 in (21), we get µ ≤ α. This implies, α ≥ 0. Taking x = x̄ and y = 0 in
(22), we get α ≤ 0. Hence α = 0.

Suppose µ = 0. Taking x = x̄ + β in (21), we get β t β ≤ 0 which in turn implies


(β, µ) = 0, a contradiction. It follows that µ < 0.

Dividing both sides of (22) by −µ and letting ξ = − µ1 β, we get

ξ t (x − x̄) − y ≥ 0 for all x ∈ S and all y ≤ 0.

27
Taking y = 0 in the above inequality, we get

ξ t (x − x̄) ≥ 0 for all x ∈ S.

Since x̄ is optimal f (x) − f (x̄) ≥ 0 for all x ∈ S. Dividing both sides of (21) by −µ and
rearranging we get

ξ t (x − x̄) ≤ y for all x ∈ Rn and y > f (x) − f (x̄).

Taking x ∈ S and taking limit as y → f (x) − f (x̄) we get,

ξ t (x − x̄) ≤ f (x) − f (x̄) for all x ∈ S.

Thus, ξ is a subgradient of f at x̄ with ξ t (x − x̄) ≥ 0 for all x ∈ S. 2

Corollary 2.2. Let f : Rn → R be a function, where S is a nonempty open convex


subset of Rn . An x̄ ∈ S is an optimal solution to the problem minimize f (x) subject
x ∈ S if, and only if, f has a zero subgradient at x̄.

Proof. By the theorem, x̄ is optimal if, and only if, ξ t (x − x̄) ≥ 0 for all x ∈ S. Take
x = x̄ − λξ where λ > 0 such that x ∈ S (S is open). This is possible, if, and only if,
ξ = 0. 2

Corollary 2.3. Let f : Rn → R be a differentiable function, where S is a nonempty


open convex subset of Rn . Then, an x̄ ∈ S is an optimal solution to the problem of
minimizing f (x) subject x ∈ S if, and only if, ∇f (x̄)t (x − x̄) ≥ 0 for all x ∈ S. In
addition, if S is also open, then x̄ is an optimal solution if, and only if, ∇f (x̄) = 0.

Proof. Exercise.

Method of Feasible Directions

Consider the problem: Minimize f (x) subject x ∈ S. We shall assume that both S and
f are convex. To find a solution to this problem, we start with an x̄ ∈ S. If x̄ is not
an optimal solution the problem, then we must have an x ∈ S such that the function f
must decrease in the direction x − x̄ from f (x̄) (otherwise, x̄ will become a local optimal
solution and by convexity x̄ will also be a global optimal solution). Note that in this
case, x − x̄ will be a feasible direction of S at x̄ (recall the Definition 2.2 of feasible

28
direction). To obtain the direction, we solve the following optimization problem :

Minimize f [x̄ + λ(x − x̄)] subject to x ∈ S. (23)

Suppose f is a convex function over entire Rn and we are interested in the problem:
Minimize f (x) subject x ∈ S, where S is any arbitrary set, not necessarily convex.
Again, let us start our search for an optimal solution from a point x̄ ∈ S. If x̄ is an
optimal solution to the problem, then any y ∈ Rn with f (y) < f (x̄) must not be in S.
Let y ∈ Rn be such that f (y) < f (x̄). Then, if x̄ is optimal to the problem, we must
have
f (x̄) > f (y) ≥ f (x̄) + ∇f (x̄)t (y − x̄)

which implies that ∇f (x̄)t (y − x̄) < 0. In other words, the hyperplane
t
H = {u : ∇f (x̄) (u − x̄) = 0} separates the set of all ys that are better than x̄
(that is, f (y) < f (x̄)) from S. Thus, if x̄ is an optimal solution, then we must have
∇f (x̄)t (x − x̄) ≥ 0 for all x ∈ S. Therefore, the problem reduces to

Minimize ∇f (x̄)t (x − x̄) subject to x ∈ S. (24)

Note that (24) has a linear objective function, and if S is a polyhedral set, then the
problem reduces to a Linear Programming problem.

Theorem 2.12. Consider the problem of minimizing f (x) subject to x ∈ S, where f is


a convex function and twice differentiable function, and S is a convex set, and suppose
that there exists an optimal solution x̄. Then, the set of alternative optimal solution to
the problem is given by

V = {x ∈ S : ∇f (x̄)t (x − x̄) ≤ 0, and ∇f (x) = ∇f (x̄)}

Proof. Let U be the set of all optimal solutions to the problem. Note that U 6= ∅ as
x̄ ∈ U . Consider any y ∈ V . By the convexity of f and the definition of V , we have
y ∈ S and

f (x̄) ≥ f (y) + ∇f (y)t (x̄ − y) = f (y) + ∇f (x̄)t (x̄ − y) ≥ f (y)

and hence we must have y ∈ U . Thus, V ⊆ U .

To prove the converse, let y ∈ U . Then, y ∈ S and f (y) = f (x̄). This means that
f (y) ≥ f (x̄)+∇f (x̄)t (y − x̄) or that ∇f (x̄)t (y − x̄) ≤ 0. Since x̄ is optimal, we must have

29
∇f (x̄)t (y − x̄) ≥ 0 and hence ∇f (x̄)t (y − x̄) = 0. By interchanging the roles of y and
x̄, we must have ∇f (y)t (y − x̄) = 0. Therefore,

[∇f (x̄) − ∇f (y)]t (x̄ − y) = 0

Note that

[∇f (x̄) − ∇f (y)] = ∇f [y + λ(x̄ − y)]λ=1


λ=0
Z λ=1
= H[y + λ(x̄ − y)](x̄ − y)dλ = G(x̄ − y),
λ=0
R λ=1
where G = λ=0
H[y + λ(x̄ − y)]dλ (the integral is performed each element-wise of the
Hessian matrix. Observe that G is positive semidefinite. It follows that (x̄−y)t G(x̄−y) =
0 which in turn implies G(x̄ − y) = 0 and hence ∇f (y) = ∇f (x̄). 2

Quasiconvex Functions

Definition 2.11. Let S be nonempty convex subset of Rn . A function from f : S → R


is said to be quasiconvex if

f (λx + (1 − λ)y) ≤ max(f (x), f (y))

∀x, y ∈ S, ∀λ ∈ [0, 1].

The function f is said be quasiconcave if −f is quasiconvex.

Exercise 2.11. Show that f is quasiconvex if, and only if, its level sets are convex.

Theorem 2.13. Let S be nonempty open convex subset of Rn . Let f : S → R be


a function differentiable on S. Then f is quasiconvex if, and only if, the following
implication holds good:

∀x, y ∈ S, f (x) ≤ f (y) ⇒ ∇f (y)t (x − y) ≤ 0. (25)

Proof. Assume f is quasiconvex. Fix x, y ∈ S. We may assume f (x) ≤ f (y). For any
λ ∈ (0, 1), by differentiability of f ,

f (λx + (1 − λ)y) − f (y) = λ∇f (y)t (x − y) + λ || x − y || α(y; λ(x − y)),

30
where α(y; λ(x − y)) → 0 as λ → 0. Since f is quasiconvex, the LHS is nonpositive, and
this implies
λ∇f (y)t (x − y) + λ || x − y || α(y; λ(x − y)) ≤ 0.

Dividing both sides by λ and taking the limit as λ → 0 we get

∇f (y)t (x − y) ≤ 0.

Conversely, assume that (25) holds. Need to show that f is quasiconvex. Take x, y ∈ S.
We may assume f (x) ≤ f (y).Suppose there exists a λ ∈ (0, 1) such that

f (z) > f (y) where z = λx + (1 − λ)y.

Since f is differentiable on S, it is continuous on S. This implies, there exists a δ ∈ (0, 1)


such that
f [µz + (1 − µ)y] > f (y) ∀µ ∈ [δ, 1] ( as f (z) > f (y)),

f [δz + (1 − δ)y] < f (z)

The last inequality follows as δz + (1 − δ)y is close to y for δ small, and f (z) > f (y).
By mean value theorem,

∇f (u)t ((1 − δ)(z − y)) = f (z) − f (δz + (1 − δ)y) > 0,

where u = µz + (1 − µ)y for some µ ∈ (δ, 1). This implies, as z − y = λ(x − y),
∇f (u)t (x − y) > 0.
On the other hand, as f (u) > f (y) ≥ f (x), from (25), ∇f (u)t (x − u) ≤ 0. As x − u =
(1 − λµ)(x − y), the last inequality implies ∇f (u)t (x − y) ≤ 0 which is a contradiction.
It follows that f is quasiconvex. 2

Theorem 2.14. Let S be nonempty compact polyhedral set in Rn . Let f : Rn → R


be a quasiconvex and continuous function on S. Consider the problem of maximizing
f (x) subject to x ∈ S. There exists an optimal solution x̄ to the problem which is an
extreme point of S.

31
Proof. Since S is compact, it has no directions and every point of S is a convex
combination of its extreme points. Let x1 , x2 , . . . , xp be the extreme points of S. Let xq
Pp
be such that f (xq ) = max{f (xi ) : 1 ≤ i ≤ p}. Given x ∈ S, we can write x = i=1 λi xi ,
a convex combination of extreme points of S. Note that as f is quasiconvex,

Xp
f (x) = f ( λi xi ) ≤ max{f (xi ) : 1 ≤ i ≤ p} = f (xq ).
i=1

Therefore, xq is an optimal solution to the problem and proof is complete.

One of the sufficient conditions for a local optimal solution to be a global optimal
solution is that the function f is strictly quasiconvex.

Definition 2.12. Let f : S → R be a function where S is a nonempty convex set in Rn .


The function f is said to be strictly quasiconvex if for each x, y ∈ S with f (x) 6= f (y),

f (λx + (1 − λ)y) < max(f (x), f (y)) ∀λ ∈ (0, 1).

Theorem 2.15. Let S be nonempty convex set in Rn and let f : Rn → R be a strictly


quasiconvex function. Consider the problem of minimizing f (x) subject to x ∈ S. If x̄
is a local optimal solution to the problem, then it is also a global optimal solution to
the problem.

Proof. Let x̄ be a local optimal solution to the problem. Suppose y ∈ S is such that
f (y) < f (x̄). For any λ ∈ (0, 1), by strict convexity of f ,

f [x̄ + λ(y − x̄)] < f (x̄).

For λ sufficiently small, this will imply that x̄ + λ(y − x̄) is locally better than x̄,
contradicting local optimality of x̄. It follows that x̄ is a global optimal solution to the
problem. 2

If a function is strictly convex, then it is also convex. However, a strictly quasiconvex


function need not necessarily a quasiconvex function. Counterexample: Let S = [−1, 1]
and let f (x) = 0 ∀ x 6= 0 and f (0) = 1.

Theorem 2.16. Let S be nonempty convex set in Rn and let f : Rn → R be a strictly


quasiconvex function. If f is continuous on S, then it is quasiconvex.

32
Proof. Let x, y ∈ S be such that f (x) = f (y). Suppose there exists a λ ∈ (0, 1) such
that f (λx + (1 − λ)y) > f (x). Let z = λx + (1 − λ)y. Since f is continuous, there
exists a µ ∈ (0, 1), such that f (z) > f [µx + (1 − µ)z] > f (x) = f (y). Note that z is
convex combination of µx + (1 − µ)z and y. Since f [µx + (1 − µ)z] > f (y), by strict
quasiconvexity of f , f (z) < f [µx + (1 − µ)z]. From this contradiction it follows that we
cannot find a λ ∈ (0, 1) such that f (λx + (1 − λ)y) > f (x), and hence f is quasiconvex.

We have seen that a local minimum for a strictly quasiconvex function is also a global
minimum. When can we say it is unique?

Definition 2.13. Let f : S → R be a function where S is a nonempty convex set in


Rn . The function f is said to be strongly quasiconvex if for each x, y ∈ S with x 6= y,

f (λx + (1 − λ)y) < max(f (x), f (y)) ∀λ ∈ (0, 1).

Note that

1. Every strictly convex function is strongly quasiconvex.

2. Every strongly quasiconvex function is strictly quasiconvex.

3. Every strongly quasiconvex function is quasiconvex.

Theorem 2.17. Let S be nonempty convex set in Rn and let f : Rn → R be a strongly


quasiconvex function. Consider the problem of minimizing f (x) subject to x ∈ S. If x̄
is a local optimal solution to the problem, then it is unique global optimal solution to
the problem.

Proof. Exercise.

Definition 2.14. Let S be a nonempty open convex set in Rn and let f : S → R be a


differentiable on S. The function f is said to be pseudoconvex on S if for each x, y ∈ S,
the implication holds:

f (x) < f (y) ⇒ ∇f (y)t (x − y) < 0.

Theorem 2.18. Let S be nonempty open convex set in Rn and let f : Rn → R


be a differentiable pseudoconvex function. Then f is both strictly quasiconvex and
quasiconvex.

33
Proof. We first show that f is strictly quasiconvex. To the contrary, assume x, y ∈ S
such that f (x) 6= f (y) and f (z) ≥ max{f (x), f (y)} where z = λx + (1 − λ)y for some
λ ∈ (0, 1). Assume, without loss of generality, f (x) < f (y). Then, we have

f (z) ≥ f (y) > f (x).

By pseudoconvexity of f , ∇f (z)t (x − z) < 0. Since x − z = −(1 − λ)(y − z)/λ, this


implies ∇f (z)t (y − z) > 0. This in turn implies, by pseudoconvexity of f , f (y) ≥ f (z)
and hence f (z) = f (y). Since ∇f (z)t (y − z) > 0, there exists a u = µz + (1 − µ)y with
µ ∈ (0, 1) such that
f (u) > f (z) = f (y).

By pseudoconvexity of f , ∇f (u)t (y − u) < 0. Similarly, ∇f (u)t (z − u) < 0. Thus, we


have
∇f (u)t (y − u) < 0 and ∇f (u)t (z − u) < 0.

But these two inequalities are contradicting each other as y − u = µ(u − z)/(1 − µ). It
follows that f is strictly quasiconvex.
Since f is differentiable, it is continuous and hence quasiconvex.

Definition 2.15. Let S be a nonempty open convex set in Rn and let f : S → R be a


differentiable on S. The function f is said to be strictly pseudoconvex on S if for each
x, y ∈ S, the implication holds:

f (x) ≤ f (y) ⇒ ∇f (y)t (x − y) < 0.

Theorem 2.19. Let S be nonempty open convex set in Rn and let f : Rn → R be a


differentiable strictly pseudoconvex function. Then f is strongly quasiconvex.

Proof. Exercise.

Exercises:

1. Suppose f : Rn → R is twice differentiable. If z is such that ∇f (z) vanishes, then


f (z+λd)−f (z)
show that limλ→0 λ2 exists for any d ∈ Rn .

2. Show that every convex function is strictly quasiconvex as well as quasiconvex.

3. Show that every differentiable convex function is pseudoconvex.

34
4. Define various types of convexity at a point and examine which of the results
developed so far hold good for functions having convexity (of different types) at a
point.

5. Let c, d ∈ Rn and let α, β ∈ R.


Let S = {x : dt x + β > 0}. Consider the function f : S → R defined by

ct x + α
f (x) = .
dt x + β

Show that f is pseudoconvex.

35
3. NonLinear Programming
and
Necessary and Sufficient Conditions
for Optimality

• Unconstrained Optimization
Minimize f (x), x ∈ Rn .

• Constrained Optimization
Minimize f (x) subject to x ∈ S or
With Inequality Constraints
Minimize f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m,
x ∈ X ⊆ Rn
With Inequality and Equality Constraints
Minimize f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m,
hi (x) = 0, i = 1, 2, . . . , l,
x ∈ X ⊆ Rn

Unconstrained Optimization

First Order Necessary Conditions for Optimality.

Theorem 3.1. Suppose f : Rn → R is differentiable at z ∈ Rn . If there is a vector


d ∈ Rn such that ∇f (z)t d < 0, then there exists a δ > 0 such that f (z + λd) < f (z) for
each λ ∈ (0, δ), so that d is a descent direction of f at z.

Proof. Using differentiability of f at z, we can write

f (z + λd) − f (z)
= ∇f (z)t d+ || d || α(z, λd).
λ

36
Since ∇f (z)t d < 0 and α(z, λd) → 0 as λ → 0, there exists a δ > 0 such that the RHS
of the above equation is negative for all λ ∈ (0, δ). The result follows. 2

Corollary 3.1. Suppose f : Rn → R is differentiable at z ∈ Rn . If z is a local


minimum, then ∇f (z) = 0.

Proof. Since z is local minimum, for any d, f (z + λd) ≥ f (z) ∀λ sufficiently small
which in turn implies ∇f (z)t d ≥ 0 for all d. Take d = −∇f (z). 2

Unconstrained Optimization

Second Order Necessary Conditions for Optimality.

Theorem 3.2. Suppose f : Rn → R is twice differentiable at z ∈ Rn . If z is a local


minimum, then ∇f (z) = 0 and H(z) is positive semidefinite.

Proof. Using differentiability of f at z and the hypothesis ∇f (z) = 0, for any d we can
write
f (z + λd) − f (z) 1
= dt H(z)d+ || d ||2 α(z, λd).
λ2 2
Since z is local minimum, f (z + λd) ≥ f (z) ∀λ > 0 sufficiently small. Taking limit as
λ → 0, it follows that dt H(z)d ≥ 0 and hence the result follows. 2

Example 3.1. Minimize f (x) = (x2 − 1)3 , x ∈ R.


∇f (x) = 6x(x2 − 1)2 ; ∇f (x) = 0 for x = −1, 0, 1. H(x) = 24x2 (x2 − 1) + 6(x2 − 1)2
and H(−1) = H(1) = 0 and H(0) = 6. Verify that z = 0 is the local (global) minimum.

Unconstrained Optimization

Sufficient Conditions for Optimality

Theorem 3.3. Suppose f : Rn → R is twice differentiable at z ∈ Rn . If ∇f (z) = 0


and H(z) is positive definite, then z is a local minimum.

Proof. If z is not local minimum, then there exists a sequence xk → z such that
f (xk ) < f (z) for each k. Using the hypotheses, we can write
f (z + λdk ) − f (z) 1
2
= dtk H(z)dk + || d ||2 α(z, λdk ),
λ 2

37
where dk = (xk − z)/ || xk − z ||.
We may assume, without loss of generality, dk → d for some d 6= 0. Taking limits as
λ → 0, we get dt H(z)d ≤ 0 which is a contradiction to positive definiteness of H(z). 2
Reexamine Example 3.1.

Theorem 3.4. Suppose f : Rn → R is pseudoconvex at z. Then z is global optimum


if, and only if, ∇f (z) = 0.

Optimization With Inequality Constraints

Consider the problem: Minimizing f (x) subject to x ∈ S.


For any z ∈ cl(S), the set of feasible directions of S at z is defined by

D(z) = {d : d 6= 0, and z + λd ∈ S ∀λ ∈ (0, δ)for some δ > 0}.

Similarly, define the set of descent directions of f at z by F (z) = {d : ∇f (z)t d < 0}.
Note that D(z) is a cone if z ∈ S. If z is local optimum, then D(z) ∩ F (z) = ∅.

Optimization With Inequality Constraints

Theorem 3.5. Consider the problem of minimizing f (x) subject to x ∈ S, where S is


nonempty set in Rn and f : Rn → R is differentiable at z ∈ S. If z is a local optimum
solution to the problem, then D(z) ∩ F (z) = ∅.

Proof. Suppose d ∈ D(z) ∩ F (z). This means we can find a λ > 0 arbitrarily small
satisfying f (z + λd) < f (z) (because d is a direction of descent) and z + λd ∈ S (because
d is a feasible direction). This contradicts the local optimality of z. It follows that
D(z) ∩ F (z) = ∅.

Consider the Problem (PI): Minimize f (x) subject to


gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X ⊆ Rn .
Here, each of f and gi s is a function from Rn to R and X is a nonempty open set in
Rn .
So, the set of feasible solutions is given by

S = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m}.

38
Optimization With Inequality Constraints

When the NLP is specified as in the above problem, the necessary geometric condition
for local optimality (D(z) ∩ F (z) = ∅) can be reduced to an algebraic condition.

Theorem 3.6. Consider the problem PI stated above. Suppose z is a feasible point to
the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at
z and that gi s for i not in I are continuous at z. If z is a local optimal solution to the
problem, then F (z) ∩ G(z) = ∅, where G(z) = {d : ∇gi (z)t d < 0, ∀i ∈ I}.

Proof. From an earlier result, D(z) ∩ F (z) = ∅. Will show that G(z) ⊆ D(z). Let
d ∈ G(z). As X is open, there exists a δ1 > 0 such that z + λd ∈ X ∀λ ∈ (0, δ1 ). For
i 6∈ I, as gi (z) < 0 and is continuous at z, there exists a δ2 > 0 such that gi (z + λd) <
0 ∀λ ∈ (0, δ2 ). For i ∈ I, as ∇gi (z)t d < 0, d is a descent direction of gi at z and hence
∃ a δ3 > 0 3 gi (z + λd) < gi (z) = 0 ∀λ ∈ (0, δ3 ). From these inferences, we conclude
that d ∈ D(z) and hence the result follows.

Example 3.2.
Minimize (x − 3)2 + (y − 2)2
subject to x2 + y 2 ≤ 5
x+y ≤3
x, y ≥ 0
Analyze the optimality at the points z = ( 59 , 65 )t and
u = (2, 1)t .
−12 −8 t
∇f (z) = ( , ) and ∇g2 (z) = (1, 1)t .
5 5
Note that F (z) ∩ G(z) 6= ∅ and hence z cannot be an optimal solution.

∇f (u) = (−2, −2)t , ∇g1 (u) = (4, 2)t and ∇g2 (z) = (1, 1)t .

Note that F (u) ∩ G(u) = ∅ and hence u may be an optimal solution but this cannot be
guaranteed from F (u) ∩ G(u) = ∅ as it only a necessary condition.

Effect of the Form of Constraints

Utility of necessary conditions of the above theorem, i.e.,


F (u) ∩ G(u) = ∅, may depend on how the constraints are expressed.

39
Example 3.3.
Minimize (x − 1)2 + (y − 1)2
subject to (x + y − 1)3 ≤ 0
x, y ≥ 0
In this case, the necessary condition will hold good for each feasible (x, y) satisfying
x + y = 1. Now consider the same problem expressed as
Minimize (x − 1)2 + (y − 1)2
subject to x+y ≤1
x, y ≥ 0
Verify that the necessary condition is satisfied only at the point ( 12 , 12 ).
Note that when ∇f (z) = 0 or ∇gi (z) = 0 for i ∈ I, the necessary condition developed
above is of no use.

Fritz John Conditions

Theorem 3.7. Consider the problem PI stated earlier. Suppose z is a feasible point
to the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable
at z and that gi s for i 6∈ I are continuous at z. If z is a local optimal solution to the
problem, then there exist constants u0 and ui for i ∈ I such that
X
u0 ∇f (z) + ui ∇gi (z) = 0
i∈I
u0 , u i ≥ 0, for i ∈ I

(u0 , uI ) 6= 0

Furthermore, if gi s for i 6∈ I are also differentiable at z, then there exist u0 ∈ R and


u ∈ Rm such that
m
X
u0 ∇f (z) + ui ∇gi (z) = 0 (26)
i=1
ui gi (z) = 0, for i = 1, 2, . . . , m, (27)

u0 , u i ≥ 0, for i = 1, 2, . . . , m, (28)

(u0 , ut ) 6= 0 (29)

Proof. Let k =| I | and let A be the n × (k + 1) matrix with its first column as
∇f (z) and its ith column as ∇gi (z), i ∈ I. From the previous theorem, we know that

40
F (z) ∩ G(z) = ∅. This is equivalent to saying there exists no d satisfying ∇f (z)t d < 0
and ∇gi (z)t d < 0 for each i ∈ I. In other words, the system At d < 0 has no solution.
By Gordan’s theorem, there exists a nonzero nonnegative vector p ∈ Rk+1 satisfying
Ap = 0. Taking u0 = p1 , ui = pi+1 , the first assertion of the theorem follows. For the
second assertion, take ui = 0 for i 6∈ I.

The ui s in (26) in the statement of theorem are called the Lagrangian multipliers.
The condition ui gi (z) = 0, i = 1, 2, . . . , m, is called the complementary slackness con-
dition.

Example 3.4.
Minimize (x − 3)2 + (y − 2)2
subject to x2 + y 2 ≤ 5
x + 2y ≤ 4
x, y ≥ 0
x I ∇f (x) ∇gi1 (x) ∇gi2 (z)
t t t
z = (2, 1) i1 = 1, i2 = 2 (−2, −2) (4, 2) (1, 2)t
w = (0, 0)t i1 = 3, i2 = 4 (−6, −4)t (−1, 0)t (0, −1)t

Example 3.5.
Minimize −x
subject to y − (1 − x)3 ≤ 0
y≥0

Note that z = (1, 0)t is the optimal solution to the problem (draw the feasible region
and check this) and the Fritz John conditions hold good at this point. Here I = {1, 2},
∇f (z) = (−1, 0)t , ∇g1 (z) = (0, 1)t and ∇g2 (z) = (0, −1)t . For the Fritz John condition
we must have        
−1 0 0 0
u0   + u1   + u2  = 
0 1 −1 0
which holds good only if u0 = 0.

Example 3.6.
Minimize −x
subject to x+y ≤0
y≥0

41
Note that Fritz John condition holds good at z = (1, 0)t with u0 = u1 = u2 = α for any
real α > 0.
∇f (z) = (−1, 0)t ∇g1 (z) = (1, 1)t , ∇g2 (z) = (0, −1)t .

Kuhn-Tucker Necessary Conditions

Note that in examples 3.4 and 3.6, u0 is positive. But in example 3.5, u0 = 0. In
example 3.5, the ∇gi (z)s for i ∈ I are linearly dependent, but not in the other two
examples. Note that when u0 = 0 in Fritz John condition, the condition only talks
about the constraints. With an additional assumption, the Fritz John condition can be
improved. This is due to Kuhn and Tucker.

Theorem 3.8. Consider the problem PI stated earlier. Suppose z is a feasible point to
the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at z
and that gi s for i 6∈ I are continuous at z. Also assume that gi s for i ∈ I are linearly
independent. If z is a local optimal solution to the problem, then there exist constants
ui for i ∈ Isuch that
X
∇f (z) + ui ∇gi (z) = 0
i∈I
ui ≥ 0, for i ∈ I.

Furthermore, if gi s for i 6∈ I are also differentiable at z, then there exist u ∈ Rm


such that
m
X
∇f (z) + ui ∇gi (z) = 0 (30)
i=1
ui gi (z) = 0, for i = 1, 2, . . . , m, (31)

ui ≥ 0, for i = 1, 2, . . . , m. (32)

(33)

Proof. Get u0 and ui s as in the previous theorem. Note that u0 > 0, as ∇gi (z)s for
i ∈ I would become linearly dependent otherwise. Since u0 > 0, we can as well assume
that it is equal to one without loss of generality. The second assertion of the theorem
can be established as in the previous theorem.

42
Note that a geometric interpretation of the Kuhn-Tucker conditions is that if z
is a local optimum, then the gradient vector of the objective function at z with its
sign reversed is contained in the cone generated by the gradient vectors of the binding
constraints (follows from (30) above).

Kuhn-Tucker Sufficient Conditions

Theorem 3.9. Consider the problem PI stated earlier. Suppose z is a feasible point
to the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable
at z and that gi s for i 6∈ I are continuous at z. Further, assume that f is pseudoconvex
at z and gi is differentiable quasiconvex at z for each i ∈ I. If the Kuhn-Tucker
conditions hold good at z, that is, there exist ui ∈ R for each i ∈ I such that ∇f (z) +
Pm
i=1 ui ∇gi (z) = 0, then z is a global optimal solution to the problem.

Proof. Let x be any feasible solution to PI. Then, for i ∈ I, gi (x) ≤ 0 = gi (z). By
quasiconvexity of gi at z,

gi (z + λ(x − z)) = gi (λx + (1 − λ)z) ≤ max{gi (x), gi (z)} = gi (z)

for all λ ∈ (0, 1). Thus gi does not increase in the direction x − z and hence we must
have ∇gi (z)t (x − z) ≤ 0. This implies t
P
i∈I ui ∇gi (z) (x − z) ≤ 0. Since ∇f (z) +
Pm t
i=1 ui ∇gi (z) = 0, it follows that ∇f (z) (x − z) ≥ 0. Since f is pseudoconvex, f (x) ≥

f (z). It follows that z is global minimum.

Consider the Problem (PIE):


Minimize f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m,
hi (x) = 0, i = 1, 2, . . . , l, x ∈ X ⊆ Rn .
Here, each of f , gi s and hi s is a function from Rn to R and X is a nonempty open set
in Rn .
So, the set of feasible solutions is given by

S = {x ∈ X : gi (x) ≤ 0, i = 1, . . . , m; hi (x) = 0, i = 1, . . . , l}.

Treating Equalities as Inequalities

43
Consider the problem: Minimize f (x) subject to
g(x) = 0, x ∈ X, X is a nonempty subset in Rn .
Letting g1 (x) = g(x) and g2 (x) = −g(x), the above problem can be stated as
Minimize f (x) subject to
gi (x) ≤ 0, i = 1, 2, x ∈ X.
Note that, G(z) = ∅ is true and hence the optimality conditions developed above are of
no use.

Optimality Conditions for PIE

Theorem 3.10. Consider the problem PIE stated above. Suppose z is a local optimal
solution to the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are
differentiable at z and that gi s for i 6∈ I are continuous at z. Further, assume that hi is
continuously differentiable at z for i = 1, 2, . . . , l. If ∇hi (z)s, i = 1, 2, . . . , l, are linearly
independent, then F (z) ∩ G(z) ∩ H(z) = ∅, where
F (z) = {d : ∇f (z)t d < 0}.
G(z) = {d : ∇gi (z)t d < 0, ∀i ∈ I}.
H(z) = {d : ∇hi (z)t d = 0, for = 1, 2, . . . , l}.

Fritz John Necessary Conditions

Theorem 3.11. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at z and that
gi s for i 6∈ I are continuous at z. Further, assume that hi is continuously differentiable
at z for i = 1, 2, . . . , l. If z is a local optimal solution to the problem, then there exist
constants u0 , ui for i ∈ I and vi , i = 1, 2, . . . l such that
X l
X
u0 ∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0
i∈I i=1
u0 , u i ≥ 0, for i ∈ I

(u0 , uI , v) 6= 0,

where uI is the vector of ui s corresponding to I and


v = (v1 , . . . , vl )t .

44
Furthermore, if gi s for i 6∈ I are also differentiable at z, then there exist u0 ∈ R and
u ∈ Rm such that
m
X l
X
u0 ∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0 (34)
i=1 i=1
ui gi (z) = 0, for i = 1, . . . , m, (35)

u0 , ui ≥ 0, for i = 1, . . . , m, (36)

(u0 , ut , v t ) 6= 0, (37)

Proof. If ∇hi (z), i = 1, . . . , l are lineraly dependent, then there exist vi , i = 1, . . . , l,


Pl
not all of them equal to zero, such that i=1 vi ∇hi (z) = 0. Taking u0 and ui s to be
zero, we see that z satisfies the necessary conditions.

Suppose ∇hi (z), i = 1, . . . , l are lineraly independent. Let A be the matrix whose first
column is ∇f (z) and the remaining columns being ∇gi (z), i ∈ I. Let B be the matrix
whose ith column is ∇h(z), i = 1, . . . , l. Then from the previous theorem, there is no d
which satisfies
At d < 0 and B t d = 0.

Define the sets S = {(p, q) : p = At d, q = B t d, d ∈ Rn } and T = {(p, q) : p < 0, q = 0}.


Note that S and T are disjoint convex sets. Therefore, there exists a vector (u0 , utI , v t ) 6=
0 such that

(u0 , utI )At d + v t B t d ≥ (u0 , utI )p + v t q∀d ∈ Rn , ∀(p, q) ∈ cl(T ).

Since (p, 0) ∈ Cl(T ) for arbitrarily large negative, it follows that (u0 , utI ) ≥ 0. Since
(0, 0) ∈ cl(T ),
(u0 , utI )At d + v t B t d ≥ 0 for all d ∈ Rn .
This implies (u0 , utI )At + v t B t = 0. From this the theorem follows.

Remark 3.1. Note that the Lagrangian multipliers associated with hi s are unrestricted
in sign.

Exercise. Write the Fritz John’s conditions in the vector notation.

Example 3.7.

45
Minimize (x − 3)2 + (y − 2)2
subject to x2 + y 2 ≤ 5
x + 2y = 4
x, y ≥ 0
Analyze at z(2, 1)t .

Example 3.8.
Minimize −x
subject to y − (1 − x)3 = 0
y≥0

Analyze at z(1, 0)t .

46
Kuhn-Tucker Necessary Conditions

Theorem 3.12. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at z and that gi s
for i 6∈ I are continuous at z. Further, assume that hi is continuously differentiable at z
for i = 1, 2, . . . , l. Also assume that ∇gi s for i ∈ I and ∇hi (z)s, i = 1, . . . , l, are linearly
independent.

If z is a local optimal solution to the problem, then there exist constants ui for i ∈ I
and vi , i = 1, 2, . . . l such that

X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0
i∈I i=1
ui ≥ 0, for i ∈ I,

where uI is the vector of ui s corresponding to I and


v = (v1 , . . . , vl )t .

47
Furthermore, if gi s for i 6∈ I are also differentiable at z, then there exist u ∈ Rm and
v ∈ Rl such that
m
X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0 (38)
i=1 i=1
ui gi (z) = 0, for i = 1, . . . , m, (39)

ui ≥ 0, for i = 1, . . . , m, (40)

Example 3.9.
Minimize −x
subject to x+y ≤0
y≥0

48
Kuhn-Tucker Sufficient Conditions

Theorem 3.13. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Suppose that the Kuhn-Tucker conditions hold good at z, i.e.,
there exist scalers ui , i ∈ I and vi , i = 1, . . . , l, such that

X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0.
i∈I i=1

Let J = {i : vi > 0} and K = {i : vi < 0}. Assume that f is pseudoconvex at z,


gi s are quasiconvex at z for i ∈ I, hi s are quasiconvex at z for i ∈ J and that hi s are
quasiconcave at z for i ∈ K. Then z is a global optimal solution to the problem.

49
Kuhn-Tucker Sufficient Conditions

Theorem 3.14. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Suppose that the Kuhn-Tucker conditions hold good at z, i.e.,
there exist scalers ui , i ∈ I and vi , i = 1, . . . , l, such that

X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0.
i∈I i=1

Let J = {i : vi > 0} and K = {i : vi < 0}. Assume that f is pseudoconvex at z,


gi s are quasiconvex at z for i ∈ I, hi s are quasiconvex at z for i ∈ J and that hi s are
quasiconcave at z for i ∈ K. Then z is a global optimal solution to the problem.

Proof. Let x be any feasible solution to PIE. Then, for i ∈ I, gi (x) ≤ 0 = gi (z). By
quasiconvexity of gi at z,

gi (z + λ(x − z)) = gi (λx + (1 − λ)z) ≤ max{gi (x), gi (z)} = gi (z)

for all λ ∈ (0, 1). Thus gi does not increase in the direction x − z and hence we must
have ∇gi (z)t (x − z) ≤ 0. This implies i∈I ui ∇gi (z)t (x − z) ≤ 0.
P

Similarly, using quasiconvexity of hi s for i ∈ J and quasiconcavity of hi s for i ∈ K, we


can show that
∇hi (z)t (x − z) ≤ 0 for i ∈ J and

∇hi (z)t (x − z) ≥ 0 for i ∈ K.

From the above inequalities, we can conclude

X l
X
[ ui ∇gi (z) + vi ∇hi (z)]t (x − z) ≤ 0.
i∈I i=1

Pl
vi ∇hi (z) = 0 it follows that ∇f (z)t (x − z) ≥ 0.
P
Since ∇f (z) + i∈I ui ∇gi (z) + i=1

Since f is pseudoconvex, f (x) ≥ f (z). It follows that z is global minimum.

50
4. Duality in NLP

We shall refer to PIE as the Primal Problem. We shall write this problem in the
vector notation as
Minimize f (x)
subject to
g(x) ≤ 0,
h(x) = 0, x ∈ X ⊆ Rn .
Here g(x) = (g1 (x), g2 (x), . . . , gm (x))t and
h(x) = (h1 (x), h2 (x), . . . , hl (x))t
A number of problems closely associated with primal problem, called the dual problems,
have been proposed in the literature. Lagrangian Dual problem is one of these problems
which has played a significant role in the development of algorithms for

• large-scale linear programming problems,

• convex and nonconvex nonlinerar problems,

• discrete optimization problems.

The Lagrangian Dual Problem of PIE

Maximize θ(u, v)
subject to
u ≥ 0, where

m
X l
X
θ(u, v) = inf{f (x) + ui gi (x) + vi hi (x) : x ∈ X}.
i=1 i=1

In the vector notation, the Lagrangian dual is written as Maximize θ(u, v)


subject to u ≥ 0,
where θ(u, v) = inf{f (x) + ut g(x) + v t h(x) : x ∈ X}.

Since the dual maximizes the infimum, the dual is sometimes called the max-min prob-
lem.

51
Note that

• the lagrangian dual objective function θ(u, v) incorporates constraint functions of


the primal, the objective function of the primal, and the lagrangian multipliers of
the primal encountered in the optimality conditions,

• the lagrangian multipliers associated with ‘≤’ constraints (g(x) ≤ 0), namely the
ui s are nonnegative and those associated with the ‘=’ constraints (h(x) = 0),
namely vi s are unrestricted in sign,

• the lagrangian dual objective function θ(u, v) may be −∞ for a fixed vector (u, v),
because it is the infimum of a functional expression over a set X,

• the lagrangian dual of a PIE is generally not unique as it depends on which


constraints we treat as gi s, which constraints as hi s and which constraints as
X,

• the choice of a lagrangian dual would affect the solution process using the dual
approach to solve the primal.

Geometric Interpretation of Lagrangian Dual

Consider the problem


Minimize f (x)
subject to
g(x) ≤ 0,
x ∈ X ⊆ Rn .
Here, both f and g are functions from Rn to R. There is only one inequality constraint
and no equality constraints.
For each x ∈ X, the two-tuple (g(x), f (x)) can be plotted on the two dimensional plane.
Let G = {(z1 , z2 ) : z1 = g(x), z2 = f (x), x ∈ X}. A solution to the primal problem is
that x which corresponds to (z1 , z2 ) in G such that z1 ≤ 0 and z2 is minimum.
The lagrangian dual objective function for this problem is given by θ(u) = inf{f (x) +
ug(x) : x ∈ X}, where u is nonnegative. That is, θ(u) = inf{z2 + uz1 ) : (z1 , z2 ) ∈ G},
where u is nonnegative.

52
Note that for each u ≥ 0 fixed, the dual objective value is the intercept of the line z2 +uz1
supporting G (from below) on the z2 axis. Therefore, the dual problem is equivalent to
finding the slope of the supporting hyperplane of G such that the intercept on the z2
axis is maximal. Note that at (z̄1 , z̄2 ) the dual objective function attains its maximum
with ū as the dual optimal solution. Also, in this case, the optimum dual objective value
coincides with the optimum primal objective value.

Example 4.1.
Minimize f (x, y) = x2 + y 2
subject to g(x, y) = −x − y + 4 ≤ 0
x, y ≥ 0
Verify that (2, 2)t is the optimum solution to this problem with optimal objective value
equal to 8.
Taking X = {(x, y) : x ≥ 0, y ≥ 0}, the dual function is given by

θ(u) = inf{x2 + y 2 + u(−x − y + 4) : x ≥ 0, y ≥ 0}

= inf{x2 − ux : x ≥ 0} + inf{y 2 − uy : y ≥ 0} + 4u

 − 1 u2 + 4u, for u ≥ 0 ;
2
=
 4u, for u < 0.

Duality Theorems and Saddle Point Optimality

Theorem 4.1 (The Weak Duality Theorem). Let x be a feasible solution to PIE
an let (u, v) be a solution to its Lagrangian dual. The f (x) ≥ θ(u, v).

Corollary 4.1. inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} ≥ sup{θ(u, v) : u ≥ 0}.

Corollary 4.2. If x is a feasible solution to PIE and (u, v) is a solution to its Lagrangian
dual such that f (x) ≤ θ(u, v), then x is optimal for PIE and (u, v) is optimal for the
dual.

Corollary 4.3. If inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} = −∞, then θ(u, v) = −∞


for each u ≥ 0.

Corollary 4.4. If sup{θ(u, v) : u ≥ 0} = ∞, then PIE has no feasible solution.

53
Duality Gap

Remark: Note that in PIE, it was assumed that the set X was a nonempty open set.
However, for the dual formulation, the openness of X is not required. In fact, X may
even be a dicrete/finite set. Check that the weak duality theorem and its corollaries hold
good for any nonempty set X. Henceforth, we shall refer to PIE without the openness
assumption of X as the primal problem.

The dual optimal objective value may be a strictly less than the primal optimal objec-
tive value. In this case we say that there is duality gap. Analyze the following example.

Example 4.2.
Minimize f (x, y) = −2x + y
subject to h(x, y) = x + y − 3 = 0
(x, y) ∈ X = {(0, 0), (0, 4), (4, 4), (4, 0), (1, 2), (2, 1)} .

The Strong Duality Theorem asserts that, under some convexity assumptions and a
constraint qualification, the primal optimal objective value is equal to the dual optimal
objective value.

Theorem 4.2 (The Strong Duality Theorem). Let X be a nonempty convex set
in Rn , let f : Rn → R and g : Rn → Rm be convex, and let h : Rn → Rl be affine, that
is, h is of the form h(x) = Ax − b. Suppose that the following constraint qualification
holds true. There exists a z ∈ X such that g(z) < 0 and h(z) = 0, and 0 ∈ int(h(X)),
where h(X) = {h(x) : x ∈ X}. Then

inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} = sup{θ(u, v) : u ≥ 0}

Furthermore, if the infimum is finite, then the sup{θ(u, v) : u ≥ 0} is attained at (ū, v̄)
with ū ≥ 0. If the infimum is attained at x̄, then ūt g(x̄) = 0.

54
Lemma: Let X be a nonempty convex set in Rn , let α : Rn → R and g : Rn → Rm be
convex, and let h : Rn → Rl be affine. Consider the two systems:
System 1. α(x) < 0, g(x) ≤ 0, h(x) = 0 for some x ∈ X
System 2. u0 α(x) + ut g(x) + v t h(x) ≥ 0 ∀x ∈ X
(u0 , u) ≥ 0, (u0 , u, v) 6= 0
If System 1 has no solution, then System 2 has a solution. The converse is true if u0 > 0.

Proof. Suppose that System 1 has no solution. Consider the set

Λ = {(p, q, r) : p > α(x), q ≥ g(x), r = h(x) for some x ∈ X}

Since α and g are convex and h is affine, Λ is convex. Since System 1 has no solution,
the vector (0, 0, 0) ∈ R1+m+l does not belong to Λ. By a separation theorem, there exists
a non-zero vector (u0 , u, v) ∈ R1+m+l such that

u0 p + ut q + v t r ≥ 0 ∀ (p, q, r) ∈ cl(Λ) (41)

Fix any x ∈ X. Note that (α(x), g(x), h(x)) ∈ cl(Λ) and (p, q, h(x)) ∈ cl(Λ) for all
(p, q) > (α(x), g(x)).

From this and (16), it follows that (u0 , u) ≥ 0. It follows that (u0 , u, v) is a solution to
System 2.

To prove the converse, assume System 2 has a solution (u0 , u, v) with u0 > 0.

Let x ∈ X be such that g(x) ≤ 0 and h(x) = 0. Since (u0 , u, v) solves System 2, we
have
u0 α(x) + ut g(x) + v t h(x) ≥ 0.

Since g(x) ≤ 0, h(x) = 0 and u ≥ 0, it follows u0 α(x) ≥ 0. Since u0 > 0, we must have
α(x) ≥ 0. It follows that System 1 has no solution.

Proof of Strong Duality Theorem.

Let µ = inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0}. If µ = −∞, then by a corollary


of weak duality theorem, sup θ(u, v) : u ≥ 0} = −∞. So, let us consider the case, µ is

55
finite. Consider the system:

f (x) − µ < 0, g(x) ≤ 0, h(x) = 0, x ∈ X.

By the definition of µ, this system has no solution. Hence, from the Lemma, there
exists a nonzero vector (u0 , u, v) ∈ R1+m+l with (u0 , u) ≥ 0 such that

u0 (f (x) − µ) + ut g(x) + v t h(x) ≥ 0 ∀ x ∈ X (42)

We first show that u0 > 0. To the contrary, assume u0 = 0. From the hypothesis of
the theorem, that z ∈ X satisfies g(z) < 0 and h(z) = 0. Substituting z in the above
inequality, we get ut g(z) ≥ 0. This implies, as u ≥ 0 and g(z) < 0, u = 0. From (17),
it follows v t h(x) ≥ 0 ∀x ∈ X. Since 0 ∈ int(h(X)), there exists an x ∈ X such that
h(x) = −λv where λ is a small positive real. This implies, v t (−λv) ≥ 0 which in turn
implies v = 0. Thus, (u0 , u, v) = 0 which is a contradiction. Hence, u0 > 0. Without
loss of generality, we may assume that u0 = 1 and write

f (x) + ut g(x) + v t h(x) ≥ µ ∀x ∈ X. (43)

This implies
θ(u, v) = inf{f (x) + ut g(x) + v t h(x) : x ∈ X} ≥ µ.

From weak duality theorem, it follows that

inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} = sup{θ(u, v) : u ≥ 0}

Finally, suppose x̄ is an optimal solution to the primal problem, that is, x̄ ∈ X, g(x̄) ≤
0, h(x) = 0 and f (x̄) = µ. Sustituting in (18), we get ūt g(x̄) ≥ 0. Since ū ≥ 0 and
g(x̄) ≤ 0, ūt g(x̄) = 0.

Remark. The constraint qualification that 0 ∈ int(h(X)) used in Strong Duality


Theorem automatically holds good if X = Rn . To see this, note that we may assume
without loss of generality that the matrix A defining h(x) is of full row rank. If y ∈
Rm , then y = h(x) where x = At (AAt )−1 (y + b) and hence h(X) = Rm . Therefore,
0 ∈ int(h(X)).

Saddle Point Optimality criteria

56
An important consequence of strong duality theorem is the saddle point optimality
criteria. The existence of a saddle point asserts optimal solutions to both the primal
and dual problems and that the optimal objective values of the two problems are equal.
This does not require any convexity assumptions made in the strong duality theorem.
However, under the convexity assumptions one can assert the existence of a saddle point.

Saddle Point Theorem. Let X be a nonempty set in Rn , and let f : Rn → R,


g : Rn → Rm and h : Rn → Rl . Suppose there exist x̄ ∈ X and (ū, v̄) with ū ≥ 0 so
that
φ(x̄, u, v) ≤ φ(x̄, ū, v̄) ≤ φ(x, ū, v̄) (44)

for all x ∈ X, for all u ≥ 0 and for all v, where φ(x, u, v) = f (x) + ut g(x) + v t h(x).
Then, x̄ and (ū, v̄) are optimal solutions to the primal and dual problems respectively.

Conversely, suppose that X, f, g are convex and that h is affine (i.e., h(x) = Ax − b).
Further, assume that there exists a z ∈ X such that g(z) < 0 and h(z) = 0, and that
0 ∈ int(h(X)). If x̄ is optimal solution to the primal problem, then there exists (ū, v̄)
with ū ≥ 0, so that (19) hold true.

Proof. Suppose there exist x̄ ∈ X and (ū, v̄) with ū ≥ 0 such that (19) hold good.
Since
f (x̄) + ut g(x̄) + v t h(x̄) = φ(x̄, u, v) ≤ φ(x̄, ū, v̄)

for all u ≥ 0 and all v ∈ Rl , it follows that g(x̄) ≤ 0 and h(x̄) = 0. Therefore, x̄ is a
solution to the primal problem. Putting u = 0 in the above inequality, it follows that
ūt g(x̄) ≥ 0. Since u ≥ 0 and g(x̄) ≤ 0, ūt g(x̄) = 0. From (19), for each x ∈ X, we have

f (x̄) = f (x̄) + ūt g(x̄) + v̄ t h(x̄)

= φ(x̄, ū, v̄)

≤ φ(x, ū, v̄) = f (x) + ūt g(x) + v̄ t h(x) (45)

Since (20) holds good for all x ∈ X, it follow that f (x̄) ≤ θ(ū, v̄). Since x̄ is feasible to
the primal and ū ≥ 0, from a corollary to the weak duality theorem it follows that x̄
and ūt g(x̄) = 0. are optimal to the primal and the dual problems respectively.

Conversely, suppose that x̄ is an optimal solution to the primal problem. By strong


duality theorem, there exists (ū, v̄) with ū ≥ 0 such that f (x̄) = θ(ū, v̄) and ūt g(x̄) = 0.

57
By definition of θ, we must have

f (x̄) = θ(ū, v̄) ≤ f (x) + ūt g(x) + v̄ t h(x)∀x ∈ X

But since ūt g(x̄) = 0,

φ(x̄, ū, v̄) = f (x̄) + ūt g(x̄) + v̄ t h(x̄) ≤ φ(x, ū, v̄)∀x ∈ X.

Again,

φ(x̄, u, v) = f (x̄) + ut g(x̄) + v t h(x̄)

≤ f (x̄) = φ(x̄, ū, v̄)

Thus, φ(x̄, u, v) ≤ φ(x̄, ū, v̄) ≤ φ(x, ū, v̄).

Relationship Between Saddle Point Criteria and Kuhn-Tucker Conditions

Theorem. Let S = {x ∈ X : g(x) ≤ 0, h(x) = 0}, and consider the primal prob-
lem, minimize f (x) subject to x ∈ S. Suppose that x̄ ∈ S satisfies the Kuhn-Tucker
conditions, that is, there exist ū ≥ 0 and v̄ such that

∇f (x̄) + ∇g(x̄)ū + ∇h(x̄)v̄ = 0, ūt g(x̄) = 0 (46)

Suppose that f, gi , i ∈ I are convex at x̄, where I = {i : gi (x̄) = 0}. Further suppose
that if v̄i 6= 0, then hi is affine. Then, (x̄, ū, v̄) is a saddle point, that is,

φ(x̄, u, v) ≤ φ(x̄, ū, v̄) ≤ φ(x, ū, v̄)

for all x ∈ X, for all u ≥ 0 and for all v, where φ(x, u, v) = f (x) + ut g(x) + v t h(x).

Conversely, suppose that (x̄, ū, v̄), with x̄ ∈ int(X) and ū ≥ 0, is a saddle point. Then
x̄ is feasible to the primal problem and furthermore, (x̄, ū, v̄) satisfies the Kuhn-Tucker
conditions given in (21).

Proof. Suppose that (x̄, ū, v̄) with x̄ ∈ S and ū ≥ 0 satisfy Kuhn-Tucker conditions,
(21). By convexity of f, gi , i ∈ I at x̄, and since hi s are affine for vi 6= 0, we have

f (x) ≥ f (x̄) + ∇f (x̄)t (x − x̄) (47)

gi (x) ≥ gi (x̄) + ∇gi (x̄)t (x − x̄) for i ∈ I (48)

hi (x) = hi (x̄) + ∇hi (x̄)t (x − x̄) for vi 6= 0 (49)

58
for all x ∈ X. Multiplying (23) by ūi ≥ 0, (24) by v̄i and adding, and using the
hypothesis (20), it follows that φ(x̄, ū, v̄) ≤ φ(x, ū, v̄) for all x ∈ X.

Since g(x̄) ≤ 0, h(x̄) = 0 and ūt g(x̄) = 0, it follows that φ(x̄, u, v) ≤ φ(x̄, ū, v̄) for
all u ≥ 0. Hence, (x̄, ū, v̄) satisfies the saddle point condition.

To prove the converse, suppose that (x̄, ū, v̄), with x̄ ∈ int(X) and ū ≥ 0, is a saddle
point. Since φ(x̄, u, v) ≤ φ(x̄, ū, v̄) for all u ≥ 0 and all v, it follows g(x̄) ≤ 0, h(x̄) = 0
and ūt g(x̄) = 0. This shows that x̄ is a feasible solution to the primal. Since φ(x̄, ū, v̄) ≤
φ(x, ū, v̄) for all x ∈ X, x̄ is a local optimal solution to the problem: minimize φ(x, ū, v̄)
subject to x ∈ X. Since x̄ ∈ int(X), ∇x φ(x̄, ū, v̄) = 0, that is, ∇f (x̄) + ∇g(x̄)ū +
∇h(x̄)v̄ = 0. It follows that (21) holds good.

Remark. We see that under certain convexity assumptions, the Lagrangian multipliers
of Kuhn-Tucker conditions also serve as the multipliers in the saddle point criteria.
Conversely, the multipliers of the saddle point criteria are the Lagrangian multipliers
of the Kuhn-Tucker conditions. Also, note that the dual varibles turn out to be the
Lagrangian multipliers.

59
Properties of the Dual Function

For the problems with zero duality gap, one way of solving the primal problem is to
obtain the solution via the dual problem. In order to solve the dual problem one has to
understand the properties of the dual objective function. We shall derive some properties
of the dual under assumption that the set X is a compact set. As one can always impose
boundary conditions on the variables x, this assumption is a reasonable one to make.

For ease of notation, we shall combine the vector functions g and h into β, i.e., β(x) =
(g(x)t , h(x)t )t and combine the dual variable vectors u and v into w, i.e., w = (ut , v t )t .

The first property of the dual objective function is that it is concave over the entire
Rm+l which in turn asserts that any local optimal solution is global optimal solution to
the dual maximization objective.

60
Theorem. Let X be a nonempty compact set in Rn . Let f : Rn → R, and β : Rn →
Rm+l be continuous. Then, θ defined by

θ(w) = inf{f (x) + wt β(x) : x ∈ X

is concave over Rm+l .

Proof. Since X is compact and since f and β are continuous, θ is a real valued function
on Rm+l . For any w1 , w2 ∈ Rm+l and for any λ ∈ (0, 1), we have

θ[λw1 + (1 − λ)w2 ]
= inf{f (x) + [λw1 + (1 − λ)w2 ]t β(x) : x ∈ X}
= inf{λ[f (x) + w1t β(x)] + (1 − λ)[f (x) + w2t β(x)] : x ∈ X}
≥ λ inf{f (x) + w1t β(x) : x ∈ X}

+(1 − λ) inf{f (x) + w2t β(x) : x ∈ X}


= λθ(w1 ) + (1 − λ)θ(w2 ).

61
When X is compact and the f and β are continuous, the infimum defined by θ(w) =
inf{f (x) + wt β(x) : x ∈ X} is attained at some x ∈ X for each w. We shall define
the set X(w) = {x ∈ X : f (x) + wt β(x) = θ(w)}. If X(w) is a singleton set, then θ is
differnetiable at w.

Lemma. Let X be a nonempty compact set in Rn and let f : Rn → R, β : Rn → Rm+l


be continuous. Let w̄ ∈ Rm+l be such that X(w̄) is a singleton, say {x̄}. If wk is any
sequence such that wk → w̄, then for any sequence xk , with xk ∈ X(wk ) for each k,
converges to x̄.

Proof. Suppose xk does not converge to x̄. Since X is compact, we may assume without
loss of generality that xk converges to z ∈ X where z 6= x̄. For each k, as xk ∈ X(wk )

f (xk ) + β(xk )t wk ≤ f (x̄) + β(x̄)t wk

Taking the limit as k → ∞, we get f (z) + β(z)t w̄ ≤ f (x̄) + β(x̄)t w̄. This implies
z ∈ X(w̄) = {x̄}, contradiction.

62
Theorem. Let X be a nonempty compact set in Rn and let f : Rn → R, β : Rn → Rm+l
be continuous. Let w̄ ∈ Rm+l be such that X(w̄) is a singleton. Then θ is differentiable
at w̄ with gradient ∇θ(w̄) = β(x̄).

Proof. Since f and β are continuous, and X is compact, for any w there exists xw ∈
X(w). From the definition of θ, the following inequalities hold good:

θ(w) − θ(w̄) ≤ f (x̄) + wt β(x̄) − f (x̄) + w̄t β(x̄)

= (w − w̄)t β(x̄) (50)

θ(w̄) − θ(w̄) ≤ f (xw ) + w̄t β(xw ) − f (xw ) − wt β(xw )

= (w̄ − w)t β(xw ) (51)

From (25) and (26) and Schwartz inequality, it follows that

0 ≥ θ(w) − θ(w̄) − (w − w̄)t β(x̄)

≥ (w − w̄)t [β(xw ) − β(x̄)]

≥ − || w − w̄ || || β(xw ) − β(x̄) ||

This further implies that

θ(w) − θ(w̄) − (w − w̄)t β(x̄)


0≥ ≥ − || β(xw ) − β(x̄) || (52)
|| w − w̄ ||

As w → w̄, then by by Lemma and by continuity of β, β(xw ) → β(x̄). From (27), we


get
θ(w) − θ(w̄) − (w − w̄)t β(x̄)
lim =0
w→w̄ || w − w̄ ||
Hence θ is differentiable at w̄ with gradient β(x̄).

63

You might also like