NLP Notes
NLP Notes
• Convex Hulls
Definition 1.1 (Convex Set). A nonempty set S in Rn is said to be convex if the line
segment joining any two points in the set is contained in S. Equivalently, if x, y ∈ S
and λ ∈ [0, 1], then λx + (1 − λ)y ∈ S.
1. S = {(x, y, z) : x + 2y − z = 4} ⊂ R3
2. S = {(x, y, z) : x + 2y − z ≤ 4} ⊂ R3
3. S = {(x, y, z) : x + 2y − z ≤ 4, 2x − y + z ≤ 6}
4. S = {(x, y) : y ≥ |x|} ⊂ R2
5. S = {(x, y) : x2 + y 2 ≤ 4} ⊂ R2
1
Pk
Definition 1.2. Let x1 , x2 , . . . , xk be in Rn . Then x =
λi xi is called a convex
i=1
Pk
combination of the k points provided λi ≥ 0 for each i and i=1 λi = 1. If the
Exercise 1.1. Shaw that a set S is convex if, and only if, for any positive integer k,
any convex combination any k points in S is in S.
1. S ∩ T is convex,
2. S + T = {x + y : x ∈ S, y ∈ T } is convex,
3. S − T = {x − y : x ∈ S, y ∈ T } is convex.
Exercise 1.3. Is the polytope formed by (1, 1), (2, 2), (0, 0) a simplex? Draw a simplex
in R2 . Can you draw a simplex in R2 with four vertices in R2 ?
2
Pk
are linearly dependent. This implies x = i=1µi (xk+1 − xi ) = 0 for some real numbers
Pk
µ1 , µ2 , . . . , µk , at least one of them different from zero. Let µk+1 = − i=1 µi . Then
Pk+1 Pk+1
i=1 µi = 0 and i=1 µi xi = 0. Since λi s are positive, we can choose a constant c
such that βi = λi − cµi ≥ 0 for all i and equal to zero for at least one i. Note that
k+1
X k+1
X k+1
X k+1
X
x= λi xi = λi xi − c µi xi = βi xi .
i=1 i=1 i=1 i=1
From the above, it follows that x can be written as a convex combination of k points
from S. If k ≤ n, the proof is complete. Otherwise we repeat the argument to show that
x can be written as a convex combination of fewer than k points from S. This argument
can be continued until x can be written as a convex combination of n + 1 points from
S. 2
Theorem 1.2. Let S be a nonempty convex set in Rn with nonempty interior. Let
x1 ∈ Cl(S) and let x2 ∈ int(S). Then λx1 + (1 − λ)x2 ∈ int(S) for all λ ∈ (0, 1).
Proof. Since x2 ∈ int(S), there exists an > 0 such that N (x2 ) ⊂ S. Fix any λ ∈ (0, 1)
and let y = λx1 + (1 − λ)x2 . Will show that y ∈ int(S) by showing that Nδ (y) ⊂ S
where δ = (1 − λ).
δ−||z−y||
Let z ∈ Nδ (y). Since x1 ∈ Cl(S), there exists a z1 ∈ S such that ||z1 − x1 || < λ .
z−λz1
Let z2 = 1−λ . Then
z − λz1
||z2 − x2 || = || − x2 ||
1−λ
(z − λz1 ) − (y − λx1 )
= || ||
1−λ
3
1
= ||(z − y) + λ(x1 − z1 )||
1−λ
1
≤ [||(z − y)|| + λ||(x1 − z1 )||]
1−λ
<
Corollary 1.2. If S is a convex set with nonempty interior, then Cl(S) is a convex set.
Corollary 1.3. If S is a convex set with nonempty interior, then Cl(int(S)) = Cl(S).
Corollary 1.4. If S is a convex set with nonempty interior, then int(Cl(S)) = int(S).
The notion of separation and support of disjoint convex sets plays an important role in
deriving results regarding optimality conditions and also in computing optimal solutions.
Theorem 1.3. Let S be a nonempty closed convex set in Rn and let y ∈ Rn \ S. Then,
there exists a unique point x̄ ∈ S with minimum distance from S. Furthermore, x̄ is
the minimizing point if, and only if, (x − x̄)t (x̄ − y) ≥ 0 for all x ∈ S.
Proof. Let γ = inf x∈S ||x − y||. Since S is closed and y 6∈ S, γ > 0. It follows that there
exists a sequence xk ∈ S such that ||xk − y|| → γ. Clearly xk is a bounded sequence and
hence must have a convergent subsequence. Without loss of generality, we may assume
that xk itself converges to a point x̄. Since S is closed, x̄ ∈ S and ||x̄ − y|| = γ.
Next, suppose x̄ is such that x − x̄)t (x̄ − y) ≥ 0 for all x ∈ S. Let x ∈ S. Then,
4
= ||x − x̄||2 + ||x̄ − y||2 + 2(x − x̄)t (x̄ − y)
Since ||x − x̄||2 ≥ 0 and x − x̄)t (x̄ − y) ≥ 0, it follows from the above that ||x − y||2 ≥
||x̄ − y||2 and hence the x̄ is the distance minimizing point.
Next, assume that x̄ is the distance minimizing point. Fix x ∈ S and λ ∈ (0, 1).
Since S is convex, x̄ + λ(x − x̄) ∈ S and since x̄ is the minimizing point
Also
||y − x̄ − λ(x − x̄)||2 = ||y − x̄||2 + λ2 ||x − x̄||2 + 2λ(x̄ − y)t (x − x̄)
λ2 ||x − x̄||2 + 2λ(x̄ − y)t (x − x̄) = ||y − x̄ − λ(x − x̄)||2 − ||y − x̄||2 ≥ 0
Dividing both sides by λ and taking limit as λ → 0, we get (x − x̄)t (x̄ − y) ≥ 0 and this
completes the proof. 2
H + = {x : pt x ≥ α} and H − = {x : pt x ≤ α}
5
Theorem 1.4. Let S be a nonempty closed convex set in Rn and let y ∈ Rn \ S.
Then there exist a nonzero vector p and a real number of α such that for all x ∈ S,
pt y > α ≥ pt x.
Corollary 1.5. Every closed convex set in Rn is the intersection of all halfspaces
containing it.
Corollary 1.6. Let S be a nonempty set in Rn and let y 6∈ Cl(H(S)). Then, there
exists a hyperplane that strongly separates S and y.
Proof. Exercise.
Proof. If both systems have solutions, say, x and y, then 0 < ct x = y t Ax ≤ 0. The
last inequality follows from Ax ≤ 0 and y ≥ 0. From this contradiction, it follows that
both systems cannot have solutions simultaneously.
6
At y, for some y ≥ 0}. Since System 2 has no solution, c 6∈ S. From Theorem 4, there
exists a nonzero x and a number α such that xt u ≤ α for all u ∈ S and xt c > α. Since
0 ∈ S, α ≥ 0. It follows that xt At y ≤ 0 for all y ≥ 0, and hence xt At ≤ 0. Thus, we
have x satisfying Ax ≤ 0 and ct x > 0. 2
Corollary 1.8. Let A be an m × n real matrix and let c ∈ Rn . Then, exactly one of
the following systems has a solution:
7
or S ⊆ H − . If, in addition, S 6⊆ H, then H is called a proper supporting hyperplane of
S at x̄.
See Figure-6.
It will be shown that convex sets have supporting hyperplanes at each of the boundary
points.
Theorem 1.6. Let S be a nonempty convex set in Rn , and let x̄ ∈ ∂S. Then, there
exists a supporting hyperplane H = {x : pt (x − x̄) = 0} that supports S at x̄.
Proof. Since x̄ ∈ ∂S, there is a sequence yk 6∈ Cl(S) such that yk → x̄. Since, Cl(S) is a
closed convex set, for each k, there exists a nonzero pk such that ptk x ≤ ykt x̄ for all x ∈ S.
We may assume, without loss of generality, that ||pk || = 1 for each k and that pk is a
convergent sequence. Let p be the limit of pk . Note that p 6= 0 as ||p|| will be 1. Taking
limits in the above inequality, we get pt x ≤ pt x̄. Therefore, H = {x : pt (x − x̄) = 0}
supports S at x̄. 2
Exercises
E1.6. Let S be a nonempty convex set in Rn . If x̄ 6∈ int(S), then show that there exists
a nonzero p such that pt (x − x̄) ≤ 0 for all x ∈ S.
E1.7. Let S be a nonempty set in Rn . If y 6∈ Cl(H(S)), then show that there exists a
hyperplane that separates y and S.
E1.8. Let S be a nonempty set in Rn , and let x̄ ∈ ∂S ∩ ∂H(S). Show that there exists
a hyperplane that supports S at x̄.
Separation of Two Convex Sets
Theorem 1.7. Let S and T be two nonempty disjoint convex sets in Rn . Then there
exists a hyperplane that separates S and T , that is, there exists a nonzero p such that
8
Proof. Let U = S − T . Then U is a convex set and 0 6∈ U as S and T are disjoint.
In particular, 0 6∈ int(U ). Therefore, there exists a nonzero p such that pt u ≤ 0 for all
u ∈ U . That is, pt (x − y) ≤ 0 for all x ∈ S and for all y ∈ T . Hence the theorem. 2
Exercises
E1.9. How will you define proper, strict and strong separation of sets?
E1.10. Let S and T be two nonempty convex sets in Rn with int(T ) 6= ∅ and S ∩
int(T ) = ∅. Show that there exists a hyperplane that separates S and T .
E1.12. Let S and T be two nonempty sets in Rn with their convex hulls having
nonempty interiors. Assume that H(S) ∩ int(H(T )) = ∅. Show that there exists a
hyperplane that separates S and T .
Strong Separation of Convex Sets
Theorem 1.8. Let S and T be two nonempty disjoint closed convex sets in Rn . If S is
bounded, then there exists a hyperplane that strongly separates S and T , that is, there
exists a nonzero p and an > 0 such that
Proof. Let U = S − T . Then U is a convex set and 0 6∈ U as S and T are disjoint. Note
that U is a closed set. To see this, let uk ∈ U be a sequence converging to u. Then,
for each k, there exist xk ∈ S and y k ∈ T such that uk = xk − y k . Since S is compact,
we may assume, without loss of generality, that xk converges to some x ∈ S (as S is
closed). This implies that y k → x − u. Since T is closed y ∈ T . Hence u = x − y and
u ∈ U . From Theorem 1.4, exists a nonzero p and a number such that pt u ≥ for all
u ∈ U and pt 0 < . That is, pt (x − y) ≥ for all x ∈ S and for all y ∈ T . And hence
pt x ≥ + pt y for all x ∈ S and for all y ∈ T . 2
Exercise 1.13. Prove or disprove: If S and T are two nonempty disjoint closed convex
sets, then there exists a hyperplane that separates S and T .
Convex Cones, Polarity and Polyhedral Sets
9
implies λx ∈ C for all λ ≥ 0. If, in addition, C is convex, then C is called a convex
cone.
Draw the graphs of the following sets and check which of these are cones: (i) S =
{λ(1, 3) : λ ≥ 0}, (ii) S = {λ(1, 3) + β(2, 1) : λ, β ≥ 0}.
Lemma 1.2. Let S and T be nonempty sets in Rn . The following statements hold
good:
3. S ⊆ T implies T ∗ ⊆ S ∗ .
10
Definition 1.10. Let A be an m × n real matrix and let b ∈ Rn . The set S = {x :
Ax = b, x ≥ 0} is called a polyhedral set.
Exercise 1.15. Show that S has finite number of extreme points and extreme directions.
Exercises
11
E1.17. Let S be a compact set in Rn . Show that H(S) is closed. Is this result true if
S is only closed and not bounded.
E1.18. Show
that the system
Ax ≤ 0 and ct x > 0 has a solution x ∈ Rn , where
1 −1 −1
A= and c = (1, 0, 5)t .
2 2 0
E1.19. Let A be a p × n matrix and B be a q × n matrix. Show that exactly one of the
following systems has a solution:
E1.20. Let S and T be two nonempty convex sets in Rn . Show that there exists a
hyperplane that separates S and T if, and only if, inf{||x−y|| : x ∈ S, y ∈ T } > 0.
E1.21. Let S and T be two nonempty disjoint convex sets in Rn . Show that there exist
nonzero vectors p and q such that
E1.22. Let C and D be convex cones in Rn . Show that C + D is a convex cone and
that C + D = H(C ∪ D).
12
Convex Functions
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for all x, y ∈ S and all λ ∈ (0, 1)
1. f (x) = 3x + 4
2. f (x) = |x|
3. f (x) = x2 − 2x
√
The function f (x) = − x is a convex function on R+ .
13
Proof. Let x, y ∈ fα and let λ ∈ (0, 1). Then, f (x) ≤ α and f (y) ≤ α. By convexity of
f,
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λα + (1 − λ)α = α
Exercise 2.1. Show that f : S → R is a convex function if, and only if, for all λi ∈ (0, 1)
λi = 1 and for all xi ∈ S, f ( λi xi ) ≤ λi f (xi ).
P P P
with
Theorem 2.1. Let S be a convex set in Rn with nonempty interior and let f : S → R
be a convex function. Then f is continuous on the interior of S.
Proof. Let x̄ ∈ int(S). Fix > 0. Since x̄ ∈ int(S), there exists a µ such that ||x− x̄|| <
µ implies x ∈ S. Let θ = 1 + maxi {max{f (x̄ + µei ) − f (x̄), f (x̄ − µei ) − f (x̄)}}, where ei
x̄+µei +x̄−µei
is the ith column of the identity matrix. Since f is convex and x̄ = 2 , either
f (x̄ + µei ) − f (x̄) ≥ 0 or f (x̄ − µei ) − f (x̄) ≥ 0. Therefore, θ > 0. Let δ = min{ nµ , nθµ
}.
Now, fix any x such that ||x − x̄|| < δ. Will show that ||f (x) − f (x̄)|| < .
|xi −x̄i | 1 µ 1
Then µ( αi2 ) 2 = ||x − x̄|| <
P
Let αi = µ . n which in turn implies αi ≤ n for
each i. Thus, for each i, 0 ≤ nαi ≤ 1. Let z i be the vector with its ith coordinate as
µ(xi − x̄i )/|xi − x̄i |, and all other coordinates as zero. Then, x = x̄ + αi z i .
P
X
f (x)
= f (x̄ + αi z i )
1X
= f( [x̄ + nαi z i ])
n
1X
≤ f ([x̄ + nαi z i ])
n
1X
= f ((1 − nαi )x̄ + nαi (x̄ + z i ))
n
1X
≤ [(1 − nαi )f (x̄) + nαi f (x̄ + z i )]
n
1X
= f (x̄) + nαi [f (x̄ + z i ) − f (x̄)] (1)
n
P
Note that αi < δ/µ and hence αi < nδ/µ < /θ. Therefore, rewriting (1), we get
1X
f (x) − f (x̄) ≤ nαi θ ≤ (2)
n
14
Next, let y = 2x̄ − x. Then, ||y − x̄|| < δ and hence
Since x̄ = 21 x + 12 y, we have
1 1
f (x̄) ≤ f (x) + f (y) (4)
2 2
Combining (3) and (4), we get f (x̄)−f (x) ≤ . This completes the proof of the theorem.
2
Directional Derivative of Convex Functions
If the function f is convex and is defined globally (that is, S = Rn ), then the directional
derivative exists at all x ∈ Rn . However, when S is not whole of Rn , the directional
derivative may not exist on ∂S.
λ1 λ1
f (x̄ + λ1 d) = f[ (x̄ + λ2 d) + (1 − )x̄]
λ2 λ2
λ1 λ1
≤ f (x̄ + λ2 d) + (1 − )f (x̄)
λ2 λ2
(by convexity of f )
15
f (x̄+λd)−f (x̄)
Let g(λ) = λ . Then g is a nondecreasing function of λ over R+ .
λ 1
f (x̄) = f[ (x̄ − d) + (x̄ + λd)]
1+λ 1+λ
λ 1
≤ f (x̄ − d) + f (x̄ + λd) (6)
1+λ 1+λ
f (x̄ + λ) − f (x̄)
g(λ) = ≥ f (x̄) − f (x̄ − d).
λ
Thus, g(λ) is bounded below and hence the limλ→0+ g(λ) exists. 2
Subgradients of Convex Functions
Definition 2.3. Let f : S → R. The set {(x, f (x)) : x ∈ S} ⊆ Rn+1 is called the graph
of the function. Furthermore, the sets {(x, y) : x ∈ S and y ≥ f (x)} and {(x, y) : x ∈ S
and y ≤ f (x)} are called the epigraph and hypograph of f respectively.
We shall denote the epigraph of a function by epi(f ) and its hypograph by hyp(f ).
Proof. Assume f is convex. Let (x, y), (u, v) ∈ epi(f ) where x, u ∈ S. Let λ ∈ (0, 1).
Then, we have
f (x) ≤ y and f (u) ≤ v.
Since f is convex,
Conversely, assume that epi(f ) is convex. Let x, u ∈ S and let λ ∈ (0, 1). Let y =
f (x) and let v = f (u). Then, (x, y) and (u, v) are in epi(f ). Since epi(f ) is convex,
λ(x, y) + (1 − λ)(u, v) ∈ epi(f ). Hence
16
Definition 2.4. Let S be a nonempty convex set in Rn and let f : S → R be convex.
Then a vector ξ ∈ Rn is called a subgradient of f at a point x̄ ∈ S if
17
Definition 2.5. Let S be a nonempty convex set in Rn and let f : S → R. Say that
f is concave on S if −f is convex. If f is a concave function, then a vector ξ ∈ Rn is
called a subgradient of f at a point x̄ ∈ S if
Exercise 2.2. Analyze the convexity of the function h(x) = min{f (x), g(x)} where
f (x) = 4 − |x| and g(x) = 4 − (x − 2)2 , x ∈ R.
Proof. Since f is convex, epi(f ) is convex by Theorem 2.2. Note that the point (x̄, f (x̄))
is a point on the boundary of epi(f ). By Theorem 1.6, there exists a hyperplane that
supports epi(f ) at (x̄, f (x̄)). That is, there will exist a nonzero (p, q) with p ∈ Rn and
q ∈ R such that
Definition 2.6. Let S be a nonempty convex set in Rn and let f : S → R. Say that f
is strictly convex on S if
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for all x, y ∈ S, x 6= y, and all λ ∈ (0, 1)
18
Corollary 2.1. Let S be a nonempty convex set of Rn and let f : S → R be a convex
function. Then for every x̄ ∈ int(S), there exists a ξ such that
for all x ∈ S. Assume, if possible, the equality holds for some u ∈ S f (u) = f (x̄) +
ξ t (u − x̄). Let λ ∈ (0, 1). By strict convexity of f , we have
= f (x̄) + λξ t (u − x̄)
Proof. Let x, y ∈ int(S) and let λ ∈ (0, 1). Note that x̄ = λx + (1 − λ)y is in int(S).
From the hypothesis, there exists a ξ such that
and
f (y) ≥ f (x̄) − λξ t (x − y) (13)
19
Multiplying (12) by λ both sides and (13) by (1 − λ) both sides and adding the resulting
inequalities, we get
Is f a convex function? Does f have subgradient vectors at all interior points? If so,
what are they?
Definition 2.7. Let S be a set in Rn with nonempty interior and let f : S → R. Let
x̄ ∈ int(S). Say that f is differentiable at x̄ if there exists a vector ∇f (x̄), called the
gradient vector of f at x̄, and there exists a function α : Rn → R, such that
Remark 2.1. When f is differentiable at x̄, then the gradient vector is unique and is
given by
t
∂f (x̄) ∂f (x̄) ∂f (x̄)
∇f (x̄) = , , .....,
∂x1 ∂x2 ∂xn
20
Subtracting (15) from (14), we get
Letting x = x̄ + δ(ξ − ∇f (x̄)) in the above inequality (for sufficiently small δ > 0), we
get
δ(ξ − ∇f (x̄))t (ξ − ∇f (x̄)) − δ||ξ − ∇f (x̄)||α(x̄, δ(ξ − ∇f (x̄))) ≤ 0 (17)
(ξ − ∇f (x̄))t (ξ − ∇f (x̄)) ≤ 0
Proof. Exercise.
Remark 2.3. Consider the system of nonlinear constraints defined by the set
X = {x : gi (x) ≤ 0, i = 1, 2, .., m, }
21
Note that X ⊆ Y as x ∈ X implies gi (x) ≤ 0 and by the above theorem
which implies x ∈ Y . In other words, we first try to solve a linear program over a bigger
set of X which is a polyhedral approximation of X and then try to push this bigger set
towards X in successive iterations. Here Y is called a relaxation of X.
Similarly, f is strictly convex if, and only if, for every x, y ∈ S, x 6= y, we have
Proof. Fix x, y ∈ S and let x̄ = λx + (1 − λ)y, where λ ∈ (0, 1). From Theorem 2.6, f
is convex if, and only if, for all v ∈ S,
and
f (y) ≥ f (x) + ∇f (x)t (y − x).
which is same as
[∇f (y) − ∇f (x)]t (y − x) ≥ 0.
22
Fix x, y ∈ S. By Mean Value Theorem,
Dividing by (1 − λ) and substituting for ∇f (x̄) from (18), we get f (y) ≥ f (x) +
∇f (x)t (y − x). By an earlier theorem, it follows that f is convex. 2
Exercise 2.4. Prove the above theorem for strict convexity part.
1
f (x) = f (x̄) + ∇f (x̄)t (x − x̄) + (x − x̄)t H(x̄)(x − x̄) + ||x − x̄||2 α(x̄, x − x̄)
2
Remark 2.4. When the function f is twice differentiable, the Hessian matrix is given
by
∂ 2 f (x̄) ∂ 2 f (x̄) ∂ 2 f (x̄)
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
∂ 2 f (x̄) ∂ 2 f (x̄) ∂ 2 f (x̄)
∂x2 ∂x1 ∂x22
... ∂x2 ∂xn
H(x̄) =
... ... ... ...
∂ 2 f (x̄) ∂ 2 f (x̄) ∂ 2 f (x̄)
∂xn ∂x1 ∂xn ∂x2 ... ∂x2n
Exercise 2.6. Find the gradient and Hessian matrix of f (x) = ct x + xt Ax where A is
an n × n matrix and c, x ∈ Rn .
23
Theorem 2.8. Let S be a nonempty open convex set of Rn and let f : S → R be
twice differentiable on S. Then f is convex if, and only if, the Hessian matrix of f is
positive semidefinite at each point in S.
1
f (x̄ + λx) = f (x̄) + λ∇f (x̄)t x + λ2 xt H(x̄)x + λ2 ||x||2 α(x̄, λx) (20)
2
1 2 t
λ x H(x̄)x + λ2 ||x||2 α(x̄, λx) ≥ 0
2
Next, assume that H(x) is positive semidefinite for all x ∈ S. Fix any x, y ∈ S. By
mean value theorem, we have
1
f (x) = f (y) + ∇f (y)t (x − y) + (x − y)t H(x̄)(x − y),
2
where x̄ = λx + (1 − λ)y for some λ ∈ (0, 1). Since x̄ ∈ S, (x − y)t H(x̄)(x − y) ≥ 0 and
hence f (x) ≥ f (y) + ∇f (y)t (x − y) and by Theorem 2.6 it follows that f is convex. 2
1. f is convex if, and only if, g(x,d) is convex for all x ∈ Rn and for all nonzero
d ∈ Rn .
24
2. f is strictly convex if, and only if, g(x,d) is strictly convex for all x ∈ Rn and for
all nonzero d ∈ Rn .
25
Minima and Maxima of Convex Functions
Exercise 2.9. Give an example to distinguish strict and strong local optimal solutions.
Show that every strong local optimal solution is a strict optimal solution. Is the converse
true?
Proof.
1. To the contrary, assume that f (y) < f (x̄) for some y ∈ S. For λ ∈ (0, 1), λy + (1 −
λ)x̄ ∈ S, and by convexity of f ,
26
For λ sufficiently close to 0, λy + (1 − λ)x̄ ∈ S can be made arbitrarily close to x̄ which
will contradict local optimality of x̄.
Proof. Suppose f has a subgradient ξ at x̄ such that ξ t (x − x̄) ≥ 0 for all x ∈ S. Then,
Since ξ t (x − x̄) ≥ 0 for all x ∈ S, we have f (x) ≥ f (x̄) for all x ∈ S and hence x̄ is a
solution to the problem.
V = {(x − x̄, y) : x ∈ S, y ≤ 0}
and
β t (x − x̄) + µy ≥ α for all x ∈ S and y ≤ 0. (22)
If µ > 0, then (22) will be violated for large negative y. Hence µ ≤ 0. Letting x = x̄
and y = > 0 in (21), we get µ ≤ α. This implies, α ≥ 0. Taking x = x̄ and y = 0 in
(22), we get α ≤ 0. Hence α = 0.
27
Taking y = 0 in the above inequality, we get
Since x̄ is optimal f (x) − f (x̄) ≥ 0 for all x ∈ S. Dividing both sides of (21) by −µ and
rearranging we get
Proof. By the theorem, x̄ is optimal if, and only if, ξ t (x − x̄) ≥ 0 for all x ∈ S. Take
x = x̄ − λξ where λ > 0 such that x ∈ S (S is open). This is possible, if, and only if,
ξ = 0. 2
Proof. Exercise.
Consider the problem: Minimize f (x) subject x ∈ S. We shall assume that both S and
f are convex. To find a solution to this problem, we start with an x̄ ∈ S. If x̄ is not
an optimal solution the problem, then we must have an x ∈ S such that the function f
must decrease in the direction x − x̄ from f (x̄) (otherwise, x̄ will become a local optimal
solution and by convexity x̄ will also be a global optimal solution). Note that in this
case, x − x̄ will be a feasible direction of S at x̄ (recall the Definition 2.2 of feasible
28
direction). To obtain the direction, we solve the following optimization problem :
Suppose f is a convex function over entire Rn and we are interested in the problem:
Minimize f (x) subject x ∈ S, where S is any arbitrary set, not necessarily convex.
Again, let us start our search for an optimal solution from a point x̄ ∈ S. If x̄ is an
optimal solution to the problem, then any y ∈ Rn with f (y) < f (x̄) must not be in S.
Let y ∈ Rn be such that f (y) < f (x̄). Then, if x̄ is optimal to the problem, we must
have
f (x̄) > f (y) ≥ f (x̄) + ∇f (x̄)t (y − x̄)
which implies that ∇f (x̄)t (y − x̄) < 0. In other words, the hyperplane
t
H = {u : ∇f (x̄) (u − x̄) = 0} separates the set of all ys that are better than x̄
(that is, f (y) < f (x̄)) from S. Thus, if x̄ is an optimal solution, then we must have
∇f (x̄)t (x − x̄) ≥ 0 for all x ∈ S. Therefore, the problem reduces to
Note that (24) has a linear objective function, and if S is a polyhedral set, then the
problem reduces to a Linear Programming problem.
Proof. Let U be the set of all optimal solutions to the problem. Note that U 6= ∅ as
x̄ ∈ U . Consider any y ∈ V . By the convexity of f and the definition of V , we have
y ∈ S and
To prove the converse, let y ∈ U . Then, y ∈ S and f (y) = f (x̄). This means that
f (y) ≥ f (x̄)+∇f (x̄)t (y − x̄) or that ∇f (x̄)t (y − x̄) ≤ 0. Since x̄ is optimal, we must have
29
∇f (x̄)t (y − x̄) ≥ 0 and hence ∇f (x̄)t (y − x̄) = 0. By interchanging the roles of y and
x̄, we must have ∇f (y)t (y − x̄) = 0. Therefore,
Note that
Quasiconvex Functions
Exercise 2.11. Show that f is quasiconvex if, and only if, its level sets are convex.
Proof. Assume f is quasiconvex. Fix x, y ∈ S. We may assume f (x) ≤ f (y). For any
λ ∈ (0, 1), by differentiability of f ,
30
where α(y; λ(x − y)) → 0 as λ → 0. Since f is quasiconvex, the LHS is nonpositive, and
this implies
λ∇f (y)t (x − y) + λ || x − y || α(y; λ(x − y)) ≤ 0.
∇f (y)t (x − y) ≤ 0.
Conversely, assume that (25) holds. Need to show that f is quasiconvex. Take x, y ∈ S.
We may assume f (x) ≤ f (y).Suppose there exists a λ ∈ (0, 1) such that
The last inequality follows as δz + (1 − δ)y is close to y for δ small, and f (z) > f (y).
By mean value theorem,
where u = µz + (1 − µ)y for some µ ∈ (δ, 1). This implies, as z − y = λ(x − y),
∇f (u)t (x − y) > 0.
On the other hand, as f (u) > f (y) ≥ f (x), from (25), ∇f (u)t (x − u) ≤ 0. As x − u =
(1 − λµ)(x − y), the last inequality implies ∇f (u)t (x − y) ≤ 0 which is a contradiction.
It follows that f is quasiconvex. 2
31
Proof. Since S is compact, it has no directions and every point of S is a convex
combination of its extreme points. Let x1 , x2 , . . . , xp be the extreme points of S. Let xq
Pp
be such that f (xq ) = max{f (xi ) : 1 ≤ i ≤ p}. Given x ∈ S, we can write x = i=1 λi xi ,
a convex combination of extreme points of S. Note that as f is quasiconvex,
Xp
f (x) = f ( λi xi ) ≤ max{f (xi ) : 1 ≤ i ≤ p} = f (xq ).
i=1
One of the sufficient conditions for a local optimal solution to be a global optimal
solution is that the function f is strictly quasiconvex.
Proof. Let x̄ be a local optimal solution to the problem. Suppose y ∈ S is such that
f (y) < f (x̄). For any λ ∈ (0, 1), by strict convexity of f ,
For λ sufficiently small, this will imply that x̄ + λ(y − x̄) is locally better than x̄,
contradicting local optimality of x̄. It follows that x̄ is a global optimal solution to the
problem. 2
32
Proof. Let x, y ∈ S be such that f (x) = f (y). Suppose there exists a λ ∈ (0, 1) such
that f (λx + (1 − λ)y) > f (x). Let z = λx + (1 − λ)y. Since f is continuous, there
exists a µ ∈ (0, 1), such that f (z) > f [µx + (1 − µ)z] > f (x) = f (y). Note that z is
convex combination of µx + (1 − µ)z and y. Since f [µx + (1 − µ)z] > f (y), by strict
quasiconvexity of f , f (z) < f [µx + (1 − µ)z]. From this contradiction it follows that we
cannot find a λ ∈ (0, 1) such that f (λx + (1 − λ)y) > f (x), and hence f is quasiconvex.
We have seen that a local minimum for a strictly quasiconvex function is also a global
minimum. When can we say it is unique?
Note that
Proof. Exercise.
33
Proof. We first show that f is strictly quasiconvex. To the contrary, assume x, y ∈ S
such that f (x) 6= f (y) and f (z) ≥ max{f (x), f (y)} where z = λx + (1 − λ)y for some
λ ∈ (0, 1). Assume, without loss of generality, f (x) < f (y). Then, we have
But these two inequalities are contradicting each other as y − u = µ(u − z)/(1 − µ). It
follows that f is strictly quasiconvex.
Since f is differentiable, it is continuous and hence quasiconvex.
Proof. Exercise.
Exercises:
34
4. Define various types of convexity at a point and examine which of the results
developed so far hold good for functions having convexity (of different types) at a
point.
ct x + α
f (x) = .
dt x + β
35
3. NonLinear Programming
and
Necessary and Sufficient Conditions
for Optimality
• Unconstrained Optimization
Minimize f (x), x ∈ Rn .
• Constrained Optimization
Minimize f (x) subject to x ∈ S or
With Inequality Constraints
Minimize f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m,
x ∈ X ⊆ Rn
With Inequality and Equality Constraints
Minimize f (x)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m,
hi (x) = 0, i = 1, 2, . . . , l,
x ∈ X ⊆ Rn
Unconstrained Optimization
f (z + λd) − f (z)
= ∇f (z)t d+ || d || α(z, λd).
λ
36
Since ∇f (z)t d < 0 and α(z, λd) → 0 as λ → 0, there exists a δ > 0 such that the RHS
of the above equation is negative for all λ ∈ (0, δ). The result follows. 2
Proof. Since z is local minimum, for any d, f (z + λd) ≥ f (z) ∀λ sufficiently small
which in turn implies ∇f (z)t d ≥ 0 for all d. Take d = −∇f (z). 2
Unconstrained Optimization
Proof. Using differentiability of f at z and the hypothesis ∇f (z) = 0, for any d we can
write
f (z + λd) − f (z) 1
= dt H(z)d+ || d ||2 α(z, λd).
λ2 2
Since z is local minimum, f (z + λd) ≥ f (z) ∀λ > 0 sufficiently small. Taking limit as
λ → 0, it follows that dt H(z)d ≥ 0 and hence the result follows. 2
Unconstrained Optimization
Proof. If z is not local minimum, then there exists a sequence xk → z such that
f (xk ) < f (z) for each k. Using the hypotheses, we can write
f (z + λdk ) − f (z) 1
2
= dtk H(z)dk + || d ||2 α(z, λdk ),
λ 2
37
where dk = (xk − z)/ || xk − z ||.
We may assume, without loss of generality, dk → d for some d 6= 0. Taking limits as
λ → 0, we get dt H(z)d ≤ 0 which is a contradiction to positive definiteness of H(z). 2
Reexamine Example 3.1.
Similarly, define the set of descent directions of f at z by F (z) = {d : ∇f (z)t d < 0}.
Note that D(z) is a cone if z ∈ S. If z is local optimum, then D(z) ∩ F (z) = ∅.
Proof. Suppose d ∈ D(z) ∩ F (z). This means we can find a λ > 0 arbitrarily small
satisfying f (z + λd) < f (z) (because d is a direction of descent) and z + λd ∈ S (because
d is a feasible direction). This contradicts the local optimality of z. It follows that
D(z) ∩ F (z) = ∅.
S = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m}.
38
Optimization With Inequality Constraints
When the NLP is specified as in the above problem, the necessary geometric condition
for local optimality (D(z) ∩ F (z) = ∅) can be reduced to an algebraic condition.
Theorem 3.6. Consider the problem PI stated above. Suppose z is a feasible point to
the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at
z and that gi s for i not in I are continuous at z. If z is a local optimal solution to the
problem, then F (z) ∩ G(z) = ∅, where G(z) = {d : ∇gi (z)t d < 0, ∀i ∈ I}.
Proof. From an earlier result, D(z) ∩ F (z) = ∅. Will show that G(z) ⊆ D(z). Let
d ∈ G(z). As X is open, there exists a δ1 > 0 such that z + λd ∈ X ∀λ ∈ (0, δ1 ). For
i 6∈ I, as gi (z) < 0 and is continuous at z, there exists a δ2 > 0 such that gi (z + λd) <
0 ∀λ ∈ (0, δ2 ). For i ∈ I, as ∇gi (z)t d < 0, d is a descent direction of gi at z and hence
∃ a δ3 > 0 3 gi (z + λd) < gi (z) = 0 ∀λ ∈ (0, δ3 ). From these inferences, we conclude
that d ∈ D(z) and hence the result follows.
Example 3.2.
Minimize (x − 3)2 + (y − 2)2
subject to x2 + y 2 ≤ 5
x+y ≤3
x, y ≥ 0
Analyze the optimality at the points z = ( 59 , 65 )t and
u = (2, 1)t .
−12 −8 t
∇f (z) = ( , ) and ∇g2 (z) = (1, 1)t .
5 5
Note that F (z) ∩ G(z) 6= ∅ and hence z cannot be an optimal solution.
∇f (u) = (−2, −2)t , ∇g1 (u) = (4, 2)t and ∇g2 (z) = (1, 1)t .
Note that F (u) ∩ G(u) = ∅ and hence u may be an optimal solution but this cannot be
guaranteed from F (u) ∩ G(u) = ∅ as it only a necessary condition.
39
Example 3.3.
Minimize (x − 1)2 + (y − 1)2
subject to (x + y − 1)3 ≤ 0
x, y ≥ 0
In this case, the necessary condition will hold good for each feasible (x, y) satisfying
x + y = 1. Now consider the same problem expressed as
Minimize (x − 1)2 + (y − 1)2
subject to x+y ≤1
x, y ≥ 0
Verify that the necessary condition is satisfied only at the point ( 12 , 12 ).
Note that when ∇f (z) = 0 or ∇gi (z) = 0 for i ∈ I, the necessary condition developed
above is of no use.
Theorem 3.7. Consider the problem PI stated earlier. Suppose z is a feasible point
to the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable
at z and that gi s for i 6∈ I are continuous at z. If z is a local optimal solution to the
problem, then there exist constants u0 and ui for i ∈ I such that
X
u0 ∇f (z) + ui ∇gi (z) = 0
i∈I
u0 , u i ≥ 0, for i ∈ I
(u0 , uI ) 6= 0
u0 , u i ≥ 0, for i = 1, 2, . . . , m, (28)
(u0 , ut ) 6= 0 (29)
Proof. Let k =| I | and let A be the n × (k + 1) matrix with its first column as
∇f (z) and its ith column as ∇gi (z), i ∈ I. From the previous theorem, we know that
40
F (z) ∩ G(z) = ∅. This is equivalent to saying there exists no d satisfying ∇f (z)t d < 0
and ∇gi (z)t d < 0 for each i ∈ I. In other words, the system At d < 0 has no solution.
By Gordan’s theorem, there exists a nonzero nonnegative vector p ∈ Rk+1 satisfying
Ap = 0. Taking u0 = p1 , ui = pi+1 , the first assertion of the theorem follows. For the
second assertion, take ui = 0 for i 6∈ I.
The ui s in (26) in the statement of theorem are called the Lagrangian multipliers.
The condition ui gi (z) = 0, i = 1, 2, . . . , m, is called the complementary slackness con-
dition.
Example 3.4.
Minimize (x − 3)2 + (y − 2)2
subject to x2 + y 2 ≤ 5
x + 2y ≤ 4
x, y ≥ 0
x I ∇f (x) ∇gi1 (x) ∇gi2 (z)
t t t
z = (2, 1) i1 = 1, i2 = 2 (−2, −2) (4, 2) (1, 2)t
w = (0, 0)t i1 = 3, i2 = 4 (−6, −4)t (−1, 0)t (0, −1)t
Example 3.5.
Minimize −x
subject to y − (1 − x)3 ≤ 0
y≥0
Note that z = (1, 0)t is the optimal solution to the problem (draw the feasible region
and check this) and the Fritz John conditions hold good at this point. Here I = {1, 2},
∇f (z) = (−1, 0)t , ∇g1 (z) = (0, 1)t and ∇g2 (z) = (0, −1)t . For the Fritz John condition
we must have
−1 0 0 0
u0 + u1 + u2 =
0 1 −1 0
which holds good only if u0 = 0.
Example 3.6.
Minimize −x
subject to x+y ≤0
y≥0
41
Note that Fritz John condition holds good at z = (1, 0)t with u0 = u1 = u2 = α for any
real α > 0.
∇f (z) = (−1, 0)t ∇g1 (z) = (1, 1)t , ∇g2 (z) = (0, −1)t .
Note that in examples 3.4 and 3.6, u0 is positive. But in example 3.5, u0 = 0. In
example 3.5, the ∇gi (z)s for i ∈ I are linearly dependent, but not in the other two
examples. Note that when u0 = 0 in Fritz John condition, the condition only talks
about the constraints. With an additional assumption, the Fritz John condition can be
improved. This is due to Kuhn and Tucker.
Theorem 3.8. Consider the problem PI stated earlier. Suppose z is a feasible point to
the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at z
and that gi s for i 6∈ I are continuous at z. Also assume that gi s for i ∈ I are linearly
independent. If z is a local optimal solution to the problem, then there exist constants
ui for i ∈ Isuch that
X
∇f (z) + ui ∇gi (z) = 0
i∈I
ui ≥ 0, for i ∈ I.
ui ≥ 0, for i = 1, 2, . . . , m. (32)
(33)
Proof. Get u0 and ui s as in the previous theorem. Note that u0 > 0, as ∇gi (z)s for
i ∈ I would become linearly dependent otherwise. Since u0 > 0, we can as well assume
that it is equal to one without loss of generality. The second assertion of the theorem
can be established as in the previous theorem.
42
Note that a geometric interpretation of the Kuhn-Tucker conditions is that if z
is a local optimum, then the gradient vector of the objective function at z with its
sign reversed is contained in the cone generated by the gradient vectors of the binding
constraints (follows from (30) above).
Theorem 3.9. Consider the problem PI stated earlier. Suppose z is a feasible point
to the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable
at z and that gi s for i 6∈ I are continuous at z. Further, assume that f is pseudoconvex
at z and gi is differentiable quasiconvex at z for each i ∈ I. If the Kuhn-Tucker
conditions hold good at z, that is, there exist ui ∈ R for each i ∈ I such that ∇f (z) +
Pm
i=1 ui ∇gi (z) = 0, then z is a global optimal solution to the problem.
Proof. Let x be any feasible solution to PI. Then, for i ∈ I, gi (x) ≤ 0 = gi (z). By
quasiconvexity of gi at z,
for all λ ∈ (0, 1). Thus gi does not increase in the direction x − z and hence we must
have ∇gi (z)t (x − z) ≤ 0. This implies t
P
i∈I ui ∇gi (z) (x − z) ≤ 0. Since ∇f (z) +
Pm t
i=1 ui ∇gi (z) = 0, it follows that ∇f (z) (x − z) ≥ 0. Since f is pseudoconvex, f (x) ≥
43
Consider the problem: Minimize f (x) subject to
g(x) = 0, x ∈ X, X is a nonempty subset in Rn .
Letting g1 (x) = g(x) and g2 (x) = −g(x), the above problem can be stated as
Minimize f (x) subject to
gi (x) ≤ 0, i = 1, 2, x ∈ X.
Note that, G(z) = ∅ is true and hence the optimality conditions developed above are of
no use.
Theorem 3.10. Consider the problem PIE stated above. Suppose z is a local optimal
solution to the problem. Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are
differentiable at z and that gi s for i 6∈ I are continuous at z. Further, assume that hi is
continuously differentiable at z for i = 1, 2, . . . , l. If ∇hi (z)s, i = 1, 2, . . . , l, are linearly
independent, then F (z) ∩ G(z) ∩ H(z) = ∅, where
F (z) = {d : ∇f (z)t d < 0}.
G(z) = {d : ∇gi (z)t d < 0, ∀i ∈ I}.
H(z) = {d : ∇hi (z)t d = 0, for = 1, 2, . . . , l}.
Theorem 3.11. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at z and that
gi s for i 6∈ I are continuous at z. Further, assume that hi is continuously differentiable
at z for i = 1, 2, . . . , l. If z is a local optimal solution to the problem, then there exist
constants u0 , ui for i ∈ I and vi , i = 1, 2, . . . l such that
X l
X
u0 ∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0
i∈I i=1
u0 , u i ≥ 0, for i ∈ I
(u0 , uI , v) 6= 0,
44
Furthermore, if gi s for i 6∈ I are also differentiable at z, then there exist u0 ∈ R and
u ∈ Rm such that
m
X l
X
u0 ∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0 (34)
i=1 i=1
ui gi (z) = 0, for i = 1, . . . , m, (35)
u0 , ui ≥ 0, for i = 1, . . . , m, (36)
(u0 , ut , v t ) 6= 0, (37)
Suppose ∇hi (z), i = 1, . . . , l are lineraly independent. Let A be the matrix whose first
column is ∇f (z) and the remaining columns being ∇gi (z), i ∈ I. Let B be the matrix
whose ith column is ∇h(z), i = 1, . . . , l. Then from the previous theorem, there is no d
which satisfies
At d < 0 and B t d = 0.
Since (p, 0) ∈ Cl(T ) for arbitrarily large negative, it follows that (u0 , utI ) ≥ 0. Since
(0, 0) ∈ cl(T ),
(u0 , utI )At d + v t B t d ≥ 0 for all d ∈ Rn .
This implies (u0 , utI )At + v t B t = 0. From this the theorem follows.
Remark 3.1. Note that the Lagrangian multipliers associated with hi s are unrestricted
in sign.
Example 3.7.
45
Minimize (x − 3)2 + (y − 2)2
subject to x2 + y 2 ≤ 5
x + 2y = 4
x, y ≥ 0
Analyze at z(2, 1)t .
Example 3.8.
Minimize −x
subject to y − (1 − x)3 = 0
y≥0
46
Kuhn-Tucker Necessary Conditions
Theorem 3.12. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Assume f and gi s for i ∈ I are differentiable at z and that gi s
for i 6∈ I are continuous at z. Further, assume that hi is continuously differentiable at z
for i = 1, 2, . . . , l. Also assume that ∇gi s for i ∈ I and ∇hi (z)s, i = 1, . . . , l, are linearly
independent.
If z is a local optimal solution to the problem, then there exist constants ui for i ∈ I
and vi , i = 1, 2, . . . l such that
X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0
i∈I i=1
ui ≥ 0, for i ∈ I,
47
Furthermore, if gi s for i 6∈ I are also differentiable at z, then there exist u ∈ Rm and
v ∈ Rl such that
m
X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0 (38)
i=1 i=1
ui gi (z) = 0, for i = 1, . . . , m, (39)
ui ≥ 0, for i = 1, . . . , m, (40)
Example 3.9.
Minimize −x
subject to x+y ≤0
y≥0
48
Kuhn-Tucker Sufficient Conditions
Theorem 3.13. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Suppose that the Kuhn-Tucker conditions hold good at z, i.e.,
there exist scalers ui , i ∈ I and vi , i = 1, . . . , l, such that
X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0.
i∈I i=1
49
Kuhn-Tucker Sufficient Conditions
Theorem 3.14. Consider the problem PIE. Suppose z is a feasible point to the problem.
Let I = {i : gi (z) = 0}. Suppose that the Kuhn-Tucker conditions hold good at z, i.e.,
there exist scalers ui , i ∈ I and vi , i = 1, . . . , l, such that
X l
X
∇f (z) + ui ∇gi (z) + vi ∇hi (z) = 0.
i∈I i=1
Proof. Let x be any feasible solution to PIE. Then, for i ∈ I, gi (x) ≤ 0 = gi (z). By
quasiconvexity of gi at z,
for all λ ∈ (0, 1). Thus gi does not increase in the direction x − z and hence we must
have ∇gi (z)t (x − z) ≤ 0. This implies i∈I ui ∇gi (z)t (x − z) ≤ 0.
P
X l
X
[ ui ∇gi (z) + vi ∇hi (z)]t (x − z) ≤ 0.
i∈I i=1
Pl
vi ∇hi (z) = 0 it follows that ∇f (z)t (x − z) ≥ 0.
P
Since ∇f (z) + i∈I ui ∇gi (z) + i=1
50
4. Duality in NLP
We shall refer to PIE as the Primal Problem. We shall write this problem in the
vector notation as
Minimize f (x)
subject to
g(x) ≤ 0,
h(x) = 0, x ∈ X ⊆ Rn .
Here g(x) = (g1 (x), g2 (x), . . . , gm (x))t and
h(x) = (h1 (x), h2 (x), . . . , hl (x))t
A number of problems closely associated with primal problem, called the dual problems,
have been proposed in the literature. Lagrangian Dual problem is one of these problems
which has played a significant role in the development of algorithms for
Maximize θ(u, v)
subject to
u ≥ 0, where
m
X l
X
θ(u, v) = inf{f (x) + ui gi (x) + vi hi (x) : x ∈ X}.
i=1 i=1
Since the dual maximizes the infimum, the dual is sometimes called the max-min prob-
lem.
51
Note that
• the lagrangian multipliers associated with ‘≤’ constraints (g(x) ≤ 0), namely the
ui s are nonnegative and those associated with the ‘=’ constraints (h(x) = 0),
namely vi s are unrestricted in sign,
• the lagrangian dual objective function θ(u, v) may be −∞ for a fixed vector (u, v),
because it is the infimum of a functional expression over a set X,
• the choice of a lagrangian dual would affect the solution process using the dual
approach to solve the primal.
52
Note that for each u ≥ 0 fixed, the dual objective value is the intercept of the line z2 +uz1
supporting G (from below) on the z2 axis. Therefore, the dual problem is equivalent to
finding the slope of the supporting hyperplane of G such that the intercept on the z2
axis is maximal. Note that at (z̄1 , z̄2 ) the dual objective function attains its maximum
with ū as the dual optimal solution. Also, in this case, the optimum dual objective value
coincides with the optimum primal objective value.
Example 4.1.
Minimize f (x, y) = x2 + y 2
subject to g(x, y) = −x − y + 4 ≤ 0
x, y ≥ 0
Verify that (2, 2)t is the optimum solution to this problem with optimal objective value
equal to 8.
Taking X = {(x, y) : x ≥ 0, y ≥ 0}, the dual function is given by
= inf{x2 − ux : x ≥ 0} + inf{y 2 − uy : y ≥ 0} + 4u
− 1 u2 + 4u, for u ≥ 0 ;
2
=
4u, for u < 0.
Theorem 4.1 (The Weak Duality Theorem). Let x be a feasible solution to PIE
an let (u, v) be a solution to its Lagrangian dual. The f (x) ≥ θ(u, v).
Corollary 4.2. If x is a feasible solution to PIE and (u, v) is a solution to its Lagrangian
dual such that f (x) ≤ θ(u, v), then x is optimal for PIE and (u, v) is optimal for the
dual.
53
Duality Gap
Remark: Note that in PIE, it was assumed that the set X was a nonempty open set.
However, for the dual formulation, the openness of X is not required. In fact, X may
even be a dicrete/finite set. Check that the weak duality theorem and its corollaries hold
good for any nonempty set X. Henceforth, we shall refer to PIE without the openness
assumption of X as the primal problem.
The dual optimal objective value may be a strictly less than the primal optimal objec-
tive value. In this case we say that there is duality gap. Analyze the following example.
Example 4.2.
Minimize f (x, y) = −2x + y
subject to h(x, y) = x + y − 3 = 0
(x, y) ∈ X = {(0, 0), (0, 4), (4, 4), (4, 0), (1, 2), (2, 1)} .
The Strong Duality Theorem asserts that, under some convexity assumptions and a
constraint qualification, the primal optimal objective value is equal to the dual optimal
objective value.
Theorem 4.2 (The Strong Duality Theorem). Let X be a nonempty convex set
in Rn , let f : Rn → R and g : Rn → Rm be convex, and let h : Rn → Rl be affine, that
is, h is of the form h(x) = Ax − b. Suppose that the following constraint qualification
holds true. There exists a z ∈ X such that g(z) < 0 and h(z) = 0, and 0 ∈ int(h(X)),
where h(X) = {h(x) : x ∈ X}. Then
Furthermore, if the infimum is finite, then the sup{θ(u, v) : u ≥ 0} is attained at (ū, v̄)
with ū ≥ 0. If the infimum is attained at x̄, then ūt g(x̄) = 0.
54
Lemma: Let X be a nonempty convex set in Rn , let α : Rn → R and g : Rn → Rm be
convex, and let h : Rn → Rl be affine. Consider the two systems:
System 1. α(x) < 0, g(x) ≤ 0, h(x) = 0 for some x ∈ X
System 2. u0 α(x) + ut g(x) + v t h(x) ≥ 0 ∀x ∈ X
(u0 , u) ≥ 0, (u0 , u, v) 6= 0
If System 1 has no solution, then System 2 has a solution. The converse is true if u0 > 0.
Since α and g are convex and h is affine, Λ is convex. Since System 1 has no solution,
the vector (0, 0, 0) ∈ R1+m+l does not belong to Λ. By a separation theorem, there exists
a non-zero vector (u0 , u, v) ∈ R1+m+l such that
Fix any x ∈ X. Note that (α(x), g(x), h(x)) ∈ cl(Λ) and (p, q, h(x)) ∈ cl(Λ) for all
(p, q) > (α(x), g(x)).
From this and (16), it follows that (u0 , u) ≥ 0. It follows that (u0 , u, v) is a solution to
System 2.
To prove the converse, assume System 2 has a solution (u0 , u, v) with u0 > 0.
Let x ∈ X be such that g(x) ≤ 0 and h(x) = 0. Since (u0 , u, v) solves System 2, we
have
u0 α(x) + ut g(x) + v t h(x) ≥ 0.
Since g(x) ≤ 0, h(x) = 0 and u ≥ 0, it follows u0 α(x) ≥ 0. Since u0 > 0, we must have
α(x) ≥ 0. It follows that System 1 has no solution.
55
finite. Consider the system:
By the definition of µ, this system has no solution. Hence, from the Lemma, there
exists a nonzero vector (u0 , u, v) ∈ R1+m+l with (u0 , u) ≥ 0 such that
We first show that u0 > 0. To the contrary, assume u0 = 0. From the hypothesis of
the theorem, that z ∈ X satisfies g(z) < 0 and h(z) = 0. Substituting z in the above
inequality, we get ut g(z) ≥ 0. This implies, as u ≥ 0 and g(z) < 0, u = 0. From (17),
it follows v t h(x) ≥ 0 ∀x ∈ X. Since 0 ∈ int(h(X)), there exists an x ∈ X such that
h(x) = −λv where λ is a small positive real. This implies, v t (−λv) ≥ 0 which in turn
implies v = 0. Thus, (u0 , u, v) = 0 which is a contradiction. Hence, u0 > 0. Without
loss of generality, we may assume that u0 = 1 and write
This implies
θ(u, v) = inf{f (x) + ut g(x) + v t h(x) : x ∈ X} ≥ µ.
Finally, suppose x̄ is an optimal solution to the primal problem, that is, x̄ ∈ X, g(x̄) ≤
0, h(x) = 0 and f (x̄) = µ. Sustituting in (18), we get ūt g(x̄) ≥ 0. Since ū ≥ 0 and
g(x̄) ≤ 0, ūt g(x̄) = 0.
56
An important consequence of strong duality theorem is the saddle point optimality
criteria. The existence of a saddle point asserts optimal solutions to both the primal
and dual problems and that the optimal objective values of the two problems are equal.
This does not require any convexity assumptions made in the strong duality theorem.
However, under the convexity assumptions one can assert the existence of a saddle point.
for all x ∈ X, for all u ≥ 0 and for all v, where φ(x, u, v) = f (x) + ut g(x) + v t h(x).
Then, x̄ and (ū, v̄) are optimal solutions to the primal and dual problems respectively.
Conversely, suppose that X, f, g are convex and that h is affine (i.e., h(x) = Ax − b).
Further, assume that there exists a z ∈ X such that g(z) < 0 and h(z) = 0, and that
0 ∈ int(h(X)). If x̄ is optimal solution to the primal problem, then there exists (ū, v̄)
with ū ≥ 0, so that (19) hold true.
Proof. Suppose there exist x̄ ∈ X and (ū, v̄) with ū ≥ 0 such that (19) hold good.
Since
f (x̄) + ut g(x̄) + v t h(x̄) = φ(x̄, u, v) ≤ φ(x̄, ū, v̄)
for all u ≥ 0 and all v ∈ Rl , it follows that g(x̄) ≤ 0 and h(x̄) = 0. Therefore, x̄ is a
solution to the primal problem. Putting u = 0 in the above inequality, it follows that
ūt g(x̄) ≥ 0. Since u ≥ 0 and g(x̄) ≤ 0, ūt g(x̄) = 0. From (19), for each x ∈ X, we have
Since (20) holds good for all x ∈ X, it follow that f (x̄) ≤ θ(ū, v̄). Since x̄ is feasible to
the primal and ū ≥ 0, from a corollary to the weak duality theorem it follows that x̄
and ūt g(x̄) = 0. are optimal to the primal and the dual problems respectively.
57
By definition of θ, we must have
φ(x̄, ū, v̄) = f (x̄) + ūt g(x̄) + v̄ t h(x̄) ≤ φ(x, ū, v̄)∀x ∈ X.
Again,
Theorem. Let S = {x ∈ X : g(x) ≤ 0, h(x) = 0}, and consider the primal prob-
lem, minimize f (x) subject to x ∈ S. Suppose that x̄ ∈ S satisfies the Kuhn-Tucker
conditions, that is, there exist ū ≥ 0 and v̄ such that
Suppose that f, gi , i ∈ I are convex at x̄, where I = {i : gi (x̄) = 0}. Further suppose
that if v̄i 6= 0, then hi is affine. Then, (x̄, ū, v̄) is a saddle point, that is,
for all x ∈ X, for all u ≥ 0 and for all v, where φ(x, u, v) = f (x) + ut g(x) + v t h(x).
Conversely, suppose that (x̄, ū, v̄), with x̄ ∈ int(X) and ū ≥ 0, is a saddle point. Then
x̄ is feasible to the primal problem and furthermore, (x̄, ū, v̄) satisfies the Kuhn-Tucker
conditions given in (21).
Proof. Suppose that (x̄, ū, v̄) with x̄ ∈ S and ū ≥ 0 satisfy Kuhn-Tucker conditions,
(21). By convexity of f, gi , i ∈ I at x̄, and since hi s are affine for vi 6= 0, we have
58
for all x ∈ X. Multiplying (23) by ūi ≥ 0, (24) by v̄i and adding, and using the
hypothesis (20), it follows that φ(x̄, ū, v̄) ≤ φ(x, ū, v̄) for all x ∈ X.
Since g(x̄) ≤ 0, h(x̄) = 0 and ūt g(x̄) = 0, it follows that φ(x̄, u, v) ≤ φ(x̄, ū, v̄) for
all u ≥ 0. Hence, (x̄, ū, v̄) satisfies the saddle point condition.
To prove the converse, suppose that (x̄, ū, v̄), with x̄ ∈ int(X) and ū ≥ 0, is a saddle
point. Since φ(x̄, u, v) ≤ φ(x̄, ū, v̄) for all u ≥ 0 and all v, it follows g(x̄) ≤ 0, h(x̄) = 0
and ūt g(x̄) = 0. This shows that x̄ is a feasible solution to the primal. Since φ(x̄, ū, v̄) ≤
φ(x, ū, v̄) for all x ∈ X, x̄ is a local optimal solution to the problem: minimize φ(x, ū, v̄)
subject to x ∈ X. Since x̄ ∈ int(X), ∇x φ(x̄, ū, v̄) = 0, that is, ∇f (x̄) + ∇g(x̄)ū +
∇h(x̄)v̄ = 0. It follows that (21) holds good.
Remark. We see that under certain convexity assumptions, the Lagrangian multipliers
of Kuhn-Tucker conditions also serve as the multipliers in the saddle point criteria.
Conversely, the multipliers of the saddle point criteria are the Lagrangian multipliers
of the Kuhn-Tucker conditions. Also, note that the dual varibles turn out to be the
Lagrangian multipliers.
59
Properties of the Dual Function
For the problems with zero duality gap, one way of solving the primal problem is to
obtain the solution via the dual problem. In order to solve the dual problem one has to
understand the properties of the dual objective function. We shall derive some properties
of the dual under assumption that the set X is a compact set. As one can always impose
boundary conditions on the variables x, this assumption is a reasonable one to make.
For ease of notation, we shall combine the vector functions g and h into β, i.e., β(x) =
(g(x)t , h(x)t )t and combine the dual variable vectors u and v into w, i.e., w = (ut , v t )t .
The first property of the dual objective function is that it is concave over the entire
Rm+l which in turn asserts that any local optimal solution is global optimal solution to
the dual maximization objective.
60
Theorem. Let X be a nonempty compact set in Rn . Let f : Rn → R, and β : Rn →
Rm+l be continuous. Then, θ defined by
Proof. Since X is compact and since f and β are continuous, θ is a real valued function
on Rm+l . For any w1 , w2 ∈ Rm+l and for any λ ∈ (0, 1), we have
θ[λw1 + (1 − λ)w2 ]
= inf{f (x) + [λw1 + (1 − λ)w2 ]t β(x) : x ∈ X}
= inf{λ[f (x) + w1t β(x)] + (1 − λ)[f (x) + w2t β(x)] : x ∈ X}
≥ λ inf{f (x) + w1t β(x) : x ∈ X}
61
When X is compact and the f and β are continuous, the infimum defined by θ(w) =
inf{f (x) + wt β(x) : x ∈ X} is attained at some x ∈ X for each w. We shall define
the set X(w) = {x ∈ X : f (x) + wt β(x) = θ(w)}. If X(w) is a singleton set, then θ is
differnetiable at w.
Proof. Suppose xk does not converge to x̄. Since X is compact, we may assume without
loss of generality that xk converges to z ∈ X where z 6= x̄. For each k, as xk ∈ X(wk )
Taking the limit as k → ∞, we get f (z) + β(z)t w̄ ≤ f (x̄) + β(x̄)t w̄. This implies
z ∈ X(w̄) = {x̄}, contradiction.
62
Theorem. Let X be a nonempty compact set in Rn and let f : Rn → R, β : Rn → Rm+l
be continuous. Let w̄ ∈ Rm+l be such that X(w̄) is a singleton. Then θ is differentiable
at w̄ with gradient ∇θ(w̄) = β(x̄).
Proof. Since f and β are continuous, and X is compact, for any w there exists xw ∈
X(w). From the definition of θ, the following inequalities hold good:
≥ − || w − w̄ || || β(xw ) − β(x̄) ||
63