0% found this document useful (0 votes)
54 views

15.093 Optimization Methods

The document discusses the conjugate gradient algorithm for optimization. It begins with an outline of topics to be covered, including the conjugate gradient algorithm, necessary and sufficient optimality conditions, convex optimization, and applications. It then provides details on the conjugate gradient algorithm, including how it finds optimal solutions for quadratic functions in a finite number of steps by expanding the search space with conjugate directions. Necessary optimality conditions called the Karush-Kuhn-Tucker (KKT) conditions are also summarized.

Uploaded by

Anup scribd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

15.093 Optimization Methods

The document discusses the conjugate gradient algorithm for optimization. It begins with an outline of topics to be covered, including the conjugate gradient algorithm, necessary and sufficient optimality conditions, convex optimization, and applications. It then provides details on the conjugate gradient algorithm, including how it finds optimal solutions for quadratic functions in a finite number of steps by expanding the search space with conjugate directions. Necessary optimality conditions called the Karush-Kuhn-Tucker (KKT) conditions are also summarized.

Uploaded by

Anup scribd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

15.

093 Optimization Methods

Lecture 20: The Conjugate Gradient Algorithm

Optimality conditions for constrained optimization

1 Outline
Slide 1
1. The Conjugate Gradient Algorithm
2. Necessary Optimality Conditions
3. Sufficient Optimality Conditions
4. Convex Optimization
5. Applications

2 The Conjugate Gradient Algorithm


2.1 Quadratic functions
Slide 2
1
min f (x) = x� Qx + c� x
2
Definition: d1 , . . . , dn are Q-conjugate if
di =
� 0, d�i Qdj = 0, i =
� j
Proposition: If d1 , . . . , dn are Q-conjugate, then d1 , . . . , dn are linearly inde­
pendent.

2.2 Motivation
Slide 3
Given Q-conjugate d1 , . . . , dn , and xk , compute
k
min f (xk + αdk ) = c� x + αc� dk +
α

1 k
(x + αdk )� Q(xk + αdk ) =
2
1
f (xk ) + α�f (xk )� dk + α2 d�k Qdk
2
Solution:
−�f (xk )� dk
α̂k = , xk+1 = xk + α̂k dk
d�k Qdk

2.3 Expanding Subspace


Theorem
Slide 4
d1 , . . . , dn are Q-conjugate. Then, xk+1 solves
min f (x)
k

s.t. x = x1 + αj dj
j=1

Moreover, xn+1 = x∗ .

1
2.4 The Algorithm
Slide 5
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 For k = 1, . . . , n do:
If ||�f (xk )|| ≤ �, stop; else:

−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk

xk+1 = xk + α̂k dk
−�f (xk+1 )� Qdk
dk+1 = −�f (x k+1 ) + λk dk , λk =
d�k Qdk

2.5 Correctness
Slide 6
Theorem: The directions d1 , . . . , dn are Q-conjugate.

2.6 Convergence Properties


Slide 7
• This is a finite algorithm.
• If there are only k distinct eigenvalues of Q, the CGA finds an optimal

solution in k steps.

• Idea of pre-conditioning. Consider

min f (Sx) = (Sx)� Q(Sx) + c� Sx


2

so that the number of distinct eigenvalues of S � QS is small

2.7 Example
Slide 8
1 �
f (x) = x Qx − c� x
2
⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞
⎜ 19 43 33 19 5 2 5 4 0 0
⎜ 0
⎜ 22 33 40 29 12 7 6 2 2 4 ⎜ 0
⎟ ⎟
⎟ ⎟
⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 16 5 12 16 12 4 8 2 4 8 ⎜ 0
Q = ⎜ ⎟ and c = ⎜ −2
⎟ ⎟
⎜ 3 2 7 7 4 5 1 0 1 4 ⎟
⎟ ⎜ ⎟
⎜ 16 5 6 14 8 1 12 2 2 4 ⎟ ⎜ 0 ⎟
⎜ 6 4 2 6 2 0 2 4 0 0
⎟ ⎜ −6 ⎟
⎝ ⎠ ⎝ ⎠
4 0 2 2 4 1 2 0 2 4 −7
4 0 4 4 8 4 4 0 4 16 −4
� �2
κ(Q)−1
κ(Q) ≈ 17, 641 δ=
κ(Q)+1
= 0.999774 Slide 9

2
f (xk ) − f (x∗ )
k f (xk ) f (xk ) − f (x∗ )
f (xk−1 ) − f (x∗ )
1 12.000000 2593.726852 1.000000
2 8.758578 2590.485430 0.998750
3 1.869218 2583.596069 0.997341
4 −12.777374 2568.949478 0.994331
5 −30.479483 2551.247369 0.993109
6 −187.804367 2393.922485 0.938334
7 −309.836907 2271.889945 0.949024
8 −408.590428 2173.136424 0.956532
9 −754.887518 1826.839334 0.840646
10 −2567.158421 14.568431 0.007975
11 −2581.711672 0.015180 0.001042
12 −2581.726852 −0.000000 −0.000000

2.8 General problems


Slide 10
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 If ||�f (xk )|| ≤ �, stop; else:

−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk

x k+1 = xk + α̂k dk
dk+1 = −�f (xk+1 ) + λk dk
||�f (xk+1 )||
λk =
||�f (xk )||
Step 2 k ← k + 1, goto Step 1

3 Necessary
Optimality Conditions
3.1 Nonlinear Optimization
Slide 11
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m

P = {x| gj (x) ≤ 0, j = 1, . . . , p,
hi (x) = 0, i = 1, . . . , m}

3
3.2 The KKT conditions
Slide 12
Discovered by Karush-Kuhn-Tucker in 1950’s.
Theorem
If
• x is local minimum of P
• I = {j| gj (x) = 0}, set of tight constraints
• Constraint qualification condition (CQC): The vectors �gj (x), j ∈ I and
�hi (x), i = 1, . . . , m, are linearly independent
Slide 13
Then, there exist vectors (u, v):
p
� m

1. �f (x) + uj �gj (x) + vi �hi (x) = 0

j=1 i=1

2. u ≥ 0
3. uj gj (x) = 0, j = 1, . . . , p

3.3 Some Intuition from LO


Slide 14
Linearize the functions in the neighborhood of the solution x. Problem becomes:
min f (x) + �f (x)� (x − x)
s.t. gj (x) + �gj (x)� (x − x) ≤ 0, j∈I
hi (x) + �hi (x)� (x − x) = 0, i = 1, . . . , m
Slide 15
This is a LO problem. Dual feasibility:
� m

ûj �gj (x) + v̂i �hi (x) = �f (x), ûj ≤ 0
j∈I i=1

Change to uj = −ûj , vi = −v̂i to obtain:


p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0, uj ≥ 0
j=1 i=1

3.4 Example 1
Slide 16
min f (x) = (x1 − 12)2 + (x2 + 6)2
s.t. h1 (x) = 8x1 + 4x2 − 20 = 0
g1 (x) = x21 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0
g2 (x) = (x1 − 9)2 + x22 − 64 ≤ 0

4
x = (2, 1)� ; g1 (x) = 0, g2 (x) = −14, h1 (x) = 0.
Slide 17

• I = {1}
• �f (x) = (−20, 14)�; �g1 (x) = (7, −2.5)�
• �g2 (x) = (−14, 2)� ; �h1 (x) = (8, 4)�
• u1 = 4, u2 = 0, v1 = −1
• �g1 (x), �h1 (x) linearly independent
• �f (x) + u1 �g1 (x) + u2 �g2 (x) + v1 �h1 (x) = 0
� � � � � � � � � �
−20 7 −14 8 0
+4 +0 + (−1) =
14 −2.5 2 4 0

3.5 Example 2
Slide 18
max x� Qx
s.t. x� x ≤ 1
Q arbitrary; Not a convex optimization problem.

min −x� Qx
s.t. x� x ≤ 1

3.5.1 KKT
Slide 19
−2Qx + 2ux = 0
x� x ≤ 1
u ≥ 0
u(1 − x� x) = 0
Slide 20

3.5.2 Solutions of KKT


• x = 0, u = 0, Obj = 0.
• x �= 0 ⇒ Qx = u x ⇒ x eigenvector of Q with non-negative eigenvalue
u.
• x� Qx = u x� x = u.
• Thus, pick the largest nonnegative eigenvalue û of Q. The solution is

the corresponding eigenvector x̂ normalized such that x̂� x̂ = 1. If all

eigenvalues are negative, x̂ = 0.

5
3.6 Are CQC Necessary?
Slide 21
min x1
s.t. x21 − x2 ≤ 0
x2 = 0
Feasible space is (0, 0).
KKT: � � � � � � � �
1 2x1 0 0
+u +v =
0 −1 1 0
KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1 (0, 0)
and �h1 (0, 0).

3.7 Constrained Qualification


Slide 22
Slater condition: There exists an x0 such that gj (x0 ) < 0, j = 1, . . . , p, and
hi (x0 ) = 0 for all i = 1, . . . , m.

Theorem Under the Slater condition the KKT conditions are necessary.

4 Sufficient
Optimality Conditions
Slide 23
Theorem If
• x feasible for P
• Feasible set is P is convex and f (x) convex
• There exist vectors (u, v), u ≥ 0:
p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1

uj gj (x) = 0, j = 1, . . . , p

Then, x is a global minimum of P .

4.1 Proof
Slide 24
• Let x ∈ P . Then (1 − λ)x + λx ∈ P for λ ∈ [0, 1].
• gj (x + λ(x − x)) ≤ 0 ⇒

�gj (x)� (x − x) ≤ 0

6
• Similarly, hi (x + λ(x − x)) ≤ 0 ⇒

�hi (x)� (x − x) = 0

Slide 25
• Thus,

�f (x)� (x − x) =

⎛ ⎞�
� p m

−⎝ uj �gj (x) + vi �hi (x)⎠ (x − x) ≥ 0
j=1 i=1

⇒ f (x) ≥ f (x).

5 Convex Optimization
Slide 26
• The KKT conditions are always necessary under CQC.
• The KKT conditions are sufficient for convex optimization problems.
• The KKT conditions are necessary and sufficient for convex optimization

problems under CQC.

• min f (x) s.t. Ax = b, x ≥ 0, f (x) convex, KKT are necessary and

sufficient even without CQC.

5.0.1 Separating hyperplanes


Slide 27
Theorem Let S be a nonempty closed convex subset of �n and let x∗ ∈ �n be
a vector that does not belong to S. Then, there exists some vector c ∈ �n such
that c� x∗ < c� x for all x ∈ S.
Proof in BT, p.170

5.1 Sketch of the Proof under convexity


Slide 28
• Suppose x is a local (and thus global) optimal solution.
• f (x) < f (x), gj (x) ≤ 0, j = 1, . . . , p, hi (x) = 0, i = 1, . . . , m is infeasi­

ble.

• Let U = {(u0 , u, v)| there exists x : f (x) < u0 , gj (x) ≤ uj , hi (x) = vi }.


• (f (x), 0, 0) ∈
/ S.
• U convex.
Slide 29
• By separating hyperplane theorem, there is a vector (c0 , c, d):
p
� m

c0 u 0 + cj u j + di vi > c0 f (x) ∀ (u0 , u, v) ∈ U.
j=1 i=1

7
S
.x *

• c0 ≥ 0 and cj ≥ 0 for j ∈ I (constraint gj (x) ≤ 0 tight). Why?


If (u0 , u, v) ∈ U , then (u0 + λ, u, v) ∈ U for λ ≥ 0. Thus,
p
� m

∀ λ ≥ 0, λc0 + cj u j + di vi > c0 f (x) ⇒ c0 ≥ 0.
j=1 i=1

• Select (u0 , u, v) = (f (x) + λ, g1 (x), . . . , gp (x), h1 (x), . . . , hm (x)) ∈ U



p
� m

c0 (f (x) + λ) + cj gj (x) + di hi (x) > c0 f (x)
j=1 i=1

• Take λ → 0:
p
� m

c0 f (x) + cj gj (x) + di hi (x) ≥ c0 f (x)
j=1 i=1

• c0 > 0 (constrained qualification needed here).


• p
� m

f (x) + uj gj (x) + vi hi (x) ≥ f (x), uj ≥ 0
j=1 i=1


p
� m

f (x) + uj gj (x) + vi hi (x) ≤ f (x) ≤
j=1 i=1
p
� m

f (x) + uj gj (x) + vi hi (x).
j=1 i=1

8
• Thus, ⎛ ⎞
p
� m

f (x) = min ⎝f (x) + uj gj (x) + vi hi (x)⎠
j=1 i=1

p

uj gj (x) = 0 ⇒ uj gj (x) = 0
j=1

• Unconstrained optimality conditions:


p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1

6 Applications
6.1 Linear Optimization
Slide 30
min c� x
s.t. Ax = b
x≥0
min c� x
s.t. Ax − b = 0
−x ≤ 0

6.1.1 KKT
Slide 31
c + A� û − v = 0
v ≥ 0
vj xj = 0
Ax − b = 0
x ≥ 0
u = −û
A� u ≤ c dual feasibility
(cj − A�j u)xj = 0 complementarity
Ax − b = 0 primal feasibility
x ≥ 0 primal feasibility

9
6.2 Portfolio Optimization
Slide 32
x =weights of the portfolio
1
max r � x − λ x� Qx
2
s.t. e� x = 1

1
min −r �x + λ x� Qx
2
s.t. e� x = 1

6.2.1 KKT
Slide 33
−r + λQx + ue = 0
1 −1
x= Q (r − ue)
λ
−1
e� x = 1 ⇒ e� Q (r − ue) = λ
� −1
eQ r−λ
u= � −1
eQ e
As λ changes, tradeoff of risk and return changes. The allocation changes as
well. This is the essense of modern portfolio theory.

10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like