Presentation Opt - Lecture3 - 2019
Presentation Opt - Lecture3 - 2019
Andrew Lesniewski
Baruch College
New York
Fall 2019
Outline
1 Linear programming
4 Simplex method
m
(P
m
X
j=1aij xj ≥ bi , for i = 1, . . . , n,
min f (x) = cj xj , subject to
j=1
xj ≥ 0, for j = 1, . . . , m.
Example 3. There are p people available to carry out q tasks. How should the
tasks be assigned so that the total value is maximized?
Let cij denote the daily value of person i = 1, . . . , p carrying out task j = 1 . . . , q,
and let xij denote the fraction of person’s i workday to spend on task j.
This leads to the following optimization problem:
Pq
p X
X q Pj=1 xij ≤ 1,
for i = 1, . . . , p,
p
max f (x) = cij xij , subject to xij ≤ 1, for j = 1, . . . , q,
i=1 j=1 i=1
xij ≥ 0, for i = 1, . . . , p, j = 1, . . . , q.
The first constraint guarantees that no one works more than a day, while the
second condition means that no more than one person is assigned to a task.
Example 4. There are p factories that supply a certain product, and there are q
sellers to which this product is shipped. How should the market demand be met
at a minimum cost?
Assume that factory i = 1, . . . , p can supply si units of the product, while seller
j = 1, . . . , q requires at least dj units of the product. Let cij be the cost of
transporting one unit of the product from factory i to seller j, and let xij denote the
number of units shipped from factory i to seller j.
This leads to the following optimization problem:
P q
p X
X q Pj=1 xij ≤ si ,
for i = 1, . . . , p,
p
min f (x) = cij xij , subject to xij ≥ dj , for j = 1, . . . , q,
i=1 j=1 i=1
xij ≥ 0, for i = 1, . . . , p, j = 1, . . . , q.
Example
−x1 + x2 ≤ 1,
x1 , x2 ≥ 0.
Examples
Standard form
Standard form
Standard form
n
X
aij xj ≤ bi ,
j=1
n
X
aij xj + si = bi ,
j=1
si ≥ 0.
Example
Introducing slack variables x3 and x4 , this problem can be written in the standard
form:
2x1 + x2 + x3 = 12,
min −x1 − x2 , subject to x1 + 2x2 + x4 = 9,
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.
Example
− −
+ +
x1 − x1 + x2 − x2 − s1 = 1,
+ − + − + −
min x2 − x2 , subject to x1 − x1 − x2 + x2 + s2 = 0,
x1 ≥ 0, x1− ≥ 0, x2+ ≥ 0, x2− ≥ 0, s1 ≥ 0, s2 ≥ 0.
+
Let us go back to Example 6 written in the standard form, and consider the
following few feasible solutions:
−x1 − x2 ≥ −2x1 − x2 − x3
− 12.
1 1
−x1 − x2 ≥ −x1 − x2 − x3 − x4
3 3
1 1
= − (2x1 + x2 + x3 ) − (x1 + 2x2 + x4 )
3 3
= −7.
The last lower bound means that f (x) ≥ −7 for any feasible solution.
Since we have already found a feasible solution saturating this bound, namely
x = (5, 2, 0, 0), it means that this x is an optimal solution to the problem.
2y1 + y2 ≤ −1,
y1 + 2y2 ≤ −1,
y1 , y2 ≤ 0.
Duality
For a general problem (2) (called the primal problem), we consider the
corresponding dual problem:
Duality
Weak duality theorem. Let x be a feasible solution to the primal problem, and let
y be a feasible solution to the dual problem. Then
c T x ≥ bT y. (5)
0 ≤ (c − AT y)T x
= c T x − y T Ax
= c T x − y T b.
Duality
Duality
y T (Ax − b) = 0,
(6)
(c − AT y)T x = 0.
AT λ + s = c,
Ax = b,
x ≥ 0, (8)
s ≥ 0,
xi si = 0, for all i = 1, . . . , n.
c T x ∗ = (AT λ∗ + s∗ )T x ∗
= λ∗T Ax ∗ + s∗T x ∗
(9)
= (Ax ∗ )T λ∗ + x ∗T s∗
= bT λ∗ .
In other words, the Lagrange multipliers can be identified with the dual variables
y in (4), and bT λ is the objective function for the dual problem!
Conversaly, we can apply the KKT conditions to the dual problem (3). The
Lagrange function reads:
Ax = b,
AT y ≤ c,
(11)
x ≥ 0,
x T (c − AT y) = 0.
The primal-dual relationship is symmetric: by taking the dual of the dual problem,
we recover the original problem.
The dual problem provides a useful intuitive interpretation of the primal problem.
As an illustration, consider the nutrition problem in Example 2.
The dual constraints read ni=1 aij λi ≤ cj , j = 1, . . . , m, and so λi represents
P
the unit price of nutrient i.
Therefore, the dual objective function ni=1 λi bi represents the cost of the daily
P
nutrients that the (imagined) salesman of nutrients is trying to maximize.
The optimal values λ∗i of the dual variables are called the shadow prices of the
nutrients i.
Even though the nutrients cannot be directly purchased, the shadow prices
represent their actual market values.
Another way of interpreting λ∗ is as the sensitivities of the primal function.
Polyhedra
Polyhedra
Extreme points
Extreme points
Let a1T , . . . , am
T denote the row vectors of the matrix A. In terms of these vectors,
has a unique solution, if and only if there exist n vectors in the set
{ai : i ∈ A(x ∗ )} which are linearly independent (why?).
We will refer to the constraints as linearly independent, if the vectors ai are
linearly independent.
Basic solutions
Extreme points
Extreme points
Extreme points
The proof of the theorem is fun: Let Q be the (nonempty) set of all optimal
solutions, and let v be the optimal value of the objective function c T x.
Then Q = {x ∈ Rn : Ax ≥ b, c T x = v }, which is also a polyhedron. Since
Q ⊂ P and P does not contain a line, Q does not contain a line, and so Q has
an extreme point x ∗ .
We will show that x ∗ is also an extreme point of P.
Assume that x ∗ = λy + (1 − λ)z, for y, z ∈ P and λ ∈ (0, 1). Then
v = cTx ∗
= λc T y + (1 − λ)c T z.
Degeneracy
x1 + x2 + 2x3 ≤ 8
x2 + 6x3 ≤ 12
x1 ≤ 4
x2 ≤ 6.
x1 , x2 , x3 ≥ 0.
Degeneracy
2x1 + x2 + x3 = 2,
x1 + x2 = 1,
x1 + x3 = 1,
x1 , x2 , x3 ≥ 0.
The corresponding matrix A has rank 2. The first constraint is redundant (it is the
sum of the second and third constraints), and can be eliminated without
changing the problem.
B = (Ar1 . . . Arm ).
Basic solutions
Permuting the columns of A we write it in the block form (B N). Under the same
permutation, a vector x can be written in the block form:
xB
.
xN
or
BxB = b.
Basic solutions
Basic solutions
There is an key link the geometric concept of a vertex of a polyhedron and the
analytic concept of a BFS given by the theorem below.
Theorem. A vector x ∗ is a BFS, if and only if it is a vertex in P.
For the proof, we assume that x ∗ is not an extreme point of P, i.e. it can be
represented as x ∗ = λy + (1 − λ)z, with 0 < λ < 1, and distinct y, z ∈ P.
But then also xN∗ = λyN + (1 − λ)zN . However, since xN∗ = 0, and y, z ≤ 0
(since they are elements of P), it follows that also yN = 0 and zN = 0.
Since BxB∗ = b, we also must have ByB = b and BzB = b (because
xN∗ = yN = zN = 0).
This implies that xB∗ = yB = zB (= B −1 b), and so x ∗ = y = z. This contradiction
means that x ∗ is extreme.
Basic solutions
Adjacent BFSs
We will now proceed to describing an algorithm for moving from one BFS to
another and decide when to stop the search.
We start with the following definition.
(i) Two BFSs are adjacent, if their basic matrices differ in one basic column
only.
(ii) Let x ∈ P. A vector d ∈ Rn is a feasible direction at x, if there is a positive
number θ such that x + θd ∈ P.
(iii) A vector d ∈ Rn is an improving direction, if c T d < 0.
In other words, moving from x in an improving direction d lowers the value of the
objective function c T x by c T d.
Adjacent BFSs
Ad = 0. (14)
The strategy is, starting from a BFS, to find an improving feasible direction
towards an adjacent BFS.
Adjacent BFSs
We move in the j-th basic direction d = (dB dN ) that has exactly one positive
component corresponding to a non-basic variable.
When moving in the basic direction, the nonbasic variable xj = 0 becomes
positive, while the other nonbasic variables remain zero. We say that xj enters
the basis.
Specifically, we select a nonbasic variable xj and set
dj = 1,
di = 0, for every nonbasic index i 6= j.
Adjacent BFSs
0 = Ad
n
X
= Ai di
i=1
m
X
= Ari dri + Aj
i=1
= BdB + Aj ,
and so dB = −B −1 Aj .
Adjacent BFSs
Reduced cost
We will now study the effect of moving in the j-th basic direction on the objective
function.
Let x be a basic solution with basis matrix B, and let cB be the vector of the costs
of the basic variables.
For each i = 1, . . . , n the reduced cost c̄i of xi is defined by
Example
Example 10. Consider the following problem. For x ∈ R4 ,
x1 + x2 + x3 + x4 = 2,
min c T x subject to 2x1 + 3x3 + 4x4 = 2,
xi ≥ 0, i = 1, . . . , 4.
Since they are linearly independent, we can choose x1 and x2 as our basic
variables, and
1 1
B= .
2 0
Example
dB = −B −1 A3
0 1/2 1
=−
1 −1/2 3
−3/2
= .
1/2
Optimality condition
Optimality condition
Therefore,
X
c T d = cBT dB + cj dj
j
X
= (cj − cBT B −1 Aj )dj
j
X
= c j dj .
j
Step size
Let d be a basic, feasible, improving direction from the current BFS x, and let B
be the basis matrix for x.
We wish to move by the amount of θ > 0 in the direction d in order to find a BFS
x 0 adjacent to x. This takes us to the point x + θ∗ d, where
θ∗ = max{θ ≥ 0 : x + θd ∈ P},
Step size
Step size
xrl + θ∗ drl = 0.
This means that the new basic variable has become 0, whereas the nonbasic
variable xj has become positive. This indicates that, in the next iteration, the
index j should replace rl .
In other words, the new basis matrix B is obtained from B by replacing its column
Arl with the column Aj .
The columns Ari , i 6= l, and Aj are linearly independent and form a new basis
matrix B.
The vector y = x + θ∗ d is a BFS corresponding to B.
Degenerate problems
As usual, starting the iteration may sometimes not be easy, and finding an initial
BFS may prove challenging.
One strategy is to solve an auxiliary problem.
For example, if we want a BFS with xi = 0, we set the objective function to xi and
find the optimal solution to this problem.
If the optimal value is 0 then we found a BFS with xi = 0, otherwise there is no
such feasible solution.
References
[1] Bertsimas, D., and Tsitsiklis, J. N.: Linear Optimization, Athena Scientific
(1997).
[2] Nocedal, J., and Wright, S. J.: Numerical Optimization, Springer (2006).