0% found this document useful (0 votes)
10 views

Math6015-Lecture-02 - Gif

Uploaded by

epicshadow001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Math6015-Lecture-02 - Gif

Uploaded by

epicshadow001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Convex Optimization

Lecture : Chapter 6
02

Approximation and Fitting

Jingwei Liang
Institute of Natural Sciences and School of Mathematical Sciences

Email: [email protected]
Office: Room 355, No. 6 Science Building
Regression example
Least square regression
Non-negative least square

Chapter 6 Norm Approximation

Least-norm Approximation

Regularized Approximation

Extra Materials
Image processing
Morden regularization
Applications
Non-smooth optimization
6.1 Regression example
Least square regression, non-negative least square
Least square regression

Observation model Let m ∈ N++ . For i = 1, ..., m, given each xi ∈ R,


yi = axi + b + εi
with εi being random noise.

2
Problems • Chapter 6-Approximation and Fitting • Regression example • Least square regression /35
Least square regression

Observation model Let m ∈ N++ . For i = 1, ..., m, given each xi ∈ R,


yi = axi + b + εi
with εi being random noise.

Matrix-vector representation,
     
x1 1 y1 ε1
 x2 1 [ ]  y2   ε2 
  a    
A= . , x = , y =  .  and ε =  .  .
 ..  b  ..   .. 
xm 1 ym εm

The system of equations reads





ax1 + b + ε1 = y1 ,

.. ⇐⇒ y = Ax + ε.
 .


axm + b + εm = ym .

2
Problems • Chapter 6-Approximation and Fitting • Regression example • Least square regression /35
Least square regression

Least square regression Estimating x from y


minimizex∈R2 ∥Ax − y∥2 .

Assume that A has full column rank


minimize ∥Ax − y∥2 ⇐⇒ 0 = AT (Ax − y)
⇐⇒ AT Ax = AT y
( )−1 T
⇐⇒ x = AT A A y

If AT A is not invertible,
x(k+1) = x(k) − γk AT (Ax(k) − y) → x⋆ ,
where x⋆ is “a” solution of the problem.

2
Problems • Chapter 6-Approximation and Fitting • Regression example • Least square regression /35
Least square regression

Least square regression Estimating x from y


minimizex∈R2 ∥Ax − y∥2 .

𝑦 = 0.23𝑥 − 0.08

2
Problems • Chapter 6-Approximation and Fitting • Regression example • Least square regression /35
Non-negative least square

Non-negative least square regression:


minimizex∈R2 ∥Ax − y∥2 subject to xi ≥ 0, i = 1, 2.

𝑦 = 0.23𝑥 − 0.08
𝑦 = 0.21𝑥 − 0

3
Problems • Chapter 6-Approximation and Fitting • Regression example • Non-negative least square /35
Non-negative least square

Non-negative least square regression:


minimizex∈R2 ∥Ax − y∥2 subject to xi ≥ 0, i = 1, 2.

𝑦 = 0.23𝑥 − 0.08
𝑦 = 0.21𝑥 − 0

Smooth problem becomes non-smooth due to xi ≥ 0, i = 1, 2.

3
Problems • Chapter 6-Approximation and Fitting • Regression example • Non-negative least square /35
Other constraints on x
For example,
2
minimizex ||Ax − y||
subject to x ∈ Ω.

4
Problems • Chapter 6-Approximation and Fitting • Regression example • Non-negative least square /35
6.2 Norm Approximation
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

The problem is convex.


Approximation solution.
Residual vector
r = Ax − b.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

As the problem is convex, optimal solution exists.


The optimal function value is 0 if and only if b ∈ range(A).
The case b ∈
/ range(A) is more interesting.
When A has full column/row rank.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Approximation interpretation
Approximating the vector b via the linear combination of the columns of A.
In this context, the problem is called regression problem, and columns of A are called the
regressors. The solution
x1 a 1 + · · · + xn a n
is called the regression of b.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Estimation interpretation (linear inverse problem) Consider linear measurement model


y = Ax⋆ + v.

x⋆ is a vector to be estimated, v is measurement error (additive white Gaussian noise), and y is the
measurement.
Estimation problem or linear inverse problem is estimating/approximating x⋆ given the
measurement y .
Let x̃ be an estimation of x⋆ , to qualify the “goodness” of the approximation, one criteria is Ax̃ is as
close to Ax⋆ (namely y ) as possible. Therefore, one solves
minimize ||Ax − y||.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Geometric interpretation Denote the range


R = range(A).
Then the norm approximation problem is equivalent to
minimize ||u − b||,
subject to u ∈ R.
The optimal solution x̃ of the problem is such that
u = Ax
is the projection of b onto R.

Remark Relation between norm and projection.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Weighted norm approximation problem


minimize ||W (Ax − b)||
where W ∈ R m×m
is weight matrix.
Equivalent to the original problem by redefining W A and W b as A and b, respectively.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Least-square approximation When taking the ℓ2 -norm, then


{ ∑ 2}
2
minimize ||Ax − b||2 ⇐⇒ minimize ||Ax − b||2 = ri .
i

This is simply a quadratic problem as


f (x) = ||Ax − b||2 = x⊤ A⊤ Ax − 2b⊤ Ax + ||b||2 .
2 2

A point x solves the problem if and only if


∇f (x) = A⊤ Ax − A⊤ b = 0,
which means x satisfies the normal equation
A⊤ Ax = A⊤ b.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Chebyshev or minimax approximation If choose ℓ∞ -norm


{ }
minimize ||Ax − b||∞ = max |r1 |, ..., |rm |
i

is called Chebyshev approximation problem, or minimax approximation problem.

The problem has an LP formulation


minimize t,
subject to − t1 ⪯ Ax − b ⪯ t1.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

Problem. Norm approximation problem [WyYβ⊤ ]

minimize ||Ax − b||,


where A ∈ R m×n
and b ∈ R , x ∈ R and || · || is a chosen norm on Rm .
m n

Sum of absolute residuals approximation If choose ℓ1 -norm



minimize ||Ax − b||1 = |ri |
i

is called the sum of (absolute) residuals approximation problem, or robust (why?) estimator.

The problem has an LP formulation


minimize 1⊤ t,
subject to − t ⪯ Ax − b ⪯ t.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Basic norm approximation problem

0.4

0.2

0
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

0.4

0.2

0
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

0.6
0.4
0.2
0
-0.2
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

From top to bottom: Gaussian, bounded and random sign.

5
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Penalty function approximation
ℓp -norm equivalence
( p p )1/p p p
minimize |r1 | + · · · + |rm | ⇐⇒ minimize |r1 | + · · · + |rm | .
Problem. Penalty function approximation problem [WyYβ⊤ ]
minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
where ϕ : R → R is called the (residual) penalty function.

We assume that ϕ is convex, so that the optimization, problem still is convex. In many cases, ϕ is
moreover symmetric, nonnegative and satisfies ϕ(0) = 0.

6
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Penalty function approximation

Problem. Penalty function approximation problem [WyYβ⊤ ]


minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
where ϕ : R → R is called the (residual) penalty function.

Interpretation
Given an x, we have a residual vector r = Ax − b.
For each ri , it is penalized b ϕ.
The total penalty is ϕ(r1 ) + · · · + ϕ(rm ).
We seek x such that the total penalty is minimized.

Remark Apparently, different choice of ϕ results in different solutions.

6
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Penalty function approximation

Problem. Penalty function approximation problem [WyYβ⊤ ]


minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
where ϕ : R → R is called the (residual) penalty function.

Let ϕ(u) = |u| , p ≥ 1, the penalty function approximation


p
2
ϕ(u)
is equivalent to ℓp -norm approximation. Special case
p = 1, 2.
1.5

deadline-linear
1

0.5

u
−1 1

6
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Penalty function approximation

Problem. Penalty function approximation problem [WyYβ⊤ ]


minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
where ϕ : R → R is called the (residual) penalty function.

Let ϕ(u) = |u| , p ≥ 1, the penalty function approximation


p
2
ϕ(u)
is equivalent to ℓp -norm approximation. Special case
quadratic p = 1, 2.
1.5
The deadzone-linear penalty function
deadline-linear {
1
0 |u| < a,
ϕ(u) =
|u| − a |u| ≥ a.

0.5

u
−1 1

6
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Penalty function approximation

Problem. Penalty function approximation problem [WyYβ⊤ ]


minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
where ϕ : R → R is called the (residual) penalty function.

Let ϕ(u) = |u| , p ≥ 1, the penalty function approximation


p
2
log barrier ϕ(u)
is equivalent to ℓp -norm approximation. Special case
quadratic p = 1, 2.
1.5
The deadzone-linear penalty function
deadline-linear {
1
0 |u| < a,
ϕ(u) =
|u| − a |u| ≥ a.

0.5 The log barrier penalty function (with limit a > 0)


{ ( )
−a2 log 1 − (u/a)2 |u| < a,
u ϕ(u) =
−1 1 +∞ |u| ≥ a.

6
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Penalty function approximation

Problem. Penalty function approximation problem [WyYβ⊤ ]


minimize ϕ(r1 ) + · · · + ϕ(rm ),
subject to r = Ax − b.
where ϕ : R → R is called the (residual) penalty function.

2
ϕ(u) Scaling the penalty function by a positive number does not
log barrier
affect the solution of the penalty function approximation
quadratic problem.
1.5
But the shape of ϕ matters!
deadline-linear
1

0.5

u
−1 1

6
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Example. Optimal input design

Problem. A toy example [WyYβ⊤ ]


Let A ∈ R100×30 and b ∈ R30 .

Four cases of ϕ 40
Absolute function ϕ(u) = |u|.
Quadratic function ϕ(u) = u2 .
0
Deadzone linear ­2 ­1 0 1 2
{ } 10

ϕ(u) = max |u| − 12 , 0 .


0
Log-barrier ­2 ­1 0 1 2
( 2) 20
ϕ(u) = − log 1 − u .

0
­2 ­1 0 1 2
10

0
­2 ­1 0 1 2

Histogram of r

7
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Quadratic function is sensitive to outliers,


Outliers has much large errors.
Quadratic penalty tends to minimizing the 4
overall penalty...
One can use weighted penalty method to 3
overcome the outliers, e.g.
∑ 2
wi ϕ(ri ),
i
1
with
1
wi = .
ϕ(ri ) 1 2 3

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Quadratic function is sensitive to outliers,


min ||Ax − b||2
2
Outliers has much large errors.
Quadratic penalty tends to minimizing the 4
overall penalty...
One can use weighted penalty method to 3
overcome the outliers, e.g.
∑ 2
wi ϕ(ri ),
i
1
with
1
wi = .
ϕ(ri ) 1 2 3

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Quadratic function is sensitive to outliers,


min ||Ax − b||2
2
Outliers has much large errors.
Quadratic penalty tends to minimizing the 4
overall penalty...
One can use weighted penalty method to 3
overcome the outliers, e.g.
∑ 2
wi ϕ(ri ),
i
1
with
1
wi = .
ϕ(ri ) 1 2 3

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Quadratic function is sensitive to outliers,


min ||Ax − b||2
2
Outliers has much large errors.
min ||Ax − b||2
2
Quadratic penalty tends to minimizing the 4
overall penalty...
One can use weighted penalty method to 3
overcome the outliers, e.g.
∑ 2
wi ϕ(ri ),
i
1
with
1
wi = .
ϕ(ri ) 1 2 3

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Quadratic function is sensitive to outliers,


min ||Ax − b||2
2
Outliers has much large errors.
min ||Ax − b||2
2
Quadratic penalty tends to minimizing the 4 min ||Ax − b||1
overall penalty...
One can use weighted penalty method to 3
overcome the outliers, e.g.
∑ 2
wi ϕ(ri ),
i
1
with
1
wi = .
ϕ(ri ) 1 2 3

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Alternative penalty function 2


ϕ(u)
Truncated quadratic
{
u2 |u| ≤ M, 1.5
ϕ(u) =
M |u| > M. truncated quadratic
1

0.5

u
−1 1

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Alternative penalty function 2


ϕ(u)
Truncated quadratic
{
u2 |u| ≤ M, 1.5
ϕ(u) =
M |u| > M.
Huber function 1
{
u2 |u| ≤ M,
ϕ(u) = 0.5
M (2|u| − M ) |u| > M.

u
−1 1

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Sensitivity to outliers or large errors

Alternative penalty function 2


ϕ(u)
Truncated quadratic
{
u2 |u| ≤ M, 1.5
ϕ(u) =
M |u| > M.
Huber function 1
{
u2 |u| ≤ M,
ϕ(u) = 0.5
M (2|u| − M ) |u| > M.

u
−1 1

Remark Huber function is differentiable!

8
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Small residuals and ℓ1 -norm approximation
Ax = b

9
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Approximation with constraints
It is possible to add constraints to the basic norm approximation problem. When these constraints
are convex, the resulting problem is convex.
To rule out certain unacceptable approximations of the vector b, or to ensure that the approximator
Ax satisfies certain properties.

10
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Approximation with constraints
It is possible to add constraints to the basic norm approximation problem. When these constraints
are convex, the resulting problem is convex.
To rule out certain unacceptable approximations of the vector b, or to ensure that the approximator
Ax satisfies certain properties.
Arise as prior knowledge of the vector x to be estimated, or from prior knowledge of the estimation
error v .

10
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Approximation with constraints
It is possible to add constraints to the basic norm approximation problem. When these constraints
are convex, the resulting problem is convex.
To rule out certain unacceptable approximations of the vector b, or to ensure that the approximator
Ax satisfies certain properties.
Arise as prior knowledge of the vector x to be estimated, or from prior knowledge of the estimation
error v .
Constraints arise in a geometric setting in determining the projection of a point b on a set more
complicated than a subspace, for example, a cone or polyhedron.

10
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
Approximation with constraints
It is possible to add constraints to the basic norm approximation problem. When these constraints
are convex, the resulting problem is convex.
To rule out certain unacceptable approximations of the vector b, or to ensure that the approximator
Ax satisfies certain properties.
Arise as prior knowledge of the vector x to be estimated, or from prior knowledge of the estimation
error v .
Constraints arise in a geometric setting in determining the projection of a point b on a set more
complicated than a subspace, for example, a cone or polyhedron.

Examples
Non-negativity Variable bounds, box constraints
x ⪰ 0. ℓ ⪯ x ⪯ u.

Probability distribution Norm ball constraint



x⪰0 and 1 x = 1. ||x − c|| ≤ d.

10
Problems • Chapter 6-Approximation and Fitting • Norm Approximation /35
6.3 Least-norm Approximation
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

The problem is meaningful if and only if b ∈ range(A).


Solution of the problem is called a least-/minimal-norm solution of Ax = b.
Interesting case m < n with rank(A) = m.

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

Reformulation as norm approximation problem


minimize ||x0 + Zu||
is a norm approximation problem for u.

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

Reformulation as norm approximation problem


minimize ||x0 + Zu||
is a norm approximation problem for u.

Geometric interpretation
The objective is the length of x, which is also the distance between 0 and x.
{ }
The feasible set x ∈ Rn | Ax = b is affine.
The least-norm problem seeks the point in the { affine set with minimum
} distance to 0, which is the
projection (definition?) of 0 on the affine set x ∈ Rn | Ax = b .

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

Least-squares solution of linear equations


2
minimize ||x||2 ,
subject to Ax = b.
2
How about simplify minimize ||Ax − b|| ?

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

Least-squares solution of linear equations


minimize ϕ(x1 ) + · · · + ϕ(xn ),
subject to Ax = b,
where ϕ : R → R is convex, non-negative, and satisfies ϕ(0) = 0.

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Least-norm approximation problem

Problem. Least-norm problem [WyYβ⊤ ]


Basic formulation
minimize ||x||,
subject to Ax = b.

Sparse solutions via least ℓ1 -norm


minimize ||x||1 ,
subject to Ax = b.

A ∈ Rm×n with rank(A) = m < n.


Finite set of points (basic solutions) satisfying have at most m non-zero elements.

11
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Illustration

Ax = b

12
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Illustration

Ax = b

12
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Illustration

Ax = b

12
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
Illustration

Ax = b

12
Problems • Chapter 6-Approximation and Fitting • Least-norm Approximation /35
6.4 Regularized Approximation
Previously
Two scenarios
For approximation problem, b contains errors and might b ∈
/ range(A), and one seeks to
minimize ||Ax − b||.
For least-norm problem, there holds b = Ax, and one seeks to minimize ||x||.
A good approximation Ax ≈ b with small x is less sensitive to errors in A than good approximation
with large x.

13
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Previously
Two scenarios
For approximation problem, b contains errors and might b ∈
/ range(A), and one seeks to
minimize ||Ax − b||.
For least-norm problem, there holds b = Ax, and one seeks to minimize ||x||.
A good approximation Ax ≈ b with small x is less sensitive to errors in A than good approximation
with large x.

For the least-norm problem, we can equivalently rewrite the problem as


minimize ||x||,
subject to x ∈ argminx ||Ax − b||,
which is a special case of the bi-level optimization problem.

13
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Previously

ϕ(x)

13
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Previously

γ∥Ax − b∥2
ϕ(x)

13
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Previously

ϕ(x) + γ∥Ax − b∥2

γ∥Ax − b∥2
ϕ(x)

13
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Bi-criterion formulation

Problem. Bi-criterion formulation [WyYβ⊤ ]


Jointly minimizing the two objectives, i.e.
( )
minimize ||Ax − b||, ||x|| ,
which is multi-objective optimization problem.

14
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Bi-criterion formulation

Problem. Bi-criterion formulation [WyYβ⊤ ]


Jointly minimizing the two objectives, i.e.
( )
minimize ||Ax − b||, ||x|| ,
which is multi-objective optimization problem.

Regularization a common scalarization method used to solve the bi-criterion problem. Through
weighted sum, we arrive at
minimize ||Ax − b|| + γ||x||,
where γ > 0 is the balancing parameter.

Special case
2 2
minimize ||Ax − b||2 + γ||x||2 .

14
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Bi-criterion formulation

Problem. Bi-criterion formulation [WyYβ⊤ ]


Jointly minimizing the two objectives, i.e.
( )
minimize ||Ax − b||, ||x|| ,
which is multi-objective optimization problem.

Interpretations
In an estimation setting, the extra term penalizing large ||x|| can be interpreted as our prior
knowledge that ||x|| is not too large.
In an optimal design setting, the extra term adds the cost of using large values of the design
variables to the cost of missing the target specifications.

Remark Over-fitting v.s. Under-fitting...

14
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Bi-criterion formulation

Problem. Bi-criterion formulation [WyYβ⊤ ]


Jointly minimizing the two objectives, i.e.
( )
minimize ||Ax − b||, ||x|| ,
which is multi-objective optimization problem.

Tikhonov regularization
2 2
minimize ||Ax − b||2 + γ||x||2 .
Or more generally
2 2
minimize ||Ax − b||2 + γ||Lx||2 ,
where L is a bounded linear mapping.

Quadratic formulation Close-form/analytic solution


( )−1 ⊤
x = A⊤ A + γL⊤ L A b
given that A⊤ A + γL⊤ L is non-singular.

14
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Bi-criterion formulation

Problem. Bi-criterion formulation [WyYβ⊤ ]


Jointly minimizing the two objectives, i.e.
( )
minimize ||Ax − b||, ||x|| ,
which is multi-objective optimization problem.

Smoothing regularization Take L is the general Tikhonov regularization as


 
1 −2 1 0 · · · 0 0 0 0
0 1 −2 1 · · · 0 0 0 0
 
 .. .. .. .. .. .. .. .. 
. . . . . . . . (n−2)×n
L= ∈R .
0 0 0 0 · · · −2 1 0 0 

0 0 0 0 · · · 1 −2 1 0
0 0 0 0 ··· 0 1 −2 1

Or even combined with Ridge regression


2 2 2
minimize ||Ax − b||2 + γ||Lx||2 + δ||x||2 ,

14
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Example. Optimal input design
Linear dynamical system (or convolution system) with impulse response (convolution kernel) h

t
y(t) = h(τ )u(t − τ ), t = 0, 1, ..., N.
τ =0

15
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Example. Optimal input design
Linear dynamical system (or convolution system) with impulse response (convolution kernel) h

t
y(t) = h(τ )u(t − τ ), t = 0, 1, ..., N.
τ =0
{ }
Optimal design problem seeks the input signal u(0), ..., u(N ) achieving the following goals:
1 ∑( )2
N
Output tracking Small errors in Jtrack = y(t) − ydes (t) .
N + 1 t=0

1 ∑( )2
N
Small input Low energy Jmag = u(t) .
N + 1 t=0

1 ∑( )2
N
Small input variations Smoothness Jder = u(t + 1) − u(t) .
N + 1 t=0

15
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Example. Optimal input design
Linear dynamical system (or convolution system) with impulse response (convolution kernel) h

t
y(t) = h(τ )u(t − τ ), t = 0, 1, ..., N.
τ =0
{ }
Optimal design problem seeks the input signal u(0), ..., u(N ) achieving the following goals:
1 ∑( )2
N
Output tracking Small errors in Jtrack = y(t) − ydes (t) .
N + 1 t=0

1 ∑( )2
N
Small input Low energy Jmag = u(t) .
N + 1 t=0

1 ∑( )2
N
Small input variations Smoothness Jder = u(t + 1) − u(t) .
N + 1 t=0

Regularized least-square formulation Minimize


Jtrack + δJmag + γJder
for given δ, γ > 0 and impulse response h.

15
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Example. Optimal input design
Linear dynamical system (or convolution system) with impulse response (convolution kernel) h

t
y(t) = h(τ )u(t − τ ), t = 0, 1, ..., N.
τ =0

Top row (δ, γ) = (0, 0.005); middle row (δ, γ) = (0, 0.05); bottom row (δ, γ) = (0.3, 0.05).
5 1
0.5
0
0
­5
­0.5
­10 ­1
0 50 150 200 0 50 150 200
4 1
2 0.5
0 0
­2 ­0.5
­4 ­1
0 50 150 200 0 50 150 200
4 1
2 0.5
0 0
­2 ­0.5
­4 ­1
0 50 150 200 0 50 150 200

u(t) y(t)
15
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Regularization by ℓ1 -norm

Problem. Regularization b ℓ1 -norm [WyYβ⊤ ]


Jointly minimizing the two objectives, i.e.
minimize ||Ax − b||2 + γ||x||1 .

16
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Regularization by ℓ1 -norm

Equality constraint

Least norm problem Least cardinality Least ℓq -norm


minimize ||x||1 , minimize ||x||0 , minimize ||x||q ,
subject to Ax = b. subject to Ax = b. subject to Ax = b.

Ax = b

16
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Regularization by ℓ1 -norm

Noisy measurement b = Ax + ε

Bi-criterion Scalrization Least ℓq -norm


( )
minimize ||Ax − b||p , ||x||1 minimize ||Ax − b||p + γ||x||1 minimize ||Ax − b||p ,
subject to ||x||q ≤ κ.

ℓ2 -norm ℓ1 -norm

16
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Example: blind image deconvolution

Forward model Blind image deconvolution


b = h ⊙ x + ε. 1
minimizex,h ∥b − h ⊙ x∥2 + λϕ(x),
2
subject to 0 ⪯ x ⪯ 1, 0 ⪯ h ⪯ 1, ∥h∥1 = 1.

NGC224 by Hubble Space Telescope from Wikipedia.

17
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Example: sparse non-negative matrix factorization

Decomposition model Sparse non-negative matrix factorization


b = x⊤ y + ε. minimizex,y
1
∥b − x⊤ y∥2 ,
2
subject to x, y ⪰ 0, ∥xi ∥0 ≤ κx , i = 1, ..., r.

= ×

Representing faces via dictionaries.

18
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing
Given xcor = x⋆ + ε, denoising aims to find an estimation x̂, which resembles x⋆ , by solving
( )
minimize || x̂ − xcor ||, ϕ(x̂) .
The function ϕ : R → R is convex, and is called the regularization function or smoothing objective.

-5

200 400 600 800 1000 1200 1400 1600 1800 2000

-5

200 400 600 800 1000 1200 1400 1600 1800 2000


x and xcor .

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing
Given xcor = x⋆ + ε, denoising aims to find an estimation x̂, which resembles x⋆ , by solving
( )
minimize || x̂ − xcor ||, ϕ(x̂) .
The function ϕ : R → R is convex, and is called the regularization function or smoothing objective.

Define
5
 
−1 1 0 ··· 0 0 0
 0 −1 1 ··· 0 0 0 
 
0

 
∇ =  ... ..
.
..
.
..
. ... ..
. ,
-5  
200 400 600 800 1000 1200 1400 1600 1800 2000
 0 0 0 ··· −1 1 0 
0 0 0 ··· 0 −1 1
5
we ∇ ∈ R (n−1)×n
, and is called discrete gra-
0 dient or finite difference operator.

-5

200 400 600 800 1000 1200 1400 1600 1800 2000


x and xcor .

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing

Quadratic smooth Trade-off curve Choice of γ



n−1
( )2 If γ = 0, x̂ = xcor and || x̂ − xcor || = 0.
ϕquad (x) = xi+1 − xi If γ = +∞, x̂ = c1, c ∈ R and ϕ(x) = 0.
i=1
2 Let γ ∈ [0, γ], we obtain the trade-off curve.
= ||∇x||2 .
45
Regularization problem
40
2 2
minimize || x̂ − xcor ||2 + γ||∇x̂||2 . 35

Closed-form solution. 30

25

20

15

10

0
0 10 20 30 40 50 60

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing

Quadratic smooth

n−1
( )2
2

ϕquad (x) = xi+1 − xi 0


i=1
-2
2
= ||∇x||2 .
200 400 600 800 1000 1200 1400 1600 1800 2000
Regularization problem
2
2 2
minimize || x̂ − xcor ||2 + γ||∇x̂||2 . 0

Closed-form solution. -2

200 400 600 800 1000 1200 1400 1600 1800 2000

-2

200 400 600 800 1000 1200 1400 1600 1800 2000

Top γ = 103 , middle γ = 10, bottom γ = 10−1 .

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing

Total variation denoising


1

n−1
ϕquad (x) = |xi+1 − xi | 0.5
0
i=1
-0.5
= ||∇x||1 .
-1
Regularization problem
200 400 600 800 1000 1200 1400 1600 1800 2000
2
minimize || x̂ − xcor ||2 + γ||∇x̂||1 .
No closed-form solution, except for 1

1D case. 0.5
0
-0.5
-1

200 400 600 800 1000 1200 1400 1600 1800 2000


x and xcor

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing

Total variation denoising Trade-off curve Still



n−1 If γ = 0, x̂ = xcor and || x̂ − xcor || = 0.
ϕquad (x) = |xi+1 − xi | If γ = +∞, x̂ = c1, c ∈ R and ϕ(x) = 0.
i=1
Let γ ∈ [0, γ], we obtain the trade-off curve.
= ||∇x||1 .
80
Regularization problem
70
2
minimize || x̂ − xcor ||2 + γ||∇x̂||1 .
60
No closed-form solution, except for
50
1D case.
40

30

20

10

0
0 10 20 30 40 50

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing

Total variation denoising


1

n−1
ϕquad (x) = |xi+1 − xi | 0

i=1
-1
= ||∇x||1 . 200 400 600 800 1000 1200 1400 1600 1800 2000

Regularization problem
1
2
minimize || x̂ − xcor ||2 + γ||∇x̂||1 . 0

No closed-form solution, except for -1


1D case. 200 400 600 800 1000 1200 1400 1600 1800 2000

-1

200 400 600 800 1000 1200 1400 1600 1800 2000

Top γ = 103 , middle γ = 10, bottom γ = 10−1 .

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
Signal de-noising/smoothing

Total variation denoising


1

n−1
ϕquad (x) = |xi+1 − xi | 0

i=1
-1
= ||∇x||1 . 200 400 600 800 1000 1200 1400 1600 1800 2000

Regularization problem
1
2
minimize || x̂ − xcor ||2 + γ||∇x̂||1 . 0

No closed-form solution, except for -1


1D case. 200 400 600 800 1000 1200 1400 1600 1800 2000

-1

200 400 600 800 1000 1200 1400 1600 1800 2000

2
Results from ||∇x||2 .

19
Problems • Chapter 6-Approximation and Fitting • Regularized Approximation /35
6.5 Extra Materials
From Gaussian denoising to heat equation

Original image Noised image

Image observation
f = ů + ε,
where ε is zero mean white Gaussian noise.

20
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
From Gaussian denoising to heat equation

Noised image σ=1 σ=3 σ = 10

Gaussian denoising
u = Gσ ⋆ f ,

exp(− ||x||
2
1
where Gσ is the Gaussian kernel Gσ (x) = 2πσ 2 σ 2 ).

20
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
From Gaussian denoising to heat equation

Noised image t = 1/2 t = 4.5 t = 50

Heat equation

u(x, t) = ∆u(x, t), t > 0,
∂t
u(x, 0) = f ,
with appropriate boundary condition. Solution: u(x, t) = G√2t ⋆ f .

20
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
From Gaussian denoising to heat equation
4

20
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Non-linear diffusion
Linear diffusion
PDE appear as a natural way to denoise.
Linear PDE (or convolution) does not preserve edges.
Heat equation ( )
∆u(x, t) = div ∇u(x, t)
( )
= −∇⊤ ∇u(x, t) .

Goal
Diffusion along the edges.
No diffusion cross the edges.

21
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Non-linear diffusion

Non-linear diffusion [Perona and Malik ’90]


∂ ( )
u(x, t) = div ϕ(||∇u(x, t)||)∇u(x, t) , t > 0,
∂t
u(x, 0) = f ,

Choices of scalar function ϕ


1
ϕ(x) = √ or exp(−x).
1+x

21
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Non-linear diffusion

Non-linear diffusion [Perona and Malik ’90]


∂ ( )
u(x, t) = div ϕ(||∇u(x, t)||)∇u(x, t) , t > 0,
∂t
u(x, 0) = f ,

Choices of scalar function ϕ


1
ϕ(x) = √ or exp(−x).
1+x

TV flow [Rudin, Osher & Fatemi ’92]


∂ ( 1 )
u(x, t) = div ∇u , t > 0,
∂t ||∇u||
u(x, 0) = f.

ϵ-regularized TV flow
∂ ( 1 )
u(x, t) = div √ ∇u .
∂t ||∇u|| + ϵ
2 2

21
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Non-linear diffusion

non-linear diffusion tv flow

21
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Non-linear diffusion

Non-linear PDE (anisotropic diffusion) can preserve the edges of image.


As it still is diffusion, the image becomes a constant as t → +∞.

21
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization

Gradient flow

u(x, t) = −∇E(u), t > 0,
∂t
u(x, 0) = f .

22
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization

Gradient flow

u(x, t) = −∇E(u), t > 0,
∂t
u(x, 0) = f .

2
Consider E(u) = 12 e(||∇u|| ), then
( )
∇E(u) = ∇⊤ e′ (||∇u|| )∇u .
2

Heat equation
e′ (||∇u||2 ) = 1.
Non-linear diffusion
1
e′ (||∇u||2 ) = √ .
1 + ||∇u||2
TV flow
1
e′ (||∇u||2 ) = .
||∇u||1

22
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization

Gradient flow

u(x, t) = −∇E(u), t > 0,
∂t
u(x, 0) = f .

Optimization problem
min E(u).
u

Gradient descent: u0 = f ,
uk+1 = uk − γ∇E(uk ).

Trivial solution: c ∈ [0, 1]


u(x) = c, x ∈ Ω.

Problem: the initial condition f is discarded...

22
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization

Gradient flow

u(x, t) = −∇E(u), t > 0,
∂t
u(x, 0) = f .

Tradeoff: fidelity and diffusion



1 2
min E(u) + ||u − f || dx.
u 2µ Ω

µ provides a balance between diffusion and fidelity.


The value of µ depends on noise level.
The quadratic fidelity term accounts for noise with bounded L2 -norm.

22
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization

Gradient flow

u(x, t) = −∇E(u), t > 0,
∂t
u(x, 0) = f .

Tradeoff: fidelity and diffusion



1 2
min E(u) + ||u − f || dx.
u 2µ Ω

µ provides a balance between diffusion and fidelity.


The value of µ depends on noise level.
The quadratic fidelity term accounts for noise with bounded L2 -norm.

New gradient flow


∂ 1
u(x, t) = −∇E(u) − (u − f ), t > 0,
∂t µ
u(x, 0) = f .

22
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization

non-linear diffusion tv flow

22
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
From diffusion to regularization
Previously
To preserve edges — from isotropic diffusion to anisotropic diffusion.
Namely, there is certain prior information of u that we want to keep.

Regularization
Promoting prior information to the solution.
Making ill-posed problem “solvable” or preventing over-fitting.

Application
Signal/image processing, compressed sensing, inverse problems
Data science, machine learning
Statstics
...

23
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: Tikhonov regularization
From now on: discrete setting...
Example. Tikhonov regularization [Tikhonov ’63] [WyYβ⊤ ]
2
||Γu||

Γ is some properly chosen linear operator.

Isotropic heat diffusion


2 1 2
min ||∇u|| + ||f − u|| .
u 2µ

24
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: total variation

Example. Total variation [Rudin, Osher & Fatemi ’92] [WyYβ⊤ ]

||∇u||1

Higher-order TV...

Original image Horizontal gradient Vertical gradient

25
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: wavelet frames

Example. Wavelet decomposition [Morlet, Meyer, Mallat, Daubechies and many others] [WyYβ⊤ ]
Family of functions {ψj,k : j, k ∈ Z}

ψj,k (·) = 2j/2 ψ(2j · −k).

Original image Wavelet coefficients

26
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: dictionary

Example. Dictionary [Mairal et al ’09] [WyYβ⊤ ]


Family of filters ψj : j = {1, ..., m}
1 2
min ||u − Dr|| + µ||r||0 + ιC (D).
D,r 2

Gabor filters Redundant dictionary

27
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: low rank

Example. Nuclear norm [Recht, Fazel and Parrilo ’10] [WyYβ⊤ ]


Let A ∈ Rm×n , and A = U SV ⊤ be its singular value decomposition
min{m,n}

||A||∗ = Si,i .
i=1

Rank 20 Rank 80 Rank 140

28
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: low rank

Example. Nuclear norm [Recht, Fazel and Parrilo ’10] [WyYβ⊤ ]


Let A ∈ Rm×n , and A = U SV ⊤ be its singular value decomposition
min{m,n}

||A||∗ = Si,i .
i=1

Rank 20 Rank 80 Rank 140


How to use regularization?

28
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Examples: others
Other examples
ℓ1 -norm, ℓ1,2 -norm, ℓ0 -pseudo norm, ℓp -norm, p ∈ [0, 1]
Fourier transform, discrete cosine transform
Curvelet, shearlet...
Matrix: nuclear norm, rank function
Constraints: non-negativity, simplex, box constraint
Physics laws
...

How to use them...

29
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Image denoising
Mathematical formulation
f = u + ε,
where
u is the true image which is piecewise constant — total variation.
ε is additive noise.

Original image Noised image Denoised image

30
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Image denoising
Mathematical formulation
f = u + ε,
where
u is the true image which is piecewise constant — total variation.
ε is additive noise.

Example. TV+Lp denoising [Chambolle & Pock ’11] [WyYβ⊤ ]


1 p
min ||f − u||p + µ||∇u||1 .
u 2

µ provides the balance between fidelity and regularization.


p depends on noise model:
Additive white Gaussian noise: p = 2;
Sparse noise (salt and pepper noise): p = 1.

30
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Medical imaging
Mathematical formulation
f = F u + ε,
where
u is the true image which is piecewise constant — total variation.
F is partial Fourier transform.
ε is additive noise.
Original Measurement Reconstruction

31
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Medical imaging
Mathematical formulation
f = F u + ε,
where
u is the true image which is piecewise constant — total variation.
F is partial Fourier transform.
ε is additive noise.

Example. Wavelet frame MRI reconstruction [WyYβ⊤ ]


1 p
min ||f − F u||p + µ||Wu||1 .
u 2

µ provides the balance between fidelity and regularization.

31
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Video decomposition
Mathematical formulation
f = l + s + ε,
where
l is the background which is low rank — nuclear norm.
h is the foreground which is sparse — ℓ1 -norm.
ε is additive white Gaussian noise.

= +

= +
f l s

f l s

32
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Video decomposition
Mathematical formulation
f = l + s + ε,
where
l is the background which is low rank — nuclear norm.
h is the foreground which is sparse — ℓ1 -norm.
ε is additive white Gaussian noise.

Example. Principal component pursuit [Candès et al ’11] [WyYβ⊤ ]


1 2
min ||f − l − s|| + µ||l||∗ + ν||s||1 .
l,s 2

32
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Video decomposition

32
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Non-smooth optimization: a typical example

Problem. Non-smooth optimization problem [WyYβ⊤ ]


Let r ∈ N++ {
def
∑r }
min Φ(x) = F (x1 , x2 , ..., xr ) + i=1
R i (K i x i ) ,
x1 ,x2 ,...,xr

where
F : smooth data fidelity term...
Ri : non-smooth regularization terms...
Ki : linear/nonlinear operators...

Signal/imaging processing, compressed sensing, inverse problems


Statistics, data science, machine learning
Control theory, operation research, game theory
···

Non-smooth, (non-convex), composite, high dimension

33
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Optimization methods


r
min F (x1 , x2 , ..., xr ) + Ri (Ki xi )
x1 ,x2 ,...,xr
i=1

Divide and Conquer.

F (x1 , ·, ..., ·)
K1 , K2 , · · · , Kr R1 , R 2 , · · · , R r
F (·, x2 , ..., ·)
K1∗ , K2∗ , · · · , Kr∗ R1∗ , R2∗ , · · · , Rr∗
F (·, ..., ·, xr )

34
Problems • Chapter 6-Approximation and Fitting • Extra Materials • /35
Fin

You might also like