Cam08 44
Cam08 44
Los Angeles
by
Mingqiang Zhu
2008
c Copyright by
Mingqiang Zhu
2008
The dissertation of Mingqiang Zhu is approved.
Luminita Vese
Lieven Vandenberghe
Achi Brandt
ii
To my wife . . .
who is the sunshine in my life
To my parents . . .
who always have their faith in me
iii
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
iv
3.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Multilevel Optimizations . . . . . . . . . . . . . . . . . . . . . . . . 87
v
6.2 Multilevel Optimization for 2D Dual TV Model . . . . . . . . . . 92
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
vi
List of Figures
3.1 Denoising test problems. Left: original clean image. Right: noisy
image with Gaussian noise (σ = 20). First row: test problem
1, 256 × 256 cameraman. Middle row: test problem 2, 512 ×
512 barbara. Bottom row: test problem 3, 512 × 512 boat . . . . 46
4.1 The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Bar-
bara”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01. . . . 66
4.4 Plots of duality gap vs. CPU time cost. Problem 1-3 are shown in
the order of top to bottom. . . . . . . . . . . . . . . . . . . . . . . 71
vii
5.2 The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Bar-
bara”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3 The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01. . . . 83
5.4 The denoised images with TV terms. left column: isotropic TV,
right column: anisotropic TV. . . . . . . . . . . . . . . . . . . . . 84
7.1 The original clean images and noisy images for our test problems.
Left: 128 × 128 “shape”; right: 256 × 256 “cameraman”. . . . . . 116
7.2 Plot of relative duality gap v.s. Iterations. Top : test problem 1,
Bottom: test problem 2. . . . . . . . . . . . . . . . . . . . . . . . 117
viii
List of Tables
3.1 Number of iterations and CPU times (in seconds) for problem 1. . 47
3.2 Number of iterations and CPU times (in seconds) for problem 2. . 47
3.3 Number of iterations and CPU times (in seconds) for problem 3. . 47
4.1 Number of iterations and CPU times (in seconds) for problem 1.
∗ = initial αk scaled by 0.5 at each iteration. . . . . . . . . . . . . 68
4.2 Number of iterations and CPU times (in seconds) for problem 2.
∗ = initial αk scaled by 0.5 at each iteration. . . . . . . . . . . . . 69
4.3 Number of iterations and CPU times (in seconds) for problem 3.
∗ = initial αk scaled by 0.5 at each iteration. . . . . . . . . . . . 70
ix
Acknowledgments
I would like to thank my advisor, Professor Tony Chan, for his wise guidance and
generous support throughout my graduate studies at UCLA. His enthusiasm and
insight provided invaluable help in reaching closure on this thesis topic. I feel
paticulary grateful about his effort to meet with me regularly to discuss about
my research project despite of his busy schedule in NSF.
I would like to thank Professor Andrea Bertozz and Professor Stanley Osher
for their help in my studies and research. I also appreciate the helpful discussions
and suggestions I received from postdoctoral researchers and graduate students
in our reserach group and the math department, which includes Jerome Darbon,
Xavier Bresson, Sheshadri Thiruvenkadam, Ernie Esser, Ethan Brown and many
others.
I am also thankful to all of the people who work in the mathematics depart-
x
ment here at UCLA for all of their help, and I would really like to extend my
appreciation to Maggie Albert, Robin Krestalude and Babette Dalton.
xi
Vita
Publications
xii
Abstract of the Dissertation
Mingqiang Zhu
Doctor of Philosophy in Mathematics
University of California, Los Angeles, 2008
Professor Tony F. Chan, Chair
Image restoration models based on total variation (TV) have been popular and
successful since their introduction by Rudin, Osher, and Fatemi (ROF) in 1992.
The nonsmooth TV seminorm allows them to preserve sharp discontinuities
(edges) in an image while removing noise and other unwanted fine scale detail.
On the other hand, the the TV term, which is the L1 norm of the gradient vector,
poses computational challenge in solving those models efficiently. Furthermore,
the global coupling of the gradient operator makes the problem extra harder than
other L1 minimization problems where the variables under the L1 norm are sepa-
rable. In this paper we propose several new algorithms to tackle these difficulties
from different perspectives. Numerical experiments show that they are competi-
tive with the existing popular methods and some of them are significantly faster
despite of their simplicity. The first algorithm we introduce is a primal-dual hy-
brid gradient descent method that alternates between the primal and the dual
updates. It utilizes the information from both the primal and dual variables and
therefore is able to converges faster than the pure primal or pure dual method in
the same category. We then proposed gradient projection (GP) methods to solve
xiii
the dual problem of the ROF model based on the special structure of the dual
constraints. We also test variants of GP algorithms with different step selection
strategies, including techniques based on the Barzilai-Borwein method. In this
same line, a block co-ordinate descent method is proposed for solving the dual
ROF problem. The subproblem at each single block can be solved exactly using
different techniques. We also propose a basic multilevel optimization framework
for the dual formulation, aiming to speedup our solution process for large scale
problems. Finally, we study the connections between some existing methods and
give an improvement to CGM method based on the primal-dual interior-point
algorithms.
xiv
CHAPTER 1
Introduction
1
• |Ω| is the area of the domain Ω.
q
|∇u| = (ux ; uy ) = u2x + u2y
• BV(Ω) denotes the collection of all functions (images) in L1 (Ω) with finite
total variations (bounded variations).
It can be shown that W 1,1 (Ω) ⊆ BV(Ω) ⊆ L1 (Ω) and the definition of TV
in (1.2) is equivalent to the objective function of (1.1) when u ∈ W 1,1 . More
theoretical properties of total variation and BV space can be seen in references
[5],[28], [42] and [52]. We also point out that, once we try to solve the problem
in the discrete case, all the above technical restriction on u becomes irrelevant.
It is clear that any discrete image u in R2 has finite (discrete) total variations.
2
The results about existence and uniqueness for the solutions of problem (1.1)
can be seen in [17]. The same paper also shows that the constrained ROF model
(1.1) is equivalent to the following unconstrained model:
λ
min TV[u] + ku − f k22 , (1.4)
u∈BV 2
In fact, both ROF and their subsequent researchers mostly focus on solving
the unconstrained model (1.4) rather than the original constrained model (1.1)
since unconstrained optimization is generally comparatively easier to solve.
The original ROF model was introduced to solve image denosing problems.
But the methodology can be naturally extended to restore blurred and noisy
image by including the known blurring kernal:
λ
min TV[u] + kKu − f k22 . (1.5)
u∈BV 2
Here K is a given linear blurring operator and every other term is defined the
same as in (1.4). In this model, f is formulated as the sum of a Gaussian noise
v and a blurry image K ū resulting from the linear blurring operator K acting on
the clean image ū, i.e., f = K ū + v.
Among all linear blurring operators, many are shift-invariant and can be ex-
pressed in the form of convolution:
Z
(Ku)(x) = (h ∗ u)(x) = h(x − y) u(y) dy, (1.6)
Ω
Over the years, the ROF model has been extended to many other image
processing tasks, including inpaintings, blind-deconvolutions, cartoon-texture de-
compositions and vector valued images. (See [24] and the reference therein for
3
recent developments in TV based image processing.) It has also been modified in
a variety of ways to improve its performance by using iterative refinement with
Bregman distance [61] and the more general inverse scale methods [12] . In this
paper, we shall focus on solving the original ROF restoration problem, but we
point out that some of our ideas can be naturally extended to other relevant
models.
1.2 Duality
The discussion so far, based on the minimization models (1.1) and (1.4), can be
viewed as the primal approach to solving the TV denoising problem. In this
section, we shall discus about the dual formulations of the ROF models (1.4)
and (1.1) and some related analysis. While the dual formulation is not as well
developed as the primal one, it does have some inherent advantages and has been
receiving increasing interest recently (see e.g., [13, 25, 14]). It offers an alternative
formulation which can lead to effective computational algorithms.
First, we shall adopt the definition (1.2) for the TV semi-norm, which we
rewrite here
Z
TV[u] ≡ max −u∇ · w,
w∈W Ω
n q o
1 2 2 2
where W ≡ w w = (w1 , w2 ) ∈ Cc (Ω, R ), |w| = w1 + w2 ≤ 1 (1.7)
λ
Z
min max −u∇ · w + ku − f k22 , (1.8)
u w∈Cc1 (Ω), |w|≤1 Ω 2
where u and w are the primal and dual variables, respectively. The min-max
theorem (see e.g., Proposition 2.4 in [40])allows us to interchange the min and
4
max, to obtain
λ
Z
max min −u∇ · w + ku − f k22 . (1.9)
w∈Cc1 (Ω), |w|≤1 u Ω 2
1
u= f + ∇·w (1.10)
λ
or, equivalently,
1
min k∇ · w + λf k22 . (1.12)
1
w∈Cc (Ω), |w|≤1 2
For a primal-dual feasible pair (u, w), the duality gap G(u, w) is defined to be
the difference between the primal and the dual objectives:
The duality gap bounds the distance to optimality of the primal and dual
objectives. Specifically, if u and w are feasible for the primal (1.4) and dual
(1.11) problems, respectively, we have
5
Optimization theory tells us G(u, w) ≥ 0, and moreover, (u, w) are optimal if
and only if G(u, w) = 0.
We can derive the dual formulation of the constrained ROF model (1.1) in
exactly the same way. First of all, the constrained ROF model (1.1) can be
rewrite into the following min-max problem using the general definition of TV in
(1.7):
Z
min max −u∇ · w, (1.16)
u∈U w∈W Ω
where U ≡ {u ∈ BV(Ω) : ku − f k22 ≤ |Ω|σ 2 } and W is defined as in (1.7).
Again, the min-max theorem allows us to interchange the min and max in the
above problem and we obtain the equivalent
Z
max min −u∇ · w. (1.17)
w∈W u∈U Ω
6
Similarly, the duality gap G(u, w) for any feasible primal-dual pair (u, w) is de-
fined as
Comparing (1.18) with (1.10) or comparing (1.21) with (1.15) we find the
suitable fidelity parameter λ that leads to equivalent models (1.4) and (1.1) are
given as
k∇ · w ∗ k2
λ= 1 , (1.22)
σ |Ω| 2
where w ∗ is the dual optimizer of problem (1.19).
From the computational point of view, the primal and dual formulations pose
different challenges for computing their optimality solutions (see Table 1.1). The
total variation term in the primal formulation is non-smooth at where |∇u| =
0, which makes the derivative-based methods impossible without an artificial
smoothing parameter. The dual formulation imposes constraints which usually
require extra effort compared to unconstrained optimizations. Being quadratic,
the dual energy is less nonlinear than the primal energy, but the rank-deficient
operator ∇· makes the dual minimizers possibly non-unique. Finally, they share
7
the same problem of spatial stiffness due to the global couplings in their energy
functions, which presents a challenge to any algorithm in order to control the
computational complexity that scales reasonably bounded with the number of
pixels.
For the sake of simplicity, we assume that the domain Ω is squares, and define
a regular n × n grid of pixels, indexed as (i, j), for i = 1, 2, . . . , n, j = 1, 2, . . . , n.
We represent images images as two-dimensional matrices of dimension n × n,
8
where ui,j represents the value of the function u at pixel (i, j). (Adaptation to
less regular domains is not difficult in principle.) To define the discrete total
variation, we introduce a discrete gradient operator, whose two components at
each pixel (i, j) are defined as follows:
ui+1,j − ui,j if i < n
(∇u)1i,j = (1.23a)
0 if i = n
ui,j+1 − ui,j if j < n
(∇u)2i,j = (1.23b)
0 if j = n.
where | · | is the Euclidean (ℓ2 ) norm in R2 . Note that this norm is not a smooth
function of its argument. It has the classic “ice-cream cone” shape, nondifferen-
tiable when its argument vector is zero.
(and similarly for objects in Rn×n×2), we have from definition of the discrete
divergence operator that for any u ∈ Rn×n and w ∈ Rn×n×2 , that h∇u, wi =
hu, −∇ · wi. It is easy to check that the divergence operator can be defined
explicitly as follows:
1 1
w − wi−1,j if 1 < i < n w 2 − wi,j−1
2
if 1 < j < n
i,j i,j
(∇ · w)i,j = 1
wi,j if i = 1 + 2
wi,j if j = 1
−w 1
if i = n −w 2
if j = n.
i−1,j i,j−1
(1.24)
9
1 2
It is worth noticing that the values {wn,j : j = 1 · · · , n} and {wi,n : i = 1, · · · , n}
are not relevant in calculating the divergence ∇ · w in (1.24). Therefore, the
discrete dual problem only have essentially 2n(n − 1) instead of 2n2 unknowns.
Non-the-less, the boundary points are usually included in w (with values being
0) to make the discrete dual problem consistent in form with the primal problem.
y(j−1)n+i = ui,j , 1 ≤ i, j ≤ n.
Using this notation, the discrete version of the unconstrained primal ROF model
(1.4) can be written as follows:
N
X λ
min kATl yk + ky − zk2 , (1.26)
y∈RN
l=1
2
where (and from now on) we use k · k do denote the Euclidean norm (ℓ2 norm)
in Euclidean space Rm with any finite dimension.
10
vectors xl ∈ R2 , l = 1, 2, . . . , N, as follows:
1
wi,j
x(j−1)n+i = , 1 ≤ i, j ≤ n.
2
wi,j
The complete vector x ∈ R2N of unknowns for the discretized dual problem is
then obtained by concatenating these subvectors:
x = (x1 ; x2 ; . . . ; xN ).
Similarly, the duality gap in (1.13) has the following discrete form
N
X λ
1
2
G(y, x) = kATl yk − xTl ATl y +
y − z + Ax
, (1.29)
l=1
2 λ
By the same token, the constrained ROF model (1.1) has the following dis-
cretization for its primal formulation, dual formulation, primal-dual gap and
primal-dual optimality conditions:
11
Primal problem:
N
X
min kATl yk
y
l=1
subject to y ∈ Y ≡ {y ∈ RN : ky − zk ≤ nσ} (1.31)
Dual problem:
max −nσkAxk + z T Ax
x
Duality Gap:
N
X
G(y, x) = kATl yk − xTl ATl y + nσkAxk + (y − z)T Ax , (1.33)
l=1
In this paper, we try to develop some efficient algorithms to solve the total vari-
ation based image restoration models. The paper is organized as follows:
Chapter 1 gives a short introduction to the total variation based image restora-
tion models and BV space. It covers the application of duality theory to
the ROF model and derives the dual problem. We then analyze the dif-
ferent computational challenges facing the primal and dual formulation of
ROF model and end the chapter by introduction necessary notations and
discretization for the study numerical algorithms.
12
Chapter 2 offers a brief survey of some existing popular methods. At the end
of chapter 2, we also give remarks on exiting methods and motivations on
developing new algorithms.
Chapter 5 proposes a block coordinate descent method to solve the dual for-
mulation of the ROF model. Global convergence of the algorithm are
proved and results of numerical experiments are shown. Dual formula-
tion of anisotropic ROF model are also studied for its simpler constraints
structure.
Chapter 6 tries to speedup the algorithm in solving the dual problem of the
ROF model by applying some basic multilevel optimization techniques.
Chapter 7 studies the connections between the CGM method and SOCP method
for solving the ROF model. An improvement of CGM algorithm based on
primal-dual interior-point method is aslo proposed.
13
in this paper and some existing popular algorithms. It ends with some final
conclusions and remarks.
14
CHAPTER 2
λ
Z
min |∇u|β + ku − f k22 . (2.1)
u Ω 2
The above problem has a convex differentiable objective function and hence its
optimality solution is given by the following Euler-Lagrange equation (first-order
condition)
∇u
∇· − λ(u − f ) = 0. (2.2)
|∇u|β
In their original paper [60], Rudin et al. proposed the use of artificial time
marching to solve the Euler-Lagrange equation (2.2) which is equivalent to the
steepest descent of the energy function. More precisely, consider the image as a
function of space and time and seek the steady state of the equation
∂u ∇u
=∇· − λ(u − f ) (2.3)
∂t |∇u|β
15
the ROF model is guaranteed to be decreasing provided that the time step is small
enough and the solution will tend to the unique minimizer as time increases. The
method is (asymptotically) slow due to the CFL stability constraints Courant-
Friedrichs-Lewy (CFL) condition ∆t ≤ c|∇u|β (∆x)2 for some constant c > 0 (see
[59]), which puts a very tight bound on the time step when the solution develop
flat regions (where |∇u| ≈ 0). Hence, this scheme is useful in practice only when
low-accuracy solutions suffice. Sometimes even the cost of computing a visually
satisfactory image is too great.
To relax the CFL condition, Marquina and Osher use, in [59], a ”precondition-
ing” technique to cancel singularities due to the degenerate diffusion coefficient
1
|∇u|
:
∂u ∇u
= |∇u| ∇ · − λ(u − f ) (2.4)
∂t |∇u|β
which can also be viewed as mean curvature motion with a forcing term −λ(u−f ).
To bypass the stability constraint in time marching schemes, Vogel and Oman
proposed in [66] a “lagged diffusivity” fixed point iteration scheme which solves
the stationary Euler-Lagrange equation directly. The main idea is to linearize the
1
Euler-Lagrange equation by lagging the diffusion coefficients |∇u|β
at a previous
iteration. The k-th iterate is obtained by solving the sparse linear system (see
[66], [67])
∇uk
∇· k−1
− λ(uk − f ) = 0 (2.5)
|∇u |β
The above linear system is block tri-diagonal and can be solved by various fast
direct/iterative solvers. While global convergence has been proven by different
authors(see e.g. [33], [17], [27], [39]), the outer iteration is only linearly conver-
gent and the convergence slows down as β decreases. Moreover, the inner linear
16
system is difficult to solve efficiently because of the highly varying and possibly
1
degenerate coefficient |∇u|β
, as well as spatial stiffness of the ellipticity of the
differential operator.
It is now clear to us that the above system is a β-regularized version of the true
primal-dual system (1.15), which we rewrite here:
|∇u|w − ∇u = 0
(2.7)
∇ · w − λ(u − f ) = 0
System 2.1.3 and (1.15) are the same except the the norm |∇u| in is smoothed
17
with β. When β ↓ 0, the solution of 2.1.3 will converge to the solution of (1.15),
i.e., the true optimality solution of the ROF model.
As discussed in [25] and [11], this primal-dual system is better suited for
Newton’s method than the primal Euler-Lagrange equation (2.2) since the (u, w)
system is intuitively more “linear” than the primal system in u only. Furthermore,
compared to fixed-point methods and time-marching methods that are at best
linearly convergent, the CGM method was shown in [25] to be empirically globally
convergent with a quadratic rate. Moreover, the convergent rate deteriorates
only moderately with decreasing β and in practice small values of β can be used
without too much loss in efficiency.
There is still a need to solve an elliptic linear system at every Newton iteration,
but through an elimination of the update on w, the Newton iteration can be
implemented by solving only one linear system involving the update for u
where J is positive definite and r is the steepest ascent direction of the primal
objective function(i.e. r is the LHS of equation (2.2). This ensures that the New-
ton update is a decent direction and makes the cost per iteration comparable to
that for the fixed point method. For applications where a high accuracy solution
is needed, eg. in an automated image processing setting without human inter-
vention, fast convergent methods such as CGM will have a significant advantage
over slower linearly convergent methods.
The main advantage of the dual formulation (1.11) or (1.12) is that the objective
function is nicely quadratic and hence there is no issue with non-differentiability,
18
or the need for numerical regularization, as in the primal formulation. However,
this comes with the price of dealing with the constraints on the dual w. In fact,
there are as many constraints as there are number of pixels and one does not
know in advance where the constraints are active. There are several “standard”
numerical approaches to solving nonlinear constrained optimization (1.11), e.g.
penalty, barrier, augmented Lagrangian methods and interior-point methods(see
e.g. [53]). Carter did some early work in this direction in her thesis [13].
αi,j ≥ 0 (2.8c)
Now Chambolle made the following original observation about Lagrange mul-
tipliers α. The complementarity conditions (2.8b) are satisfied as either of the
following two cases:
αi,j = 0 or |wi,j | = 1 (2.9)
19
In either case, we have
α = |∇(∇ · w − λf )| , (2.12)
or,
w n + τ H(w n )
w n+1 = , (2.15)
1 + τ |H(w n )|
where H(w) = ∇(∇ · w − λf ) and τ is a time step chosen suitably small for
convergence.
20
2.1.5 Second-Order Cone Programming
Recently, Goldfarb and Yin proposed a new approach in [43] for various total
variation models based on second-order cone programming(SOCP). They model
problem (1.1) as a second-order cone program, which is then solved by modern
interior-point methods.
s.t. ku − f k2 ≤ σ 2 (2.16)
They let v to be the noise variable: v = f − u and let (pi,j , qi,j ) to be the
discrete ∇u by forward differentiation:
ui+1,j − ui,j if i < n
pi,j =
0 if i = n
ui,j+1 − ui,j if j < n
qi,j =
0 if j = n
They then introduce a new variable t as an upper bound of the gradient norm
|∇u|:
q
|(∇u)i,j | = p2i,j + qi,j
2
≤ ti,j
21
As a consequence, problem (7.6) is transformed to
n
X
min ti,j
i,j=1
s.t. ui,j + vi,j = fi,j for i, j = 1, · · · , n
pn,j = 0 for j = 1, · · · , n,
qi,n = 0 for i = 1, · · · , n,
v0 = σ
(7.8) is in standard form of SOCP and there are massive literature about how
to solve this class of problems (see e.g. [3]). The dual formulation can also be
transformed into SOCP (see [43]). SOCPs can be solved efficiently in practice
and, in theory, in polynomial time by interior-point methods. For example, path-
following interior-point methods generate a sequence of interior points that follow
22
the so-called central path toward the optimality. At each iteration, a set of
primal-dual equations, which depend on current variable values and a duality gap
parameter, are solved by one step of Newton’s iteration to yield an improving
search direction. Like CGM, SOCP need to solve a linear system at each iteration
but can produce highly accurate solutions with low residuals or duality gaps.
In some sense, the SOCP framework bears many similarities with the earlier
primal-dual Newton method discussed in section 2.1.3 First of all, they all
apply Newton’s method to compute the update search direction at each iteration.
Secondly, they all use a dual variable and solve the primal-dual system. In the
SOCP formulation, we do not need a regularization parameter β. However, when
we apply interior-point method to solve SOCP, the primal-dual system that we
solve does depend on an artificial parameter that measures the duality gap. This
duality parameter is similar to β in CGM, and the only difference is that β is fixed
in CGM while the duality parameter dynamically decreases during the solution
process of SOCP. Another interesting point of view here is that SOCP minimize
t, an upper bound of |∇u| instead of minimizing |∇u| itself, to get rid of the
non-smoothness of the objective function, and this bears similarity to the early
p
idea of minimizing |∇u|2 + β, which is also an upper bound of |∇u|.
23
system (2.5) in the fixed point method. Here we rewrite down the system (2.5)
∇uk
∇· − λ(uk − f ) = 0 (2.19)
|∇uk−1|β
However, there are two main drawbacks of this approach. First, the linearized
system (2.19) has very “nasty” coefficients (especially when β is small), making
it difficult to derive efficient multigrid methods for its solution. Secondly, the
outer fixed point iterations is still only linear convergent.
Recently [63] and [41] considered using the nonlinear multigrid method to
solve the non-linear Euler-Lagrange equation (2.2). However, here one has to
take a relatively large β to achieve convergence.
24
noisy f uCD uTrue
6
1
1 1.5 2 2.5 3 3.5 4
Figure 2.1: Illustration of how coordinate descent might get stuck at a wrong
solution.
All the existing multigrid methods aim to solve the primal formulations,
either the direct minimization problem(1.4) or the associated Euler-Lagrange
equation(2.2). One possible alternative is to apply multilevel optimization tech-
nique to the dual problem (1.12). The advantage of the dual multilevel method
is that the dual relaxation will not get stuck in a “local minimizer” as the for-
mulation is smooth. However, the real challenge here is how to deal with the
25
constraints. It is not clear that if we can reduce the number of the constraints in
a coarser level as we do to the unknowns. So for this technique to work, one has
to think creatively about the constraints, not only for our problem, but also for
general constrained optimizations.
Recently graph cut algorithms have been introduced as a fast and exact solver for
the discrete anisotropic TVL2 /TVL1 models (see [37, 38, 16, 72, 50] and many
others).
R
If the anisotropic TV ( Ω |ux | + |uy |) is used to replace the isotropic TV
R
( Ω
|∇u|) and the argument u is restricted to have discrete values, we have the
following discrete integer programming problem for the anisotropic ROF model:
λ
Z
min E(u) = (|ux | + |uy |) dxdy + ku − f k2 , (2.20)
u∈{0,1,··· ,L−1} Ω 2
where L = 28 for 8-bit images and L = 216 for 16-bit images.
The practical use of graph cut algorithm to solve (2.20) relies on the decom-
position of an image into its level sets and hence mapping the original problem
(2.20) into optimizations of independent binary problems. Using the discrete co-
area formula (see e.g., [37]), the objective function E in (2.20) can be decomposed
into a sum of independent functions E t which only depends on the sublevel set
of u:
L−2
X
E(u) = E t (ut ) + C.
t=0
constant independent of u.
To minimize E(·) one can minimize all binary problem E t (·) independently.
Thus we get a family {ūt } which are respectively minimizers of E t (·). Clearly
26
the summation will be minimized and thus we have a minimizer of provided this
family {ūt } is monotone:
ūt ≤ ūs ∀ t < s. (2.21)
If property (2.21) holds then the optimal solution ū is given by the recon-
struction formula from sublevel sets: ū(x) = min{t : ūt (x) = 1} . If property
(2.21) does not hold, then the family {ūt } is not a function.
Property (2.21) is proved to be true by various authors ([37, 16, 48]). Hence
the above optimization strategy is valid. Now note that each E t (ut ) is a binary
MRF with an Ising prior model and be solved efficiently using graph cut algo-
rithms. In [15], the author also did some comparison experiments regarding the
efficiency of the discrete graph cuts algorithm and some of methods we discussed
in this section.
2.2 Remarks
1. Recently, many variants and extensions to the original ROF model (1.4)
have been developed, which include TV and Elastica inpainting ([29], [26]),
blind deconvolution ([30]), multi-channel TV ([10]), L1 fidelity ([57], [22]),
multiscale texture decomposition ([52], [65]) and iterative regularization us-
ing Bregman distance function. Although the algorithms we listed in this
paper are all designed to solve the original ROF model (1.4), it can be
naturally adapted to other recently developed models we mentioned above
where the main computational challenge is still due to the total variation
term.
2. While each of the above methods might have its own comparative advan-
27
tage in certain situations, some benchmark rules still apply. Independent
of the mutigrid technique, there are two general categories: those that need
solve a linear system at each iteration and those that need not. To obtain a
visually satisfactory solution, the methods in the latter class are preferred
for their simplicity and fast initial convergence rate. Out of this category,
Chambolle’s method becomes very popular because of its simplicity, robust-
ness, no need to use numerical regularization and relative fast convergence.
On the other hand, if our goal is to obtain a state-of-art highly accurate
solution with low residual/duality gap, the locally superlinear Newton type
methods will win eventually even though they have to solve a linear system
at every iteration. In this category, The CGM method and SOCP are most
widely used for their robustness and fast asymptotic convergence. Finally,
if the problem is of very large scale, we need to apply multilevel schemes to
reduce complexity.
28
scale problems. To illustrate the high memory requirements of implicit
schemes, we note that an image of size 512 × 512 is close to the limit of
what the SOCP solver MOSEK can handle on a workstation with 2GB of
memory.
In this chapter, we shall develop some simple yet efficient algorithms. They
are explicit so the memory requirement is low and each iteration only takes
O(N) operations. They converge very fast to visually satisfactory solutions
and also have much improved asymptotical convergence rate compared with
existing explicit methods. In all of our proposed algorithms, we try to ex-
ploit the use of the dual variable since a pure primal formulation usually
requires some numerical smoothing parameter that would prevent the re-
sulting algorithm from converging to the true optimizer. In other words,
our algorithm is either a primal-dual type algorithm or simply a duality
based algorithm.
29
CHAPTER 3
In this chapter, we propose a simple yet very efficient algorithm for total varia-
tion (TV) minimizations with applications in the image processing realm. This
descent-type algorithm alternates between the primal and dual formulations and
exploit the information from both the primal and dual variables. Therefore, it
is able to converge significantly faster than either pure primal/dual gradient de-
scent methods and some other popular existing methods as demonstrated in the
numerical experiments. Finally, we show that this idea works for other optimiza-
tion problems whose objective also involves the non-smooth L1 norm as in the
TV term.
Previous developed gradient descent type methods are the primal time marching
method in [60] and Chambolle’s duality based semi-implicit gradient descent type
method in [14]. In this chapter, we refer to these two methods as the primal
gradient descent algorithm and dual gradient descent algorithm. They are showed
briefly as follows.
Primal ROF:
N
X λ
min kATl yk + ky − zk2 , (3.1)
y∈RN
l=1
2
30
Dual ROF:
where
Al ylk
kAl ylk k
, if Al ylk 6= 0
xkl = (3.5)
any element in the unit ball B(0, 1) ⊂ R2 , else
We notice that the primal gradient algorithm (3.3) or (3.4) works exclusively
on the primal formulation while the dual gradient algorithm focus exclusively on
the dual formulation. There is no cross communications between the primal vari-
able and dual variable. Our approach, on the other hand, work on the primal-dual
formulation and try to exploit the information of the primal and dual variables
simultaneously. We hope the use of both primal and dual information will speed
up the gradient descent algorithm.
Our approach can be most effectively illustrated under the setting of the
primal-dual formulation (1.8), which has the following discrete form:
λ
min max Φ(y, x) := y T Ax + ky − zk2 (3.7)
y∈RN x∈X 2
31
where
xk+1 = PX xk + τk λAT y k ,
(3.9)
where τk is the (dual) stepsize and PX denotes the projection onto the
set X which can be simply computed in our case (see later remarks). The
factor λ is used here so that the stepsize τk is sensitive to different problems
or different scales of gray levels, which also explains the same situation in
(3.11).
Algorithm PDHGD
32
Step 0. Initialization. Pick y 0 and a feasible x0 ∈ X, set k ← 0.
Step 2. Updating.
xk+1 = PX xk + τk λAT y k
(3.12a)
1
y k+1 = (1 − θk )y k + θk (z − Axk+1 ) (3.12b)
λ
3.2 Remarks
1
xk+1 = arg max Φ(y k , x) − kx − xk k2 (3.13a)
x∈X 2λτk
λ(1 − θk )
y k+1 = arg min Φ(y, xk+1) + ky − y k k2 (3.13b)
y∈R N 2θk
33
3. Both problem (3.8) and (3.10) can be solved exactly, which would yield the
following updating formula (taking τk = ∞ and θk = 1 in (3.23)):
ATl y k
xk+1
l = , for l = 1, · · · , N (3.15a)
kATl y k k
1
y k+1 = z − Axk+1 . (3.15b)
λ
However, we choose not to do so since the above algorithm does not con-
verge.
As a special case, if we only solve subproblem (3.8) exactly (taking τk = ∞
in (3.23) ), the resulting algorithm would be
1 X A AT y k
l l
y k+1 = y k − θk + y k
− z , (3.16)
λ kATl y k k
The above algorithm is exactly a subgradient descent method for the primal
formulation (4.1).
Hence, the primal subgradient descent method and the dual projected gradi-
ent descent method are two special cases of our algorithm, which correspond
to taking special stepsizes τk = ∞ and θk = 1 respectively in (3.23).
34
τ θ rather than to τ or θ individually. For example, our experiments show
that convergence can be obtained for the stepsize pair (2, 0.2) as well as for
(4, 0.1). Finally, the numerical results also reveal that a pair of relatively
small τ and large θ gives faster initial convergence rate and the opposite
choice gives faster asymptotic convergence. Therefor, we can optimize the
performance of the algorithm through some strategy of choosing (τk , θk ),
although simple fixed stepsizes might already give satisfactory results.
3.3 Extensions
where Y ≡ {y ∈ RN : ky − zk ≤ nσ}.
The above problem (3.18) has the following primal-dual and dual formulations
Both the primal problem (3.18) and dual problem (3.20) are difficult to solve
since their objectives are non-smooth. However, our proposed approach based on
35
the primal-dual formulation (3.19) are simply
τk T k
xk+1 = PX xk + A y (3.21a)
σ
y k+1 = PY y k + σθk Axk+1 ,
(3.21b)
y−z
PY (y) = z + (3.22)
max ky − zk/(nσ), 1
The full primal-dual hybrid gradient descent method for constrained ROF
model is shown as follows:
Algorithm PDHGDC
Step 2. Updating.
τk T k
xk+1 = PX xk + A y (3.23a)
σ
y k+1 = PY y k + σθk Axk+1 ,
(3.23b)
TV Debluring Model
The total variation based image restoration model (1.4) can be extended to
36
recover blurry and noisy image f by solving the following problem:
λ
Z
(P) min |∇u| + kKu − f k22 (3.24)
u Ω 2
where K is a given linear blurring operator and every other term is defined the
same as in (1.4). In this model, f is formulated as the sum of a Gaussian noise
v and a blurry image Ku resulting from the linear blurring operator K acting on
the clean image ū, i.e., f = Ku + v.
Among all linear blurring operators, many are shift-invariant and can be ex-
pressed in the form of convolution:
Z
(Ku)(x) = (h ∗ u)(x) = h(x − y) u(y) dy, (3.25)
Ω
λ
(PD) min max y T Ax + kBy − zk2 (3.27)
y∈R N x∈X 2
1 λ
(D) max − kB −1 Ax − λzk + kzk2 . (3.28)
x∈X 2λ 2
37
given as follows
xk+1 = PX xk + τk λAT y k
(3.29a)
1
y k+1 = y k − θk Axk+1 + B T (By k − z) . (3.29b)
λ
The full primal-dual hybrid gradient descent method for TV debluring is:
Step 2. Updating.
xk+1 = PX xk + τk λAT y k
(3.30a)
1
y k+1 = y k − θk Axk+1 + B T (By k − z) . (3.30b)
λ
Our method is related to projection type methods existing in the literature for
finding saddle points and, more generally, solutions to variational inequalities.
In this section, we shall discuss very briefly about the framework of projection
methods for solving variational inequalities and point out the connections and
38
difference between our method and previous work. We refer interested readers to
the survey papers [45] and [71] for the background of this area.
Let H be a real Hilbert space (in our case, Rn ), whose inner product and
norm are denoted by h·i, and k · k respectively. Let K be a closed convex set in H
and F be a mapping from H into itself. We now consider the problem of finding
v ∗ ∈ K such that
hv − v ∗ , F (v ∗ )i ≥ 0, ∀ v ∈ K. (3.31)
The above problem is called a variational inequality problem with v ∗ being one
of its solution. We denote the above variational inequality problem by VI(K, F ).
In most real applications, K is convex and F satisfy some monotonicity and
Lipschitz continuity properties, which we defined as follows:
Definition 1. F is said to be
where
y Φy (x, y)
v= F (v) = and K = Y × X.
x −Φx (x, y)
39
In particular Our ROF problem (3.19) and (3.7) can both be transformed
into a variational inequality problem VI(K, F ) in (3.32) with F and K defined
as follows.
v ∗ = PK v ∗ − αF (v ∗)
for any α > 0.
The fixed-point formulation in the above lemma suggests the simple iterative
algorithm of solving for u∗ .
VI Algorithm 1.
v k+1 = PK v k − αk F (v k ) .
(3.33)
40
VI Algorithm 2.
v k+1 = PK v k − αk F (v k+1) .
(3.34)
VI Algorithm 3.
v̄ k = PK v k − αk F (v k )
(3.35a)
v k+1 = PK v k − αk F (v̄ k )
(3.35b)
There are many other variants of the original extragradient algorithm with
different predictor search rule and corrector step size aiming to improve perfor-
mance (see [58] and [71]). New related developments in this direction can also
be found in [54] and [58], where the final solution is obtained by averaging along
the solution path.
Our ROF problem (3.19) and (3.7) can both be transformed into a variational
inequality problem VI(K, F ) with a monotone and Lipschitz continuous mapping
41
F . Some of the existing algorithms can be applied directly with proved global
convergence. However, numerical experiments show that none of these existing
methods has comparable performance to our algorithm. There are many possi-
ble explanations for this. First of all, in the variational inequality setting the
variables y and x are combined as one variable u and have to be updated in one
step with same steplength; while in our approach the primal y and dual x are
updated alternatively in a Gauss-seidal type of way with freedom to choose their
own step sizes. More importantly, all the existing algorithms are developed to
solve variational inequalities as a general class; while our method exploits the par-
ticular information of the problem, including the bilinear function F and special
structure of the set K, which allow us to choose optimal step size to improve the
performance. On the other hand, our approach is lacking of a global convergence
proof which would be useful to provide some benchmark rules and can help us
better understand how the algorithm works.
We test three problems on image denoising. The original clean images and the
input noisy images are shown in Figure 7.1. The size of the two test problems are
256 × 256, 512 × 512 and 512 × 512 respectively. The noisy images are generated
by adding Gaussian noises with standard deviation σ = 20 to the original clean
images.
42
The parameter λ in the unconstrained ROF model (3.1) is inverse related to
the noise level σ and usually need to be tuned for each individual image. In our
case, λ is chosen in the following way. We first compute the constrained ROF
model by algorithm PDHGDC for the optimality solution (y ∗, x∗ ). Then the
particular λ that will make the unconstrained ROF model (3.1) equivalent to the
constrained model (3.18) is given by
kAx∗ k
λ= . (3.36)
nσ
For our test problems, the parameters λ obtained in the above way are 0.053, 0.037
and 0.049 respectively.
Although suitable constant stepsizes will give good convergence results, the
power of our proposed algorithm shall be most exploit with some optimal strategy
of choosing stepsizes (τk , θk ). Throughout the experiments, we use the following
stepsize strategy:
5
Algorithm PDHGD τk = 0.2 + 0.08k, θk = (0.5 − 15+k
)/τk ;
In Chambolle’s method, we take the time step to be 0.248 for its near optimal
performance. In the CGM implementation, we used a direct solver for the linear
system at each iteration (the conjugate gradient iterative solver was slower on
43
these examples). The smooth parameter β is dynamically updated based on
duality gap rather than fixed. In particular we take β (0) = 100 and let β (k) =
(k) 2
β (k−1) · GG(k−1) . We noticed that this simple strategy of updating β borrowed
from interior-point methods outperforms the classical CGM measured by the
decrease of duality gap.
Tables 6.1 and 6.2 report numbers of iterations and CPU times required by
two P DHD algorithms as well as by Chambolle’s algorithm and CGM method
for the relative duality gap to achieve certain threshold. In all codes, we used the
same starting point (y 0, x0 ) = (z, 0). (Convergence does not depend on initial
conditions though.) We vary the threshold tol from 10−2 to 10−6 , producing
results of increasingly high accuracy as tol is decreased.
The results in the tables demonstrates that our proposed approaches is very
competitive to existing methods. They are the winners for all tests with different
stopping criterions tol = 10−2 , 10−4 , 10−6. It is significantly faster than Cham-
bolle’s method to obtain medium-high accurate solutions and significantly faster
than CGM method to obtain low-medium accurate solutions.
Figure 3.2 shows the denoised images obtained at different values of tol.
Note that visually there is little difference between the results obtained with two
tolerance values 10−2 and 10−4. Smaller values of tol do not produce further
44
visual differences.
Figure 7.2 plots the relative duality gap against the number of iterations
for Chambolle’s method as well as for the PDHGD algorithm. It shows that
the PDHGD converges much faster than Chambolle’s method, especially when
a medium-high accurate solution is required.
3.6 Conclusion
We have proposed a primal-dual hybrid gradient descent method to solve the total
variation based image restoration model of Rudin, Osher and Fatemin (ROF) [60].
The algorithm tries to improve performance by alternating between the primal
and dual variable and exploit information from both variables. We compare
our method with two popular existing approaches proposed by Chambolle [14]
and Chan, Golub, and Mulet [25] and show our method is consistently faster
than earlier approaches in all experiments with different stopping criterions. The
complexity of our method is O(N) at each iteration, in fact its cost is basically the
same as Chamboll’s method at each iteration; while the cost of the CGM method
3
at each iteration is O(N 2 ) for solving the block tri-diagonal linear system.
Our algorithm can be applied to solve both the unconstrained ROF and con-
strained ROF model and, in theory, it can be applied to solve other TV min-
imization model or L1 minimization problem by transforming it to a min-max
form. We also pointed out that our algorithm is related to existing projection
type methods for solving variational inequalities.
45
Figure 3.1: Denoising test problems. Left: original clean image. Right:
noisy image with Gaussian noise (σ = 20). First row: test problem 1,
256 × 256 cameraman. Middle row: test problem 2, 512 × 512 barbara. Bottom
row: test problem 3, 512 × 512 boat
46
Table 3.1: Number of iterations and CPU times (in seconds) for problem 1.
tol = 10−2 tol = 10−4 tol = 10−6
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 45 1.77 1213 61.3 22597 1159
PDHD 14 0.45 73 2.28 328 10.4
PDHDC 14 0.46 70 2.27 308 9.98
CGM 6 20.3 14 49.8 19 68.0
Table 3.2: Number of iterations and CPU times (in seconds) for problem 2.
tol = 10−2 tol = 10−4 tol = 10−6
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 141 26.5 3083 578 47935 8988
PDHD 25 3.66 117 23.7 541 110
PDHDC 24 3.66 113 22.9 519 108
CGM 7 191 15 417 23 642
Table 3.3: Number of iterations and CPU times (in seconds) for problem 3.
tol = 10−2 tol = 10−4 tol = 10−6
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 61 11.4 1218 228 22235 4168
PDHD 16 2.36 72 10.6 320 47.2
PDHDC 16 2.38 71 10.7 316 47.5
CGM 7 189 14 382 20 547
47
Figure 3.2: The denoised images with different level of termination criterions.
left column: tol = 10−2 , right column: tol = 10−4 .
48
0
10
Chambolle
PDHGD
−2
−4
10
−6
10
−8
10
100 200 300 400 500 600 700 800
Iterations
0
10
Chambolle
PDHGD
−2
Relative Duality Gap
10
−4
10
−6
10
−8
10
100 200 300 400 500 600 700 800
Iterations
0
10
Chambolle
PDHGD
−2
Relative Duality Gap
10
−4
10
−6
10
−8
10
100 200 300 400 500 600 700 800
Iterations
P (y )−D(x k k)
Figure 3.3: Plot of relative duality gap49 D(xk )
v.s. Iterations. Top left: test
problem 1. Top right: test problem2. Bottom: test problem 3.
CHAPTER 4
The discrete primal and dual formulations of the unconstrained ROF model is
given as
N
X λ
(P) min kATl yk2 + ky − zk22 (4.1)
y 2
l=1
1
(D) min kAx − λzk2
x∈X 2
where X ≡ x = (x1 ; · · · ; xN ) : xl ∈ R2 , kxl k ≤ 1 for l = 1, · · · , N ,
(4.2)
Before we discuss about the proposed algorithm we make several remarks on the
discretized problems (4.1), (4.2) and prove a general convergence result that will
be useful later. It is easy to verify that both problems can be obtained from the
function ℓ : RN × X → R defined as follows:
λ
ℓ(y, x) := xT AT v + ky − zk22 . (4.3)
2
50
The primal problem (4.1) is simply
It is easy to verify that the conditions (H1), (H2), (H3), and (H4) of [47, pp. 333-
334] are satisfied by this setting. Thus, it follows from [47, Chapter VII, Theo-
rem 4.3.1] that ℓ has a nonempty convex set of saddle points (ȳ, x̄) ∈ RN × X.
Moreover, from [47, Chapter IV, Theorem 4.2.5] and compactness of X, the point
(ȳ, x̄) ∈ RN × X is a saddle point if and only if ȳ solves (4.1) and x̄ solves (4.2).
Note that by strict convexity of the objective in (4.1), the solution ȳ of (4.1) is
in fact uniquely defined. For any saddle point (ȳ, x̄), we have that ℓ(ȳ, x̄) ≤ ℓ(y, x̄)
for all v ∈ RN , that is, ȳ is a minimizer of ℓ(·, x̄). Thus, from optimality conditions
for ℓ(·, x̄), the following relationship is satisfied for the unique solution ȳ of (4.1)
and for any solution x̄ of (4.2):
By uniqueness of ȳ, it follows that Ax̄ is constant for all solutions x̄ of (4.2).
The following general convergence result will be useful in our analysis of al-
gorithms in Section 4.2.
51
Proof. Note first that all stationary points of (4.2) are in fact (global) solutions
of (4.2), by convexity.
Suppose for contradiction that v k 6→ ȳ. Then we can choose ǫ > 0 and a
subsequence S such that kv k − ȳk2 ≥ ǫ for all k ∈ S. Since all xk belong to
the bounded set X, the sequence {xk } is bounded, so {v k } is bounded also. In
particular, the subsequence {v k }k∈S must have an accumulation point v̂, which
must satisfy kv̂ − ȳk2 ≥ ǫ > 0. By restricting S if necessary, we can assume that
limk∈S v k = v̂. By boundedness of {xk }, we can further restrict S to identify a
point x̂ ∈ X such that limk∈S xk = x̂. By (4.5), we thus have
4.2.1 Introduction
For the general problem of optimizing a smooth function over a closed convex
52
set, that is,
min F (x) (4.7)
x∈X
for some αk > 0. Here, PX denotes the projection onto the set X. Since X is
closed and convex, the operator PX is uniquely defined, but in order for the gradi-
ent projection approach to make practical sense, this operator must also be easy
to compute. For this reason, gradient projection approaches have been applied
most often to problems with separable constraints, where X can be expressed
as a Cartesian product of low-dimensional sets. In our case (4.2), X is a cross
product of unit balls in R2 , so computation of PX requires only O(N) operations.
Bertsekas [7] gives extensive background on gradient projection algorithms.
From now on, we will focus on the solution of problem (4.2), which we restate
here:
1
min F (x) := kAx − λgk22, (4.10)
x∈X 2
where the compact set X ⊂ R2N is defined in (4.2). In this section, we discuss
GP techniques for solving this problem. Our approaches move from iterate xk to
the next iterate xk+1 using the scheme (4.8)-(4.9).
53
This operation projects each 2 × 1 subvector of x separately onto the unit ball
in R2 . It is worth pointing out here that this structure of the dual constraints,
which makes the gradient projection approach practical, also enables Chambolle
to develop an analytical formula for the Lagrange multipliers in [14].
Our approaches below differ in their rules for choosing the step parameters
αk and γk in (4.8) and (4.9).
We next consider three gradient projection frameworks that encompass our gra-
dient projection algorithms, and present convergence results for methods in these
frameworks.
Framework GP-NoLS
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax .
Choose x0 and set k ← 0.
54
Framework GP-ProjArc also sets γk ≡ 1, but chooses αk by a backtracking
line search to satisfy an Armijo criterion, which enforces monotonic decrease of
F . This approach is referred to by Bertsekas [7, p. 236] as Armijo Rule Along
the Projection Arc.
Framework GP-ProjArc
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax ,
and choose ρ ∈ (0, 1) and µ ∈ (0, 12 ). Choose x0 and set k ← 0.
F (xk (ρm ᾱk )) ≤ F (xk ) − µ∇F (xk )T (xk − xk (ρm ᾱk )),
Framework GP-LimMin
55
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax .
Choose x0 and set k ← 0.
k k −(δ k )T ∇F (xk )
γk,opt = arg min F (x + γδ ) = (4.12)
kAδ k k22
so the Lipschitz constant for ∇F is kAT Ak2 , which is bounded by 8 (see [14,
p. 92]). It follows immediately from [7, Proposition 2.3.2] that every accumulation
point of {xk } is stationary for (4.10) provided that 0 < α < .25. The result now
follows immediately from Proposition 1.
56
The upper bound of .25 in Theorem 1 is tight; we observe in practice that the
method is unstable evn for τ = .251.
Proof. Proposition 2.3.3 of Bertsekas [7], with minor modifications for the vari-
able choice of ᾱk within the range [αmin , αmax ], shows that all limit points of {xk }
are stationary. The result then follows from Proposition 1.
Proof. Proposition 2.3.1 of Bertsekas [7], with minor modifications for the vari-
able choice of ᾱk within the range [αmin , αmax ], shows that all limit points of {xk }
are stationary. The result then follows from Proposition 1.
57
4.2.3 Barzilai-Borwein Strategies
where
∆xk−1 := xk − xk−1 , ∆g k−1 := ∇F (xk ) − ∇F (xk−1 ),
so our desired property on α is that α−1 ∆xk−1 ≈ ∆g k−1 . Note that for the F we
consider here (4.10), we have ∆g k−1 = AT A∆xk−1 .
which yields
k∆xk−1 k22 k∆xk−1 k22
αk,1 = = . (4.13)
h∆xk−1 , ∆g k−1i kA∆xk−1 k22
An alternative formula is obtained similarly, by doing a least-squares fit to α
rather than α−1 , to obtain
h∆xk−1 , ∆g k−1i kA∆xk−1 k22
αk,2 = arg min k∆xk−1 − α∆g k−1k22 = = . (4.14)
α∈R k∆g k−1 k22 kAT A∆xk−1 k22
These step lengths were shown in [6] to be effective on simple problems; a par-
tial analysis explaining the behavior was given. Numerous variants have been
proposed recently, and subject to with theoretical and computational evaluation.
The BB stepsize rules have also been extended to constrained optimization, par-
ticularly to bound-constrained quadratic programming; see, for example [36] and
[62]. The same formulae (4.13) and (4.14) can be used in these cases.
58
Other variants of Barzilai-Borwein schemes have been proposed by other au-
thors in other contexts. The cyclic Barzilai-Borwein (CBB) method proves to
have better performance than the standard BB in many cases (see for example
[35] and the references therein). In this approach, we recalculate the BB stepsize
from one of the formulae (4.13) or (4.14) at only every mth iteration, for some
integer m. At intervening steps, we simply use the last calculated value of αk .
There are alternating Barzilai-Borwein (ABB) schemes that switch between the
definitions (4.13) and (4.14), either adaptively or by following a fixed schedule.
We discuss here the variants of gradient projection that were implemented in our
computational testing.
Algorithm GPLS. This algorithm falls into GP-PA, where we choose the ini-
tial steplength ᾱk at each iteration by predicting what the steplength would be
if no new constraints were to become active on this step. Specifically, we define
the vector g k by
(∇F (xk ))l ,
if kxkl k2 < 1 or (∇F (xk ))Tl xkl > 0,
k
gi =
I − xk (xk )T (∇F (xk )) , otherwise.
l l l
59
Algorithm GPBB-NM. This is a nonmonotone Barzilai-Borwein method in
GP-NoLS, in which we obtain the step αk via the formula (4.13), projected if
necessary onto the interval [αmin , αmax ].
BB
αml+i = αml+1 for l = 0, 1, 2, . . . and i = 1, 2, . . . , m − 1,
BB
where αml+1 is obtained from (4.13) with k = ml + 1, restricted to the interval
[αmin , αmax ].
60
(a) γk,opt < γl and αk = αk,1; or
where γk,opt is obtained from the limited minimization rule (4.12). We refer
interested readers to [62] for the rationale of the criterion. In any case, the
chosen αk is adjusted to ensure that it lies in the interval [αmin , αmax ].
0 ≤ zl ⊥ kxl k2 − 1 ≤ 0, l = 1, 2, . . . , N,
where the scalars zl are Lagrange multipliers for the constraints kxl k22 ≤ 1, l =
1, 2, . . . , N, and the operator ⊥ indicates that at least one of its two operands
must be zero. At iteration k, we compute an estimate of the active set Ak ⊂
{1, 2, . . . , N}, which are those indices for which we believe that kxl k22 = 1 at the
solution. In our implementation, we choose this set as follows:
61
The SQP step is a Newton-like step for the following system of nonlinear equa-
tions, from the current estimates xk and zlk , l = 1, 2, . . . , N:
zl = 0, l∈
/ Ak . (4.16c)
Using z̃lk+1 to denote the values of zl at the next iterate, and d˜k to denote the step
in xk , a “second-order” step can be obtained from (4.16) by solving the following
system for d˜k and z̃lk+1 , l = 1, 2, . . . , N:
ATl Ad˜k + 2z̃lk+1 d˜kl = −ATl [Axk − λg] − 2xkl z̃lk+1 , l = 1, 2, . . . , N, (4.17a)
z̃lk+1 = 0, l ∈
/ Ak . (4.17c)
αk−1 dkl + 2zlk+1 dkl = −ATl [Axk − λg] − 2xkl zlk+1 , l = 1, 2, . . . , N, (4.18a)
zlk+1 = 0, l ∈
/ Ak . (4.18c)
Considering indices l ∈ Ak , we take the inner product of (4.18a) with xkl and use
(4.18b) and (4.15) to obtain:
We obtain the steps dkl for these indices by substituting this expression in (4.18a):
62
In fact, because of (4.18c), this same formula holds for l ∈
/ Ak , when it reduces
to the usual negative-gradient step
63
• Chambolle’s semi-implicit gradient descent method [14];
We report on a subset of these tests here, including the gradient projection vari-
ants that gave consistently good results across the three test problems.
In Chambolle’s method, we take the step to be 0.248 for near optimal perfor-
mance, although global convergence is proved in [14] only for steps in the range
(0, .125). We use the same value αk = 0.248 in Algorithm GPCL, as it appears
to be near optimal in this case as well.
For all gradient projection variants, we set αmin = 10−5 and αmax = 105 .
(Performances are insensitive to these choices, as long as αmin is sufficiently small
and αmax sufficiently large.) In Algorithm GPLS, we used ρ = 0.5 and µ = 10−4 .
In Algorithm GPABB, we set γl = 0.1 and γu = 5.
We also tried variants of the GPBB methods in which the initial choice of
αk was scaled by a factor of 0.5 at every iteration. We found that this variant
often enhanced performance. This fact is not too surprising, as we can see from
Section 4.3 that the curvature of the boundary of constraint set X suggests that
it is appropriate to add positive diagonal elements to the Hessian approximation,
which corresponds to decreasing the value of αk .
In the CGM implementation, we used a direct solver for the linear system
at each iteration, as the conjugate gradient iterative solver (which is an option
in the CGM code) was slower on these examples. The smooth parameter β
is dynamically updated based on duality gap from iteration to iteration. In
64
particular, we take β0 = 100 and let βk = βk−1 (Gk /Gk−1 )2 , where Gk and Gk−1
are the duality gaps for the past two iterations. This simple strategy for updating
β, which is borrowed from interior-point methods, outperforms the classical CGM
approach, producing faster decrease in the duality gap.
All methods are coded in MATLAB. It is likely the performance can be im-
proved by recoding in C or C++, but we believe that improvements would be
fairly uniform across all the algorithms.
Tables 6.1, 6.2, and 6.3 report number of iterations and average CPU times
over ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point x(0) = 0 in each algorithm and
the relative duality gap stopping criterion. We vary the threshold tol from 10−2
to 10−6 , producing results of increasingly high accuracy as tol is decreased.
Figure 5.4 shows the denoised images obtained at different values of tol.
Note that visually there is little difference between the results obtained with two
tolerance values 10−2 and 10−4. Smaller values of tol do not produce further
visual differences.
The tables show that on all problems, the proposed gradient projection al-
gorithms are competitive to Chambolle’s method, and that some variants are
significantly faster, especially when moderate accuracy is required for the solu-
tions. Two variants stood out as good performers: the GPBB-NM variant and
the GPBB-M(3) variant in which the initial choice of αk was scaled by 0.5 at
each iteration. For all tests with tol = 10−2 , tol = 10−3, and tol = 10−4 , the
winner was one of the gradient-projection Barzilai-Borwein strategies.
65
Figure 4.1: The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Barbara”.
Figure 4.2: The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01.
66
Figure 4.3: The denoised images with different level of termination criterions.
left column: tol = 10−2 , right column: tol = 10−4 .
67
Table 4.1: Number of iterations and CPU times (in seconds) for problem 1. ∗ =
initial αk scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 18 0.12 169 1.15 1082 7.23 23884 154
GPCL 39 0.25 136 0.90 736 4.71 16196 110
GPLS 23 0.33 181 2.66 845 12.8 17056 260
GPBB-M 12 0.16 152 1.82 836 10.1 17464 167
GPBB-M (2) 18 0.17 187 1.80 941 9.00 21146 193
GPBB-M(3) 13 0.11 92 0.78 287 2.44 3749 32.2
GPBB-M(3)∗ 11 0.09 49 0.41 190 1.60 2298 19.7
GPBB-NM 10 0.09 48 0.46 217 2.10 3857 32.3
GPABB 13 0.16 58 0.66 245 2.80 2355 24.0
SQPBB-M 13 0.17 47 0.66 196 2.81 3983 61.0
CGM 6 4.05 9 5.90 12 8.00 18 12.1
68
Table 4.2: Number of iterations and CPU times (in seconds) for problem 2. ∗ =
initial αk scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 27 1.05 164 6.40 815 31.80 15911 628
GPCL 32 1.26 112 4.31 540 21.0 11434 452
GPLS 20 1.85 132 14.8 575 66 12892 1531
GPBB-M 20 1.14 124 7.01 576 32.7 11776 674
GPBB-M (2) 20 1.12 72 4.05 245 13.9 4377 251
GPBB-M(3) 20 1.12 77 4.33 345 19.5 3522 200
GPBB-M(3) fudge 17 0.95 47 2.65 162 9.17 1766 100
GPBB-NM 16 0.85 48 2.53 178 9.52 2802 150
GPABB 16 1.06 47 3.16 168 11.4 1865 127
SQPBB-M 14 1.10 41 3.44 152 13.5 2653 245
CGM 6 22.30 10 37.5 13 48.8 19 71.0
69
Table 4.3: Number of iterations and CPU times (in seconds) for problem 3. ∗ =
initial αk scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 27 5.46 131 28.3 534 112 8314 1781
GPCL 24 4.65 80 17.2 328 69.1 5650 1212
GPLS 34 10.9 86 32.5 322 128 5473 2258
GPBB-M 20 5.50 84 24.9 332 98.2 5312 1576
GPBB-M (2) 20 5.40 56 16.3 160 46.6 4408 1290
GPBB-M(3) 20 5.38 67 19.5 174 50.5 2533 738
GPBB-M(3) fudge 17 4.58 41 11.9 131 38.0 1104 321
GPBB-NM 15 3.92 40 11.3 115 32.4 1371 388
GPABB 14 4.57 35 12.3 122 42.8 1118 394
SQPBB-M 14 4.91 40 15.8 109 44.6 1556 646
CGM 7 168 10 216 14 302 21 441
70
6
10 BB−NM
Chambolle
CGM
4
10
Duality Gap 2
10
0
10
0 20 40 60 80 100
CPU Time
BB−NM
6
10 Chambolle
CGM
Duality Gap
4
10
2
10
0
10
0 100 200 300 400 500 600
CPU Time
8
10
BB−NM
Chambolle
6 CGM
10
Duality Gap
4
10
2
10
0
10
0 200 400 600 800 1000 1200
CPU Time
Figure 4.4: Plots of duality gap vs. CPU time cost. Problem 1-3 are shown in
the order of top to bottom.
71
CHAPTER 5
5.1 Framework
In this section, we shall apply block coordinate descent (BCD) or block nonlinear
Gauss-Seidal method to solve the dual ROF problem (1.12).
minimize F (x)
subject to x ∈ X = X1 × X2 × · · · × Xn ∈ Rm , (5.1)
xk+1
i = arg min F (xk+1 k+1 k k
1 , · · · , xi−1 , yi , xi+1 , · · · , xn ), (5.2)
yi ∈Xi
which update cyclicly for each component of x, starting from a given initial point
x0 ∈ X and generates a sequence of {xk } with xk = (xk1 · · · , xkn ).
72
The discrete dual formulation for TV restoration is
1 1
w − wi−1,j if 1 < i < n w 2 − wi,j−1
2
if 1 < j < n
i,j i,j
(∇ · w)i,j = 1
wi,j if i = 1 + 2
wi,j if j = 1
−w 1
if i = n −w 2
if j = n.
i−1,j i,j−1
(5.5)
Following the idea of the block coordinate descent method (5.2), we fix w at
all other pixels except at some interior point (i, j) (i, j 6= 1 or n) , which lead to
a local minimization problem in R2 :
1 2 1 2
min φ(wi,j ) = [wi,j + wi,j − (wi−1,j + wi,j−1 − zi,j )]2
kwi,j k≤1
1 1 2 2
+ [wi,j − (wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.6)
2 2 1 1
+ [wi,j − (wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1 )]2
The local minimization problems on the boundary points can be modified ac-
cordingly as in the definition of the divergence operator (5.5). We shall describe
them briefly here.
The dual ROF model (5.3) has separable simple Euclidean ball constraints
which makes each sub minimization problem (5.6) fairly easy to solve at each
73
’coordinate’. This simple trait of the constraints that makes the BCD algorithm
practical here is also the reason that makes gradient projection practical and
enables Chambolle to derive the analytical formula for the Lagrange multipliers.
When i = 1 :
1 2 2
min φ(wi,j ) = [wi,j + wi,j − (wi,j−1 − zi,j )]2
kwi,j k≤1
1 1 2 2
+ [wi,j − (wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.7)
2 2 1
+ [wi,j − (wi,j+1 + wi,j+1 + zi,j+1 )]2
When j = 1 :
1 2 1
min φ(wi,j ) = [wi,j + wi,j − (wi−1,j − zi,j )]2
kwi,j k≤1
1 1 2
+ [wi,j − (wi+1,j + wi+1,j + zi+1,j )]2 (5.8)
2 2 1 1
+ [wi,j − (wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1 )]2
1 2
Following (5.5), the boundary values {wn,j : j = 1 · · · , n} and {wi,n : i =
1, · · · , n} are not relevant in our problem and we simply set them to be 0. Hence
the sub minimization problems on boundary i = n or j = n degenerate to the
following 1-D problems.
When i = n :
2 2 1
min φ(wi,j ) = [wi,j
2 |≤1
− (wi,j−1 + wi−1,j − zi,j )]2
|wi,j
2 2 1
+ [wi,j − (wi,j+1 − wi−1,j+1 + zi,j+1 )]2 (5.9)
74
When j = n :
1 1 2
min φ(wi,j ) = [wi,j
1 |≤1
− (wi−1,j + wi,j−1 − zi,j )]2
|wi,j
1 1 2
+ [wi,j − (wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.10)
5.2 Implementation
The sub optimization problems (5.6)- (5.10) generated by BCD algorithm are
ball constrained quadratic minimization problems with at most two unknowns.
We shall now give a generic algorithm to solve each of these subproblems. We
shall solve subproblem (5.6- 5.8) in details in the following discussion using (5.6)
as an example. The subproblems (5.9) and (5.10) are scalar minimizations and
can be solved very easily.
Problem (5.9):
1 2 2 1 1
v = (w + wi,j+1 + wi−1,j − wi−1,j+1 + zi,j+1 − zi,j )
2 i,j−1
2
wi,j = min max{−1, v}, 1 .
Problem (5.10):
1 1 1 2 2
v = (w + wi+1,j + wi,j−1 − wi+1,j−1 + zi+1,j − zi+1,j )
2 i−1,j
1
wi,j = min max{−1, v}, 1 .
75
where wi,j = (p, q) and
1 2 2
a = wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j
2 1 1
b = wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1
1 2
c = wi−1,j + wi,j−1 − zi,j
To solve problem (5.11), we first solve its unconstrained problem and obtain
by first-order conditions
1
p = (2a − b + c) (5.12)
3
1
q = (2b − a + c) (5.13)
3
(λ + 2)p + q = a + c
p + (λ + 2)q = b + c
p2 + q 2 = 1
λ > 0,
where λ is the Lagrange multiplier. The first two equations in KKT system yield
a + b + 2c
p+q =
λ+3
a−b
p−q = (5.14)
λ+1
A B
f (λ) = 2
+ −1=0 (5.15)
(λ + 3) (λ + 1)2
76
where A = 12 (a + b + 2c)2 and B = 21 (a − b)2 .
1.5
1
f(λ)
0.5
−0.5
0 0.5 1 1.5
λ
Substituting the obtained solution of (5.15) λ∗ to system (5.14) then gives the
77
optimal wi,j .
Algorithm BCD
(k+1)
Step 1. Updating. For i, j = 1, · · · , n, update wi,j as
2
(k+1) (k+1) (k)
wi,j = arg min
∇ · w1,1 , · · · , wi,j , · · · wn,n − z
kwi,j k≤1
min D[w] = k∇ · w + zk
1 2
subject to |wi,j | ≤ 1 and |wi,j | ≤ 1 for i, j = 1 · · · , n, (5.16)
where w and the ∇ · w are the same as defined in (5.4) and (5.5).
The constraints in problem (5.16) are separable in the scalar level, which
makes the subproblem (5.2) of the BCD algorithm even easier to solve. In fact,
the algorithm is simply just coordinate descent method. We shall briefly describe
the coordinate descent algorithm for the anisotropic dual ROF model as follows.
78
1
Suppose (i, j) is not on the boundary. Fix any other arguments except wi,j ,
we have the following sub minimization problem
1 1 2 2
min
1 |≤1
[wi,j − (wi−1,j − wi,j + wi,j−1 − zi,j )]2
|wi,j
1 1 2 2
+ [wi,j − (wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.17)
1
+ other terms not dependent on wi,j ,
1
Similarly, fixing any other arguments except wi,j gives
2 2 1 1
min
2 |≤1
[wi,j − (wi,j−1 + wi−1,j − wi,j − zi,j )]2
|wi,j
2 2 1 1
+ [wi,j − (wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1 )]2 (5.19)
1
+ other terms not dependent on wi,j ,
The boundary points can be treated the same way as in the isotropic case.
1 1 2 2
We can also unify all the points into one framework by setting w0,j , wn,j , wi,0 , wi,n
to be 0 for i, j = 1, · · · n − 1 and iterates through i = 1, · · · , n − 1; j = 1, · · · , n
1 2
for wi,j and iterates through i = 1, · · · , n; j = 1, · · · , n − 1 for wi,j .
The full BCDaniso algorithm is given as follows. We here eliminate the iter-
ation supscript k and use one w to describe the algorithm.
Algorithm BCDaniso
79
Step 0. Initialization. Pick initial feasible w. Set the boundary points
1 1 2 2
w0,j , wn,j , wi,0 and wi,n to be 0 for i, j = 1, · · · n − 1.
1 1 1 2 2 2 2
v = (w + wi+1,j − wi,j + wi,j−1 + wi+1,j − wi+1,j−1 + zi+1,j − zi+1,j )
2 i−1,j
1
wi,j = min max{−1, v}, 1 . (5.21)
1 2 2 1 1 1 1
v = (w + wi,j+1 + wi,j+1 − wi,j + wi−1,j − wi−1,j+1 + zi,j+1 − zi,j )
2 i,j−1
2
wi,j = min max{−1, v}, 1 . (5.22)
The convergence of the block coordinate descent method follows directly from
[44] or sec 2.7 of [7]. We quote the relevant results from [44] as follows.
Definition 2. Following the convention in (5.1) and let i ∈ {1, 2, · · · , n}; we say
that f is strictly quasiconvex with respect to xi ∈ Xi on X if for every x ∈ X and
yi ∈ Xi with yi 6= xi we have
80
Proposition 2. Suppose that the function f is strictly quasiconvex with respect
to xi on X for each i = 1, 2, · · · , n − 2 in the sense of Definition 2 and that the
sequence {xk } generated by the BCD method (5.2) has limit points. Then every
limit point x̄ of {xk } is a critical point of Problem (5.1).
Theorem 4. The BCD and BCDaniso algorithms described in the above sections
are global convergent.
Proof. The proof follows directly from Proposition 2. First of all, it is easy to
check that the energy function of the dual ROF model is strictly quasi-convex with
respect to each corresponding ‘coordinate’ for both the isotropic and anisotropic
case. Hence, Proposition 2 applies. Secondly, X is compact for both cases.
Hence every sequence of iterates {w (k) } has a limit point w̄. It follows from
Proposition 2 that the limit point w̄ is a stationary point and hence a global
minimizer of the given convex problem. Finally, by monotonicity of the algorithm:
F (w (k+1) ) ≤ F (w (k)), we have that F (w (k)) ↓ F ∗ = global minimum energy.
81
experiments. This parameter is inversely related to the noise level σ and usually
needs to be tuned for each individual image to get an optimal visual result. For
the BCD algorithm for isotropic ROF model, the subproblem at each coordinate
block are solved by the approach described in section 5.2. The inner Newton’
method to compute λ∗ at each coordinate block is stopped when |f ′ (λ)| < 10−12 .
Figure 5.4 shows the denoised images obtained by different models: the
isotropic ROF model and the anisotropic ROF model.
Tables 6.1, 6.2, and 6.3 report number of iterations and average CPU times
over ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point w (0) = 0 in each algorithm and
the relative duality gap stopping criterion. We vary the threshold tol from 10−2
to 10−4 , producing results of increasingly high accuracy as tol is decreased.
The tables show that on all problems, the proposed Block Coordinate Descent
algorithms are competitive to Chambolle’s method, and that they are consistently
faster in most situations. The tables also show that the advantage of BCD al-
gorithm over Chambolle’s algorithm is more profound for anisotropic TV due to
the simpler constraints.
82
Figure 5.2: The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Barbara”.
Figure 5.3: The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01.
83
Figure 5.4: The denoised images with TV terms. left column: isotropic TV, right
column: anisotropic TV.
84
Table 5.1: Iterations & CPU costs, isotropic TV, problem 1.
tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 35 0.23 323 2.19 1779 11.7
BCD 11 0.22 81 1.83 403 9.50
85
Table 5.4: Iterations & CPU costs, anisotropic TV, problem 1.
tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 48 0.39 430 4.41 2042 20.6
BCD 7 0.06 49 0.39 247 1.97
86
CHAPTER 6
Multilevel Optimizations
As we have seen in section 2.1.6, various multigrid (multilevel) methods have been
developed, trying to solve the total variation denoising problem efficiently with
scalable computational complexity. Most existing multilevel methods are based
on the primal formulation (1.4) and/or its associated Euler-Lagrange equation.
The non-differentiability of the primal energy usually makes it difficult to develop
efficient multigrid algorithms. In most of the times, finding a good smoother for
the primal problem is already challenging enough.
87
denoising model, which is (z = λf ) :
We shall now apply multilevel technique to solve the above problem (6.1).
Let k be the level of our multilevel algorithm with k = 1 being the finest level
and k = log2n + 1 being the coarsest level. At level k, the uniform block size is
b = 2k−1 and number of blocks is τk = n/b.
Suppose w is an intermediate solution (at the finest level) and its modification
at the next coarser level is given by δw = Pmn c = [c1 , c1 , c2 , c2 , · · · · · · , cm , cm ],
where m = n2 , c = [c1 , c2 , · · · , cm ] is the unknowns on the coarser level and P is
the (piecewise constant) prolongation operator from coarser level to finer level.
88
The objective function with the coarser level correction is then given as
φ(w + P c)
= k∇ · (w + P c) − zk2
+ ···············
= (c1 + w1 + z1 )2
+ (c2 − c1 + w3 − w2 + z3 )2
+ ······
+ (−cm − wn + zn+1 )2
where z̃ is given by
z̃1 = w1 + z1
89
L2j−1 ≤ cj + w2j−1 ≤ R2j−1
for j = 1, · · · , m.
L2j ≤ cj + w2j ≤ R2j
The above constraints is equivalent to the following coarser level constraints
where
L̃j = max(L2j−1 − w2j−1 , L2j − w2j )
(6.4)
R̃j = min(R2j−1 − w2j−1 , R2j − w2j ).
Put the above information all together, the problem of solving for the coarser
level correction is the following minimization:
Now, problem (6.5) has the exact same form as the original problem (6.1)
except it is on a coarser level. Hence we can apply the same smoother to get an
approximate solution of (6.5) and then pass to the even coarser grid for correc-
tions. We can adopt the same technique recursively until we get to the coarsest
level and then we add the correction back level by level until reach the finest level.
Each cycle of getting from finest level to coarsest level and then back to finest
level is called a V-Cycle in multigird jargon. The complexity for each V-Cycle is
O(n).
The full multilevel algorithm ML1D using the recursive V-Cycle is shown
below. Here we use
w = Smoother(w 0 , z, L, R, iter)
90
to denote an approximate intermediate solution of (6.1) that we obtained by apply
a specified smoother(relaxation) iter times with initial guess w 0 . The actual level
of the problem is implied by the context.
w = Vcycle(w 0 , z, L, R)
to denote outcome of applying one V-cycle to solve the original problem (6.1).
function w = Vcycle(w 0 , z, L, R)
n = length(w 0 );
if n == 1
w1 = 12 (z2 − z1 );
return;
else
w = Smoother(w 0 , z, L, R, iter1);
m = n2 ;
for j = 1 : m
L̃j = max(L2j−1 − w2j−1, L2j − w2j );
R̃j = min(R2j−1 − w2j−1 , R2j − w2j );
end
z̃ = R(∇ · w + z);
c = Vcycle(0, z̃, L̃, R̃);
w = w + P c;
w = Smoother(w, z, L, R, iter2);
return;
end
91
The full multilevel algorithm ML1D is shown as below.
Algorithm ML1D
92
w 1 − wi−1,j
1
if 1 < i < n w 2 − wi,j−1
2
if 1 < j < n
i,j i,j
(∇ · w)i,j = 1
wi,j if i = 1 + 2
wi,j if j = 1
−w 1
if i = n −w 2
if j = n.
i−1,j i,j−1
(6.8)
The multilevel formulation for the 2D problem is more complicated and te-
dious than the 1D problem. First let us fixed some conventions. We set the size
of matrices w 1, w 2 and z to be n × (n + 1), (n + 1) × n and (n + 1) × (n + 1)
respectively with n being a power of 2. Let m = n2 .
We follow the same setup in the 1D case and let w 1, w 2 be the intermediate
solution on the finest level. We then use c1 , c2 to denote the corrections in the
next coarser level. We again use the piecewise constant interpolation to map P
the coarser level correction to the finer level, i.e.
c1,1 c1,1 c1,2 c1,2 . . .
c c ... c1,1 c1,1 c1,2 c1,2 ...
1,1 1,2
C = c2,1 c2,2 . . . ⇒ P C = c2,1 c2,1 c2,2 c2,2 ...
.. .. . .
.
. . c2,1 c2,1 c2,2 c2,2 ...
.. .. .. ..
..
. . . . .
To rearrange the above objective function, we realized that the indices (2i, 2j), (2i−
1, 2j), (2i, 2j − 1) and (2i − 1, 2j − 1) should be treated differently. If we work our
93
every detail, we will obtain the following formulation for energy function (6.9):
φ(w + P c) = k∇ · c + z̃k2
Xm
+ k∇ · c1 (:, j)F (:, j)k2
j=1
m
X
+ k∇ · c2 (i, :) + G(i, :)k2
i=1
+ other terms independent of c. (6.10)
In (6.10) and later equations, We use the same notation ∇· for both the 2D
divergence operator defined in (6.8) and 1D divergence operator defined in (6.2).
It is easily to figure out their real meaning in the context. Here, the first is for
2D and the rest are for 1D.
S = ∇·w+z
z̃ = S(1 : 2 : n + 1, 1 : 2 : n + 1)
(6.11)
F = S(1 : 2 : n + 1, 2 : 2 : n)
G = S(2 : 2 : n, 1 : 2 : n + 1)
Motivated by equation (6.10), we rewrite the original dual ROF model (6.6) to
facilitate the recursive multigrid V-cycle algorithm as follows.
94
Then the objective function with coarser level corrections can be written as
φ(w + P c) = k∇ · c + z̃k2
Xm
+ k∇ · c1 (:, j) + F (:, j)k2
j=1
m
X
+ k∇ · c2 (i, :) + G(i, :)k2
i=1
m
X
+ 2γ k∇ · c1 (:, j) + Q(:, j)k2
j=1
m
X
+ 2γ k∇ · c2 (i, :) + T (i, :)k2
i=1
+ other terms independent of c. (6.13)
or
φ(w + P c) = k∇ · c + z̃k2
m
X
+ γ̃ k∇ · c1 (:, j) + X̃(:, j)k2
j=1
Xm
+ γ̃ k∇ · c2 (i, :) + Ỹ (i, :)k2
i=1
+ other terms independent of c. (6.14)
where F, G and z̃ are given in (6.11), and Q, T, X̃, Ỹ and γ̃ are given as follows:
γ̃= 1 + 2γ
Q = 21 U(1 : 2 : n + 1, 1 : 2 : n − 1) + U(1 : 2 : n + 1, 2 : 2 : n)
T = 12 V (1 : 2 : n − 1, 1 : 2 : n + 1) + V (2 : 2 : n, 1 : 2 : n + 1) (6.15)
X̃ = (F + 2γQ)/γ̃
Ỹ = (G + 2γT )/γ̃
The objective function (6.14) is exactly in the same form with the original energy
in (6.12) except it is on the coarser level.
95
The constraints on the coarser level variable c are given as
Now we shall tried to simplify the above constraints on the coarser level so that
we can reduce the overall complexity of the problem by a factor of 2 × 2. Un-
fortunately, there is no simple way to achieve that as we have done for the 1D
problem in (6.3). It is easily to see from 6.16 that the feasible set for each ci,j
is the intersection of 4 unit disks, which in general cannot be expressed using a
single equation.
There are several plausible approaches for this problem. For example, in stead
of trying to simply the constraints (6.16) to an equivalent single constraint, i.e.
the old constraints (6.16) are satisfied iff. the new constraint is satisfied, we can
relax this equivalence to a sufficient but not necessary new constraint as follows:
If we choose the new constraints on the coarser level as above, then the opti-
mization problem of finding coarser level correction is exactly in the same form as
the original problem in the finest level. We can apply some smoother (e.g. BCD
algorithm in Chapter 5) several iterations to solve an intermediate solution on the
coarser level and pass it to the next even coarser level recursively. However, the
problem is that the constraints (6.17) may be too restrictive for the algorithm
to make any useful corrections on the coarser level. An extreme case is when
R̃i,j = 0 and no correction will be allowed on that block.
96
We notice that the anisotropic TV model does not have this problem and
therefore is more suitable for applying our multilevel optimization algorithm in
this sense. The dual formulation of the anisotropic ROF model has the exact
same objective function with the isotropic ROF model (6.12) but its constraints
are simple bound constraints. The dual anisotropic ROF model is shown as
follows
L1i,j ≤ wi,j
1 1
≤ Ri,j for 1 ≤ i ≤ n, 1 ≤ j ≤ n + 1
subject to
L2i,j ≤ wi,j
2 2
≤ Ri,j for 1 ≤ i ≤ n + 1, 1 ≤ j ≤ n
where X, Y and γ are the same as in (6.12). L1 , L2 are matrices with all entries
being −1 and R1 , R2 are matrices with all entries being 1.
From the above analysis, it is not hard to obtain that the problem of finding
coarser level correction c is the following minimization problem.
min k∇ · c + z̃k2
m
X
+ γ̃ k∇ · c1 (:, j) + X̃(:, j)k2
j=1
Xm
+ γ̃ k∇ · c2 (i, :) + Ỹ (i, :)k2 (6.19)
i=1
97
where X̃, Ỹ , γ̃ are the same as in (6.15) and L̃, R̃ are given as follows:
Now coarser level correction problem (6.19) has the exact same formulation
with the original problem (6.18). So we can apply the same strategy recursively
and obtain the multilevel V-cycle algorithm as in the 1D problem.
Tables 6.1, 6.2, and 6.3 report number of iterations and average CPU times
over ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point x(0) = 0 in each algorithm and
the relative duality gap stopping criterion. We vary the threshold tol from 10−2
to 10−4 , producing results of increasingly high accuracy as tol is decreased.
These tables show that the improvement over the BCD algorithm by adopt-
ing the multilevel optimization framework is not significant. This might due to
98
Table 6.1: Iterations & CPU costs, anisotropic TV, problem 1.
tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
ML BCD 2 0.06 11 0.30 50 1.42
BCD 7 0.06 49 0.39 247 1.97
99
CHAPTER 7
In Chapter 2, we give a survey of some existing algorithms for solving the total
variation based image restoration models, which includes the CGM method and
SOCP approach. We also point out that SOCP and CGM bear many similarities.
Both methods apply Newton’s method to some primal-dual system to compute
the update at each iteration. In this section, we shall investigate the connections
between these two methods in more details.
Preliminaries of SOCP
We remind the reader that we follow the MATLAB convention of using “ , ”
for adjoining vectors and matrices in a row, and “ ; ” for adjoining them in a
column. Thus, for any vectors x, y and z, the following are synonymous:
x
(x; y; z) = (xT , y T , z T )T = y
z
100
We also use ⊕ denote joining matrices diagonally, that is, for two matrices A and
B,
A 0
A⊕B =
0 B
Now we use the notation Kn to denote the Lorenz cone (ice cream cone) in
Rn , i.e.
Kn ≡ {x = (x0 ; x̄) ∈ Rn : kx̄k ≤ x0 }
where
K = K n1 × K n2 × · · · × K nr .
101
Observe that x ∈ K (x ∈ int K) iff. Arw(x) is positive semidefinite (positive
definite).
We also use Arw(·) in the block sense; that is if x = (x1 ; · · · ; xr ) such that
xi ∈ Rni for i = 1, · · · , r, then
For now we assume that all vectors consist of a single block x = (x0 ; x̄). For
two vectors x and y define the following multiplication
xT y
x0 y1 + y0 x1
T
x ◦ y ≡ (x y; x0 ȳ + y0 x̄) =
..
.
x0 yn + y0 xn
y ≡ x−1 iff. x ◦ y = y ◦ x = e,
It is clear that
1
x−1 = (x0 ; −x̄).
det(x)
102
SOCP problems are usually solved by primal-dual path following interior-
point methods. Note that for x ∈ int K the function -ln(det(x)) is a convex
barrier function for K. if we replaced the second-order cone constraints by xi ∈
int Kni and add the logarithm barrier term −µ ln(det(xi )) to the objective
P
r
X r
X
(Pµ ) min cTi xi −µ ln(det(xi ))
i=1 i=1
Xr
s.t. Ai xi = b,
i=1
xi ∈ int Kni for i = 1, · · · , r. (7.2)
(7.2) are:
r
X
Ai xi = b,
i=1
ci − ATi y − 2µx−1
i = 0, for i = 1, · · · , r,
xi ◦ zi = 2µe, for i = 1, · · · , r,
For every µ > 0 we can show that the above system (7.4) has a unique solution
(xµ , yµ , zµ ). The trajectory points (xµ , yµ , zµ ) satisfying (7.4) is defined as the
103
primal-dual central path or simply the central path associated with the SOCP
problem (7.1).
Applying Newton’ method to (7.4) will result a linear system for the update
direction (∆x, ∆y, ∆y):
A 0 0 ∆x b − Ax
0 A T
I ∆y = c − AT y − z (7.5)
Arw(x) 0 Arw(x) ∆z 2µe − x ◦ z
s.t. ku − f k2 ≤ σ 2 (7.6)
They let v to be the noise variable: v = f − u and let (pi,j , qi,j ) to be the
104
discrete ∇u by forward differentiation:
ui+1,j − ui,j if i < n
pi,j =
0 if i = n
ui,j+1 − ui,j if j < n
qi,j =
0 if j = n
q
They then impose the second-order cone constraints p2i,j + qi,j
2
≤ ti,j .
pn,j = 0 for j = 1, · · · , n,
qi,n = 0 for i = 1, · · · , n,
v0 = σ
105
SOCP:
P
min 1≤i,j≤n ti,j
Problem 7.8 is in the standard form of a SOCP problem (7.1). To see that,
we introduce the following notations and variables.
n × n : Image size
N + 1 : Number of cones n2 + 1, (N = n2 )
106
xk = (ti,j ; pi,j ; qi,j ), for k = 1, 2, · · · , N.
xN +1 = (v0 ; v) ∈ RN +1 .
cN +1 = (0; · · · ; 0) ∈ RN +1 .
0 0 0
.. .. ..
. . .
0 1 0
Āk = ∈ RM ×3 , i.e., Āk (2k − 1 : 2k, 2 : 3) = I2 , for k = 1, · · · , N.
0 0 1
.. .. ..
. . .
0 0 0
T
0 A
ĀN +1 = , where A is defined in Chapter 1 in (1.25) and (1.27).
1 0
Ā = (Ā1 , Ā2 , · · · , ĀN +1 )
x = (x1 ; x2 ; · · · ; xN +1 )
c = (c1 ; · · · ; cN +1 ).
b = (AT f ; σ).
K = K3 × · · · × K3 × KN +1 .
Using the above notation and variables, we can rewrite problem (7.8) into the
following standard form of SOCP:
min cT x
subject to Āx = b,
x ∈ K.
We now follow the general procedure of the primal-dual path following interior-
107
point methods to solve 7.8. We shall show the KKT system of (7.8) that we solve
at each iteration of the interior-point method are equivalent to the CGM system
developed in [25].
2µti,j
1− =0 (7.10a)
t2i,j − (p2i,j + qi,j
2
)
2µpi,j
−αi,j + =0 (7.10b)
t2i,j − (p2i,j + qi,j
2
)
2µqi,j
−γi,j + =0 (7.10c)
t2i,j − (p2i,j + qi,j
2
)
where α, γ and λ are Lagrange multipliers for the constraints (7.9b), (7.9c) and
108
(7.9d) respectively. Note in both (7.9) and (7.10), we omit the special case for the
points on the boundary of the image domain (i = n or j = n) only for simplicity.
which is equivalent to
q
ti,j = p2i,j + qi,j
2
+ µ2 + µ
since ti,j ≥ 0.
Hence, The first three equations in the above KKT system, equations (7.10a-
7.10c) are equivalent to
q
ti,j = p2i,j + qi,j
2
+ µ2 + µ
109
From [25], we have the CGM system for the constraint ROF model is
q
|(∇u)i,j |2 + β wi,j − (∇u)i,j = 0
p
Note system (7.11) is exactly the same as (7.12) except |∇u|2 + µ2 + µ is
p
replaced by |∇u|2 + β. In other words, the SOCP problem and CGM method
actually solve the similar system at each iteration. The difference is the bar-
rier parameter µ in SOCP KKT system decreases to 0 following some reduction
schedule while the smoothing parameter β in CGM is a fixed small number.
so that the system become smooth and the point x will be enforced to stay inside
the interior of the feasible set
X = x = (x1 ; · · · ; xN ) : kxi k ≤ 1 for i = 1, · · · , N .
CGM method [25] then solves the following primal-dual system using New-
110
ton’s method.
q
kATi yk2 + β 2 xi − ATi y = 0, i = 1, . . . , N. (7.14a)
Ax + λy − λy0 = 0. (7.14b)
In [25] the parameter β is fixed, preventing the solution of the above system
from converging to the true optimizer, which is a solution of system (7.13). More-
over, if we pick a very small β > 0 to reduce this effect, the convergence of the
method can become slow.
where (xβ , yβ ) solves the equations (7.14) for each β > 0. As β → 0, (xβ , yβ ) will
converge to the optimality solution. Different from [25], where β is a constant,
the β in our algorithm is a measure of the duality gap and updated accordingly
at each iteration.
111
Let us define the following notations:
q
ρβi = kATi yk2 + β 2 ; (7.15)
1
Eβ = Diag(ρβi I2 ), Fβ = Diag(I2 − xi y T Ai ) . (7.16)
ρβi
Applying Newton’s method to the above equations gives the following linear
system for the updates (∆x, ∆y):
A λIN ∆x r
= d , (7.18)
Eβ −Fβ AT ∆y rc
where
rd = λ(y0 − y) − Ax and rc = AT y − Eβ x. (7.19)
We choose different steplength rule to update the primal y and the dual x.
For the dual, to ensure strict feasibility (i.e. kxi k < 1 for i = 1, . . . , N), we define
the updating rule as
x̃ = x + min(1, 0.99αmax )∆x (7.22)
112
Predictor-Corrector Enhancement We now discuss how to adapt Mehro-
tra’s predictor-corrector method to our problem. One motivation of this method
is to estimate the centering parameter β by the performance of the predictor
step (i.e. affine scaling step in LP). The other important idea is to take into ac-
count the second-order terms in the corrector step as to compensate for some of
the nonlinearity in the system. A key point is that both predictor and corrector
share the same matrix factorization, so the extra work in the corrector step is
little. We refer to [70] for an excellent discussion of Mehrotra’s method.
Let ỹ and x̃ be the primal and dual variables updated in the predictor step.
As a natural generalization of Mehrotra’s method in LP, a heuristic value of the
centering parameter β to be used in the corrector step can be defined by
where
N
X
gap(x, y) = kATi yk − xTi (ATi y) (7.24)
i=1
i.e.,
s
y T Ai ATi ∆y kATi ∆yk2
ρβ̃i 1+2 + (xi + ∆xi ) − (ATi y + ATi ∆y) = 0
(ρβ̃i )2 (ρβ̃i )2
113
Rearranging terms, we obtain
(1)
Eβ̃ ∆x − Fβ̃ AT ∆y = (r̃c )i + hi , (7.25)
It is not practical to solve (7.25) directly with β̃ on the left-hand side, since the
factorization of Gβ has already been computed in the predictor step using the
previous value β. Alternatively, we modify (7.25) to be
(1) (2)
Eβ ∆x − Fβ AT ∆y = (r̃c )i + hi + hi , (7.27)
where
(1) (2)
rcc = (r̃c )i + hi + hi . (7.31)
114
7.2.1 Numerical Experiments
We test our prima-dual interior point algorithm on two image denoising prob-
lem. We also compared the performance of the interior point method with CGM
method. The original clean images and the input noisy images are shown in Fig-
ure 7.1. The size of the two test problems are 128×128 and 512×512 respectively.
The noisy images are generated by adding Gaussian noise to the clean images
using the MATLAB function imnoise, with variance parameter set to 0.01. The
fidelity parameter λ is taken to be 0.045 throughout the experiments.
Figure 7.2 plots the relative duality gap against the number of iterations for
CGM method as well as for the primal-dual interior point method.
115
Figure 7.1: The original clean images and noisy images for our test problems.
Left: 128 × 128 “shape”; right: 256 × 256 “cameraman”.
116
0
10
−6
CGM, β = 10
−1 Interior Point Method
10
Relative Duality Gap
−2
10
−3
10
−4
10
−5
10
−6
10
5 10 15 20 25 30
Iterations
0
10
−6
CGM, β = 10
Interior Point Method
−2
Relative Duality Gap
10
−4
10
−6
10
5 10 15 20 25 30
Iterations
Figure 7.2: Plot of relative duality gap v.s. Iterations. Top : test problem 1,
Bottom: test problem 2.
117
CHAPTER 8
We have proposed some new algorithms to solve the total variation based image
restoration models, including primal-dual hybrid gradient descent method, duality
based gradient projection method and the dual block coordinate descent method.
All of them are very efficient and competitive to the existing popular methods.
These proposed methods are either based on the dual formulation or the primal-
dual formulation so they do not need any smoothing parameter for the total
variation term which will prevent any algorithm from converging to the true
optimality solution. They are explicit first-order methods, which are simple to
implement, only take a few sweeps at each iteration, do not require a huge amount
of memory and hence are very suitable for solving large-scale image restoration
problems.
118
over the uni-levlel scheme is limited, implies that more sophisticated coarsen-
ing and interpolation schemes are needed to exploit the advantage of multigrid
algorithms.
We also studied the connection between CGM method and the SOCP ap-
proach for solving ROF model and proposed an improvement of the CGM algo-
rithm based on the primal-dual interior-point methods.
119
References
[1] R. Acar & C. R. Vogel. Analysis of total variation penalty methods for ill-
posed problems, Inverse Problems , 10 (1994) 1217 - 1229.
[2] S. T. Acton, Multigrid anisotropic difustion, IEEE Trans. Imag. Proc.,
3(1998), 280-291.
[3] F. Alizadeh & D. Goldfarb. Second-order cone programming, Mathematical
Programming, 95 (2003), 3-51.
[4] K. ANDERSEN, E. CHRISTIANSEN, A. CONN, AND M. OVERTON,
An efficient primal-dual interior-point method for minimizing a sum of Eu-
clidean norms, SIAM J. Sci. Comput., 22 (2000), pp. 243-262.
[5] G. Aubert & P. Kornprobst. Mathematical Problems in Image Processing -
Partial Differential Equations and the Calculus of Variations, 2nd edition.
Springer, Applied Mathematical Sciences, Vol 147, 2006.
[6] J. Barzilai and J. Borwein. Two point step size gradient methods, IMA
Journal of Numerical Analysis 8 (1988), pp. 141–148.
[7] D. P. Bertsekas. Nonlinear Programming, 2nd ed., Athena Scientific, Boston,
1999.
[8] Bertsekas, D. P. & Tsitsiklis, J. N., Parallel and Distributed Computation:
Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ, 1989.
[9] E. G. Birgin, J. M. Martinez & M. Raydan. Nonmonotone spectral projected
gradient methods on convex sets, SIAM Journal on Optimization 10 (2000),
pp. 1196–1121.
[10] P. Blomgren & T. Chan. Color TV: Total Variation Methods for Restoration
of Vector-Valued Images, IEEE Trans. Image Proc., 7(1998), 304-309.
[11] P. Blomgren, T. F. Chan, P. Mulet, L. Vese & W. L. Wan. Variational PDE
models and methods for image processing, in: Research Notes in Mathemat-
ics, 420 (2000), 43-67, Chapman & hall/CRC.
[12] M. Burger, S. Osher, J. Xu, & G. Gilboa, Nonlinear inverse scale space
methods for image restoration. Lecture Notes in Computer Science, Vol.
3752, pp. 25-36, 2005.
[13] J. L. Carter. Dual method for total variation-based image restoration, UCLA
CAM Report 02-13, 2002.
120
[14] A. Chambolle. An algorithm for total variation minimization and applica-
tions, J. Math. Imag. Vis. 20 (2004), pp. 89–97.
[15] A. Chambolle. Total variation minimization and a class of binary MRF mod-
els. In: Energy Minimization Methods in Computer Vision and Pattern
Recognition, pp. 136-152, Springer Berlin, 2005.
[17] A. Chambolle & P. L. Lions, Image recovery via total variation minimization
and related problmes, Numer. Math. 76 (1997), pp. 167–188.
[19] T. F. Chan, K. Chen, & X. C. Tai. Nonlinear multilevel scheme for solving
the total variation image minimization problem, in: Image Processing Based
on Partial Differential Equations, Springer Berlin Heidelberg, 2005.
[21] T. F. Chan & K. Chen, An optimization based total variation image de-
noising. SIAM J. Multiscale Modeling and Simulation, Vol 5(2), pp.615-645,
2006.
[23] T. Chan, S. Esedoglu, & F. E. Park. A Fourth Order Dual Method for
Staircase Reduction in Texture Extraction and Image Restoration Problems,
UCLA CAM Report 05-28, 2005.
[24] T. F. Chan, S. Esedoglu, F. Park & A. Yip. Total variation image restoration:
overview and rescent developments. In Handbook of Mathematical Models
in Computer Vision. Springer Verlag, 2005. Edt. by: N. Paragios, Y. Chen,
O. Faugeras.
121
[25] T. F. Chan, G. H. Golub & P. Mulet. A nonlinear primal dual method for
total variation based image restoration, SIAM J. Sci. Comput. 20 (1999),
pp. 1964–1977.
[26] T. Chan, S. Kang, & J. Shen. Euler’s Elastica and Curvature-based Image
Inpainting. SIAM J. Appl. Math., 63(2):564-592, 2002.
[27] T. F. Chan & P. Mulet. On the Convergence of the Lagged Diffusivity Fixed
Point Method in Total Variation Image Restoration, SIAM J. Numer. Anal-
ysis, 36 (1999), 354-367.
[28] T. F. Chan & J. Shen. Image processing and analysis, variational, PDE,
wavelet and wtochastic methods. SIAM, Philadelphia, 2005.
[30] T. Chan & C. Wong. Total Variation Blind Deconvolution, IEEE Trans.
Image Process., 7(1998), 370-375.
[31] T. F. Chan, M. Zhu. Fast Algorithms for Total Variation-Based Image Pro-
cessing. To appear in: The Proceedings of 4th ICCM, Hangzhou, China
2007.
[34] K. Chen & X-C Tai, A Nonlinear Multigrid Method For Total Variation
Minimization From Image Restoration. Journal of Scientific Computing, Vol.
33 (2), pp.115-138, 2007.
[35] Y.-H. Dai, W. W. Hager, K. Schittkowski, & H Zhang. The cyclic Barzilai-
Borwein method for unconstrained optimization, IMA J. Num. Ana. 26
(2006), pp. 604–627.
[36] Y.-H. Dai & R. Fletcher. Projected Barzilai-Borwein methods for large-
scale box constrained quadratic programming, Numerische Mathematik 100
(2005), pp. 21–47.
122
[37] J. Darbon & M. Sigelle. Exact optimization of discrete constrained total
variation minimization problems. In: R. Klette and J. Zunic, editors, Tenth
International Workshop on Combinatorial Image Analysis, volume 3322 of
LNCS (2004), 548-557.
[38] J. Darbon & M. Sigelle. A fast and exact algorithm for total variation min-
imization. In: J. S. Marques, N. Prez de la Blanca, and P. Pina, editors,
2nd Iberian Conference on Pattern Recognition and Image Analysis, volume
3522 of LNCS (2005) 351-359.
[40] I. Ekeland & R. Témam. Convex Analysis and Variational Problems. SIAM
Classics in Applied Mathematics, 1999.
[43] D. Goldfarb & W. Yin. Second-order cone programming methods for total
variation-based image restoration, SIAM J. Sci. Comput. 27 (2005), pp. 622–
645.
[45] P.T. Harker & J.S. Pang. Finite-dimensional variational inequality and non-
linear complementarity problems: a survey of theory, algorithms and appli-
cations. Math. Programming 48 (1990), 161-220.
123
[49] A. N. Iusem. On the convergence properties of the projected gradient method
for convex optimization, Computational and Applied Mathematics 22 (2003),
pp. 37–52.
[50] H. Ishikawa, Exact optimization for Markov random fields with convex pri-
ors, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.
25, No. 10, pp. 13331336, 2003.
[51] G. Korpelevich. The extragradient method for finding saddle points and
other poblems. Ekonomika i Matematicheskie Metody 12 (1976), 747-756.
[55] Yu. Nesterov. Dual extrapolation and its applications for solving varia-
tional inequalities and related problems. Math. Program., Ser. B 109 (2007),
319344.
[59] S. Osher & A. Marquina. Explicit algorithms for a new time dependent model
based on level set motion for nonlinear deblurring and noise removal, SIAM
J. Sci. Comput. 22 (2000), pp. 387-405.
[60] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise
removal algorithms, Physica D 60 (1992), pp. 259–268.
124
[62] T. Serafini, G. Zanghirati, & L. Zanni. Gradient projection methods for large
quadratic programs and applications in training support vector machines,
Optimzation Methods and Software 20 (2004), pp. 353–378.
[65] L. Vese & S. Osher. Modeling Textures with Total Variation Min- imization
and Oscillating Patterns In Image Processing, J. Math. Imaging Vision, 20
(2004), 7-18.
[66] C. R. Vogel & M. E. Oman. Iterative methods for total variation denoising,
SIAM J. Sci. Stat. Comput. 17 (1996), pp. 227–238.
[71] N. Xiu & J. Zhang. Some recent advances in projection-type methods for
variational inequalities. J. of Comp. and Applied Math. 152 (2003), 559-585.
125