0% found this document useful (0 votes)
162 views

Cam08 44

ssss

Uploaded by

AugustoZebadúa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views

Cam08 44

ssss

Uploaded by

AugustoZebadúa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

University of California

Los Angeles

Fast Numerical Algorithms for


Total Variation Based Image Restoration

A dissertation submitted in partial satisfaction


of the requirements for the degree
Doctor of Philosophy in Mathematics

by

Mingqiang Zhu

2008
c Copyright by

Mingqiang Zhu
2008
The dissertation of Mingqiang Zhu is approved.

Luminita Vese

Lieven Vandenberghe

Achi Brandt

Tony F. Chan, Committee Chair

University of California, Los Angeles


2008

ii
To my wife . . .
who is the sunshine in my life
To my parents . . .
who always have their faith in me

iii
Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Total Variation Based Image Restoration Models . . . . . . . . . 1

1.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Challenges in Computing TV models . . . . . . . . . . . . . . . . 7

1.4 Discretization and Notations . . . . . . . . . . . . . . . . . . . . . 8

1.5 Organizations of This Paper . . . . . . . . . . . . . . . . . . . . . 12

2 A Brief Survey of Existing Algorithms . . . . . . . . . . . . . . . 15

2.1 Numerical Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Time Marching Schemes . . . . . . . . . . . . . . . . . . . 15

2.1.2 Fixed Point Iteration . . . . . . . . . . . . . . . . . . . . . 16

2.1.3 Primal-Dual Newton Method (CGM) . . . . . . . . . . . . 17

2.1.4 Duality-Based Gradient Descent Method . . . . . . . . . . 18

2.1.5 Second-Order Cone Programming . . . . . . . . . . . . . . 21

2.1.6 Multigrid Methods . . . . . . . . . . . . . . . . . . . . . . 23

2.1.7 Graph Cut Algorithm for Anisotropic ROF . . . . . . . . . 26

2.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 Motivations for Developing New Methods . . . . . . . . . . . . . . 28

3 Primal-Dual Hybrid Gradient Descent Method . . . . . . . . . . 30

3.1 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iv
3.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Theoretical Connections . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Implementation Details and Numerical Experiments . . . . . . . . 42

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Duality Based Gradient Projection Method . . . . . . . . . . . . 50

4.1 A Fundamental Convergence Result . . . . . . . . . . . . . . . . . 50

4.2 Gradient Projection Algorithms . . . . . . . . . . . . . . . . . . . 52

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.2 Three Frameworks . . . . . . . . . . . . . . . . . . . . . . 54

4.2.3 Barzilai-Borwein Strategies . . . . . . . . . . . . . . . . . . 58

4.2.4 Implemented Variants of Gradient Projection . . . . . . . 59

4.3 A Sequential Quadratic Programming Algorithm . . . . . . . . . . 61

4.4 Computational Experiments . . . . . . . . . . . . . . . . . . . . . 63

5 Block Coordinate Descent (BCD) Method . . . . . . . . . . . . . 72

5.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3 BCD for Anisotropic TV . . . . . . . . . . . . . . . . . . . . . . . 78

5.4 Convergence Theory . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Multilevel Optimizations . . . . . . . . . . . . . . . . . . . . . . . . 87

6.1 Multilevel Optimization for 1D Dual TV Model . . . . . . . . . . 87

v
6.2 Multilevel Optimization for 2D Dual TV Model . . . . . . . . . . 92

6.3 Numerical Experiments and Comments . . . . . . . . . . . . . . . 98

7 Connections and Improvements of Some Existing Methods . . 100

7.1 Connections between CGM and SOCP . . . . . . . . . . . . . . . 100

7.2 An improvement of CGM Method . . . . . . . . . . . . . . . . . . 110

7.2.1 Numerical Experiments . . . . . . . . . . . . . . . . . . . . 115

8 Conclusions and Discussions . . . . . . . . . . . . . . . . . . . . . . 118

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

vi
List of Figures

2.1 Illustration of how coordinate descent might get stuck at a wrong


solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 Denoising test problems. Left: original clean image. Right: noisy
image with Gaussian noise (σ = 20). First row: test problem
1, 256 × 256 cameraman. Middle row: test problem 2, 512 ×
512 barbara. Bottom row: test problem 3, 512 × 512 boat . . . . 46

3.2 The denoised images with different level of termination criterions.


left column: tol = 10−2 , right column: tol = 10−4 . . . . . . . . 48
P (y k )−D(xk )
3.3 Plot of relative duality gap D(xk )
v.s. Iterations. Top left:
test problem 1. Top right: test problem2. Bottom: test problem 3. 49

4.1 The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Bar-
bara”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.2 The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01. . . . 66

4.3 The denoised images with different level of termination criterions.


left column: tol = 10−2 , right column: tol = 10−4 . . . . . . . . 67

4.4 Plots of duality gap vs. CPU time cost. Problem 1-3 are shown in
the order of top to bottom. . . . . . . . . . . . . . . . . . . . . . . 71

5.1 Convergence of Newton’s method . . . . . . . . . . . . . . . . . . 77

vii
5.2 The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Bar-
bara”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.3 The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01. . . . 83

5.4 The denoised images with TV terms. left column: isotropic TV,
right column: anisotropic TV. . . . . . . . . . . . . . . . . . . . . 84

7.1 The original clean images and noisy images for our test problems.
Left: 128 × 128 “shape”; right: 256 × 256 “cameraman”. . . . . . 116

7.2 Plot of relative duality gap v.s. Iterations. Top : test problem 1,
Bottom: test problem 2. . . . . . . . . . . . . . . . . . . . . . . . 117

viii
List of Tables

1.1 Computational Challenges for Primal and Dual ROF . . . . . . . 8

3.1 Number of iterations and CPU times (in seconds) for problem 1. . 47

3.2 Number of iterations and CPU times (in seconds) for problem 2. . 47

3.3 Number of iterations and CPU times (in seconds) for problem 3. . 47

4.1 Number of iterations and CPU times (in seconds) for problem 1.
∗ = initial αk scaled by 0.5 at each iteration. . . . . . . . . . . . . 68

4.2 Number of iterations and CPU times (in seconds) for problem 2.
∗ = initial αk scaled by 0.5 at each iteration. . . . . . . . . . . . . 69

4.3 Number of iterations and CPU times (in seconds) for problem 3.
∗ = initial αk scaled by 0.5 at each iteration. . . . . . . . . . . . 70

5.1 Iterations & CPU costs, isotropic TV, problem 1. . . . . . . . . . 85

5.2 Iterations & CPU costs, isotropic TV, problem 2. . . . . . . . . . 85

5.3 Iterations & CPU costs, isotropic TV, problem 3. . . . . . . . . . 85

5.4 Iterations & CPU costs, anisotropic TV, problem 1. . . . . . . . 86

5.5 Iterations & CPU costs, anisotropic TV, problem 2. . . . . . . . . 86

5.6 Iterations & CPU costs, anisotropic TV, problem 3. . . . . . . . . 86

6.1 Iterations & CPU costs, anisotropic TV, problem 1. . . . . . . . 99

6.2 Iterations & CPU costs, anisotropic TV, problem 2. . . . . . . . . 99

6.3 Iterations & CPU costs, anisotropic TV, problem 3. . . . . . . . . 99

ix
Acknowledgments

I would like to thank my advisor, Professor Tony Chan, for his wise guidance and
generous support throughout my graduate studies at UCLA. His enthusiasm and
insight provided invaluable help in reaching closure on this thesis topic. I feel
paticulary grateful about his effort to meet with me regularly to discuss about
my research project despite of his busy schedule in NSF.

I would like to express my gratitude to my other committee members, Pro-


fessor Achi Brandt, Professor Luminita Vese and Professor Lieven Vandenberghe
for their great help, encouragement and valuable discussions on my studies and
research.

I am thankful to my coauthor, Professor Stephen Wright for his stimulating


discussions. The oppotunity of collabrating with him leads to an important part
of my dissertation.

I would like to thank Professor Andrea Bertozz and Professor Stanley Osher
for their help in my studies and research. I also appreciate the helpful discussions
and suggestions I received from postdoctoral researchers and graduate students
in our reserach group and the math department, which includes Jerome Darbon,
Xavier Bresson, Sheshadri Thiruvenkadam, Ernie Esser, Ethan Brown and many
others.

I am grateful to my friends in IPAM 1129: Connie, Ronald, Igor, Bin, Jian,


Yu, and Yunho. I thank them for their cares and all the helpful and insight-
ful discussions during lunch and tea time. That would aslo lead my thanks to
IPAM and all the staff working there for providing such a wonderful working
environment to all the participants.

I am also thankful to all of the people who work in the mathematics depart-

x
ment here at UCLA for all of their help, and I would really like to extend my
appreciation to Maggie Albert, Robin Krestalude and Babette Dalton.

xi
Vita

1980 Born, Huzhou, Zhejiang, China.

2001 B.S. in Automotive Engineering


Tsinghua University, China.

2004 M.A. in Mathematics


Arizona State University, Tempe, AZ

2005 M.A. in Mathematics


University of California, Los Angeles

2004-2008 Teaching Assistant, Research Assistant


Department of Mathematics
University of California, Los Angeles

Publications

xii
Abstract of the Dissertation

Fast Numerical Algorithms for


Total Variation Based Image Restoration
by

Mingqiang Zhu
Doctor of Philosophy in Mathematics
University of California, Los Angeles, 2008
Professor Tony F. Chan, Chair

Image restoration models based on total variation (TV) have been popular and
successful since their introduction by Rudin, Osher, and Fatemi (ROF) in 1992.
The nonsmooth TV seminorm allows them to preserve sharp discontinuities
(edges) in an image while removing noise and other unwanted fine scale detail.
On the other hand, the the TV term, which is the L1 norm of the gradient vector,
poses computational challenge in solving those models efficiently. Furthermore,
the global coupling of the gradient operator makes the problem extra harder than
other L1 minimization problems where the variables under the L1 norm are sepa-
rable. In this paper we propose several new algorithms to tackle these difficulties
from different perspectives. Numerical experiments show that they are competi-
tive with the existing popular methods and some of them are significantly faster
despite of their simplicity. The first algorithm we introduce is a primal-dual hy-
brid gradient descent method that alternates between the primal and the dual
updates. It utilizes the information from both the primal and dual variables and
therefore is able to converges faster than the pure primal or pure dual method in
the same category. We then proposed gradient projection (GP) methods to solve

xiii
the dual problem of the ROF model based on the special structure of the dual
constraints. We also test variants of GP algorithms with different step selection
strategies, including techniques based on the Barzilai-Borwein method. In this
same line, a block co-ordinate descent method is proposed for solving the dual
ROF problem. The subproblem at each single block can be solved exactly using
different techniques. We also propose a basic multilevel optimization framework
for the dual formulation, aiming to speedup our solution process for large scale
problems. Finally, we study the connections between some existing methods and
give an improvement to CGM method based on the primal-dual interior-point
algorithms.

xiv
CHAPTER 1

Introduction

1.1 Total Variation Based Image Restoration Models

Image restoration is one of the fundamental problem in digital image processing.


Variational models have been extremely successful in a wide variety of image
restoration problems and remain one of the most active areas of research in
mathematical image processing and computer vision. The most fundamental
image restoration problem is perhaps denoising. It forms a significant preliminary
step in many machine vision tasks such as object detection and recognition.
Total variation (TV)-based image restoration models were first introduced by
Rudin, Osher, and Fatemi (ROF) in their pioneering work [60]. It was designed
with the explicit goal of preserving sharp discontinuities (edges) in an image
while removing noise and other unwanted fine-scale detail. ROF formulated the
following minimization problem:
Z
min |∇u| s.t. ku − f k22 ≤ |Ω|σ 2 . (1.1)
u ∈ BV(Ω) Ω
R
Where Ω
|∇u| is the total variation (TV) of u and will be denoted as TV[u].
Other notations in (1.1) are explained as follows:

• Ω is the image domain, which will be taken to be a bounded domain in Rn


with Lipschitz boundary. Usually Ω is simply a rectangle in R2 , modeling
the computer screen.

1
• |Ω| is the area of the domain Ω.

• The function f : Ω → R represents the given observed image. f ∈ L2 (Ω).

• σ is an estimate of the standard deviation of the noise in the image f .

• The notation | · | represents the Euclidean (ℓ2 ) norm on R2 , i.e.

q
|∇u| = (ux ; uy ) = u2x + u2y

• The notation k · k represents the usual L2 norm on L2 (Ω).

• BV(Ω) denotes the collection of all functions (images) in L1 (Ω) with finite
total variations (bounded variations).

The total variation of u has a more formalized definition in mathematics:


nZ o
−u∇ · w w = (w1 , w2 ) ∈ Cc1 (Ω, R2 ), |w| ≤ 1 .

TV[u] ≡ max (1.2)

This definition of TV is more general since it does not require u to be (almost


everywhere) differentiable. In fact, we only require u to be in the BV space
BV(Ω), whose definition uses the above definition of TV:
n o
BV(Ω) ≡ u u ∈ L1 (Ω), TV[u] < ∞ . (1.3)

It can be shown that W 1,1 (Ω) ⊆ BV(Ω) ⊆ L1 (Ω) and the definition of TV
in (1.2) is equivalent to the objective function of (1.1) when u ∈ W 1,1 . More
theoretical properties of total variation and BV space can be seen in references
[5],[28], [42] and [52]. We also point out that, once we try to solve the problem
in the discrete case, all the above technical restriction on u becomes irrelevant.
It is clear that any discrete image u in R2 has finite (discrete) total variations.

2
The results about existence and uniqueness for the solutions of problem (1.1)
can be seen in [17]. The same paper also shows that the constrained ROF model
(1.1) is equivalent to the following unconstrained model:

λ
min TV[u] + ku − f k22 , (1.4)
u∈BV 2

for some suitable Lagrange multiplier λ.

In fact, both ROF and their subsequent researchers mostly focus on solving
the unconstrained model (1.4) rather than the original constrained model (1.1)
since unconstrained optimization is generally comparatively easier to solve.

The original ROF model was introduced to solve image denosing problems.
But the methodology can be naturally extended to restore blurred and noisy
image by including the known blurring kernal:

λ
min TV[u] + kKu − f k22 . (1.5)
u∈BV 2

Here K is a given linear blurring operator and every other term is defined the
same as in (1.4). In this model, f is formulated as the sum of a Gaussian noise
v and a blurry image K ū resulting from the linear blurring operator K acting on
the clean image ū, i.e., f = K ū + v.

Among all linear blurring operators, many are shift-invariant and can be ex-
pressed in the form of convolution:
Z
(Ku)(x) = (h ∗ u)(x) = h(x − y) u(y) dy, (1.6)

where h is the given point spread function (PSF) associated with K.

Over the years, the ROF model has been extended to many other image
processing tasks, including inpaintings, blind-deconvolutions, cartoon-texture de-
compositions and vector valued images. (See [24] and the reference therein for

3
recent developments in TV based image processing.) It has also been modified in
a variety of ways to improve its performance by using iterative refinement with
Bregman distance [61] and the more general inverse scale methods [12] . In this
paper, we shall focus on solving the original ROF restoration problem, but we
point out that some of our ideas can be naturally extended to other relevant
models.

1.2 Duality

The discussion so far, based on the minimization models (1.1) and (1.4), can be
viewed as the primal approach to solving the TV denoising problem. In this
section, we shall discus about the dual formulations of the ROF models (1.4)
and (1.1) and some related analysis. While the dual formulation is not as well
developed as the primal one, it does have some inherent advantages and has been
receiving increasing interest recently (see e.g., [13, 25, 14]). It offers an alternative
formulation which can lead to effective computational algorithms.

First, we shall adopt the definition (1.2) for the TV semi-norm, which we
rewrite here
Z
TV[u] ≡ max −u∇ · w,
w∈W Ω
n q o
1 2 2 2
where W ≡ w w = (w1 , w2 ) ∈ Cc (Ω, R ), |w| = w1 + w2 ≤ 1 (1.7)

With this definition of TV, the ROF model (1.4) becomes

λ
Z
min max −u∇ · w + ku − f k22 , (1.8)
u w∈Cc1 (Ω), |w|≤1 Ω 2

where u and w are the primal and dual variables, respectively. The min-max
theorem (see e.g., Proposition 2.4 in [40])allows us to interchange the min and

4
max, to obtain

λ
Z
max min −u∇ · w + ku − f k22 . (1.9)
w∈Cc1 (Ω), |w|≤1 u Ω 2

The inner minimization problem can be solved exactly as follows:

1
u= f + ∇·w (1.10)
λ

leading to the following dual formulation:


" 2 #
λ 2
1
max D(w) := kf k 2 − ∇ · w + f , (1.11)
w∈Cc1 (Ω), |w|≤1 2 λ
2

or, equivalently,
1
min k∇ · w + λf k22 . (1.12)
1
w∈Cc (Ω), |w|≤1 2

For a primal-dual feasible pair (u, w), the duality gap G(u, w) is defined to be
the difference between the primal and the dual objectives:

G(u, w) = P (u) − D(w)


" 2 #
λ λ 1
Z
|∇u| + ||u − f ||22 − kf k22 −

= ∇ · w + f
Ω 2 2 λ
2
Z   λ 1
2
= |∇u| − ∇u · w + ∇ · w + f − u . (1.13)
Ω 2 λ
2

The duality gap bounds the distance to optimality of the primal and dual
objectives. Specifically, if u and w are feasible for the primal (1.4) and dual
(1.11) problems, respectively, we have

0 ≤ P (u) − O ∗ ≤ G(u, w), (1.14a)

0 ≤ O ∗ − D(w) ≤ G(u, w), (1.14b)

where O ∗ is the (common) primal-dual optimal objective value. We make use of


the duality gap in the termination criterion in our numerical experiments.

5
Optimization theory tells us G(u, w) ≥ 0, and moreover, (u, w) are optimal if
and only if G(u, w) = 0.

Notice here |w| ≤ 1 and it follows |∇u| − ∇u · w ≥ 0 ∀x ∈ Ω. Therefore


G(u, w) = 0 if and only if
1
|∇u| + ∇u · w = 0 and ∇ · w + f − u = 0.
λ
It is also clear that |∇u| − ∇u · w = 0 is equivalent to |∇u|w − ∇u = 0.

Hence, (u, w) is optimal if and only if the following system holds



 |∇u|w − ∇u = 0
(1.15)
 ∇ · w − λ(u − f ) = 0

The above system (1.15) is referred as the primal-dual system.

We can derive the dual formulation of the constrained ROF model (1.1) in
exactly the same way. First of all, the constrained ROF model (1.1) can be
rewrite into the following min-max problem using the general definition of TV in
(1.7):
Z
min max −u∇ · w, (1.16)
u∈U w∈W Ω
where U ≡ {u ∈ BV(Ω) : ku − f k22 ≤ |Ω|σ 2 } and W is defined as in (1.7).

Again, the min-max theorem allows us to interchange the min and max in the
above problem and we obtain the equivalent
Z
max min −u∇ · w. (1.17)
w∈W u∈U Ω

Now the inner minimization problem in (1.17) can be solved exactly as


p ∇·w
u = f + |Ω| σ . (1.18)
k∇ · wk2
Substituting the above result back to problem (1.17), we obtain the dual formu-
lation
p Z
max D[w] ≡ − |Ω| σk∇ · wk2 − f∇·w (1.19)
w∈W Ω

6
Similarly, the duality gap G(u, w) for any feasible primal-dual pair (u, w) is de-
fined as

G(u, w) ≡ P [u] − D[w]


Z
 p
= |∇u| + f ∇ · w + |Ω| σk∇ · wk2
ZΩ 
p Z 

= |∇u| − ∇u · w + |Ω| σk∇ · wk2 + (f − u) ∇ · w (1.20)
Ω Ω

Setting G(u, w) = 0 gives the following optimality condition



 |∇u|w − ∇u = 0
k∇·wk2
(1.21)
 ∇·w− 1 (u − f ) = 0
σ |Ω| 2

Comparing (1.18) with (1.10) or comparing (1.21) with (1.15) we find the
suitable fidelity parameter λ that leads to equivalent models (1.4) and (1.1) are
given as
k∇ · w ∗ k2
λ= 1 , (1.22)
σ |Ω| 2
where w ∗ is the dual optimizer of problem (1.19).

1.3 Challenges in Computing TV models

From the computational point of view, the primal and dual formulations pose
different challenges for computing their optimality solutions (see Table 1.1). The
total variation term in the primal formulation is non-smooth at where |∇u| =
0, which makes the derivative-based methods impossible without an artificial
smoothing parameter. The dual formulation imposes constraints which usually
require extra effort compared to unconstrained optimizations. Being quadratic,
the dual energy is less nonlinear than the primal energy, but the rank-deficient
operator ∇· makes the dual minimizers possibly non-unique. Finally, they share

7
the same problem of spatial stiffness due to the global couplings in their energy
functions, which presents a challenge to any algorithm in order to control the
computational complexity that scales reasonably bounded with the number of
pixels.

Primal Problem (1.4) Dual Problem (1.11)


· Nondifferentiable at |∇u| = 0 · Non-uniqueness due to rank-deficient ∇·
· Highly nonlinear · Extra constraints
· Spatial stiffness · Spatial stiffness

Table 1.1: Computational Challenges for Primal and Dual ROF

1.4 Discretization and Notations

Before describing the numerical algorithms, let us choose an appropriate way


to discretize the continuous ROF model and fix the main relevant notational
conventions.

Often in this paper we need to concatenate vectors and matrices, in both


column-wise or row-wise fashion. We follow the MATLAB convention of using
“,” for adjoining vectors and matrices in a row, and “;” for adjoining them in a
column. Thus, for any vectors x, y and z, the following are synonymous:
 
x
 
 y  = (xT , y T , z T )T = (x; y; z)
 
 
z

For the sake of simplicity, we assume that the domain Ω is squares, and define
a regular n × n grid of pixels, indexed as (i, j), for i = 1, 2, . . . , n, j = 1, 2, . . . , n.
We represent images images as two-dimensional matrices of dimension n × n,

8
where ui,j represents the value of the function u at pixel (i, j). (Adaptation to
less regular domains is not difficult in principle.) To define the discrete total
variation, we introduce a discrete gradient operator, whose two components at
each pixel (i, j) are defined as follows:

 ui+1,j − ui,j if i < n
(∇u)1i,j = (1.23a)
 0 if i = n

 ui,j+1 − ui,j if j < n
(∇u)2i,j = (1.23b)
 0 if j = n.

(Thus ∇u ∈ Rn×n×2 .) The discrete TV of u is then defined by


X
TV(u) = |(∇u)i,j |,
1≤i,j,≤n

where | · | is the Euclidean (ℓ2 ) norm in R2 . Note that this norm is not a smooth
function of its argument. It has the classic “ice-cream cone” shape, nondifferen-
tiable when its argument vector is zero.

The discrete divergence operator is defined, by analogy with the continuous


setting, as the negative adjoint of the gradient operator, that is, ∇· = −∇∗ .
Defining the inner product of two objects in Rn×n as follows:
X
hu, vi = ui,j vi,j ,
1≤i,j≤n

(and similarly for objects in Rn×n×2), we have from definition of the discrete
divergence operator that for any u ∈ Rn×n and w ∈ Rn×n×2 , that h∇u, wi =
hu, −∇ · wi. It is easy to check that the divergence operator can be defined
explicitly as follows:
 
1 1
 w − wi−1,j if 1 < i < n  w 2 − wi,j−1
2
if 1 < j < n
 i,j  i,j

 

(∇ · w)i,j = 1
wi,j if i = 1 + 2
wi,j if j = 1

 

 −w 1

if i = n  −w 2

if j = n.
i−1,j i,j−1
(1.24)

9
1 2
It is worth noticing that the values {wn,j : j = 1 · · · , n} and {wi,n : i = 1, · · · , n}
are not relevant in calculating the divergence ∇ · w in (1.24). Therefore, the
discrete dual problem only have essentially 2n(n − 1) instead of 2n2 unknowns.
Non-the-less, the boundary points are usually included in w (with values being
0) to make the discrete dual problem consistent in form with the primal problem.

To describe the problem in matrix algebra language, we reorder the image


matrix u (resp. f ) in row-wise fashion into a vector y (resp. z), associating the
(i, j) element of the two-dimensional structure with the element (j − 1)n + i of
the vector structure, as follows:

y(j−1)n+i = ui,j , 1 ≤ i, j ≤ n.

We have y ∈ RN , where N = n2 . The (i, j) component of the gradient (1.23)


can thus be represented as a multiplication of the vector y ∈ RN by a matrix
Al ∈ R2×N , for l = 1, 2, . . . , N:

(yl+1 − yl ; yl+n − yl ) if l mod n 6= 0 and l + n ≤ N








(0; yl+n − yl )

if l mod n = 0 and l + n ≤ N
T
Al y = (1.25)
(yl+1 − yl ; 0) if l mod n 6= 0 and l + n > N








(0; 0)

if l mod n = 0 and l + n > N.

Using this notation, the discrete version of the unconstrained primal ROF model
(1.4) can be written as follows:
N
X λ
min kATl yk + ky − zk2 , (1.26)
y∈RN
l=1
2

where (and from now on) we use k · k do denote the Euclidean norm (ℓ2 norm)
in Euclidean space Rm with any finite dimension.

Similarly, we restructure the dual variable w row-wisely into a collection of

10
vectors xl ∈ R2 , l = 1, 2, . . . , N, as follows:
 
1
wi,j
x(j−1)n+i =  , 1 ≤ i, j ≤ n.
2
wi,j

The complete vector x ∈ R2N of unknowns for the discretized dual problem is
then obtained by concatenating these subvectors:

x = (x1 ; x2 ; . . . ; xN ).

We also form the matrix A by concatenating the matrices Al , l = 1, 2, . . . , N


defined in (1.25), that is,

A = (A1 , . . . , AN ) ∈ RN ×2N . (1.27)

In this notation, the divergence ∇ · w is simply −Ax, so the discretization of the


dual ROF model (1.12) is

min kAx − λzk2


x∈X

where X = {x : x ∈ R2N , kxl k ≤ 1 for l = 1, 2, · · · , N} (1.28)

Similarly, the duality gap in (1.13) has the following discrete form
N 
X  λ 1 2
G(y, x) = kATl yk − xTl ATl y + y − z + Ax , (1.29)

l=1
2 λ

and primal-dual optimality condition (1.15) has the discrete form



 kAT yk − xT AT y = 0 for l = 1, · · · , N
l l l
(1.30)
 y − z + λ1 Ax = 0

By the same token, the constrained ROF model (1.1) has the following dis-
cretization for its primal formulation, dual formulation, primal-dual gap and
primal-dual optimality conditions:

11
Primal problem:
N
X
min kATl yk
y
l=1
subject to y ∈ Y ≡ {y ∈ RN : ky − zk ≤ nσ} (1.31)

Dual problem:

max −nσkAxk + z T Ax
x

subject to x ∈ X ≡ {x ∈ R2N : kxl k2 ≤ 1, l = 1, 2, . . . , N}. (1.32)

Duality Gap:
N 
X   
G(y, x) = kATl yk − xTl ATl y + nσkAxk + (y − z)T Ax , (1.33)
l=1

Primal-dual optimality conditions:



 kAT yk xl − AT y = 0 for l = 1, · · · , N
l l
(1.34)
 y − z + kAxk Ax = 0

1.5 Organizations of This Paper

In this paper, we try to develop some efficient algorithms to solve the total vari-
ation based image restoration models. The paper is organized as follows:

Chapter 1 gives a short introduction to the total variation based image restora-
tion models and BV space. It covers the application of duality theory to
the ROF model and derives the dual problem. We then analyze the dif-
ferent computational challenges facing the primal and dual formulation of
ROF model and end the chapter by introduction necessary notations and
discretization for the study numerical algorithms.

12
Chapter 2 offers a brief survey of some existing popular methods. At the end
of chapter 2, we also give remarks on exiting methods and motivations on
developing new algorithms.

Chapter 3 proposes a simple yet efficient primal-dual hybrid gradient descent


method for solving the ROF model. Results of numerical experiments to
demonstrate its fast convergence. Connections to projection type meth-
ods in solving variational inequalities and extensions to other models are
studied.

Chapter 4 introduces gradient projection methods to solve the dual formulation


of the ROF model. We test variants of GP with different step selection and
line search strategies, including techniques based on the Barzilai-Borwein
method. Global convergence can in most cases be proved by appealing to
existing theory. Numerical results are shown in demonstrating performance.

Chapter 5 proposes a block coordinate descent method to solve the dual for-
mulation of the ROF model. Global convergence of the algorithm are
proved and results of numerical experiments are shown. Dual formula-
tion of anisotropic ROF model are also studied for its simpler constraints
structure.

Chapter 6 tries to speedup the algorithm in solving the dual problem of the
ROF model by applying some basic multilevel optimization techniques.

Chapter 7 studies the connections between the CGM method and SOCP method
for solving the ROF model. An improvement of CGM algorithm based on
primal-dual interior-point method is aslo proposed.

Chapter 8 gives an comprehensive comparison of the newly developed methods

13
in this paper and some existing popular algorithms. It ends with some final
conclusions and remarks.

14
CHAPTER 2

A Brief Survey of Existing Algorithms

2.1 Numerical Algorithms

2.1.1 Time Marching Schemes


p
If we replace |∇u| in (1.4) by |∇u|β = |∇u|2 + β, we obtain the following
smoothed (β-regularized) ROF model

λ
Z
min |∇u|β + ku − f k22 . (2.1)
u Ω 2

The above problem has a convex differentiable objective function and hence its
optimality solution is given by the following Euler-Lagrange equation (first-order
condition)
 ∇u 
∇· − λ(u − f ) = 0. (2.2)
|∇u|β

In their original paper [60], Rudin et al. proposed the use of artificial time
marching to solve the Euler-Lagrange equation (2.2) which is equivalent to the
steepest descent of the energy function. More precisely, consider the image as a
function of space and time and seek the steady state of the equation
 
∂u ∇u
=∇· − λ(u − f ) (2.3)
∂t |∇u|β

In numerical implementation, an explicit time marching scheme with time step ∆t


and space step ∆x(usually is 1) is used. Under this method, the objective value of

15
the ROF model is guaranteed to be decreasing provided that the time step is small
enough and the solution will tend to the unique minimizer as time increases. The
method is (asymptotically) slow due to the CFL stability constraints Courant-
Friedrichs-Lewy (CFL) condition ∆t ≤ c|∇u|β (∆x)2 for some constant c > 0 (see
[59]), which puts a very tight bound on the time step when the solution develop
flat regions (where |∇u| ≈ 0). Hence, this scheme is useful in practice only when
low-accuracy solutions suffice. Sometimes even the cost of computing a visually
satisfactory image is too great.

To relax the CFL condition, Marquina and Osher use, in [59], a ”precondition-
ing” technique to cancel singularities due to the degenerate diffusion coefficient
1
|∇u|
:
 
∂u  ∇u 
= |∇u| ∇ · − λ(u − f ) (2.4)
∂t |∇u|β
which can also be viewed as mean curvature motion with a forcing term −λ(u−f ).

2.1.2 Fixed Point Iteration

To bypass the stability constraint in time marching schemes, Vogel and Oman
proposed in [66] a “lagged diffusivity” fixed point iteration scheme which solves
the stationary Euler-Lagrange equation directly. The main idea is to linearize the
1
Euler-Lagrange equation by lagging the diffusion coefficients |∇u|β
at a previous
iteration. The k-th iterate is obtained by solving the sparse linear system (see
[66], [67])
∇uk 
∇· k−1
− λ(uk − f ) = 0 (2.5)
|∇u |β
The above linear system is block tri-diagonal and can be solved by various fast
direct/iterative solvers. While global convergence has been proven by different
authors(see e.g. [33], [17], [27], [39]), the outer iteration is only linearly conver-
gent and the convergence slows down as β decreases. Moreover, the inner linear

16
system is difficult to solve efficiently because of the highly varying and possibly
1
degenerate coefficient |∇u|β
, as well as spatial stiffness of the ellipticity of the
differential operator.

2.1.3 Primal-Dual Newton Method (CGM)

In order to obtain superlinear convergence, a Newton-type algorithm is needed.


However, naive exact Newton’s method is known to have a very small domain of
convergence for equation (2.2) and fails for modest β (see [32], [11]). In [32], Chan
et al tried to combine Newton’s method with a continuation method in the pa-
rameter β, which produced more robustness but still was not overall satisfactory.
Ng et al proposed in [56] an nonsmooth Newton’s method to solve the original
ROF model which does not require the smoothing parameter β. However, the
number of Newton iterations it required to achieve convergence is still considered
too large.

A much better alternative introduce by Chan, Golub and Mullet(CGM) in


[25] is to apply Newton’s method to the primal-dual system. By introducing
∇u
a new dual variable w := |∇u|β
to equation (2.2), they obtained the following
equivalent system to (2.2)

 w|∇u|β − ∇u = 0
(2.6)
 ∇ · w − λ(u − f ) = 0

It is now clear to us that the above system is a β-regularized version of the true
primal-dual system (1.15), which we rewrite here:

 |∇u|w − ∇u = 0
(2.7)
 ∇ · w − λ(u − f ) = 0

System 2.1.3 and (1.15) are the same except the the norm |∇u| in is smoothed

17
with β. When β ↓ 0, the solution of 2.1.3 will converge to the solution of (1.15),
i.e., the true optimality solution of the ROF model.

As discussed in [25] and [11], this primal-dual system is better suited for
Newton’s method than the primal Euler-Lagrange equation (2.2) since the (u, w)
system is intuitively more “linear” than the primal system in u only. Furthermore,
compared to fixed-point methods and time-marching methods that are at best
linearly convergent, the CGM method was shown in [25] to be empirically globally
convergent with a quadratic rate. Moreover, the convergent rate deteriorates
only moderately with decreasing β and in practice small values of β can be used
without too much loss in efficiency.

There is still a need to solve an elliptic linear system at every Newton iteration,
but through an elimination of the update on w, the Newton iteration can be
implemented by solving only one linear system involving the update for u

J(uk , w k ) · δuk = −r(uk ),

where J is positive definite and r is the steepest ascent direction of the primal
objective function(i.e. r is the LHS of equation (2.2). This ensures that the New-
ton update is a decent direction and makes the cost per iteration comparable to
that for the fixed point method. For applications where a high accuracy solution
is needed, eg. in an automated image processing setting without human inter-
vention, fast convergent methods such as CGM will have a significant advantage
over slower linearly convergent methods.

2.1.4 Duality-Based Gradient Descent Method

The main advantage of the dual formulation (1.11) or (1.12) is that the objective
function is nicely quadratic and hence there is no issue with non-differentiability,

18
or the need for numerical regularization, as in the primal formulation. However,
this comes with the price of dealing with the constraints on the dual w. In fact,
there are as many constraints as there are number of pixels and one does not
know in advance where the constraints are active. There are several “standard”
numerical approaches to solving nonlinear constrained optimization (1.11), e.g.
penalty, barrier, augmented Lagrangian methods and interior-point methods(see
e.g. [53]). Carter did some early work in this direction in her thesis [13].

One advance in this direction was made by Chambolle in [14], in which he


obtained an explicit analytic formula for the Lagrange multipliers for the con-
straints. This idea is also extended to high order dual methods by Chan et. al.
in [23].

The optimality conditions(Karush-Kuhn-Tucker) for the discretized version


of (1.12) is (see [14])

−(∇(∇ · w − λf ))i,j + αi,j wi,j = 0 (2.8a)

αi,j (|wi,j |2 − 1) = 0 (2.8b)

αi,j ≥ 0 (2.8c)

where αi,j are the Lagrange multipliers.

Now Chambolle made the following original observation about Lagrange mul-
tipliers α. The complementarity conditions (2.8b) are satisfied as either of the
following two cases:
αi,j = 0 or |wi,j | = 1 (2.9)

Applying (5.15) back to (2.8a), we have

αi,j = 0 ⇒ |(∇(∇ · w − λf ))i,j | = αi,j |wi,j | = 0 = αi,j (2.10)

|wi,j | = 1 ⇒ |(∇(∇ · w − λf ))i,j | = αi,j .|wi,j | = αi,j (2.11)

19
In either case, we have
α = |∇(∇ · w − λf )| , (2.12)

which yields the following nonlinear algebraic equation in w only:

−∇(∇ · w − λf ) + |∇(∇ · w − λf )|w = 0 (2.13)

It is important to observe that while (2.13) still inherits the non-differentiability of


the primal objective, it is nonetheless “integrated once”, i.e. the nonsmoothness
is in a set of algebraic equations rather than inside the minimization problem.

He then proposed a semi-implicit gradient descent algorithm (notice the LHS


of (2.13) can be viewed as the gradient ascent direction):
 
w n+1 = w n − τ − ∇(∇ · w n − λf ) + ∇(∇ · w n − λf ) w n+1 (2.14)

or,
w n + τ H(w n )
w n+1 = , (2.15)
1 + τ |H(w n )|
where H(w) = ∇(∇ · w − λf ) and τ is a time step chosen suitably small for
convergence.

It is clear that the constraints |w| ≤ 1 are automatically handled in the


above algorithm, provided the initial guess satisfies so. Global convergence for
1
τ < 8
was also proved in [14]. However, since the linearization is through a
fixed point method, the convergent rate is at best linear. Also, since nothing
is applied to handle the spatial stiffness, the number of iterations will increase
with the number of pixels. In practice, a relatively large number of iterations is
still required. Of course, the main advantage is that sharp discontinuities can be
recovered without numerical regularization and the method is globally convergent
and easy to implement.

20
2.1.5 Second-Order Cone Programming

Recently, Goldfarb and Yin proposed a new approach in [43] for various total
variation models based on second-order cone programming(SOCP). They model
problem (1.1) as a second-order cone program, which is then solved by modern
interior-point methods.

To reform the constrained ROF model


X
min k(∇u)i,j k
1≤i,j≤n

s.t. ku − f k2 ≤ σ 2 (2.16)

into a SOCP problem, the authors in [43] introduced new variables p, q, t, v as


follows.

They let v to be the noise variable: v = f − u and let (pi,j , qi,j ) to be the
discrete ∇u by forward differentiation:

 ui+1,j − ui,j if i < n
pi,j =
 0 if i = n

 ui,j+1 − ui,j if j < n
qi,j =
 0 if j = n

They then introduce a new variable t as an upper bound of the gradient norm
|∇u|:
q
|(∇u)i,j | = p2i,j + qi,j
2
≤ ti,j

and this is where the “second-order cone” constraints come from.

21
As a consequence, problem (7.6) is transformed to
n
X
min ti,j
i,j=1
s.t. ui,j + vi,j = fi,j for i, j = 1, · · · , n

−pi,j + ui+1,j − ui,j = 0 for i = 1, · · · , n − 1, j = 1, · · · , n

−qi,j + ui,j+1 − ui,j = 0 for i = 1, · · · , n, j = 1, · · · , n − 1

pn,j = 0 for j = 1, · · · , n,

qi,n = 0 for i = 1, · · · , n,

v0 = σ

(ti,j ; pi,j ; qi,j ) ∈ K3 for i, j = 1, · · · , n,


2 +1
(v0 ; v) ∈ Kn (2.17)

Eliminating u from formulation (7.7), we obtain the following standard form


SOCP:
P
min 1≤i,j≤n ti,j

s.t. pi,j + vi+1,j − vi,j = fi+1,j − fi,j for 1 ≤ i ≤ n − 1, 1 ≤ j ≤ n


qi,j + vi,j+1 − vi,j = fi,j+1 − fi,j for 1 ≤ i ≤ n, 1 ≤ j ≤ n − 1
pn,j = 0 for 1 ≤ j ≤ n
(2.18)
qi,n = 0 for 1 ≤ i ≤ n
v0 = σ
(ti,j ; pi,j ; qi,j ) ∈ K3 for 1 ≤ i, j ≤ n
2 +1
(v0 ; v) ∈ Kn

(7.8) is in standard form of SOCP and there are massive literature about how
to solve this class of problems (see e.g. [3]). The dual formulation can also be
transformed into SOCP (see [43]). SOCPs can be solved efficiently in practice
and, in theory, in polynomial time by interior-point methods. For example, path-
following interior-point methods generate a sequence of interior points that follow

22
the so-called central path toward the optimality. At each iteration, a set of
primal-dual equations, which depend on current variable values and a duality gap
parameter, are solved by one step of Newton’s iteration to yield an improving
search direction. Like CGM, SOCP need to solve a linear system at each iteration
but can produce highly accurate solutions with low residuals or duality gaps.

In some sense, the SOCP framework bears many similarities with the earlier
primal-dual Newton method discussed in section 2.1.3 First of all, they all
apply Newton’s method to compute the update search direction at each iteration.
Secondly, they all use a dual variable and solve the primal-dual system. In the
SOCP formulation, we do not need a regularization parameter β. However, when
we apply interior-point method to solve SOCP, the primal-dual system that we
solve does depend on an artificial parameter that measures the duality gap. This
duality parameter is similar to β in CGM, and the only difference is that β is fixed
in CGM while the duality parameter dynamically decreases during the solution
process of SOCP. Another interesting point of view here is that SOCP minimize
t, an upper bound of |∇u| instead of minimizing |∇u| itself, to get rid of the
non-smoothness of the objective function, and this bears similarity to the early
p
idea of minimizing |∇u|2 + β, which is also an upper bound of |∇u|.

2.1.6 Multigrid Methods

In order to obtain an efficient algorithm with scalable computational complexity,


some sort of multilevel methods seems necessary. Interested readers can refer to
the survey [19] for a more detailed account. One approach is to use multigrid
to solve the various linear systems that arise in the methods mentioned above.
One of the earliest attempts along this line can be seen in [2], [18], [68], [69] and
[64], where a linear multigrid method is used as a fast solver for the inner linear

23
system (2.5) in the fixed point method. Here we rewrite down the system (2.5)
 ∇uk 
∇· − λ(uk − f ) = 0 (2.19)
|∇uk−1|β
However, there are two main drawbacks of this approach. First, the linearized
system (2.19) has very “nasty” coefficients (especially when β is small), making
it difficult to derive efficient multigrid methods for its solution. Secondly, the
outer fixed point iterations is still only linear convergent.

Recently [63] and [41] considered using the nonlinear multigrid method to
solve the non-linear Euler-Lagrange equation (2.2). However, here one has to
take a relatively large β to achieve convergence.

Another approach introduced in [20, 21] is to apply multilevel algorithms


directly to the minimization problem (1.4), and it is generated in [34] to use
piecewise linear coarser level corrections. This approach uses coordinate descent
relaxation of the primal objective as a smoother. It either does not require the
regularization parameter β (in 1D) or works with an extremely small β (in 2D).
However, it is a well known fact that coordinate descent method is generally not
guaranteed to converge for nonseparable non-smooth optimizations. As shown in
[13] and [20, 21], primal relaxation usually gets stuck in a “local minimizer”(in
the sense of the algorithm) due to the non-smoothness of the TV term and thus
fails to converge to the true solution. Figure (2.1) illustrated how coordinate
descent method might get stuck at a wrong solution. The key point here is when
a flat region (non-smoothness) develops in an intermediate solution, changing
the value of any single pixel might not be sufficient to decrease the objective
function. However, the multilevel optimization method based on primal relax-
ation is somehow more robust because the coarse level correction can help the
solution moves away from the “local minimizer”. Nevertheless, the multilevel
method with standard coarsening can still fail in some circumstances. [20, 21]

24
noisy f uCD uTrue
6

1
1 1.5 2 2.5 3 3.5 4

Figure 2.1: Illustration of how coordinate descent might get stuck at a wrong
solution.

proposed the multilevel method with a non-standard coarsening scheme based


on the “flat” patches of an image. Numerical experiments showed this scheme
has better robustness than the standard coarsening and works for most examples.
The above multilevel optimization technique has two other potential variants that
can handle the difficulty inherited from the non-differentiability. One way is to
avoid getting stuck at a wrong solution by using algebraic multigrid(AMG) to
automatically find a good coarsening scheme. The other way is to incorporate
some randomness in the relaxations to improve robustness, e.g. use simulated
annealing as a smoother.

All the existing multigrid methods aim to solve the primal formulations,
either the direct minimization problem(1.4) or the associated Euler-Lagrange
equation(2.2). One possible alternative is to apply multilevel optimization tech-
nique to the dual problem (1.12). The advantage of the dual multilevel method
is that the dual relaxation will not get stuck in a “local minimizer” as the for-
mulation is smooth. However, the real challenge here is how to deal with the

25
constraints. It is not clear that if we can reduce the number of the constraints in
a coarser level as we do to the unknowns. So for this technique to work, one has
to think creatively about the constraints, not only for our problem, but also for
general constrained optimizations.

2.1.7 Graph Cut Algorithm for Anisotropic ROF

Recently graph cut algorithms have been introduced as a fast and exact solver for
the discrete anisotropic TVL2 /TVL1 models (see [37, 38, 16, 72, 50] and many
others).
R
If the anisotropic TV ( Ω |ux | + |uy |) is used to replace the isotropic TV
R
( Ω
|∇u|) and the argument u is restricted to have discrete values, we have the
following discrete integer programming problem for the anisotropic ROF model:
λ
Z
min E(u) = (|ux | + |uy |) dxdy + ku − f k2 , (2.20)
u∈{0,1,··· ,L−1} Ω 2
where L = 28 for 8-bit images and L = 216 for 16-bit images.

The practical use of graph cut algorithm to solve (2.20) relies on the decom-
position of an image into its level sets and hence mapping the original problem
(2.20) into optimizations of independent binary problems. Using the discrete co-
area formula (see e.g., [37]), the objective function E in (2.20) can be decomposed
into a sum of independent functions E t which only depends on the sublevel set
of u:
L−2
X
E(u) = E t (ut ) + C.
t=0

Here, u is the characteristic functions on the sublevel set: ut = 1u≤t and C is a


t

constant independent of u.

To minimize E(·) one can minimize all binary problem E t (·) independently.
Thus we get a family {ūt } which are respectively minimizers of E t (·). Clearly

26
the summation will be minimized and thus we have a minimizer of provided this
family {ūt } is monotone:
ūt ≤ ūs ∀ t < s. (2.21)

If property (2.21) holds then the optimal solution ū is given by the recon-
struction formula from sublevel sets: ū(x) = min{t : ūt (x) = 1} . If property
(2.21) does not hold, then the family {ūt } is not a function.

Property (2.21) is proved to be true by various authors ([37, 16, 48]). Hence
the above optimization strategy is valid. Now note that each E t (ut ) is a binary
MRF with an Ising prior model and be solved efficiently using graph cut algo-
rithms. In [15], the author also did some comparison experiments regarding the
efficiency of the discrete graph cuts algorithm and some of methods we discussed
in this section.

2.2 Remarks

1. Recently, many variants and extensions to the original ROF model (1.4)
have been developed, which include TV and Elastica inpainting ([29], [26]),
blind deconvolution ([30]), multi-channel TV ([10]), L1 fidelity ([57], [22]),
multiscale texture decomposition ([52], [65]) and iterative regularization us-
ing Bregman distance function. Although the algorithms we listed in this
paper are all designed to solve the original ROF model (1.4), it can be
naturally adapted to other recently developed models we mentioned above
where the main computational challenge is still due to the total variation
term.

2. While each of the above methods might have its own comparative advan-

27
tage in certain situations, some benchmark rules still apply. Independent
of the mutigrid technique, there are two general categories: those that need
solve a linear system at each iteration and those that need not. To obtain a
visually satisfactory solution, the methods in the latter class are preferred
for their simplicity and fast initial convergence rate. Out of this category,
Chambolle’s method becomes very popular because of its simplicity, robust-
ness, no need to use numerical regularization and relative fast convergence.
On the other hand, if our goal is to obtain a state-of-art highly accurate
solution with low residual/duality gap, the locally superlinear Newton type
methods will win eventually even though they have to solve a linear system
at every iteration. In this category, The CGM method and SOCP are most
widely used for their robustness and fast asymptotic convergence. Finally,
if the problem is of very large scale, we need to apply multilevel schemes to
reduce complexity.

2.3 Motivations for Developing New Methods

As we have seen, most existing numerical algorithms to solve ROF models


(1.4) or (1.12) can be loosely divided into two categories: those that need to
solve a linear system of equations at each iteration (implicit) and those that
require only a matrix-vector multiplication in the discrete setting (explicit).
Generally speaking, the implicit methods (e.g. CGM, HS, and SOCP)
have fast asymptotic convergence rates and can provide highly accurate
benchmark solutions. However, explicit methods are preferred in many
situations for their simplicity and their convergence with relatively little
computational effort to medium-accurate and visually satisfactory results.
Their low memory requirements make them even more attractive for large-

28
scale problems. To illustrate the high memory requirements of implicit
schemes, we note that an image of size 512 × 512 is close to the limit of
what the SOCP solver MOSEK can handle on a workstation with 2GB of
memory.

In this chapter, we shall develop some simple yet efficient algorithms. They
are explicit so the memory requirement is low and each iteration only takes
O(N) operations. They converge very fast to visually satisfactory solutions
and also have much improved asymptotical convergence rate compared with
existing explicit methods. In all of our proposed algorithms, we try to ex-
ploit the use of the dual variable since a pure primal formulation usually
requires some numerical smoothing parameter that would prevent the re-
sulting algorithm from converging to the true optimizer. In other words,
our algorithm is either a primal-dual type algorithm or simply a duality
based algorithm.

29
CHAPTER 3

Primal-Dual Hybrid Gradient Descent Method

In this chapter, we propose a simple yet very efficient algorithm for total varia-
tion (TV) minimizations with applications in the image processing realm. This
descent-type algorithm alternates between the primal and dual formulations and
exploit the information from both the primal and dual variables. Therefore, it
is able to converge significantly faster than either pure primal/dual gradient de-
scent methods and some other popular existing methods as demonstrated in the
numerical experiments. Finally, we show that this idea works for other optimiza-
tion problems whose objective also involves the non-smooth L1 norm as in the
TV term.

3.1 The Proposed Algorithm

Previous developed gradient descent type methods are the primal time marching
method in [60] and Chambolle’s duality based semi-implicit gradient descent type
method in [14]. In this chapter, we refer to these two methods as the primal
gradient descent algorithm and dual gradient descent algorithm. They are showed
briefly as follows.

Primal ROF:
N
X λ
min kATl yk + ky − zk2 , (3.1)
y∈RN
l=1
2

30
Dual ROF:

max kAx − λzk2


x∈X

where X = {x : x ∈ R2N , kxl k ≤ 1 for l = 1, 2, · · · , N} (3.2)

Primal gradient descent algorithm (smoothed with β):


N
k+1 k
1 X Al ATl y k 
y = y − θk p + yk − z , (3.3)
λ kATl y k k2 + β
l=1

Primal gradient descent algorithm (unsmoothed subgradient):


N
1 X 
y k+1 = y k − θk Al xkl + y k − z , (3.4)
λ
l=1

where

Al ylk

kAl ylk k
, if Al ylk 6= 0
xkl = (3.5)
 any element in the unit ball B(0, 1) ⊂ R2 , else

Dual gradient descent algorithm (Chmbolle):


xkl − τk ATl (Axk − λz)
xk+1
l = . (3.6)
1 + τk kATl (Axk − λz)k

We notice that the primal gradient algorithm (3.3) or (3.4) works exclusively
on the primal formulation while the dual gradient algorithm focus exclusively on
the dual formulation. There is no cross communications between the primal vari-
able and dual variable. Our approach, on the other hand, work on the primal-dual
formulation and try to exploit the information of the primal and dual variables
simultaneously. We hope the use of both primal and dual information will speed
up the gradient descent algorithm.

Our approach can be most effectively illustrated under the setting of the
primal-dual formulation (1.8), which has the following discrete form:
λ
min max Φ(y, x) := y T Ax + ky − zk2 (3.7)
y∈RN x∈X 2

31
where

X = x = (x1 ; · · · ; xN ) : xl ∈ R2 , kxl k ≤ 1 for l = 1, · · · , N




Given any intermediatesolution (y k , xk ) at iteration step k, the proposed al-


gorithm updates the solution as follows.

1. Apply one step of (projected) gradient ascent method to the maximization


problem
max Φ(y k , x). (3.8)
x∈X

The ascent direction ∇x Φ(y k , xk ) = AT y k , so we update x as

xk+1 = PX xk + τk λAT y k ,

(3.9)

where τk is the (dual) stepsize and PX denotes the projection onto the
set X which can be simply computed in our case (see later remarks). The
factor λ is used here so that the stepsize τk is sensitive to different problems
or different scales of gray levels, which also explains the same situation in
(3.11).

2. Apply one step of gradient descent method to the minimization problem

min Φ(y, xk+1). (3.10)


y∈RN

The ascent direction is ∇y Φ(y k , xk+1 ) = Axk+1 + λ(y k − z) and therefore


the update is
1
y k+1 = y k − θk ( Axk+1 + y k − z), (3.11)
λ
where θk is the (primal) stepsize.

Put them all together, we have the following algorithm.

Algorithm PDHGD

32
Step 0. Initialization. Pick y 0 and a feasible x0 ∈ X, set k ← 0.

Step 1. Choose stepize τk and θk .

Step 2. Updating.

xk+1 = PX xk + τk λAT y k

(3.12a)
1
y k+1 = (1 − θk )y k + θk (z − Axk+1 ) (3.12b)
λ

Step 3. Terminate if a stopping criterion is satisfied;


otherwise set k ← k + 1 and return to step 1.

3.2 Remarks

1. The hybrid descent algorithm can also be developed as a (primal-dual)


proximal method:

1
xk+1 = arg max Φ(y k , x) − kx − xk k2 (3.13a)
x∈X 2λτk
λ(1 − θk )
y k+1 = arg min Φ(y, xk+1) + ky − y k k2 (3.13b)
y∈R N 2θk

2. The projection PX in (3.13a) can be computed in the following straightfor-


ward way:
  xl
PX (x) = , l = 1, 2, . . . , N. (3.14)
l max{kxl k, 1}
Here, since X is a Cartesian product of unit Euclidean balls, the above
operation (4.11) actually projects each 2 × 1 subvector of x separately onto
the unit ball in R2 .

33
3. Both problem (3.8) and (3.10) can be solved exactly, which would yield the
following updating formula (taking τk = ∞ and θk = 1 in (3.23)):

ATl y k
xk+1
l = , for l = 1, · · · , N (3.15a)
kATl y k k
1
y k+1 = z − Axk+1 . (3.15b)
λ

However, we choose not to do so since the above algorithm does not con-
verge.
As a special case, if we only solve subproblem (3.8) exactly (taking τk = ∞
in (3.23) ), the resulting algorithm would be
 1 X A AT y k 
l l
y k+1 = y k − θk + y k
− z , (3.16)
λ kATl y k k

The above algorithm is exactly a subgradient descent method for the primal
formulation (4.1).

Another special case is that we solve subproblem (3.10) exactly (taking


θk = 1 in (3.23)) and still apply gradient ascent method to (3.8). The
resulting algorithm is as follows
 
PX xk − τk AT (Axk − λz) , (3.17)

which is a (projected) gradient descent method for dual problem (3.2).

Hence, the primal subgradient descent method and the dual projected gradi-
ent descent method are two special cases of our algorithm, which correspond
to taking special stepsizes τk = ∞ and θk = 1 respectively in (3.23).

4. The convergence of the PDHGD algorithm is (empirically) observed for


a variety of suitable stepsize pairs (τ, θ). Moreover, the choice of stepsizes
are insensitive to different problems or scales of gray levels. In some range,
the stability constraint on the stepsizes seems more relevant to the product

34
τ θ rather than to τ or θ individually. For example, our experiments show
that convergence can be obtained for the stepsize pair (2, 0.2) as well as for
(4, 0.1). Finally, the numerical results also reveal that a pair of relatively
small τ and large θ gives faster initial convergence rate and the opposite
choice gives faster asymptotic convergence. Therefor, we can optimize the
performance of the algorithm through some strategy of choosing (τk , θk ),
although simple fixed stepsizes might already give satisfactory results.

3.3 Extensions

Our algorithm can be naturally extended to other TV image restoration models


without any major modifications.

Constrained ROF Model


The original constrained primal ROF model (1.1) has the following discrete
form
N
X
(P) min kATl yk, (3.18)
y∈Y
l=1

where Y ≡ {y ∈ RN : ky − zk ≤ nσ}.

The above problem (3.18) has the following primal-dual and dual formulations

(PD) min max y T Ax (3.19)


y∈Y x∈X

(D) max −nσkAxk + z T Ax (3.20)


x∈X

Both the primal problem (3.18) and dual problem (3.20) are difficult to solve
since their objectives are non-smooth. However, our proposed approach based on

35
the primal-dual formulation (3.19) are simply

τk T k 
xk+1 = PX xk + A y (3.21a)
σ
y k+1 = PY y k + σθk Axk+1 ,

(3.21b)

where the projection PY is given by

y−z
PY (y) = z +  (3.22)
max ky − zk/(nσ), 1

The full primal-dual hybrid gradient descent method for constrained ROF
model is shown as follows:

Algorithm PDHGDC

Step 0. Initialization. Pick y 0 and a feasible x0 ∈ X, set k ← 0.

Step 1. Choose stepize τk and θk .

Step 2. Updating.

τk T k 
xk+1 = PX xk + A y (3.23a)
σ
y k+1 = PY y k + σθk Axk+1 ,

(3.23b)

Step 3. Terminate if a stopping criterion is satisfied; otherwise set k ← k + 1


and return to step 1.

TV Debluring Model
The total variation based image restoration model (1.4) can be extended to

36
recover blurry and noisy image f by solving the following problem:

λ
Z
(P) min |∇u| + kKu − f k22 (3.24)
u Ω 2

where K is a given linear blurring operator and every other term is defined the
same as in (1.4). In this model, f is formulated as the sum of a Gaussian noise
v and a blurry image Ku resulting from the linear blurring operator K acting on
the clean image ū, i.e., f = Ku + v.

Among all linear blurring operators, many are shift-invariant and can be ex-
pressed in the form of convolution:
Z
(Ku)(x) = (h ∗ u)(x) = h(x − y) u(y) dy, (3.25)

where h is the given point spread function (PSF) associated with K.

The discrete form of model (3.24) is


N
X λ
min kATl yk + kBy − zk2 , (3.26)
y∈RN
l=1
2

where B is the discretization of the blurring operator K.

The primal-dual and dual formulation of (3.26) can be obtained as follows

λ
(PD) min max y T Ax + kBy − zk2 (3.27)
y∈R N x∈X 2

1 λ
(D) max − kB −1 Ax − λzk + kzk2 . (3.28)
x∈X 2λ 2

However, the blurring matrix B is highly ill-posed (non-invertible in some


cases), making it difficult if not impossible to compute the inverse B −1 . Therefore
the dual formulation (3.28) is of little use in practice and sometimes may not even
exist. On the other hand, our primal-dual hybrid gradient descent algorithm
based on formulation (3.27) still works well. The core part of the algorithm are

37
given as follows

xk+1 = PX xk + τk λAT y k

(3.29a)
1 
y k+1 = y k − θk Axk+1 + B T (By k − z) . (3.29b)
λ

The full primal-dual hybrid gradient descent method for TV debluring is:

Algorithm PDHGD Deblur

Step 0. Initialization. Pick y 0 and a feasible x0 ∈ X, set k ← 0.

Step 1. Choose stepize τk and θk .

Step 2. Updating.

xk+1 = PX xk + τk λAT y k

(3.30a)
1 
y k+1 = y k − θk Axk+1 + B T (By k − z) . (3.30b)
λ

Step 3. Terminate if a stopping criterion is satisfied;


otherwise set k ← k + 1 and return to step 1.

3.4 Theoretical Connections

Our method is related to projection type methods existing in the literature for
finding saddle points and, more generally, solutions to variational inequalities.
In this section, we shall discuss very briefly about the framework of projection
methods for solving variational inequalities and point out the connections and

38
difference between our method and previous work. We refer interested readers to
the survey papers [45] and [71] for the background of this area.

Let H be a real Hilbert space (in our case, Rn ), whose inner product and
norm are denoted by h·i, and k · k respectively. Let K be a closed convex set in H
and F be a mapping from H into itself. We now consider the problem of finding
v ∗ ∈ K such that
hv − v ∗ , F (v ∗ )i ≥ 0, ∀ v ∈ K. (3.31)

The above problem is called a variational inequality problem with v ∗ being one
of its solution. We denote the above variational inequality problem by VI(K, F ).
In most real applications, K is convex and F satisfy some monotonicity and
Lipschitz continuity properties, which we defined as follows:

Definition 1. F is said to be

(i) monotone if hu − v, F (u) − F (v)i ≥ 0 ∀ u, v ∈ H.

(ii) strongly monotone if

∃ ν > 0 s.t. hu − v, F (u) − F (v)i ≥ νku − vk2 ∀ u, v ∈ H.

(iii) pseudomonotone if hu − v, F (v)i > 0 ⇒ hu − v, F (u)i ≥ 0 ∀ u, v ∈ H.

(iv) Lipschitz continuous if ∃ L > 0 s.t. kF (u) − F (v)k ≤ Lku − vk ∀ u, v ∈ H.

Finding a saddle point (y ∗, x∗ ) to the min-max problem

min max Φ(y, x)


y∈Y x∈X

can be written as a special case of the variational inequality problem:

find v ∗ ∈ K s.t. hv − v ∗ , F (v ∗ )i ≥ 0 ∀ v ∈ K, (3.32)

where    
y Φy (x, y)
v=  F (v) =   and K = Y × X.
x −Φx (x, y)

39
In particular Our ROF problem (3.19) and (3.7) can both be transformed
into a variational inequality problem VI(K, F ) in (3.32) with F and K defined
as follows.

For unconstrained ROF (3.19):


 
Ax + λ(y − z)
F (v) =   and K = RN × X.
−AT y

For constrained ROF (3.7):


 
Ax
F (v) =   and K = Y × X.
T
−A y

Variational inequality problem is closely related to the fixed-point problem.


The fixed-point theory has played an important role in the development of various
algorithms for solving variational inequalities. In fact we have the following well-
known result (see, e.g. [8, pp. 267]):

Lemma 1. v ∗ is a solution of VI(K, F ) if and only if

v ∗ = PK v ∗ − αF (v ∗)

for any α > 0.

The fixed-point formulation in the above lemma suggests the simple iterative
algorithm of solving for u∗ .

VI Algorithm 1.
v k+1 = PK v k − αk F (v k ) .

(3.33)

The convergence of the above algorithm requires F to be strongly monotone


and Lipschitz continuous, which is too restrictive in many cases. An alternative
approach is to consider the following ‘implicit’ iterative scheme

40
VI Algorithm 2.
v k+1 = PK v k − αk F (v k+1) .

(3.34)

The convergence of this new algorithm only requires monotonicity of F but


it is often difficulty to solve the implicit update at each iteration, making it less
practical.

To overcome the drawbacks of the projection methods defined in (3.33) and


(3.34), Korpelevich [51] first proposed a modified method called extragradient
algorithm. It consists two projections at each iteration: a predictor step and a
corrector step.

VI Algorithm 3.

v̄ k = PK v k − αk F (v k )

(3.35a)

v k+1 = PK v k − αk F (v̄ k )

(3.35b)

Global convergence is proved for the above algorithm if F is pseudomonotone


or Lipschitz continuous or the problem satisfy some local error bound (see [71]),
provided the step size αk is small enough to satisfy

αk kF (v k ) − F (v̄ k )k ≤ µkv k − v̄ k k for some fixed µ ∈ (0, 1),

which can be obtained by simple Armijo type line search.

There are many other variants of the original extragradient algorithm with
different predictor search rule and corrector step size aiming to improve perfor-
mance (see [58] and [71]). New related developments in this direction can also
be found in [54] and [58], where the final solution is obtained by averaging along
the solution path.

Our ROF problem (3.19) and (3.7) can both be transformed into a variational
inequality problem VI(K, F ) with a monotone and Lipschitz continuous mapping

41
F . Some of the existing algorithms can be applied directly with proved global
convergence. However, numerical experiments show that none of these existing
methods has comparable performance to our algorithm. There are many possi-
ble explanations for this. First of all, in the variational inequality setting the
variables y and x are combined as one variable u and have to be updated in one
step with same steplength; while in our approach the primal y and dual x are
updated alternatively in a Gauss-seidal type of way with freedom to choose their
own step sizes. More importantly, all the existing algorithms are developed to
solve variational inequalities as a general class; while our method exploits the par-
ticular information of the problem, including the bilinear function F and special
structure of the set K, which allow us to choose optimal step size to improve the
performance. On the other hand, our approach is lacking of a global convergence
proof which would be useful to provide some benchmark rules and can help us
better understand how the algorithm works.

3.5 Implementation Details and Numerical Experiments

We report on computational experiments for three test problems in image de-


noising. All the programs are run in an IBM T60 Notebook PC with 1.83 GHz
Intel Core Duo CPU and 1G RAM. All methods are coded in MATLAB. It is
expected that the performance can be improved by recoding in C or C++, but
we believe that improvements would be fairly uniform across all the algorithms.

We test three problems on image denoising. The original clean images and the
input noisy images are shown in Figure 7.1. The size of the two test problems are
256 × 256, 512 × 512 and 512 × 512 respectively. The noisy images are generated
by adding Gaussian noises with standard deviation σ = 20 to the original clean
images.

42
The parameter λ in the unconstrained ROF model (3.1) is inverse related to
the noise level σ and usually need to be tuned for each individual image. In our
case, λ is chosen in the following way. We first compute the constrained ROF
model by algorithm PDHGDC for the optimality solution (y ∗, x∗ ). Then the
particular λ that will make the unconstrained ROF model (3.1) equivalent to the
constrained model (3.18) is given by

kAx∗ k
λ= . (3.36)

For our test problems, the parameters λ obtained in the above way are 0.053, 0.037
and 0.049 respectively.

We tested the following algorithms for denoising problems:

• Chambolle’s semi-implicit gradient descent method [14];

• Primal-dual hybrid descent methods proposed in Section 3.1;

• The CGM method of [25].

Although suitable constant stepsizes will give good convergence results, the
power of our proposed algorithm shall be most exploit with some optimal strategy
of choosing stepsizes (τk , θk ). Throughout the experiments, we use the following
stepsize strategy:

5
Algorithm PDHGD τk = 0.2 + 0.08k, θk = (0.5 − 15+k
)/τk ;

Algorithm PDHGDC τk = 0.2 + 0.08k θk = 0.5/τk .

In Chambolle’s method, we take the time step to be 0.248 for its near optimal
performance. In the CGM implementation, we used a direct solver for the linear
system at each iteration (the conjugate gradient iterative solver was slower on

43
these examples). The smooth parameter β is dynamically updated based on
duality gap rather than fixed. In particular we take β (0) = 100 and let β (k) =
(k) 2
β (k−1) · GG(k−1) . We noticed that this simple strategy of updating β borrowed
from interior-point methods outperforms the classical CGM measured by the
decrease of duality gap.

The decision about when an approximate solution is of sufficiently high quality


to terminate the algorithm can be difficult for general constrained optimization
problems. Often, we wish the approximate solution x to be close to a global
minimizer x∗ and/or the function value F (x) be close to F (x∗ ). In the denoising
case, the duality gap provides a reliable and easily calculated stopping criterion.
G(y k ,xk )
We terminate our program whenever the relative duality gap D(xk )
reaches a
pre-specified tolerance threshold tol.

Tables 6.1 and 6.2 report numbers of iterations and CPU times required by
two P DHD algorithms as well as by Chambolle’s algorithm and CGM method
for the relative duality gap to achieve certain threshold. In all codes, we used the
same starting point (y 0, x0 ) = (z, 0). (Convergence does not depend on initial
conditions though.) We vary the threshold tol from 10−2 to 10−6 , producing
results of increasingly high accuracy as tol is decreased.

The results in the tables demonstrates that our proposed approaches is very
competitive to existing methods. They are the winners for all tests with different
stopping criterions tol = 10−2 , 10−4 , 10−6. It is significantly faster than Cham-
bolle’s method to obtain medium-high accurate solutions and significantly faster
than CGM method to obtain low-medium accurate solutions.

Figure 3.2 shows the denoised images obtained at different values of tol.
Note that visually there is little difference between the results obtained with two
tolerance values 10−2 and 10−4. Smaller values of tol do not produce further

44
visual differences.

Figure 7.2 plots the relative duality gap against the number of iterations
for Chambolle’s method as well as for the PDHGD algorithm. It shows that
the PDHGD converges much faster than Chambolle’s method, especially when
a medium-high accurate solution is required.

3.6 Conclusion

We have proposed a primal-dual hybrid gradient descent method to solve the total
variation based image restoration model of Rudin, Osher and Fatemin (ROF) [60].
The algorithm tries to improve performance by alternating between the primal
and dual variable and exploit information from both variables. We compare
our method with two popular existing approaches proposed by Chambolle [14]
and Chan, Golub, and Mulet [25] and show our method is consistently faster
than earlier approaches in all experiments with different stopping criterions. The
complexity of our method is O(N) at each iteration, in fact its cost is basically the
same as Chamboll’s method at each iteration; while the cost of the CGM method
3
at each iteration is O(N 2 ) for solving the block tri-diagonal linear system.

Our algorithm can be applied to solve both the unconstrained ROF and con-
strained ROF model and, in theory, it can be applied to solve other TV min-
imization model or L1 minimization problem by transforming it to a min-max
form. We also pointed out that our algorithm is related to existing projection
type methods for solving variational inequalities.

45
Figure 3.1: Denoising test problems. Left: original clean image. Right:
noisy image with Gaussian noise (σ = 20). First row: test problem 1,
256 × 256 cameraman. Middle row: test problem 2, 512 × 512 barbara. Bottom
row: test problem 3, 512 × 512 boat

46
Table 3.1: Number of iterations and CPU times (in seconds) for problem 1.
tol = 10−2 tol = 10−4 tol = 10−6
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 45 1.77 1213 61.3 22597 1159
PDHD 14 0.45 73 2.28 328 10.4
PDHDC 14 0.46 70 2.27 308 9.98
CGM 6 20.3 14 49.8 19 68.0

Table 3.2: Number of iterations and CPU times (in seconds) for problem 2.
tol = 10−2 tol = 10−4 tol = 10−6
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 141 26.5 3083 578 47935 8988
PDHD 25 3.66 117 23.7 541 110
PDHDC 24 3.66 113 22.9 519 108
CGM 7 191 15 417 23 642

Table 3.3: Number of iterations and CPU times (in seconds) for problem 3.
tol = 10−2 tol = 10−4 tol = 10−6
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 61 11.4 1218 228 22235 4168
PDHD 16 2.36 72 10.6 320 47.2
PDHDC 16 2.38 71 10.7 316 47.5
CGM 7 189 14 382 20 547

47
Figure 3.2: The denoised images with different level of termination criterions.
left column: tol = 10−2 , right column: tol = 10−4 .

48
0
10
Chambolle
PDHGD
−2

Relative Duality Gap


10

−4
10

−6
10

−8
10
100 200 300 400 500 600 700 800
Iterations
0
10
Chambolle
PDHGD
−2
Relative Duality Gap

10

−4
10

−6
10

−8
10
100 200 300 400 500 600 700 800
Iterations
0
10
Chambolle
PDHGD
−2
Relative Duality Gap

10

−4
10

−6
10

−8
10
100 200 300 400 500 600 700 800
Iterations
P (y )−D(x k k)
Figure 3.3: Plot of relative duality gap49 D(xk )
v.s. Iterations. Top left: test
problem 1. Top right: test problem2. Bottom: test problem 3.
CHAPTER 4

Duality Based Gradient Projection Method

The discrete primal and dual formulations of the unconstrained ROF model is
given as
N
X λ
(P) min kATl yk2 + ky − zk22 (4.1)
y 2
l=1

1
(D) min kAx − λzk2
x∈X 2
where X ≡ x = (x1 ; · · · ; xN ) : xl ∈ R2 , kxl k ≤ 1 for l = 1, · · · , N ,

(4.2)

In this chapter, we report on the development, implementation, and testing


of some simple but fast explicit algorithms. These algorithms are based on the
following dual formulation (4.2) so they do not require any numerical smoothing
parameters that would prevent them from converging to the true optimizer.

4.1 A Fundamental Convergence Result

Before we discuss about the proposed algorithm we make several remarks on the
discretized problems (4.1), (4.2) and prove a general convergence result that will
be useful later. It is easy to verify that both problems can be obtained from the
function ℓ : RN × X → R defined as follows:

λ
ℓ(y, x) := xT AT v + ky − zk22 . (4.3)
2

50
The primal problem (4.1) is simply

min max ℓ(y, x),


v∈RN x∈X

while the dual problem (4.2) is equivalent to

max min ℓ(y, x).


x∈X v∈RN

It is easy to verify that the conditions (H1), (H2), (H3), and (H4) of [47, pp. 333-
334] are satisfied by this setting. Thus, it follows from [47, Chapter VII, Theo-
rem 4.3.1] that ℓ has a nonempty convex set of saddle points (ȳ, x̄) ∈ RN × X.
Moreover, from [47, Chapter IV, Theorem 4.2.5] and compactness of X, the point
(ȳ, x̄) ∈ RN × X is a saddle point if and only if ȳ solves (4.1) and x̄ solves (4.2).

Note that by strict convexity of the objective in (4.1), the solution ȳ of (4.1) is
in fact uniquely defined. For any saddle point (ȳ, x̄), we have that ℓ(ȳ, x̄) ≤ ℓ(y, x̄)
for all v ∈ RN , that is, ȳ is a minimizer of ℓ(·, x̄). Thus, from optimality conditions
for ℓ(·, x̄), the following relationship is satisfied for the unique solution ȳ of (4.1)
and for any solution x̄ of (4.2):

Ax̄ + λ(ȳ − z) = 0. (4.4)

By uniqueness of ȳ, it follows that Ax̄ is constant for all solutions x̄ of (4.2).

The following general convergence result will be useful in our analysis of al-
gorithms in Section 4.2.

Proposition 1. Let {xk } be any sequence with xk ∈ X for all k = 1, 2, . . . such


that all accumulation points of {xk } are stationary points of (4.2). Then the
sequence {v k } defined by
1
v k = g − Axk (4.5)
λ
converges to the unique solution ȳ of (4.1).

51
Proof. Note first that all stationary points of (4.2) are in fact (global) solutions
of (4.2), by convexity.

Suppose for contradiction that v k 6→ ȳ. Then we can choose ǫ > 0 and a
subsequence S such that kv k − ȳk2 ≥ ǫ for all k ∈ S. Since all xk belong to
the bounded set X, the sequence {xk } is bounded, so {v k } is bounded also. In
particular, the subsequence {v k }k∈S must have an accumulation point v̂, which
must satisfy kv̂ − ȳk2 ≥ ǫ > 0. By restricting S if necessary, we can assume that
limk∈S v k = v̂. By boundedness of {xk }, we can further restrict S to identify a
point x̂ ∈ X such that limk∈S xk = x̂. By (4.5), we thus have

Ax̂ + λ(v̂ − z) = 0 = lim Axk + λ(v k − z) = 0, (4.6)


k∈S

Since x̂ is an accumulation point of the whole sequence, we have by assumption


that x̂ is a stationary point and hence a solution of (4.2). By our observation
following (4.4), we thus have that Ax̂+λ(ȳ−z) = 0, where ȳ is the unique solution
of (4.1). By comparing this expression with (4.6), we obtain the contradiction
v̂ = ȳ, proving the result.

4.2 Gradient Projection Algorithms

4.2.1 Introduction

Our proposed approaches are essentially gradient projection algorithms applied


to (4.2), in which the search path from each iterate is obtained by projecting
negative-gradient (steepest descent) directions onto the feasible set. Various en-
hancements involving different step-length rules and different line-search strate-
gies are important in making the method efficient.

For the general problem of optimizing a smooth function over a closed convex

52
set, that is,
min F (x) (4.7)
x∈X

(where F : Rm → R is smooth and X is a closed convex subset of Rm ), gradient


projection methods set

xk+1 = xk + γk (xk (αk ) − xk ), (4.8)

for some parameter γk ∈ [0, 1], where

xk (αk ) := PX (xk − αk ∇F (xk )), (4.9)

for some αk > 0. Here, PX denotes the projection onto the set X. Since X is
closed and convex, the operator PX is uniquely defined, but in order for the gradi-
ent projection approach to make practical sense, this operator must also be easy
to compute. For this reason, gradient projection approaches have been applied
most often to problems with separable constraints, where X can be expressed
as a Cartesian product of low-dimensional sets. In our case (4.2), X is a cross
product of unit balls in R2 , so computation of PX requires only O(N) operations.
Bertsekas [7] gives extensive background on gradient projection algorithms.

From now on, we will focus on the solution of problem (4.2), which we restate
here:
1
min F (x) := kAx − λgk22, (4.10)
x∈X 2
where the compact set X ⊂ R2N is defined in (4.2). In this section, we discuss
GP techniques for solving this problem. Our approaches move from iterate xk to
the next iterate xk+1 using the scheme (4.8)-(4.9).

Projection PX on the set X, a Cartesian product of unit Euclidean balls, can


be computed straightforwardly as follows.
  xl
PX (x) = , l = 1, 2, . . . , N. (4.11)
l max{kxl k, 1}

53
This operation projects each 2 × 1 subvector of x separately onto the unit ball
in R2 . It is worth pointing out here that this structure of the dual constraints,
which makes the gradient projection approach practical, also enables Chambolle
to develop an analytical formula for the Lagrange multipliers in [14].

Our approaches below differ in their rules for choosing the step parameters
αk and γk in (4.8) and (4.9).

4.2.2 Three Frameworks

We next consider three gradient projection frameworks that encompass our gra-
dient projection algorithms, and present convergence results for methods in these
frameworks.

Framework GP-NoLS chooses αk in some predetermined range and


sets γk ≡ 1.

Framework GP-NoLS

Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax .
Choose x0 and set k ← 0.

Step 1. Choose steplength αk ∈ [αmin , αmax ].

Step 2. Set xk+1 = xk (αk ).

Step 3. Terminate if a stopping criterion is satisfied; otherwise set k ← k + 1


and go to Step 1.

54
Framework GP-ProjArc also sets γk ≡ 1, but chooses αk by a backtracking
line search to satisfy an Armijo criterion, which enforces monotonic decrease of
F . This approach is referred to by Bertsekas [7, p. 236] as Armijo Rule Along
the Projection Arc.

Framework GP-ProjArc

Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax ,
and choose ρ ∈ (0, 1) and µ ∈ (0, 12 ). Choose x0 and set k ← 0.

Step 1. Choose initial steplength ᾱk ∈ [αmin , αmax ].

Step 2. Backtracking Line Search. Choose m to be the smallest nonneg-


ative integer such that

F (xk (ρm ᾱk )) ≤ F (xk ) − µ∇F (xk )T (xk − xk (ρm ᾱk )),

where xk (α) is defined as in (4.9);


Set αk = ρm ᾱk and xk+1 = xk (αk ).

Step 3. Terminate if a stopping criterion is satisfied; otherwise set k ← k + 1


and go to Step 1.

Framework GP-LimMin fixes αk at the start of each iteration (possibly using


information gathered on previous steps), but then performs a “limited minimiza-
tion” procedure to find γk , again ensuring decrease of F at every step.

Framework GP-LimMin

55
Step 0. Initialization. Choose parameters αmin , αmax with 0 < αmin < αmax .
Choose x0 and set k ← 0.

Step 1. Choose steplength αk ∈ [αmin , αmax ]. Compute xk (αk ) and set δ k :=


(xk (αk ) − xk ).

Step 2. Limited Minimizing Line Search. Set xk+1 = xk + γk δ k , with


γk = mid(0, γk,opt , 1) and

k k −(δ k )T ∇F (xk )
γk,opt = arg min F (x + γδ ) = (4.12)
kAδ k k22

Step 3. Terminate if a stopping criterion is satisfied; otherwise set k ← k + 1


and go to Step 1.

The first algorithm we consider — Algorithm GPCL — is obtained from


Framework GP-NoLS by setting αk equal to the fixed value α > 0 at every step.
Convergence is obtained for all α sufficiently small, as we now show.

Theorem 1. Let {xk } be a sequence generated by Algorithm GPCL. Then if


0 < α < .25, the sequence v k obtained from (4.5) converges to the unique solution
ȳ of (4.1).

Proof. Given any two vectors x′ and x′′ we have that

∇F (x′ ) − ∇F (x′′ ) = AT A(x′ − x′′ ),

so the Lipschitz constant for ∇F is kAT Ak2 , which is bounded by 8 (see [14,
p. 92]). It follows immediately from [7, Proposition 2.3.2] that every accumulation
point of {xk } is stationary for (4.10) provided that 0 < α < .25. The result now
follows immediately from Proposition 1.

56
The upper bound of .25 in Theorem 1 is tight; we observe in practice that the
method is unstable evn for τ = .251.

We consider other algorithms in Framework GP-NoLS below, in which αk


takes on different values at each iteration that may violate the bound of .25. These
methods may be non-monotone (the function F may increase on some iterations)
and convergence results cannot be proven in general without the addition of step
acceptance criteria, such as the requiring a significant improvement over the worst
function value at the last M points visited for some positive integer M; see [9].
Strategies are also required for modifying steps that fail these criteria.

For algorithms in Framework GP-ProjArc , we have the following convergence


result.

Theorem 2. Let {xk } be a sequence generated by an algorithm in Framework


GP-ProjArc . Then the sequence v k obtained from (4.5) converges to the unique
solution ȳ of (4.1).

Proof. Proposition 2.3.3 of Bertsekas [7], with minor modifications for the vari-
able choice of ᾱk within the range [αmin , αmax ], shows that all limit points of {xk }
are stationary. The result then follows from Proposition 1.

An identical result holds for algorithms in Framework GP-LimMin .

Theorem 3. Let {xk } be a sequence generated by an algorithm in Framework


GP-LimMin . Then the sequence v k obtained from (4.5) converges to the unique
solution ȳ of (4.1).

Proof. Proposition 2.3.1 of Bertsekas [7], with minor modifications for the vari-
able choice of ᾱk within the range [αmin , αmax ], shows that all limit points of {xk }
are stationary. The result then follows from Proposition 1.

57
4.2.3 Barzilai-Borwein Strategies

We discuss strategies that choose αk using approaches first proposed by Barzilai


and Borwein [6] (BB) and subsequently elaborated by other authors. For the
problem minx∈R2N F (x), the basic BB strategy sets xk+1 ← xk − αk ∇F (xk ),
where αk is chosen so that αk−1 I mimics the behavior of the Hessian ∇2 F over
the previous step. By Taylor’s theorem, we have

∇2 F (xk )∆xk−1 ≈ ∆g k−1, ∆xk−1 ≈ (∇2 F (xk ))−1 ∆g k−1 ,

where
∆xk−1 := xk − xk−1 , ∆g k−1 := ∇F (xk ) − ∇F (xk−1 ),

so our desired property on α is that α−1 ∆xk−1 ≈ ∆g k−1 . Note that for the F we
consider here (4.10), we have ∆g k−1 = AT A∆xk−1 .

One formula for α is obtained by performing a least-squares fit in one variable,


as follows:  −1
αk,1 = arg min kτ ∆x k−1
− ∆g k−1 k22 ,
τ ∈R

which yields
k∆xk−1 k22 k∆xk−1 k22
αk,1 = = . (4.13)
h∆xk−1 , ∆g k−1i kA∆xk−1 k22
An alternative formula is obtained similarly, by doing a least-squares fit to α
rather than α−1 , to obtain
h∆xk−1 , ∆g k−1i kA∆xk−1 k22
αk,2 = arg min k∆xk−1 − α∆g k−1k22 = = . (4.14)
α∈R k∆g k−1 k22 kAT A∆xk−1 k22
These step lengths were shown in [6] to be effective on simple problems; a par-
tial analysis explaining the behavior was given. Numerous variants have been
proposed recently, and subject to with theoretical and computational evaluation.
The BB stepsize rules have also been extended to constrained optimization, par-
ticularly to bound-constrained quadratic programming; see, for example [36] and
[62]. The same formulae (4.13) and (4.14) can be used in these cases.

58
Other variants of Barzilai-Borwein schemes have been proposed by other au-
thors in other contexts. The cyclic Barzilai-Borwein (CBB) method proves to
have better performance than the standard BB in many cases (see for example
[35] and the references therein). In this approach, we recalculate the BB stepsize
from one of the formulae (4.13) or (4.14) at only every mth iteration, for some
integer m. At intervening steps, we simply use the last calculated value of αk .
There are alternating Barzilai-Borwein (ABB) schemes that switch between the
definitions (4.13) and (4.14), either adaptively or by following a fixed schedule.

4.2.4 Implemented Variants of Gradient Projection

We discuss here the variants of gradient projection that were implemented in our
computational testing.

Algorithm GPLS. This algorithm falls into GP-PA, where we choose the ini-
tial steplength ᾱk at each iteration by predicting what the steplength would be
if no new constraints were to become active on this step. Specifically, we define
the vector g k by

(∇F (xk ))l ,

if kxkl k2 < 1 or (∇F (xk ))Tl xkl > 0,
k
gi =
I − xk (xk )T  (∇F (xk )) , otherwise.

l l l

We then choose the initial guess to be

ᾱk = arg min F (xk − αg k ),


α

which can be computed explicitly as


(g k )T ∇F (xk ) kg k k22
ᾱk = = .
kAg k k22 kAg k k22
In practice, we find that using 12 ᾱk as the initial value gives better performance,
and backtracking is not necessary in any of our numerical experiments.

59
Algorithm GPBB-NM. This is a nonmonotone Barzilai-Borwein method in
GP-NoLS, in which we obtain the step αk via the formula (4.13), projected if
necessary onto the interval [αmin , αmax ].

Algorithm GPBB-M. A monotone Barzilai-Borwein method in GP-LM, in


which αk is obtained as in Algorithm GPBB-NM.

Algorithm GPCBB-m. A monotone cyclic Barzilai-Borwein algorithm in


GP-LM, in which αk is recalculated from (4.13) at every mth iteration. For-
mally, we set

BB
αml+i = αml+1 for l = 0, 1, 2, . . . and i = 1, 2, . . . , m − 1,

BB
where αml+1 is obtained from (4.13) with k = ml + 1, restricted to the interval
[αmin , αmax ].

Algorithm GPABB. A monotonic alternating Barzilai-Borwein method in


GP-LM, in which the technique of Serafini, Zanghirati, and Zanni [62, Sec-
tion 2.2], is used to switch between the rules (4.13) and (4.14). This technique
makes use of two positive integer parameters nmin and nmax with 0 < nmin ≤ nmax .
Let nα be the number of consecutive iterations that use the same steplength se-
lection rule, (4.13) or (4.14). We switch from one rule to the other at the next
iteration k + 1 if either (i) nα ≥ nmax or (ii) nα ≥ nmin and αk is either a separat-
ing steplength or a bad descent generator. The current step αk is a separating
steplength if it lies between the values generated by the two rules at the next
iteration, that is, αk+1,2 < αk < αk+1,1 . Given two constants γl and γu with
0 < γl ≤ 1 ≤ γu , we say that αk is a bad descent generator if one of the following
conditions holds:

60
(a) γk,opt < γl and αk = αk,1; or

(b) γk,opt > γu and αk = αk,2.

where γk,opt is obtained from the limited minimization rule (4.12). We refer
interested readers to [62] for the rationale of the criterion. In any case, the
chosen αk is adjusted to ensure that it lies in the interval [αmin , αmax ].

4.3 A Sequential Quadratic Programming Algorithm

We describe here a variation on the techniques of the previous section in which


the curvature of the boundary of the constraint set X is accounted for in com-
puting the search direction. The method can be viewed as a sequential quadratic
programming (SQP) method applied to the dual formulation (4.10). The KKT
optimality conditions for this formulation can be written as follows:

ATl (Ax − λg) + 2zl xl = 0, l = 1, 2, . . . , N,

0 ≤ zl ⊥ kxl k2 − 1 ≤ 0, l = 1, 2, . . . , N,

where the scalars zl are Lagrange multipliers for the constraints kxl k22 ≤ 1, l =
1, 2, . . . , N, and the operator ⊥ indicates that at least one of its two operands
must be zero. At iteration k, we compute an estimate of the active set Ak ⊂
{1, 2, . . . , N}, which are those indices for which we believe that kxl k22 = 1 at the
solution. In our implementation, we choose this set as follows:

Ak = {l | kxkl k2 = 1 and (xkl )T [∇F (xk )]l ≤ 0}

= {l | kxkl k2 = 1 and (xkl )T ATl (Axk − λg) ≤ 0}. (4.15)

61
The SQP step is a Newton-like step for the following system of nonlinear equa-
tions, from the current estimates xk and zlk , l = 1, 2, . . . , N:

ATl (Ax − λg) + 2xl zl = 0, l = 1, 2, . . . , N, (4.16a)

kxl k22 − 1 = 0, l ∈ Ak , (4.16b)

zl = 0, l∈
/ Ak . (4.16c)

Using z̃lk+1 to denote the values of zl at the next iterate, and d˜k to denote the step
in xk , a “second-order” step can be obtained from (4.16) by solving the following
system for d˜k and z̃lk+1 , l = 1, 2, . . . , N:

ATl Ad˜k + 2z̃lk+1 d˜kl = −ATl [Axk − λg] − 2xkl z̃lk+1 , l = 1, 2, . . . , N, (4.17a)

2(xkl )T d˜kl = 0, l ∈ Ak , (4.17b)

z̃lk+1 = 0, l ∈
/ Ak . (4.17c)

We now define Newton-like steps dk in x, and new iterates z k+1 in z, by replacing


AT A by αk−1 I in (4.17a) and solving the following linear system:

αk−1 dkl + 2zlk+1 dkl = −ATl [Axk − λg] − 2xkl zlk+1 , l = 1, 2, . . . , N, (4.18a)

2(xkl )T dkl = 0, l ∈ Ak , (4.18b)

zlk+1 = 0, l ∈
/ Ak . (4.18c)

Considering indices l ∈ Ak , we take the inner product of (4.18a) with xkl and use
(4.18b) and (4.15) to obtain:

zlk+1 = −(1/2)(xkl )T ATl (Axk − λg), l ∈ Ak .

We obtain the steps dkl for these indices by substituting this expression in (4.18a):

dkl = −(αk−1 + 2zlk+1 )−1 ATl (Axk − λg) + 2xkl zlk+1 , l ∈ Ak .


 

62
In fact, because of (4.18c), this same formula holds for l ∈
/ Ak , when it reduces
to the usual negative-gradient step

dkl = −αk ATl (Axk − λg), l ∈


/ Ak .

We define the (nonmonotone) Algorithm SQPBB-NM by making an initial


choice of αk at each iteration according to the formula (4.13), and calculating
xk+1 = xk + dk and z k+1 as described above. In the monotone variant of this
method, known as SQPBB-M, we successively decrease αk by a factor of ρ ∈ (0, 1),
as in Framework GP-ProjArc , and recalculate xk+1 and z k+1 as above, until a
decrease in the objective function is obtained.

We also tried versions of these methods in which αk was recalculated only on


every mth iteration; these are referred to as SQPBB-NM(m) and SQPBB-M(m),
respectively.

4.4 Computational Experiments

We report on computational experiments for three test problems in image de-


noising. The original clean images and the input noisy images are shown in
Figures 5.2 and 5.3, respectively. The sizes of the discretizations for the three
test problems are 128 × 128, 256 × 256, and 512 × 512, respectively. The noisy
images are generated by adding Gaussian noise to the clean images using the
MATLAB function imnoise, with variance parameter set to 0.01. The fidelity
parameter λ is taken to be 0.045 throughout the experiments. This parameter
is inversely related to the noise level σ and usually needs to be tuned for each
individual image to get an optimal visual result.

We tested the following algorithms:

63
• Chambolle’s semi-implicit gradient descent method [14];

• many variants of gradient projection proposed in Section 4.2;

• the SQP method of Section 4.3;

• the CGM method of [25].

We report on a subset of these tests here, including the gradient projection vari-
ants that gave consistently good results across the three test problems.

In Chambolle’s method, we take the step to be 0.248 for near optimal perfor-
mance, although global convergence is proved in [14] only for steps in the range
(0, .125). We use the same value αk = 0.248 in Algorithm GPCL, as it appears
to be near optimal in this case as well.

For all gradient projection variants, we set αmin = 10−5 and αmax = 105 .
(Performances are insensitive to these choices, as long as αmin is sufficiently small
and αmax sufficiently large.) In Algorithm GPLS, we used ρ = 0.5 and µ = 10−4 .
In Algorithm GPABB, we set γl = 0.1 and γu = 5.

We also tried variants of the GPBB methods in which the initial choice of
αk was scaled by a factor of 0.5 at every iteration. We found that this variant
often enhanced performance. This fact is not too surprising, as we can see from
Section 4.3 that the curvature of the boundary of constraint set X suggests that
it is appropriate to add positive diagonal elements to the Hessian approximation,
which corresponds to decreasing the value of αk .

In the CGM implementation, we used a direct solver for the linear system
at each iteration, as the conjugate gradient iterative solver (which is an option
in the CGM code) was slower on these examples. The smooth parameter β
is dynamically updated based on duality gap from iteration to iteration. In

64
particular, we take β0 = 100 and let βk = βk−1 (Gk /Gk−1 )2 , where Gk and Gk−1
are the duality gaps for the past two iterations. This simple strategy for updating
β, which is borrowed from interior-point methods, outperforms the classical CGM
approach, producing faster decrease in the duality gap.

All methods are coded in MATLAB. It is likely the performance can be im-
proved by recoding in C or C++, but we believe that improvements would be
fairly uniform across all the algorithms.

Tables 6.1, 6.2, and 6.3 report number of iterations and average CPU times
over ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point x(0) = 0 in each algorithm and
the relative duality gap stopping criterion. We vary the threshold tol from 10−2
to 10−6 , producing results of increasingly high accuracy as tol is decreased.

Figure 5.4 shows the denoised images obtained at different values of tol.
Note that visually there is little difference between the results obtained with two
tolerance values 10−2 and 10−4. Smaller values of tol do not produce further
visual differences.

The tables show that on all problems, the proposed gradient projection al-
gorithms are competitive to Chambolle’s method, and that some variants are
significantly faster, especially when moderate accuracy is required for the solu-
tions. Two variants stood out as good performers: the GPBB-NM variant and
the GPBB-M(3) variant in which the initial choice of αk was scaled by 0.5 at
each iteration. For all tests with tol = 10−2 , tol = 10−3, and tol = 10−4 , the
winner was one of the gradient-projection Barzilai-Borwein strategies.

For these low-to-moderate accuracy requirements, CGM is generally slower


than the gradient-based methods, particularly on the larger problems. The pic-
ture changes considerably, however, when high accuracy (tol = 10−6 ) is required.

65
Figure 4.1: The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Barbara”.

Figure 4.2: The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01.

The rapid asymptotic convergence of CGM is seen to advantage in this situation,


and it outperforms the gradient-based methods in all cases. This result is ob-
served again in figure 4.4, which plots the duality gap again the CPU time cost for
Chambolle’s method, CGM method and the GPBB-NM variant of the gradient
projection algorithm.

66
Figure 4.3: The denoised images with different level of termination criterions.
left column: tol = 10−2 , right column: tol = 10−4 .

67
Table 4.1: Number of iterations and CPU times (in seconds) for problem 1. ∗ =
initial αk scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 18 0.12 169 1.15 1082 7.23 23884 154
GPCL 39 0.25 136 0.90 736 4.71 16196 110
GPLS 23 0.33 181 2.66 845 12.8 17056 260
GPBB-M 12 0.16 152 1.82 836 10.1 17464 167
GPBB-M (2) 18 0.17 187 1.80 941 9.00 21146 193
GPBB-M(3) 13 0.11 92 0.78 287 2.44 3749 32.2
GPBB-M(3)∗ 11 0.09 49 0.41 190 1.60 2298 19.7
GPBB-NM 10 0.09 48 0.46 217 2.10 3857 32.3
GPABB 13 0.16 58 0.66 245 2.80 2355 24.0
SQPBB-M 13 0.17 47 0.66 196 2.81 3983 61.0
CGM 6 4.05 9 5.90 12 8.00 18 12.1

68
Table 4.2: Number of iterations and CPU times (in seconds) for problem 2. ∗ =
initial αk scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 27 1.05 164 6.40 815 31.80 15911 628
GPCL 32 1.26 112 4.31 540 21.0 11434 452
GPLS 20 1.85 132 14.8 575 66 12892 1531
GPBB-M 20 1.14 124 7.01 576 32.7 11776 674
GPBB-M (2) 20 1.12 72 4.05 245 13.9 4377 251
GPBB-M(3) 20 1.12 77 4.33 345 19.5 3522 200
GPBB-M(3) fudge 17 0.95 47 2.65 162 9.17 1766 100
GPBB-NM 16 0.85 48 2.53 178 9.52 2802 150
GPABB 16 1.06 47 3.16 168 11.4 1865 127
SQPBB-M 14 1.10 41 3.44 152 13.5 2653 245
CGM 6 22.30 10 37.5 13 48.8 19 71.0

69
Table 4.3: Number of iterations and CPU times (in seconds) for problem 3. ∗ =
initial αk scaled by 0.5 at each iteration.
tol = 10−2 tol = 10−3 tol = 10−4 tol = 10−6
Algorithms Iter Time Iter Time Iter Time Iter Time
Chambolle 27 5.46 131 28.3 534 112 8314 1781
GPCL 24 4.65 80 17.2 328 69.1 5650 1212
GPLS 34 10.9 86 32.5 322 128 5473 2258
GPBB-M 20 5.50 84 24.9 332 98.2 5312 1576
GPBB-M (2) 20 5.40 56 16.3 160 46.6 4408 1290
GPBB-M(3) 20 5.38 67 19.5 174 50.5 2533 738
GPBB-M(3) fudge 17 4.58 41 11.9 131 38.0 1104 321
GPBB-NM 15 3.92 40 11.3 115 32.4 1371 388
GPABB 14 4.57 35 12.3 122 42.8 1118 394
SQPBB-M 14 4.91 40 15.8 109 44.6 1556 646
CGM 7 168 10 216 14 302 21 441

70
6
10 BB−NM
Chambolle
CGM
4
10

Duality Gap 2
10

0
10

0 20 40 60 80 100
CPU Time
BB−NM
6
10 Chambolle
CGM
Duality Gap

4
10

2
10

0
10
0 100 200 300 400 500 600
CPU Time
8
10
BB−NM
Chambolle
6 CGM
10
Duality Gap

4
10

2
10

0
10
0 200 400 600 800 1000 1200
CPU Time

Figure 4.4: Plots of duality gap vs. CPU time cost. Problem 1-3 are shown in
the order of top to bottom.

71
CHAPTER 5

Block Coordinate Descent (BCD) Method

5.1 Framework

In this section, we shall apply block coordinate descent (BCD) or block nonlinear
Gauss-Seidal method to solve the dual ROF problem (1.12).

Consider a general optimization problem

minimize F (x)

subject to x ∈ X = X1 × X2 × · · · × Xn ∈ Rm , (5.1)

where f : Rm → R is a continuously differentiable function and the feasible set


X is the Cartesian product of closed, nonempty and convex subsets Xi ∈ Rmi ,
for i = 1, · · · , n with ni=1 mi = m.
P

If the vector x ∈ Rm is partitioned into n component vectors xi ∈ Rmi , then


the minimization version of the block coordinate descent method for the solution
of (5.1) is defined by

xk+1
i = arg min F (xk+1 k+1 k k
1 , · · · , xi−1 , yi , xi+1 , · · · , xn ), (5.2)
yi ∈Xi

which update cyclicly for each component of x, starting from a given initial point
x0 ∈ X and generates a sequence of {xk } with xk = (xk1 · · · , xkn ).

72
The discrete dual formulation for TV restoration is

min φ(w) = k∇ · w + zk2

subject to |wi,j |2 ≤ 1 for i, j = 1 · · · , n, (5.3)

where z = λf, w = (wi,j ) ∈ R2×n×n with each component wi,j ∈ R2 , i.e.


 
1
wi,j
wi,j =   , 1 ≤ i, j ≤ n. (5.4)
2
wi,j

The discrete divergence operator ∇· is defined in ( 1.24) which we rewrite it here:

 
1 1
 w − wi−1,j if 1 < i < n  w 2 − wi,j−1
2
if 1 < j < n
 i,j  i,j

 

(∇ · w)i,j = 1
wi,j if i = 1 + 2
wi,j if j = 1

 

 −w 1

if i = n  −w 2

if j = n.
i−1,j i,j−1
(5.5)

Following the idea of the block coordinate descent method (5.2), we fix w at
all other pixels except at some interior point (i, j) (i, j 6= 1 or n) , which lead to
a local minimization problem in R2 :

1 2 1 2
min φ(wi,j ) = [wi,j + wi,j − (wi−1,j + wi,j−1 − zi,j )]2
kwi,j k≤1
1 1 2 2
+ [wi,j − (wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.6)
2 2 1 1
+ [wi,j − (wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1 )]2

+ other terms not dependent on wi,j

The local minimization problems on the boundary points can be modified ac-
cordingly as in the definition of the divergence operator (5.5). We shall describe
them briefly here.

The dual ROF model (5.3) has separable simple Euclidean ball constraints
which makes each sub minimization problem (5.6) fairly easy to solve at each

73
’coordinate’. This simple trait of the constraints that makes the BCD algorithm
practical here is also the reason that makes gradient projection practical and
enables Chambolle to derive the analytical formula for the Lagrange multipliers.

When i = 1 :

1 2 2
min φ(wi,j ) = [wi,j + wi,j − (wi,j−1 − zi,j )]2
kwi,j k≤1
1 1 2 2
+ [wi,j − (wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.7)
2 2 1
+ [wi,j − (wi,j+1 + wi,j+1 + zi,j+1 )]2

+ other terms not dependent on wi,j

When j = 1 :

1 2 1
min φ(wi,j ) = [wi,j + wi,j − (wi−1,j − zi,j )]2
kwi,j k≤1
1 1 2
+ [wi,j − (wi+1,j + wi+1,j + zi+1,j )]2 (5.8)
2 2 1 1
+ [wi,j − (wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1 )]2

+ other terms not dependent on wi,j

1 2
Following (5.5), the boundary values {wn,j : j = 1 · · · , n} and {wi,n : i =
1, · · · , n} are not relevant in our problem and we simply set them to be 0. Hence
the sub minimization problems on boundary i = n or j = n degenerate to the
following 1-D problems.

When i = n :

2 2 1
min φ(wi,j ) = [wi,j
2 |≤1
− (wi,j−1 + wi−1,j − zi,j )]2
|wi,j
2 2 1
+ [wi,j − (wi,j+1 − wi−1,j+1 + zi,j+1 )]2 (5.9)

+ other terms not dependent on wi,j

74
When j = n :

1 1 2
min φ(wi,j ) = [wi,j
1 |≤1
− (wi−1,j + wi,j−1 − zi,j )]2
|wi,j
1 1 2
+ [wi,j − (wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.10)

+ other terms not dependent on wi,j

5.2 Implementation

The sub optimization problems (5.6)- (5.10) generated by BCD algorithm are
ball constrained quadratic minimization problems with at most two unknowns.
We shall now give a generic algorithm to solve each of these subproblems. We
shall solve subproblem (5.6- 5.8) in details in the following discussion using (5.6)
as an example. The subproblems (5.9) and (5.10) are scalar minimizations and
can be solved very easily.

Problem (5.9):

1 2 2 1 1
v = (w + wi,j+1 + wi−1,j − wi−1,j+1 + zi,j+1 − zi,j )
2 i,j−1
2

wi,j = min max{−1, v}, 1 .

Problem (5.10):

1 1 1 2 2
v = (w + wi+1,j + wi,j−1 − wi+1,j−1 + zi+1,j − zi+1,j )
2 i−1,j
1

wi,j = min max{−1, v}, 1 .

Problem (5.6- 5.8): First problem (5.6) can be rewritten as

min φ(p, q) = (p − a)2 + (q − b)2 + (p + q − c)2 (5.11)


p2 +q 2 ≤1

75
where wi,j = (p, q) and

1 2 2
a = wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j
2 1 1
b = wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1
1 2
c = wi−1,j + wi,j−1 − zi,j

To solve problem (5.11), we first solve its unconstrained problem and obtain
by first-order conditions

1
p = (2a − b + c) (5.12)
3
1
q = (2b − a + c) (5.13)
3

The above (p, q) is the solution to (5.11) if p2 + q 2 ≤ 1. Otherwise the constraint


is active and we solve the equality-constrained problem by the KKT conditions

(λ + 2)p + q = a + c

p + (λ + 2)q = b + c

p2 + q 2 = 1

λ > 0,

where λ is the Lagrange multiplier. The first two equations in KKT system yield

a + b + 2c
p+q =
λ+3
a−b
p−q = (5.14)
λ+1

Substituting the above equations to (p + q)2 + (p −q)2 = 2 we obtain the following


equation for the Lagrange multiplier

A B
f (λ) = 2
+ −1=0 (5.15)
(λ + 3) (λ + 1)2

76
where A = 12 (a + b + 2c)2 and B = 21 (a − b)2 .

Equation (5.15) yields an a fourth-order polynomial which can be solved ana-


lytically. However, it is more effective to solve it numerically by Newton’s method.
Newton’s method is convergent for solving 5.15 with initial condition λ0 = 0.
Instead of giving a rigorous proof, we shall illustrate our argument with the fol-
lowing figure (5.1). First, there exists a λ∗ > 0 with f (λ∗ ) = 0. Then it follows
that f (0) > 0 by the monotonicity of function f (λ) on (−1, ∞). The plot of
f (λ) looks like the one in figure (5.1). Then it is straightforward to see that the
iterates λk converges to λ∗ with monotonically decreasing |λk − λ∗ |. In practice
it only take couple of steps to obtain an approximate λ∗ .

1.5

1
f(λ)

0.5

−0.5
0 0.5 1 1.5
λ

Figure 5.1: Convergence of Newton’s method

Substituting the obtained solution of (5.15) λ∗ to system (5.14) then gives the

77
optimal wi,j .

The full BCD algorithm are given as follows

Algorithm BCD

Step 0. Initialization. Pick initial feasible w (0) , set k ← 0.

(k+1)
Step 1. Updating. For i, j = 1, · · · , n, update wi,j as
2
(k+1) (k+1) (k)

wi,j = arg min ∇ · w1,1 , · · · , wi,j , · · · wn,n − z

kwi,j k≤1

by the method described earlier in section (5.2).

Step 2. Terminate if a stopping criterion is satisfied;


otherwise set k ← k + 1 and return to step 1.

5.3 BCD for Anisotropic TV

The discrete dual formulation for the anisotropic TV denoising model is

min D[w] = k∇ · w + zk
1 2
subject to |wi,j | ≤ 1 and |wi,j | ≤ 1 for i, j = 1 · · · , n, (5.16)

where w and the ∇ · w are the same as defined in (5.4) and (5.5).

The constraints in problem (5.16) are separable in the scalar level, which
makes the subproblem (5.2) of the BCD algorithm even easier to solve. In fact,
the algorithm is simply just coordinate descent method. We shall briefly describe
the coordinate descent algorithm for the anisotropic dual ROF model as follows.

78
1
Suppose (i, j) is not on the boundary. Fix any other arguments except wi,j ,
we have the following sub minimization problem
1 1 2 2
min
1 |≤1
[wi,j − (wi−1,j − wi,j + wi,j−1 − zi,j )]2
|wi,j
1 1 2 2
+ [wi,j − (wi+1,j + wi+1,j − wi+1,j−1 + zi+1,j )]2 (5.17)
1
+ other terms not dependent on wi,j ,

which can be easily solved as


1 1 1 2 2 2 2
v = (wi−1,j + wi+1,j − wi,j + wi,j−1 + wi+1,j − wi+1,j−1 + zi+1,j − zi+1,j )
2
1

wi,j = min max{−1, v}, 1 . (5.18)

1
Similarly, fixing any other arguments except wi,j gives
2 2 1 1
min
2 |≤1
[wi,j − (wi,j−1 + wi−1,j − wi,j − zi,j )]2
|wi,j
2 2 1 1
+ [wi,j − (wi,j+1 + wi,j+1 − wi−1,j+1 + zi,j+1 )]2 (5.19)
1
+ other terms not dependent on wi,j ,

which can be solved as


1 2 2 1 1 1 1
v = (wi,j−1 + wi,j+1 + wi,j+1 − wi,j + wi−1,j − wi−1,j+1 + zi,j+1 − zi,j )
2
2

wi,j = min max{−1, v}, 1 . (5.20)

The boundary points can be treated the same way as in the isotropic case.
1 1 2 2
We can also unify all the points into one framework by setting w0,j , wn,j , wi,0 , wi,n
to be 0 for i, j = 1, · · · n − 1 and iterates through i = 1, · · · , n − 1; j = 1, · · · , n
1 2
for wi,j and iterates through i = 1, · · · , n; j = 1, · · · , n − 1 for wi,j .

The full BCDaniso algorithm is given as follows. We here eliminate the iter-
ation supscript k and use one w to describe the algorithm.

Algorithm BCDaniso

79
Step 0. Initialization. Pick initial feasible w. Set the boundary points
1 1 2 2
w0,j , wn,j , wi,0 and wi,n to be 0 for i, j = 1, · · · n − 1.

Step 1. Updating. For i = 1, · · · , n − 1 and j = 1, · · · , n, set

1 1 1 2 2 2 2
v = (w + wi+1,j − wi,j + wi,j−1 + wi+1,j − wi+1,j−1 + zi+1,j − zi+1,j )
2 i−1,j
1

wi,j = min max{−1, v}, 1 . (5.21)

For i = 1, · · · , n and j = 1, · · · , n − 1, set

1 2 2 1 1 1 1
v = (w + wi,j+1 + wi,j+1 − wi,j + wi−1,j − wi−1,j+1 + zi,j+1 − zi,j )
2 i,j−1
2

wi,j = min max{−1, v}, 1 . (5.22)

Step 2. Terminate if w satisfies the stopping criterion;


otherwise return to step 1.

5.4 Convergence Theory

The convergence of the block coordinate descent method follows directly from
[44] or sec 2.7 of [7]. We quote the relevant results from [44] as follows.

Definition 2. Following the convention in (5.1) and let i ∈ {1, 2, · · · , n}; we say
that f is strictly quasiconvex with respect to xi ∈ Xi on X if for every x ∈ X and
yi ∈ Xi with yi 6= xi we have

f (x1 , · · · , txi + (1 − t)yi , · · · , xn ) < max{f (x), f (x1 , · · · , yi, · · · , xn )}

for all t ∈ (0, 1).

80
Proposition 2. Suppose that the function f is strictly quasiconvex with respect
to xi on X for each i = 1, 2, · · · , n − 2 in the sense of Definition 2 and that the
sequence {xk } generated by the BCD method (5.2) has limit points. Then every
limit point x̄ of {xk } is a critical point of Problem (5.1).

Theorem 4. The BCD and BCDaniso algorithms described in the above sections
are global convergent.

Proof. The proof follows directly from Proposition 2. First of all, it is easy to
check that the energy function of the dual ROF model is strictly quasi-convex with
respect to each corresponding ‘coordinate’ for both the isotropic and anisotropic
case. Hence, Proposition 2 applies. Secondly, X is compact for both cases.
Hence every sequence of iterates {w (k) } has a limit point w̄. It follows from
Proposition 2 that the limit point w̄ is a stationary point and hence a global
minimizer of the given convex problem. Finally, by monotonicity of the algorithm:
F (w (k+1) ) ≤ F (w (k)), we have that F (w (k)) ↓ F ∗ = global minimum energy.

5.5 Numerical Experiments

We report on computational experiments for three test problems in image denois-


ing. All the programs are run in an IBM T60 Notebook PC with 1.83 GHz Intel
Core Duo CPU and 1G RAM. All methods are coded in MATLAB. The original
clean images and the input noisy images are shown in Figures 5.2 and 5.3, respec-
tively. The sizes of the discretizations for the three test problems are 128 × 128,
256 × 256, and 512 × 512, respectively. The noisy images are generated by adding
Gaussian noise to the clean images using the MATLAB function imnoise, with
variance parameter set to 0.01. The fidelity parameter λ is taken to be 0.045 for
the isotropic ROF model and 0.05 for the anisotropic ROF model throughout the

81
experiments. This parameter is inversely related to the noise level σ and usually
needs to be tuned for each individual image to get an optimal visual result. For
the BCD algorithm for isotropic ROF model, the subproblem at each coordinate
block are solved by the approach described in section 5.2. The inner Newton’
method to compute λ∗ at each coordinate block is stopped when |f ′ (λ)| < 10−12 .

We tested the following algorithms:

• Chambolle’s semi-implicit gradient descent method [14] for isotropic ROF


model;

• BCD algorithm for isotropic ROF model as described in section 5.2;

• Chambolle’s semi-implicit gradient descent method [14] for anisotropic ROF


model;

• BCD algorithm for anisotropic ROF model as described in section 5.3.

Figure 5.4 shows the denoised images obtained by different models: the
isotropic ROF model and the anisotropic ROF model.

Tables 6.1, 6.2, and 6.3 report number of iterations and average CPU times
over ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point w (0) = 0 in each algorithm and
the relative duality gap stopping criterion. We vary the threshold tol from 10−2
to 10−4 , producing results of increasingly high accuracy as tol is decreased.

The tables show that on all problems, the proposed Block Coordinate Descent
algorithms are competitive to Chambolle’s method, and that they are consistently
faster in most situations. The tables also show that the advantage of BCD al-
gorithm over Chambolle’s algorithm is more profound for anisotropic TV due to
the simpler constraints.

82
Figure 5.2: The original clean images for our test problems. Left: 128 × 128
“shape”; middle: 256 × 256 “cameraman”; right: 512 × 512 “Barbara”.

Figure 5.3: The input noisy images for our test problems. Gaussian noise is
added using MATLAB function imnoise with variance 0.01.

83
Figure 5.4: The denoised images with TV terms. left column: isotropic TV, right
column: anisotropic TV.

84
Table 5.1: Iterations & CPU costs, isotropic TV, problem 1.
tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 35 0.23 323 2.19 1779 11.7
BCD 11 0.22 81 1.83 403 9.50

Table 5.2: Iterations & CPU costs, isotropic TV, problem 2.


tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 47 2.02 276 11.0 1285 50.2
BCD 14 1.59 66 7.47 278 32.3

Table 5.3: Iterations & CPU costs, isotropic TV, problem 3.


tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 44 11.1 202 49.6 814 200
BCD 12 6.38 45 24.3 166 92.3

85
Table 5.4: Iterations & CPU costs, anisotropic TV, problem 1.
tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 48 0.39 430 4.41 2042 20.6
BCD 7 0.06 49 0.39 247 1.97

Table 5.5: Iterations & CPU costs, anisotropic TV, problem 2.


tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 63 2.70 345 15.1 1279 54.0
BCD 8 0.38 41 1.75 150 6.59

Table 5.6: Iterations & CPU costs, anisotropic TV, problem 3.


tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
Chambolle 61 12.8 252 51.2 805 164
BCD 8 1.70 32 6.67 99 19.6

86
CHAPTER 6

Multilevel Optimizations

As we have seen in section 2.1.6, various multigrid (multilevel) methods have been
developed, trying to solve the total variation denoising problem efficiently with
scalable computational complexity. Most existing multilevel methods are based
on the primal formulation (1.4) and/or its associated Euler-Lagrange equation.
The non-differentiability of the primal energy usually makes it difficult to develop
efficient multigrid algorithms. In most of the times, finding a good smoother for
the primal problem is already challenging enough.

In this chapter, we shall derive the multilevel optimization algorithms based


on the dual formulation of the ROF model (1.12). We shall see that in this case
the main challenge in developing efficient multilevel methods is the nonlinear
constraints. In other words, we need to find a correct way to interpolate the
constraints in coarser level such that the coarser level correction will not violate
the finer level constraints.

6.1 Multilevel Optimization for 1D Dual TV Model

As a starting point, we shall describe the multilevel optimization algorithm ap-


plied on dual formulation of the one dimensional total variation based signal

87
denoising model, which is (z = λf ) :

min φ(w) = k∇ · w + zk2

subject to Lj ≤ wj ≤ Rj for j = 1, 2, · · · n, (6.1)

where Lj = −1, Rj = 1 and the one dimensional divergence operator ∇· is



 w, if i = 1
 1


(∇ · w)j = wj − wj−1 if 2 ≤ j ≤ n (6.2)



 −w if j = n + 1
n

We shall now apply multilevel technique to solve the above problem (6.1).
Let k be the level of our multilevel algorithm with k = 1 being the finest level
and k = log2n + 1 being the coarsest level. At level k, the uniform block size is
b = 2k−1 and number of blocks is τk = n/b.

Suppose w is an intermediate solution (at the finest level) and its modification
at the next coarser level is given by δw = Pmn c = [c1 , c1 , c2 , c2 , · · · · · · , cm , cm ],
where m = n2 , c = [c1 , c2 , · · · , cm ] is the unknowns on the coarser level and P is
the (piecewise constant) prolongation operator from coarser level to finer level.

88
The objective function with the coarser level correction is then given as

φ(w + P c)

= k∇ · (w + P c) − zk2

= (w1 + c1 + z1 )2 + [(w2 + c1 ) − (w1 + c1 ) + z2 ]2

+ [(w3 + c2 ) − (w2 + c1 ) + z3 ]2 + [(w4 + c2 ) − (w3 + c2 ) + z4 ]2

+ ···············

+ [(wn−1 + cm ) − (wn−2 + cm−1 )]2 + [(wn + cm ) − (wn−1 + cm )]2

+ [−(wn + cτk ) + zn+1 ]2

= (c1 + w1 + z1 )2

+ (c2 − c1 + w3 − w2 + z3 )2

+ ······

+ (cm − cm−1 + wn−1 − wn−2 + zn−1 )2

+ (−cm − wn + zn+1 )2

+ other terms independent of c

= k∇ · c + z̃k2 + other terms independent of c,

where z̃ is given by

z̃1 = w1 + z1

z̃j+1 = w2j+1 − w2j + z2j+1 for j = 1, · · · , m − 1

z̃m+1 = −wn + zn+1 ,

which is equivalent to z̃ = R (∇ · w + z). Here R is the restriction operator from


finer level to coarser level, i.e.,

R (x1 , x2 , x3 , · · · , x2l+1 ) = x(1 : 2 : 2l + 1) = (x1 , x3 , x5 · · · , x2l+1 ).

The finer level constraints −1 ≤ (w + P c)j ≤ 1 can be expressed as

89
L2j−1 ≤ cj + w2j−1 ≤ R2j−1
for j = 1, · · · , m.
L2j ≤ cj + w2j ≤ R2j
The above constraints is equivalent to the following coarser level constraints

L̃j ≤ cj ≤ R̃j for j = 1, · · · , m, (6.3)

where
L̃j = max(L2j−1 − w2j−1 , L2j − w2j )
(6.4)
R̃j = min(R2j−1 − w2j−1 , R2j − w2j ).

Put the above information all together, the problem of solving for the coarser
level correction is the following minimization:

min φ̃(c) = k∇ · c + z̃k2

subject to L̃j ≤ cj ≤ R̃j for j = 1, · · · , m. (6.5)

Now, problem (6.5) has the exact same form as the original problem (6.1)
except it is on a coarser level. Hence we can apply the same smoother to get an
approximate solution of (6.5) and then pass to the even coarser grid for correc-
tions. We can adopt the same technique recursively until we get to the coarsest
level and then we add the correction back level by level until reach the finest level.
Each cycle of getting from finest level to coarsest level and then back to finest
level is called a V-Cycle in multigird jargon. The complexity for each V-Cycle is
O(n).

The full multilevel algorithm ML1D using the recursive V-Cycle is shown
below. Here we use

w = Smoother(w 0 , z, L, R, iter)

90
to denote an approximate intermediate solution of (6.1) that we obtained by apply
a specified smoother(relaxation) iter times with initial guess w 0 . The actual level
of the problem is implied by the context.

We then use the recursive function

w = Vcycle(w 0 , z, L, R)

to denote outcome of applying one V-cycle to solve the original problem (6.1).

The recursive function can be defined as follows:

function w = Vcycle(w 0 , z, L, R)
n = length(w 0 );
if n == 1
w1 = 12 (z2 − z1 );
return;
else
w = Smoother(w 0 , z, L, R, iter1);
m = n2 ;
for j = 1 : m
L̃j = max(L2j−1 − w2j−1, L2j − w2j );
R̃j = min(R2j−1 − w2j−1 , R2j − w2j );
end
z̃ = R(∇ · w + z);
c = Vcycle(0, z̃, L̃, R̃);
w = w + P c;
w = Smoother(w, z, L, R, iter2);
return;
end

91
The full multilevel algorithm ML1D is shown as below.

Algorithm ML1D

Step 0. Initialization. Set z = λf, Lj = −1, Rj = 1, for j = 1, · · · n.


Pick initial feasible w 0 . Set k = 0.

Step 1. Updating Using V-Cycle.

w k+1 = Vcycle(w k , z, L, R).

Step 2. Terminate if w k+1 satisfies the stopping criterion;


otherwise set k ← k + 1 and return to step 1.

6.2 Multilevel Optimization for 2D Dual TV Model

We now generalize the idea of multilevel optimization to 2D TV models.

The discrete dual formulation for TV restoration is

min φ(w) = k∇ · w + zk2

subject to kwi,j k ≤ 1 for i, j = 1 · · · , n, (6.6)

where z = λf, w = (wi,j ) ∈ R2×n×n with each component wi,j ∈ R2 , i.e.


 
1
wi,j
wi,j =   , 1 ≤ i, j ≤ n. (6.7)
2
wi,j

The discrete divergence operator ∇· is defined in ( 1.24) which we rewrite it here:

92
 
 w 1 − wi−1,j
1
if 1 < i < n  w 2 − wi,j−1
2
if 1 < j < n
 i,j  i,j

 

(∇ · w)i,j = 1
wi,j if i = 1 + 2
wi,j if j = 1

 

 −w 1

if i = n  −w 2

if j = n.
i−1,j i,j−1
(6.8)

The multilevel formulation for the 2D problem is more complicated and te-
dious than the 1D problem. First let us fixed some conventions. We set the size
of matrices w 1, w 2 and z to be n × (n + 1), (n + 1) × n and (n + 1) × (n + 1)
respectively with n being a power of 2. Let m = n2 .

We follow the same setup in the 1D case and let w 1, w 2 be the intermediate
solution on the finest level. We then use c1 , c2 to denote the corrections in the
next coarser level. We again use the piecewise constant interpolation to map P
the coarser level correction to the finer level, i.e.
 
c1,1 c1,1 c1,2 c1,2 . . .
   
c c ...  c1,1 c1,1 c1,2 c1,2 ... 
 
 1,1 1,2   
C =  c2,1 c2,2 . . .  ⇒ P C =  c2,1 c2,1 c2,2 c2,2 ... 
   
.. .. . .
   
.
 
. .  c2,1 c2,1 c2,2 c2,2 ... 
.. .. .. ..
 
..
. . . . .

The objective function with the coarser level correction is

φ(w + P c) = k∇ · (w + P c) + zk2 (6.9)

To rearrange the above objective function, we realized that the indices (2i, 2j), (2i−
1, 2j), (2i, 2j − 1) and (2i − 1, 2j − 1) should be treated differently. If we work our

93
every detail, we will obtain the following formulation for energy function (6.9):

φ(w + P c) = k∇ · c + z̃k2
Xm
+ k∇ · c1 (:, j)F (:, j)k2
j=1
m
X
+ k∇ · c2 (i, :) + G(i, :)k2
i=1
+ other terms independent of c. (6.10)

In (6.10) and later equations, We use the same notation ∇· for both the 2D
divergence operator defined in (6.8) and 1D divergence operator defined in (6.2).
It is easily to figure out their real meaning in the context. Here, the first is for
2D and the rest are for 1D.

In (6.10), c1 and G are m × (m + 1) matrices, c2 and F are (m + 1) × m


matrices and z̃ has dimension (m + 1) × (m + 1). F, G and z̃ are given as follows.

S = ∇·w+z
z̃ = S(1 : 2 : n + 1, 1 : 2 : n + 1)
(6.11)
F = S(1 : 2 : n + 1, 2 : 2 : n)
G = S(2 : 2 : n, 1 : 2 : n + 1)
Motivated by equation (6.10), we rewrite the original dual ROF model (6.6) to
facilitate the recursive multigrid V-cycle algorithm as follows.

min φ(w) = k∇ · w + zk2


X n
+γ k∇ · w 1 (:, j) + X(:, j)k2
j=1
Xn
+γ k∇ · w 2(i, :) + Y (i, :)k2 . (6.12)
i=1
subject to kwi,j k ≤ Ri,j for i, j = 1 · · · , n,

where initially in the finest level γ = 0, X, Y are (n + 1) × n and n × (n + 1) zero


matrices and R is a (n + 1) × (n + 1) matrix with all ones.

94
Then the objective function with coarser level corrections can be written as

φ(w + P c) = k∇ · c + z̃k2
Xm
+ k∇ · c1 (:, j) + F (:, j)k2
j=1
m
X
+ k∇ · c2 (i, :) + G(i, :)k2
i=1
m
X
+ 2γ k∇ · c1 (:, j) + Q(:, j)k2
j=1
m
X
+ 2γ k∇ · c2 (i, :) + T (i, :)k2
i=1
+ other terms independent of c. (6.13)

or

φ(w + P c) = k∇ · c + z̃k2
m
X
+ γ̃ k∇ · c1 (:, j) + X̃(:, j)k2
j=1
Xm
+ γ̃ k∇ · c2 (i, :) + Ỹ (i, :)k2
i=1
+ other terms independent of c. (6.14)

where F, G and z̃ are given in (6.11), and Q, T, X̃, Ỹ and γ̃ are given as follows:

γ̃= 1 + 2γ
Q = 21 U(1 : 2 : n + 1, 1 : 2 : n − 1) + U(1 : 2 : n + 1, 2 : 2 : n)
 

T = 12 V (1 : 2 : n − 1, 1 : 2 : n + 1) + V (2 : 2 : n, 1 : 2 : n + 1) (6.15)
 

X̃ = (F + 2γQ)/γ̃
Ỹ = (G + 2γT )/γ̃

The objective function (6.14) is exactly in the same form with the original energy
in (6.12) except it is on the coarser level.

95
The constraints on the coarser level variable c are given as

kci,j + w2i−1,2j−1 k ≤ R2i−1,2j−1


kci,j + w2i−1,2j k ≤ R2i−1,2j
for 1 ≤ i, j ≤ m (6.16)
kci,j + w2i,2j−1 k ≤ R2i,2j−1
kci,j + w2i,2j k ≤ R2i,2j

Now we shall tried to simplify the above constraints on the coarser level so that
we can reduce the overall complexity of the problem by a factor of 2 × 2. Un-
fortunately, there is no simple way to achieve that as we have done for the 1D
problem in (6.3). It is easily to see from 6.16 that the feasible set for each ci,j
is the intersection of 4 unit disks, which in general cannot be expressed using a
single equation.

There are several plausible approaches for this problem. For example, in stead
of trying to simply the constraints (6.16) to an equivalent single constraint, i.e.
the old constraints (6.16) are satisfied iff. the new constraint is satisfied, we can
relax this equivalence to a sufficient but not necessary new constraint as follows:

kci,j k ≤ R̃i,j (6.17)



where R̃i,j = min Rk,l − kwk,l k : 2i − 1 ≤ k ≤ 2i, 2j − 1 ≤ l ≤ 2j .

If we choose the new constraints on the coarser level as above, then the opti-
mization problem of finding coarser level correction is exactly in the same form as
the original problem in the finest level. We can apply some smoother (e.g. BCD
algorithm in Chapter 5) several iterations to solve an intermediate solution on the
coarser level and pass it to the next even coarser level recursively. However, the
problem is that the constraints (6.17) may be too restrictive for the algorithm
to make any useful corrections on the coarser level. An extreme case is when
R̃i,j = 0 and no correction will be allowed on that block.

96
We notice that the anisotropic TV model does not have this problem and
therefore is more suitable for applying our multilevel optimization algorithm in
this sense. The dual formulation of the anisotropic ROF model has the exact
same objective function with the isotropic ROF model (6.12) but its constraints
are simple bound constraints. The dual anisotropic ROF model is shown as
follows

min φ(w) = k∇ · w + zk2


X n
+γ k∇ · w 1 (:, j) + X(:, j)k2
j=1
Xn
+γ k∇ · w 2(i, :) + Y (i, :)k2 . (6.18)
i=1

L1i,j ≤ wi,j
1 1
≤ Ri,j for 1 ≤ i ≤ n, 1 ≤ j ≤ n + 1
subject to
L2i,j ≤ wi,j
2 2
≤ Ri,j for 1 ≤ i ≤ n + 1, 1 ≤ j ≤ n

where X, Y and γ are the same as in (6.12). L1 , L2 are matrices with all entries
being −1 and R1 , R2 are matrices with all entries being 1.

From the above analysis, it is not hard to obtain that the problem of finding
coarser level correction c is the following minimization problem.

min k∇ · c + z̃k2
m
X
+ γ̃ k∇ · c1 (:, j) + X̃(:, j)k2
j=1
Xm
+ γ̃ k∇ · c2 (i, :) + Ỹ (i, :)k2 (6.19)
i=1

L̃1i,j ≤ c1i,j ≤ R̃i,j


1
for 1 ≤ i ≤ m, 1 ≤ j ≤ m + 1
subject to
L̃2i,j ≤ c2i,j ≤ R̃i,j
2
for 1 ≤ i ≤ m + 1, 1 ≤ j ≤ m

97
where X̃, Ỹ , γ̃ are the same as in (6.15) and L̃, R̃ are given as follows:

L̃1i,j = max L1k,l − wk,l


1

: 2i − 1 ≤ k ≤ 2i, 2j − 1 ≤ l ≤ 2j ;

L̃2i,j = max lk,l


2 2

− wk,l : 2i − 1 ≤ k ≤ 2i, 2j − 1 ≤ l ≤ 2j ;
1
 1 1

R̃i,j = min Rk,l − wk,l : 2i − 1 ≤ k ≤ 2i, 2j − 1 ≤ l ≤ 2j ;
2
 2 2

R̃i,j = min Rk,l − wk,l : 2i − 1 ≤ k ≤ 2i, 2j − 1 ≤ l ≤ 2j ;

Now coarser level correction problem (6.19) has the exact same formulation
with the original problem (6.18). So we can apply the same strategy recursively
and obtain the multilevel V-cycle algorithm as in the 1D problem.

6.3 Numerical Experiments and Comments

We test our multilevel (ML) optimization algorithm on the anisotropic ROF


model using the block coordinate descent (BCD) method developed in Chapter 5
as the smoother. We also compared the performance of the ML algorithm and the
uni-level BCD algorithm. The test problems and parameters are exactly the same
as those in Chapter 5. At each level of the V-cycle inside the ML algorithm, we
apply 3 relaxations before pass to the coarser level and apply 2 more relaxations
after adding the coarser level corrections.

Tables 6.1, 6.2, and 6.3 report number of iterations and average CPU times
over ten runs, where each run adds a different random noise vector to the true
image. In all codes, we used the starting point x(0) = 0 in each algorithm and
the relative duality gap stopping criterion. We vary the threshold tol from 10−2
to 10−4 , producing results of increasingly high accuracy as tol is decreased.

These tables show that the improvement over the BCD algorithm by adopt-
ing the multilevel optimization framework is not significant. This might due to

98
Table 6.1: Iterations & CPU costs, anisotropic TV, problem 1.
tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
ML BCD 2 0.06 11 0.30 50 1.42
BCD 7 0.06 49 0.39 247 1.97

Table 6.2: Iterations & CPU costs, anisotropic TV, problem 2.


tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
ML BCD 2 0.31 8 1.22 29 4.13
BCD 8 0.38 41 1.75 150 6.59

Table 6.3: Iterations & CPU costs, anisotropic TV, problem 3.


tol = 10−2 tol = 10−3 tol = 10−4
Algorithms Iter CPU (s) Iter CPU (s) Iter CPU (s)
ML BCD 2 1.47 6 4.35 19 13.4
BCD 8 1.70 32 6.67 99 19.6

the characteristics of this particular problem. In particular, the solution and


residuals are not smooth with any available relaxation algorithms because of the
constraints. For future work, some more intelligent coarsening and interpolation
schemes must be adopted to fully exploit the advantage of multigrid algorithms.

99
CHAPTER 7

Connections and Improvements of Some


Existing Methods

7.1 Connections between CGM and SOCP

In Chapter 2, we give a survey of some existing algorithms for solving the total
variation based image restoration models, which includes the CGM method and
SOCP approach. We also point out that SOCP and CGM bear many similarities.
Both methods apply Newton’s method to some primal-dual system to compute
the update at each iteration. In this section, we shall investigate the connections
between these two methods in more details.

Preliminaries of SOCP
We remind the reader that we follow the MATLAB convention of using “ , ”
for adjoining vectors and matrices in a row, and “ ; ” for adjoining them in a
column. Thus, for any vectors x, y and z, the following are synonymous:
 
x
 
(x; y; z) = (xT , y T , z T )T =  y 
 
 
z

100
We also use ⊕ denote joining matrices diagonally, that is, for two matrices A and
B,  
A 0
A⊕B = 
0 B

Now we use the notation Kn to denote the Lorenz cone (ice cream cone) in
Rn , i.e.
Kn ≡ {x = (x0 ; x̄) ∈ Rn : kx̄k ≤ x0 }

Second-Order Cone Programming (SOCP) problem has the following standard


form
Primal Dual
min cT x min bT y
(7.1)
T
s. t. Ax = b, s. t. A y + c = z,
x ∈ K. z ∈ K.

where

x = (x1 ; · · · ; xr ) with xi ∈ Rni


c = (c1 ; · · · ; cr ) with ci ∈ Rni
z = (z1 ; · · · ; zr ) with zi ∈ Rni
A = (A1 , · · · , Ar ) with Ai ∈ Rm×ni
y, b ∈ Rm
and K is the Cartesian product of several cones:

K = K n1 × K n2 × · · · × K nr .

Associated with each vector x = (x0 ; x̄) ∈ Rn there is an arrow-shaped matrix


Arw(x) defined as:  
T
x0 x̄
Arx(x) ≡  
x̄ x0 I

101
Observe that x ∈ K (x ∈ int K) iff. Arw(x) is positive semidefinite (positive
definite).

We also use Arw(·) in the block sense; that is if x = (x1 ; · · · ; xr ) such that
xi ∈ Rni for i = 1, · · · , r, then

Arw(x) = Arw(x1 ) ⊕ · · · ⊕ Arw(xr ).

There is a particular algebra associated with second-order cones, the under-


standing of which sheds light on all aspects of the SOCP problem, from duality
and complementarity properties, to conditions of non-degeneracy and ultimately
to the design and analysis of interior-point algorithms. This algebra is well-known
and is a special case of a so-called Euclidean Jordan algebra. (see for a complete
study.) Here we only borrow some notations and definitions to facilitate our
explanations.

For now we assume that all vectors consist of a single block x = (x0 ; x̄). For
two vectors x and y define the following multiplication
 
xT y
 
 x0 y1 + y0 x1
 
T

x ◦ y ≡ (x y; x0 ȳ + y0 x̄) = 
 ..


 . 
 
x0 yn + y0 xn

Let e = (1; 0; · · · 0). We define the inverse of x, x−1 by

y ≡ x−1 iff. x ◦ y = y ◦ x = e,

and define the determinant of x, det(x) by

det(x) = x20 − kx̄k2 .

It is clear that
1
x−1 = (x0 ; −x̄).
det(x)

102
SOCP problems are usually solved by primal-dual path following interior-
point methods. Note that for x ∈ int K the function -ln(det(x)) is a convex
barrier function for K. if we replaced the second-order cone constraints by xi ∈
int Kni and add the logarithm barrier term −µ ln(det(xi )) to the objective
P

function in the primal problem we get

r
X r
X
(Pµ ) min cTi xi −µ ln(det(xi ))
i=1 i=1
Xr
s.t. Ai xi = b,
i=1
xi ∈ int Kni for i = 1, · · · , r. (7.2)

The Karush-Kuhn-Tucker (KKT) optimality conditions for

(7.2) are:
r
X
Ai xi = b,
i=1
ci − ATi y − 2µx−1
i = 0, for i = 1, · · · , r,

xi ∈ int Kni for i = 1, · · · , r. (7.3)

If we set zi = ci − ATi y, the any solution of problem Pµ satisfies:


r
X
Ai xi = b,
i=1
ATi y + zi = ci , for i = 1, · · · , r,

xi ◦ zi = 2µe, for i = 1, · · · , r,

xi , zi ∈ int Kni for i = 1, · · · , r. (7.4)

For every µ > 0 we can show that the above system (7.4) has a unique solution
(xµ , yµ , zµ ). The trajectory points (xµ , yµ , zµ ) satisfying (7.4) is defined as the

103
primal-dual central path or simply the central path associated with the SOCP
problem (7.1).

Generally speaking the strategy of path-following primal-dual interior-point


method can be sketched as follows. We start with a point near or on the central
path. First, we apply Newton’s method to the system (7.4) to get a direction
(∆x, ∆y, ∆y) that reduces the duality gap, and then take a step in this direction
making sure that the new point is still feasible and in the interior of K. Then we
reduce µ by some factor and repeat the process. With judicious choices of initial
point, step length and reduction schedule for µ we are able to show convergence
in a polynomial number of iterations.

Applying Newton’ method to (7.4) will result a linear system for the update
direction (∆x, ∆y, ∆y):

     
A 0 0 ∆x b − Ax
     
0 A T
I  ∆y  =  c − AT y − z (7.5)
     
  
     
Arw(x) 0 Arw(x) ∆z 2µe − x ◦ z

SOCP for ROF and Connections with CGM


To reform the constrained ROF model
X
min k(∇u)i,j k
1≤i,j≤n

s.t. ku − f k2 ≤ σ 2 (7.6)

into a SOCP problem, the authors in [43] introduced new variables p, q, t, v as


follows.

They let v to be the noise variable: v = f − u and let (pi,j , qi,j ) to be the

104
discrete ∇u by forward differentiation:

 ui+1,j − ui,j if i < n
pi,j =
 0 if i = n

 ui,j+1 − ui,j if j < n
qi,j =
 0 if j = n
q
They then impose the second-order cone constraints p2i,j + qi,j
2
≤ ti,j .

As a consequence, problem (7.6) is transformed to


n
X
min ti,j
i,j=1
s.t. ui,j + vi,j = fi,j for i, j = 1, · · · , n

−pi,j + ui+1,j − ui,j = 0 for i = 1, · · · , n − 1, j = 1, · · · , n

−qi,j + ui,j+1 − ui,j = 0 for i = 1, · · · , n, j = 1, · · · , n − 1

pn,j = 0 for j = 1, · · · , n,

qi,n = 0 for i = 1, · · · , n,

v0 = σ

(ti,j ; pi,j ; qi,j ) ∈ K3 for i, j = 1, · · · , n,


2 +1
(v0 ; v) ∈ Kn (7.7)

Eliminating u from formulation (7.7), we obtain the following standard form

105
SOCP:
P
min 1≤i,j≤n ti,j

s.t. pi,j + vi+1,j − vi,j = fi+1,j − fi,j for 1 ≤ i ≤ n − 1, 1 ≤ j ≤ n


qi,j + vi,j+1 − vi,j = fi,j+1 − fi,j for 1 ≤ i ≤ n, 1 ≤ j ≤ n − 1
pn,j = 0 for 1 ≤ j ≤ n
(7.8)
qi,n = 0 for 1 ≤ i ≤ n
v0 = σ
(ti,j ; pi,j ; qi,j ) ∈ K3 for 1 ≤ i, j ≤ n
2 +1
(v0 ; v) ∈ Kn

Problem 7.8 is in the standard form of a SOCP problem (7.1). To see that,
we introduce the following notations and variables.

n × n : Image size

N + 1 : Number of cones n2 + 1, (N = n2 )

M : Number of equality constraints (number of rows in Ā), M = 2N + 1

k : Index of blocks, k ≡ k(i, j) = (j − 1)n + i

106
xk = (ti,j ; pi,j ; qi,j ), for k = 1, 2, · · · , N.

ck = (1; 0; 0), for k = 1, · · · , N.

xN +1 = (v0 ; v) ∈ RN +1 .

cN +1 = (0; · · · ; 0) ∈ RN +1 .
 
0 0 0
 .. .. .. 
 
 . . . 
 
 0 1 0 
 
Āk =   ∈ RM ×3 , i.e., Āk (2k − 1 : 2k, 2 : 3) = I2 , for k = 1, · · · , N.
 
 0 0 1 
 
 .. .. .. 
 . . . 
 
0 0 0
 
T
0 A
ĀN +1 =   , where A is defined in Chapter 1 in (1.25) and (1.27).
1 0
Ā = (Ā1 , Ā2 , · · · , ĀN +1 )

x = (x1 ; x2 ; · · · ; xN +1 )

c = (c1 ; · · · ; cN +1 ).

b = (AT f ; σ).

K = K3 × · · · × K3 × KN +1 .

Using the above notation and variables, we can rewrite problem (7.8) into the
following standard form of SOCP:

min cT x

subject to Āx = b,

x ∈ K.

We now follow the general procedure of the primal-dual path following interior-

107
point methods to solve 7.8. We shall show the KKT system of (7.8) that we solve
at each iteration of the interior-point method are equivalent to the CGM system
developed in [25].

First, the constraints that kvk ≤ σ is always active at optimality in practice


and therefor we can replace it by equality constraints kvk = σ in (7.8). Replacing
the second-order cone constraints p2i,j + qi,j
2
≤ t2i,j by p2i,j + qi,j
2
< t2i,j and adding
the logarithm barrier term − µ2 ln(t2i,j − (p2i,j + qi,j
2
P
)) to the objective function
in the problem (7.8) we get a problem (Pµ ) as in (7.2):
X X
min ti,j − µ ln(t2i,j − p2i,j − qi,j
2
) (7.9a)
1≤i,j≤n 1≤i,j≤n

s.t. pi,j + vi+1,j − vi,j = fi+1,j − fi,j for 1 ≤ i ≤ n − 1, 1 ≤ j ≤ n (7.9b)

qi,j + vi,j+1 − vi,j = fi,j+1 − fi,j for 1 ≤ i ≤ n, 1 ≤ j ≤ n − 1 (7.9c)


X
2
vi,j − σ2 = 0 (7.9d)

(ti,j ; pi,j ; qi,j ) ∈ int K3 for 1 ≤ i, j ≤ n (7.9e)

The KKT system for problem (7.9) is

2µti,j
1− =0 (7.10a)
t2i,j − (p2i,j + qi,j
2
)
2µpi,j
−αi,j + =0 (7.10b)
t2i,j − (p2i,j + qi,j
2
)
2µqi,j
−γi,j + =0 (7.10c)
t2i,j − (p2i,j + qi,j
2
)

(αi,j − αi−1,j ) + (γi,j − γi,j−1) + λvi,j = 0 (7.10d)

pi,j + vi+1,j − vi,j − fi+1,j + fi,j = 0 (7.10e)

qi,j + vi,j+1 − vi,j − fi,j+1 + fi,j = 0 (7.10f)


X
2
vi,j − σ2 = 0 (7.10g)

where α, γ and λ are Lagrange multipliers for the constraints (7.9b), (7.9c) and

108
(7.9d) respectively. Note in both (7.9) and (7.10), we omit the special case for the
points on the boundary of the image domain (i = n or j = n) only for simplicity.

Equation (7.10a) is same as

t2i,j − 2µti,j − (p2i,j + qi,j


2
) = 0,

which is equivalent to
q
ti,j = p2i,j + qi,j
2
+ µ2 + µ

since ti,j ≥ 0.

Hence, The first three equations in the above KKT system, equations (7.10a-
7.10c) are equivalent to
q
ti,j = p2i,j + qi,j
2
+ µ2 + µ

ti,j αi,j − pi,j = 0

ti,j γi,j − qi,j = 0

If we eliminate v by v = f − u, replace (pi,j , qi,j ) by (∇u)i,j and introduce w as


wi,j = (αi,j , γi,j ), equations (7.10a-7.10c) are equivalent to
q
|(∇u)i,j |2 + β wi,j − (∇u)i,j = 0,

and equation (7.10d) simply becomes

(∇ · w)i,j − λ(ui,j − fi,j ) = 0.

Hence system (7.10) can be reformed to the following equivalent system:


q 
|(∇u)i,j |2 + µ2 + µ wi,j − (∇u)i,j = 0

(∇ · w)i,j − λ(ui,j − fi,j ) = 0 (7.11)


X
|ui,j − fi,j |2 − σ 2 = 0

109
From [25], we have the CGM system for the constraint ROF model is
q
|(∇u)i,j |2 + β wi,j − (∇u)i,j = 0

(∇ · w)i,j − λ(ui,j − fi,j ) = 0 (7.12)


X
|ui,j − fi,j |2 − σ 2 = 0

p
Note system (7.11) is exactly the same as (7.12) except |∇u|2 + µ2 + µ is
p
replaced by |∇u|2 + β. In other words, the SOCP problem and CGM method
actually solve the similar system at each iteration. The difference is the bar-
rier parameter µ in SOCP KKT system decreases to 0 following some reduction
schedule while the smoothing parameter β in CGM is a fixed small number.

7.2 An improvement of CGM Method

The primal-dual optimality condition for the unconstrained ROF model is

kATi yk xi − ATi y = 0, i = 1, . . . , N. (feaibility condition) (7.13a)

Ax + λy − λy0 = 0. (complimentary condition) (7.13b)

In CGM method [25], the feasibility condition is modified with a parameter


β>0
q
kATi yk2 + β 2 xi − ATi y = 0, i = 1, . . . , N,

so that the system become smooth and the point x will be enforced to stay inside
the interior of the feasible set


X = x = (x1 ; · · · ; xN ) : kxi k ≤ 1 for i = 1, · · · , N .

CGM method [25] then solves the following primal-dual system using New-

110
ton’s method.
q
kATi yk2 + β 2 xi − ATi y = 0, i = 1, . . . , N. (7.14a)

Ax + λy − λy0 = 0. (7.14b)

In [25] the parameter β is fixed, preventing the solution of the above system
from converging to the true optimizer, which is a solution of system (7.13). More-
over, if we pick a very small β > 0 to reduce this effect, the convergence of the
method can become slow.

To overcome this drawback of CGM, we propose a method that dynamically


reduces β during the solution process. In such a scheme, β can decrease to a very
small number to make the solution arbitrary close to the true optimality solution.
Furthermore, the warm start of β from a relative large number will improve the
convergence over picking a very small β initially.

Remember the parameter β in CGM is similar to the barrier parameter µ in


the KKT system for the SOCP formulation. However, the primal-dual interior-
point method adopted to solve SOCP has a reduction schedule for µ to decrease
to 0. Hence, in reducing β to 0 in CGM, we are just borrowing the same idea
from the primal-dual interior-point method.

Analogically to SOCP, we can define the central path as the trajectory of


points
C = {(xβ , yβ ) | β > 0},

where (xβ , yβ ) solves the equations (7.14) for each β > 0. As β → 0, (xβ , yβ ) will
converge to the optimality solution. Different from [25], where β is a constant,
the β in our algorithm is a measure of the duality gap and updated accordingly
at each iteration.

111
Let us define the following notations:
q
ρβi = kATi yk2 + β 2 ; (7.15)
1
Eβ = Diag(ρβi I2 ), Fβ = Diag(I2 − xi y T Ai ) . (7.16)
ρβi

Then the feasibility and complementary condition (7.14) can be written as


 
Ax + λy − λy0
 =0 (7.17)
T
Eβ x − A y

Applying Newton’s method to the above equations gives the following linear
system for the updates (∆x, ∆y):
    
A λIN ∆x r
   =  d , (7.18)
Eβ −Fβ AT ∆y rc

where
rd = λ(y0 − y) − Ax and rc = AT y − Eβ x. (7.19)

The system can be solved as follows:

Gβ ∆y = −AEβ−1 AT y − λ(y − y0 ) ; (7.20)

∆x = Eβ−1 AT y + Eβ−1 Fβ AT ∆y − x , (7.21)

where Gβ = AEβ−1 Fβ AT + λIN .

We choose different steplength rule to update the primal y and the dual x.
For the dual, to ensure strict feasibility (i.e. kxi k < 1 for i = 1, . . . , N), we define
the updating rule as
x̃ = x + min(1, 0.99αmax )∆x (7.22)

where αmax = max{α : kxi + α∆xi k ≤ 1, i = 1, . . . , N}

112
Predictor-Corrector Enhancement We now discuss how to adapt Mehro-
tra’s predictor-corrector method to our problem. One motivation of this method
is to estimate the centering parameter β by the performance of the predictor
step (i.e. affine scaling step in LP). The other important idea is to take into ac-
count the second-order terms in the corrector step as to compensate for some of
the nonlinearity in the system. A key point is that both predictor and corrector
share the same matrix factorization, so the extra work in the corrector step is
little. We refer to [70] for an excellent discussion of Mehrotra’s method.

Let ỹ and x̃ be the primal and dual variables updated in the predictor step.
As a natural generalization of Mehrotra’s method in LP, a heuristic value of the
centering parameter β to be used in the corrector step can be defined by

gap(x, y)  gap(x̃, ỹ) 3


β̃ = σ , σ= , (7.23)
N gap(x, y)

where
N 
X 
gap(x, y) = kATi yk − xTi (ATi y) (7.24)
i=1

To compute the second-order correction, we replace x and y in the centering


condition (??) by x + ∆x and y + ∆y respectively and obtain, for i = 1, . . . , N,
q
kATi (y + ∆y)k2 + β̃ 2 (xi + ∆xi ) − ATi (y + ∆y) = 0,

i.e.,
s
y T Ai ATi ∆y kATi ∆yk2
ρβ̃i 1+2 + (xi + ∆xi ) − (ATi y + ATi ∆y) = 0
(ρβ̃i )2 (ρβ̃i )2

Apply Taylor series and neglecting higher order terms gives

y T Ai ATi ∆y kATi ∆yk2 (y T Ai ATi ∆y)2


 
ρβ̃i 1+ + − (xi + ∆xi ) − (ATi y + ATi ∆y) = 0
(ρβ̃i )2 2(ρβ̃i )2 2(ρβ̃i )4

113
Rearranging terms, we obtain

(1)
Eβ̃ ∆x − Fβ̃ AT ∆y = (r̃c )i + hi , (7.25)

where r̃c = AT y − Eβ̃ x and

(1) y T Ai ATi ∆y kATi ∆yk2 (y T Ai ATi ∆y)2


hi = − ∆xi − xi + xi . (7.26)
ρβ̃i 2ρβ̃i 2(ρβ̃i )3

It is not practical to solve (7.25) directly with β̃ on the left-hand side, since the
factorization of Gβ has already been computed in the predictor step using the
previous value β. Alternatively, we modify (7.25) to be

(1) (2)
Eβ ∆x − Fβ AT ∆y = (r̃c )i + hi + hi , (7.27)

with extra corrector term


 
(2) 1 1
hi = (ρβi − ρβ̃i )∆xi + − (yiT Ai ATi yi )xi . (7.28)
ρβi ρβ̃i
There is no correction for the feasibility updating equation

A∆x + λ∆y = rd (7.29)

Combining (7.27) and (7.29), we have the corrector system


    
A λIN ∆x rd
  = , (7.30)
T c
Eβ −Fβ A ∆y rc

where
(1) (2)
rcc = (r̃c )i + hi + hi . (7.31)

It gives the corrector step as

Gβ ∆y = −A(Eβ−1 rcc + x) − λ(y − y0 ) ; (7.32)

∆x = Eβ−1 (Fβ AT ∆y + rcc ). (7.33)

114
7.2.1 Numerical Experiments

We test our prima-dual interior point algorithm on two image denoising prob-
lem. We also compared the performance of the interior point method with CGM
method. The original clean images and the input noisy images are shown in Fig-
ure 7.1. The size of the two test problems are 128×128 and 512×512 respectively.
The noisy images are generated by adding Gaussian noise to the clean images
using the MATLAB function imnoise, with variance parameter set to 0.01. The
fidelity parameter λ is taken to be 0.045 throughout the experiments.

Figure 7.2 plots the relative duality gap against the number of iterations for
CGM method as well as for the primal-dual interior point method.

115
Figure 7.1: The original clean images and noisy images for our test problems.
Left: 128 × 128 “shape”; right: 256 × 256 “cameraman”.

116
0
10
−6
CGM, β = 10
−1 Interior Point Method
10
Relative Duality Gap

−2
10

−3
10

−4
10

−5
10

−6
10
5 10 15 20 25 30
Iterations
0
10
−6
CGM, β = 10
Interior Point Method

−2
Relative Duality Gap

10

−4
10

−6
10

5 10 15 20 25 30
Iterations

Figure 7.2: Plot of relative duality gap v.s. Iterations. Top : test problem 1,
Bottom: test problem 2.

117
CHAPTER 8

Conclusions and Discussions

We have proposed some new algorithms to solve the total variation based image
restoration models, including primal-dual hybrid gradient descent method, duality
based gradient projection method and the dual block coordinate descent method.
All of them are very efficient and competitive to the existing popular methods.
These proposed methods are either based on the dual formulation or the primal-
dual formulation so they do not need any smoothing parameter for the total
variation term which will prevent any algorithm from converging to the true
optimality solution. They are explicit first-order methods, which are simple to
implement, only take a few sweeps at each iteration, do not require a huge amount
of memory and hence are very suitable for solving large-scale image restoration
problems.

Of all the methods we proposed, the primal-dual hybrid gradient method


are the most efficient in all situations. It costs roughly 20 times less CPU time
to compute an visually-satisfactory medium-accuracy solution compared with
Chambolle’s semi-implicit gradient-descent method. This advantage get more
profound when higher-accuracy solution are required. In fact, this method re-
mains competitive in computing high-accuracy solutions even compared with
high-order methods like CGM.

We proposed a basic multilevel optimization framework for the dual formula-


tion of the anisotropic ROF model. The improvement of the multilevel scheme

118
over the uni-levlel scheme is limited, implies that more sophisticated coarsen-
ing and interpolation schemes are needed to exploit the advantage of multigrid
algorithms.

We also studied the connection between CGM method and the SOCP ap-
proach for solving ROF model and proposed an improvement of the CGM algo-
rithm based on the primal-dual interior-point methods.

119
References

[1] R. Acar & C. R. Vogel. Analysis of total variation penalty methods for ill-
posed problems, Inverse Problems , 10 (1994) 1217 - 1229.
[2] S. T. Acton, Multigrid anisotropic difustion, IEEE Trans. Imag. Proc.,
3(1998), 280-291.
[3] F. Alizadeh & D. Goldfarb. Second-order cone programming, Mathematical
Programming, 95 (2003), 3-51.
[4] K. ANDERSEN, E. CHRISTIANSEN, A. CONN, AND M. OVERTON,
An efficient primal-dual interior-point method for minimizing a sum of Eu-
clidean norms, SIAM J. Sci. Comput., 22 (2000), pp. 243-262.
[5] G. Aubert & P. Kornprobst. Mathematical Problems in Image Processing -
Partial Differential Equations and the Calculus of Variations, 2nd edition.
Springer, Applied Mathematical Sciences, Vol 147, 2006.
[6] J. Barzilai and J. Borwein. Two point step size gradient methods, IMA
Journal of Numerical Analysis 8 (1988), pp. 141–148.
[7] D. P. Bertsekas. Nonlinear Programming, 2nd ed., Athena Scientific, Boston,
1999.
[8] Bertsekas, D. P. & Tsitsiklis, J. N., Parallel and Distributed Computation:
Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ, 1989.
[9] E. G. Birgin, J. M. Martinez & M. Raydan. Nonmonotone spectral projected
gradient methods on convex sets, SIAM Journal on Optimization 10 (2000),
pp. 1196–1121.
[10] P. Blomgren & T. Chan. Color TV: Total Variation Methods for Restoration
of Vector-Valued Images, IEEE Trans. Image Proc., 7(1998), 304-309.
[11] P. Blomgren, T. F. Chan, P. Mulet, L. Vese & W. L. Wan. Variational PDE
models and methods for image processing, in: Research Notes in Mathemat-
ics, 420 (2000), 43-67, Chapman & hall/CRC.
[12] M. Burger, S. Osher, J. Xu, & G. Gilboa, Nonlinear inverse scale space
methods for image restoration. Lecture Notes in Computer Science, Vol.
3752, pp. 25-36, 2005.
[13] J. L. Carter. Dual method for total variation-based image restoration, UCLA
CAM Report 02-13, 2002.

120
[14] A. Chambolle. An algorithm for total variation minimization and applica-
tions, J. Math. Imag. Vis. 20 (2004), pp. 89–97.

[15] A. Chambolle. Total variation minimization and a class of binary MRF mod-
els. In: Energy Minimization Methods in Computer Vision and Pattern
Recognition, pp. 136-152, Springer Berlin, 2005.

[16] A. Chambolle, Total Variation minimization and a class of binary MRF


models, In Springer-Verlag, (ed.), 5th International Workshop on En-
ergy Minimization Methods in Computer Vision and Pattern Recogni-
tion(EMMCVPR), Vol. LNCS 3757, pp. 136152, 2005.

[17] A. Chambolle & P. L. Lions, Image recovery via total variation minimization
and related problmes, Numer. Math. 76 (1997), pp. 167–188.

[18] R. H. Chan, T. F. Chan & W. L. Wan, Multigrid for differential convolution


problems arising from image processing, UCLA CAM Report 97-20, 1997.

[19] T. F. Chan, K. Chen, & X. C. Tai. Nonlinear multilevel scheme for solving
the total variation image minimization problem, in: Image Processing Based
on Partial Differential Equations, Springer Berlin Heidelberg, 2005.

[20] T. F. Chan & K. Chen. On a nonlinear multigrid algorithm with primal


relaxation for the image total variation minimisation, Numerical Algorithms,
41 (2006), 387-411.

[21] T. F. Chan & K. Chen, An optimization based total variation image de-
noising. SIAM J. Multiscale Modeling and Simulation, Vol 5(2), pp.615-645,
2006.

[22] T. F. Chan & S. Esedoglu. Aspects of total variation regularized L1 function


approximation. SIAM Journal on Applied Mathematics, 65:5 (2005), 1817-
1837.

[23] T. Chan, S. Esedoglu, & F. E. Park. A Fourth Order Dual Method for
Staircase Reduction in Texture Extraction and Image Restoration Problems,
UCLA CAM Report 05-28, 2005.

[24] T. F. Chan, S. Esedoglu, F. Park & A. Yip. Total variation image restoration:
overview and rescent developments. In Handbook of Mathematical Models
in Computer Vision. Springer Verlag, 2005. Edt. by: N. Paragios, Y. Chen,
O. Faugeras.

121
[25] T. F. Chan, G. H. Golub & P. Mulet. A nonlinear primal dual method for
total variation based image restoration, SIAM J. Sci. Comput. 20 (1999),
pp. 1964–1977.

[26] T. Chan, S. Kang, & J. Shen. Euler’s Elastica and Curvature-based Image
Inpainting. SIAM J. Appl. Math., 63(2):564-592, 2002.

[27] T. F. Chan & P. Mulet. On the Convergence of the Lagged Diffusivity Fixed
Point Method in Total Variation Image Restoration, SIAM J. Numer. Anal-
ysis, 36 (1999), 354-367.

[28] T. F. Chan & J. Shen. Image processing and analysis, variational, PDE,
wavelet and wtochastic methods. SIAM, Philadelphia, 2005.

[29] T. Chan & J. Shen. Mathematical Models of Local Non-texture Inpaintings,


SIAM J. Appl. Math., 62 (1019-1043), 2001.

[30] T. Chan & C. Wong. Total Variation Blind Deconvolution, IEEE Trans.
Image Process., 7(1998), 370-375.

[31] T. F. Chan, M. Zhu. Fast Algorithms for Total Variation-Based Image Pro-
cessing. To appear in: The Proceedings of 4th ICCM, Hangzhou, China
2007.

[32] T. F. Chan, H. M. Zhou & R. H. Chan, Continuation Method for Total


Variation Denoising Problems, UCLA CAM Report 95-28, 1995.

[33] P. Charbonnier, L. Blanc-Feraud, G. Aubert & M.Barluad. Deterministic


edge-preserving regularization in computed imaging. IEEE Trans. Image
Processing 6 (1997), 298-311.

[34] K. Chen & X-C Tai, A Nonlinear Multigrid Method For Total Variation
Minimization From Image Restoration. Journal of Scientific Computing, Vol.
33 (2), pp.115-138, 2007.

[35] Y.-H. Dai, W. W. Hager, K. Schittkowski, & H Zhang. The cyclic Barzilai-
Borwein method for unconstrained optimization, IMA J. Num. Ana. 26
(2006), pp. 604–627.

[36] Y.-H. Dai & R. Fletcher. Projected Barzilai-Borwein methods for large-
scale box constrained quadratic programming, Numerische Mathematik 100
(2005), pp. 21–47.

122
[37] J. Darbon & M. Sigelle. Exact optimization of discrete constrained total
variation minimization problems. In: R. Klette and J. Zunic, editors, Tenth
International Workshop on Combinatorial Image Analysis, volume 3322 of
LNCS (2004), 548-557.

[38] J. Darbon & M. Sigelle. A fast and exact algorithm for total variation min-
imization. In: J. S. Marques, N. Prez de la Blanca, and P. Pina, editors,
2nd Iberian Conference on Pattern Recognition and Image Analysis, volume
3522 of LNCS (2005) 351-359.

[39] D. C. Dobson & C. R. Vogel. Convergence of an iterative method for total


variation denoising SIAM J. Numer. Analysis, 34 (1997), 1779-1791.

[40] I. Ekeland & R. Témam. Convex Analysis and Variational Problems. SIAM
Classics in Applied Mathematics, 1999.

[41] C. Frohn-Schauf, S. Henn & K. Witsch. Nonlinear multigrid methods for


total variation image denoising, Comput. Visual Sci., 7 (2004), 199-206.

[42] E. Giusti. Minimal Surfaces and Functions of Bounded Variation.


Birkhäuser, Boston, 1984.

[43] D. Goldfarb & W. Yin. Second-order cone programming methods for total
variation-based image restoration, SIAM J. Sci. Comput. 27 (2005), pp. 622–
645.

[44] L. Grippoa & M. Sciandroneb. On the convergence of the block nonlinear


Gauss Seidel method under convex constraints, Operations Research Letters
26 (2000) 127-136.

[45] P.T. Harker & J.S. Pang. Finite-dimensional variational inequality and non-
linear complementarity problems: a survey of theory, algorithms and appli-
cations. Math. Programming 48 (1990), 161-220.

[46] M. Hintermüller & G. Stadler. An infeasible primal-dual algorithm for TV-


based inf-convolution-type image restoration, SIAM J. Sci. Comput. 28
(2006), pp. 1–23.

[47] J.-B. Hiriart-Urruty & C. Lemaréchal. Convex Analysis and Minimization


Algorithms, Volume I, Springer-Verlag, Berlin & Heidelberg, 1993.

[48] D. Hochbaum, An efficient algorithm for image segmentation, Markov Ran-


dom Fields and related problems, Journal of the ACM, Vol. 48, No. 2, pp.
686701, 2001.

123
[49] A. N. Iusem. On the convergence properties of the projected gradient method
for convex optimization, Computational and Applied Mathematics 22 (2003),
pp. 37–52.

[50] H. Ishikawa, Exact optimization for Markov random fields with convex pri-
ors, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.
25, No. 10, pp. 13331336, 2003.

[51] G. Korpelevich. The extragradient method for finding saddle points and
other poblems. Ekonomika i Matematicheskie Metody 12 (1976), 747-756.

[52] Y. Meyer. Oscillating Patterns in Image Processing and Nonlinear Evolution


Equations, Uiversity Lecture Series, vol 22. AMS, Providence, 2001.

[53] J. Nocedal, S. J. Wright. Nemerical Optimization, Springer, 1999.

[54] A. Nemirovski. Prox-Method with Rate of Convergence O(1/t) for Vari-


ational Inequalities with Lipschitz Continuous Monotone Operators and
Smooth Convex-Concave Saddle Point Problems. SIAM Journal on Opti-
mization 15 (2005), 229-251.

[55] Yu. Nesterov. Dual extrapolation and its applications for solving varia-
tional inequalities and related problems. Math. Program., Ser. B 109 (2007),
319344.

[56] M. Ng , L. Qi , Y. Yang & Y. Huang, On Semismooth Newtons Methods for


Total Variation Minimization. Journal of Mathematical Imaging and Vision,
3(27), 2007.

[57] M. Nikolova. Minimizers of cost-functions involving nonsmooth data fidelity


terms, SIAM J. Numer. Anal., 40 (2002), 965-994.

[58] M. A. Noor. New extragradient-type methods for general variational inequal-


ities. Journal of Math. Anal. Appl. 277 (2003), 379-394.

[59] S. Osher & A. Marquina. Explicit algorithms for a new time dependent model
based on level set motion for nonlinear deblurring and noise removal, SIAM
J. Sci. Comput. 22 (2000), pp. 387-405.

[60] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise
removal algorithms, Physica D 60 (1992), pp. 259–268.

[61] S. Osher, M. Burger, D. Goldfarb, J. Xu & W. Yin, An Iterative Regu-


larization Method for Total Variation Based Image Restoration. SIAM J.
Multiscale Modeling and Simulation 4(2), 460-489, 2005.

124
[62] T. Serafini, G. Zanghirati, & L. Zanni. Gradient projection methods for large
quadratic programs and applications in training support vector machines,
Optimzation Methods and Software 20 (2004), pp. 353–378.

[63] J. Savage & K. Chen. An improved and accelerated nonlinear multigrid


method for total-variation denoising, Int. J. Comput. Math., 82 (2005),
1001-1005.

[64] P. S. Vassilevski & J. G. Wade. A comparison of multilevel methods for total


variation regularization, Electronic Trans. On Numer. Anal., 6 (1997), 255-
270.

[65] L. Vese & S. Osher. Modeling Textures with Total Variation Min- imization
and Oscillating Patterns In Image Processing, J. Math. Imaging Vision, 20
(2004), 7-18.

[66] C. R. Vogel & M. E. Oman. Iterative methods for total variation denoising,
SIAM J. Sci. Stat. Comput. 17 (1996), pp. 227–238.

[67] C. R. Vogel & M. E. Oman. Fast, robust total variation-based reconstruction


of noisy, blurred images, IEEE Trans. Image Proc., 7 (1998), 813-824.

[68] C. R. Vogel. A multigrid method for total variation-based image denoising,


in: Computation and control IV, 20, Progress in systems and control theory,
Birkhauser, 1995.

[69] C. R. Vogel. Negative resutls for multilevel preconditioners in image deblur-


ring, in: Scale-space theroies in computer vision, Springer-Verlag, 1999.

[70] S. WRIGHT, Primal-Dual Interior-Point Methods, SIAM, Philadelphia,


1997.

[71] N. Xiu & J. Zhang. Some recent advances in projection-type methods for
variational inequalities. J. of Comp. and Applied Math. 152 (2003), 559-585.

[72] B. Zalesky, Network Flow Optimization for Restoration of Images. Journal


of Applied Mathematics, Vol. 2, No. 4, pp. 199218, 2002.

125

You might also like