(Scientific Computation) Guy Chavent (auth.) - Nonlinear Least Squares for Inverse Problems_ Theoretical Foundations and Step-by-Step Guide for Applications-Springer Netherlands (2010)
(Scientific Computation) Guy Chavent (auth.) - Nonlinear Least Squares for Inverse Problems_ Theoretical Foundations and Step-by-Step Guide for Applications-Springer Netherlands (2010)
Scientific Computation
Editorial Board
J.-J. Chattot, Davis, CA, USA
P. Colella, Berkeley, CA, USA
Weinan E, Princeton, NJ, USA
R. Glowinski, Houston, TX, USA
M. Holt, Berkeley, CA, USA
Y. Hussaini, Tallahassee, FL, USA
P. Joly, Le Chesnay, France
H. B. Keller, Pasadena, CA, USA
J. E. Marsden, Pasadena, CA, USA
D. I. Meiron, Pasadena, CA, USA
O. Pironneau, Paris, France
A. Quarteroni, Lausanne, Switzerland
and Politecnico of Milan, Italy
J. Rappaz, Lausanne, Switzerland
R. Rosner, Chicago, IL, USA
P. Sagaut, Paris, France
J. H. Seinfeld, Pasadena, CA, USA
A. Szepessy, Stockholm, Sweden
M. F.Wheeler, Austin, TX, USA
With 25 Figures
123
Guy Chavent
Ceremade, Université Paris-Dauphine
75775 Paris Cedex 16
France
and
Inria-Rocquencourt
BP 105, 78153 Le Chesnay Cedex
France
[email protected]
ISSN 1434-8322
ISBN 978-90-481-2784-9 e-ISBN 978-90-481-2785-6
DOI 10.1007/978-90-481-2785-6
Springer Dordrecht Heidelberg London New York
Springer
c Science+Business Media B.V. 2009
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written
permission from the Publisher, with the exception of any material supplied specifically for the purpose
of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
vii
viii PREFACE
holds. The various regularization techniques used to trim the inverse problem
can then be checked against their ability to produce the desirable Q-wellposed
problems.
The second objective of the book is to give a detailed presentation of im-
portant practical issues for the resolution of NLS problems: sensitivity func-
tions and adjoint state methods for the computations of derivatives, choice of
optimization parameters (calibration, sensitivity analysis, multiscale and/or
adaptive parameterization), organization of the inversion code, and choice
of the descent step for the minimization algorithm. Most of this material is
seldom presented in detail, because it is quite elementary from the mathe-
matical point of view, and has usually to be rediscovered by trial-and-error!
As one can see from these objectives, this book does not pretend to give
an exhaustive panorama of nonlinear inverse problems, but merely to present
the author’s view and experience on the subject. Alternative approaches,
when known, are mentioned and referenced, but not developed. The book is
organized in two parts, which can be read independently:
Part I (Chaps. 1–5) is devoted to the step-by-step resolution and analysis
of NLS inverse problems. It should be of interest to scientists of various
application fields interested in the practical resolution of inverse problems,
as well as to applied mathematicians interested also in their analysis. The
required background is a good knowledge of Hilbert spaces, and some notions
of functional analysis if one is interested in the infinite dimensional examples.
The elements of the geometrical theory of Part II, which are necessary for
the Q-wellposedness analysis, are presented without demonstration, but in
an as-intuitive-as-possible way, at the beginning of Chap. 4, so that it is not
necessary to read Part II, which is quite technical.
Part II (Chaps. 6–8) presents the geometric theory of quasi-convex and
strictly quasi-convex sets, which are the basis for the results of Chaps. 4 and 5.
These sets possess a neighborhood where the projection is well-behaved, and
can be recognized by their finite curvature and limited deflection. This part
should be of interest to those more interested in the theory of projection
on nonconvex sets. It requires familiarity with Hilbert spaces and functional
analysis. The material of Part II was scattered in various papers with different
notations. It is presented for the first time in this book in a progressive
and coherent approach, which benefits from substantial enhancements and
simplifications in the definition of strictly quasi-convex sets.
To facilitate a top-to-bottom approach of the subject, each chapter starts
with an overview of the concepts and results developed herein – at the price of
PREFACE ix
some repetition between the overview and the main corpus of the chapter. . . .
Also, we have tried to make the index more user-friendly, all indexed words
or expressions are emphasized in the text (but not all emphasized words are
indexed!).
I express my thanks to my colleagues, and in particular to François
Clement, Karl Kunisch, and Hend Benameur for the stimulating discussions
we had over all these years, and for the pleasure I found interacting with
them.
Preface vii
2 Computing Derivatives 29
2.1 Setting the Scene . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 The Sensitivity Functions Approach . . . . . . . . . . . . . . . 33
2.3 The Adjoint Approach . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Implementation of the Adjoint Approach . . . . . . . . . . . . 38
2.5 Example 1: The Adjoint Knott–Zoeppritz Equations . . . . . . 41
xi
xii CONTENTS
3 Choosing a Parameterization 79
3.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.1.1 On the Parameter Side . . . . . . . . . . . . . . . . . . 80
3.1.2 On the Data Side . . . . . . . . . . . . . . . . . . . . . 83
3.1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2 How Many Parameters Can be Retrieved from the Data? . . . 84
3.3 Simulation Versus Optimization Parameters . . . . . . . . . . 88
3.4 Parameterization by a Closed Form Formula . . . . . . . . . . 90
3.5 Decomposition on the Singular Basis . . . . . . . . . . . . . . 91
3.6 Multiscale Parameterization . . . . . . . . . . . . . . . . . . . 93
3.6.1 Simulation Parameters for a Distributed Parameter . . 93
3.6.2 Optimization Parameters at Scale k . . . . . . . . . . . 94
3.6.3 Scale-By-Scale Optimization . . . . . . . . . . . . . . . 95
3.6.4 Examples of Multiscale Bases . . . . . . . . . . . . . . 105
3.6.5 Summary for Multiscale Parameterization . . . . . . . 108
3.7 Adaptive Parameterization: Refinement Indicators . . . . . . . 108
3.7.1 Definition of Refinement Indicators . . . . . . . . . . . 109
3.7.2 Multiscale Refinement Indicators . . . . . . . . . . . . 116
3.7.3 Application to Image Segmentation . . . . . . . . . . . 121
3.7.4 Coarsening Indicators . . . . . . . . . . . . . . . . . . . 122
3.7.5 A Refinement/Coarsening Indicators Algorithm . . . . 124
CONTENTS xiii
Bibliography 345
Index 353
Part I
1
PART I: NONLINEAR LEAST SQUARES 3
of all quantities that are input to the calculation, and state vector the vector
made of all quantities one has to compute to solve the state equations (here
at a given incidence angle θ).
We have supposed in the above formulas that the incidence angle θ is
smaller than the critical angle, so that the reflection coefficient R computed
by formula (1.2) is real, but the least squares formulation that follows can
be extended without difficulty to postcritical incidence angles with complex
reflection coefficients.
8 CHAPTER 1. NONLINEAR INVERSE PROBLEMS
Remark
Other choices are possible forthe data misfit: one could, for example, replace
the Euclidean
q r 1/r norm v = ( q 2 1/2
1 i)
v on the data space by the r-norm v =
( 1 vi ) for some r > 1, as such a norm is known to be less sensitive to
outliers in the data for 1 < r < 2. The results of Chaps. 2 and 3 on derivation
and parameterization generalize easily to these norms. But the results on Q-
wellposedness and regularization of Chap. 4 and 5 hold true only for the
Euclidean norm.
But it is much more general, and allows also to handle infinite dimensional
problems, as we shall see in Sects. 1.4–1.6. So we shall suppose throughout
this book that the following minimum set of hypothesis holds:
⎧
⎪
⎪ E = Banach space, with norm E ,
⎪
⎪
⎪
⎪ C ⊂ E with C convex and closed,
⎪
⎪
⎨ F = Hilbert space, with norm F ,
z ∈ F (1.12)
⎪
⎪
⎪
⎪ ϕ : C F is differentiable along segments of C,
⎪
⎪
⎪
⎪ and : ∃ αM ≥ 0 s.t. ∀x0 , x1 ∈ C, ∀t ∈ [0, 1]
⎩
Dt ϕ((1 − t)x0 + tx1 ) ≤ αM x1 − x0 .
These hypothesis are satisfied in most inverse problems – and in all those we
address in this book, including of course the inversion of the Knott–Zoeppritz
equations of Sect. 1.1 – but they are far from being sufficient to ensure good
properties of the NLS problem (1.10).
1.3.1 Wellposedness
The first question is wellposedness: does (1.10) has a unique solution x̂ that
depends continuously on the data z? It is the first question, but it is also a
difficult one because of the nonlinearity of ϕ. It is most likely to be answered
negatively, as the word ill-posed is almost automatically associated with in-
verse problem. To see where the difficulties arise from, we can conceptually
split the resolution of (1.10) into two consecutive steps:
Projection step: given z ∈ F , find a projection X̂ of z on ϕ(C)
Preimage step: given X̂ ∈ ϕ(C), find one preimage x̂ of X̂ by ϕ
Difficulties can – and usually will – arise in both steps:
1. There can be more than one preimage x̂ of X̂ if ϕ is not injective
2. Even if ϕ is injective, its inverse ϕ−1 : X̂ → x̂ may not be continuous
over ϕ(C)
1.3. ANALYSIS OF NLS PROBLEMS 11
z = X̂, (1.13)
1.3.2 Optimizability
The next question that arises as soon as one considers solving (1.10) on a
computer is the possibility to use performant local optimization techniques
to find the global minimum of the objective function J. It is known that
such algorithms converge to stationary points of J, and so it would be highly
desirable to be able to recognize the case where J has no parasitic stationary
points on C, that is, stationary points where J is strictly larger than its
global minimum over C. It will be convenient to call unimodal functions with
this property, and optimizable the associated least squares problem. Convex
functions, for example, are unimodal, and linear least squares problems are
optimizable.
Stationary points of J are closely related to stationary points of the dis-
tance to z function over ϕ(C), whose minimum gives the projection of z
on ϕ(C). Hence we introduce in Chap. 7 a generalization of convex sets to
strictly quasi-convex (s.q.c.) sets D ⊂ F , which possess a neighborhood ϑ on
which the distance to z function is unimodal over D [19].
Sufficient conditions for a set to be s.q.c. are developed in Chap. 8, in
particular, in the case of interest for NLS problems where D is the attainable
set ϕ(C).
As it will turn out (Theorem 7.2.10), s.q.c. sets are indeed quasi-convex
sets, and so requiring that the output set is s.q.c. will solve both the well-
posedness and optimizability problems at once.
The sufficient conditions are then applied in Sects. 4.8 and 4.9 to analyze
the Q-wellposedness of the estimation of the diffusion coefficient in one- and
two-dimensional elliptic equations, as described in Sects. 1.4 and 1.6, when
an H 1 measurement of the solution is available.
1.3.4 Regularization
OLS-identifiability is a very strong property, so one cannot expect that
it will apply to the original NLS problem, when this one is known to be
ill-posed.
On the other hand, the only NLS problems that should reasonably be
solved on the computer at the very end are those where OLS-identifiability
holds, as this ensures the following:
– Any local optimization algorithm will converge to the global minimum
– The identified parameter is stable with respect to data perturbations
which are the only conditions under which one can trust the results of the
computation.
Hence the art of inverse problems consists in adding information of various
nature to the original problem, until a Q-wellposed NLS problem is achieved,
or equivalently OLS-identifiability holds: this is the regularization process.
The sufficient conditions for OLS-identifiability of Chap. 4 will then allow to
check whether the regularized problem has these desirable properties, and
help to make decisions concerning the choice of the regularization method.
Adapted Regularization
If the added a-priori information is (at least partly) incompatible with the
one conveyed from the data by the model, the optimal regularized param-
eter has to move away from its true value to honor both information. So
one should ideally try to add information that tends to conflict as little as
possible with the one coming from the data. Such an adapted regularization
is recommended whenever it is possible. But it requires a refined analysis of
the forward map ϕ, which is not always possible, and has to be considered
case by case. We give in Sect. 5.4 an example of adapted regularization for the
estimation of the diffusion coefficient in a two-dimensional elliptic equation.
Regularization by Parameterization
This is one of the most commonly used approach: instead of searching for
the parameter x, one searches for a parameter xopt related to x by
x = ψ(xopt ), xopt ∈ Copt with ψ(Copt ) ⊂ C, (1.16)
where ψ is the parameterization map, and Copt is the admissible parameter
set for the optimization variables xopt (Sect. 3.3). Satisfying the inclusion
ψ(Copt ) ⊂ C is not always easy, and it has to be kept in mind when choosing
the parameterization mapping ψ.
Parameterization is primarily performed to reduce the number of un-
known parameters, but it has also the effect, when x is a function, to impose
regularity constraints, which have usually a stabilizing effect on the inverse
problem. The regularized problem is now
1
x
opt minimizes J ◦ ψ(xopt ) = ϕ(ψ(xopt )) − z2F over Copt . (1.17)
2
We address in Chap. 3 some practical aspects of parameterization:
– How to calibrate the parameters and the data
– How to use the singular value decomposition of the Jacobian D = ϕ(x)
to estimate the number of independent parameters that can be estimated for
a given level of uncertainty on the data
– How to use multiscale approximations and/or refinement indicators in
order to determine an adaptive parameterization, which makes the prob-
lem optimizable, explains the data up to the noise level, and does not lead
to over-parameterization
– How to organize the inversion code to make experimentation with var-
ious parameterizations more easy
– How to use in minimization algorithms a descent step adapted to the
nonlinear least square structure of the problem
Such smoothness constraints are often used to ensure that the regular-
ized problem (1.19) is a FC least squares problem. FC problems were
introduced in [28] under the less descriptive name of “weakly nonlinear”
problems, and are defined in Sect. 4.2 (Definition 4.2.1). They represent
a first necessary step towards Q-wellposedness.
For example, in the estimation of the diffusion coefficient a in a two-
dimensional elliptic equation (Sect. 1.6), the natural admissible set C
is defined by (1.66):
The first regularization step we shall take in Sect. 4.9 will be to replace
C by the set defined in (4.106), which we rewrite here as:
C = { a ∈ E | am ≤ a(ξ) ≤ aM ∀ξ ∈ Ω, (1.21)
|a(ξ1 ) − a(ξ0 )| ≤ bM ξ1 − ξ0 ∀ξ0 , ξ1 ∈ Ω },
For example, the second regularization step we shall take in Sect. 4.9
for the same diffusion coefficient problem as earlier will be to reduce,
in the admissible set (1.21), the interval between the lower and upper
bounds on a until the condition
π
aM − am ≤ am (1.22)
4
is satisfied. Proposition 4.9.3 will then ensure that the deflection condi-
tion is satisfied, and make the problem ready for LMT regularization.
2
x̂ minimizes J (x) = J(x) + x − x0 2E over C. (1.25)
2
Its properties have been studied extensively when ϕ is linear, and we refer,
for example, to the monographs [8, 43, 59, 63, 38, 55, 12]. The main result for
linear problems is that the regularized problem (1.25) is wellposed as soon as
> 0, and that the solution x̂,δ of (1.25) corresponding to noise corrupted
data zδ converges to the x0 -minimum norm solution of (1.10) – that is, the
solution of (1.10) closest to x0 , provided that such a solution exists, and
18 CHAPTER 1. NONLINEAR INVERSE PROBLEMS
that goes to zero more slowly than the noise level δ = zδ − z. We give
at the beginning of Chaps. 4 and 5 a short summary of properties of linear
least-squares problems.
We shall see that all the nice properties of LMT regularization for the
linear case extend to the class of FC/LC problems, where the attainable set
has a FC, and the size of the admissible set has been chosen small enough for
the deflection condition to hold (see Regularization by size reduction above).
LMT regularization of FC/LC problems is studied in Sect. 5.1 and is applied
in Sect. 5.2 to the the source identification problem of Sect. 1.5.
But for general nonlinear problems, the situation is much less brilliant.
In particular, there is no reason for the regularized problem (1.25) to be
Q-wellposed for small ’s! Hence when → 0, problem (1.25) may have more
than one solution or stationary point: convergence results can still be ob-
tained for sequences of adequately chosen solutions [37, 67], but optimiz-
ability by local minimization algorithms is lost. We give in Sect. 5.1.3 an
estimation of the minimum amount of regularization to be added to restore
Q-wellposedness.
State-Space Regularization
When the original problem (1.10) has no finite curvature, which is alas almost
a generic situation for infinite dimensional parameters, it is difficult to find a
classical regularization process that ensures Q-wellposednes of the regularized
problem for small ’s. One approach is then to decompose (see Sect. 1.3.5)
the forward map ϕ into the resolution of a state equation (1.33) followed by
the application of an observation operator (1.34)
ϕ = M ◦ φ, (1.26)
where φ : x y is the solution mapping of the state equation, and M :
y v is the observation operator.
The state-space regularized version of problem (1.10) is then [26, 29]
2
x̂ minimizes J (x) = J(x) + φ(x) − ŷ 2 over C ∩ B , (1.27)
2
where B is a localization constraint, and ŷ is the solution of the auxiliary
unconstrained problem
1 2
ŷ minimizes M(y) − z2F + y − y0 2Y over Y, (1.28)
2 2
where y0 is an a priori guess for the state vector.
1.3. ANALYSIS OF NLS PROBLEMS 19
For example, in the parameter estimation problems of Sects. 1.4 and 1.6,
one can think of x as being the diffusion coefficient a, of y as being the solu-
tion of the elliptic equation in the Sobolev space H 1 (Ω), and of v as being a
measure of y in L2 (Ω). State-space regularization would then consist in using
first LMT regularization to compute a smoothed version y ∈ H 1 (Ω) of the
data z ∈ L2 (Ω) by solving the auxiliary problem (1.28), and then to use this
smoothed data as extra information in the regularized NLS problem (1.27).
Smoothing data before inverting them has been a long-time favorite in
the engineering community, and so state-space regularization is after all not
so unnatural. It is studied in Sect. 5.3. The main result is demonstrated in
Sect. 5.3.1: problems (1.27) and (1.28) remain Q-wellposed when → 0, pro-
vided one can choose the state-space Y as follows:
– The problem of estimating x ∈ C from a measure of φ(x) in Y is
Q-wellposed (this requires usually that the the norm on Y is strong enough)
– The observation mapping M is linear and injective
This covers, for example, the case of the parameter estimation problems of
Sects. 1.4 and 1.6 when only L2 observations are available.
A partial result for the case of point and/or boundary measurements,
which do not satisfy the injectivity condition, is given in Sect. 5.3.2.
Common Features
All the regularized problems can be cast into the same form as the original
problem (1.10). It suffices to perform the following substitutions in (1.10):
⎧
⎪
⎪ C ← Copt
⎪
regularization ⎪ ⎨ ϕ(x) ← ϕ(ψ(xopt ))
by F ← F (1.29)
⎪
parameterization ⎪⎪
⎪ F ← F
⎩
z ← z
⎧
⎪
⎪ C ← C
⎪
⎪
⎨ ϕ(x) ← ϕ(x)
reduction
F ← F (1.30)
of C ⎪
⎪
⎪
⎪ F ← F
⎩
z ← z
20 CHAPTER 1. NONLINEAR INVERSE PROBLEMS
⎧
⎪
⎪ C ← C
⎪
⎪ ←
⎨ ϕ(x) (ϕ(x), x)
LMT
F ← F = F × E (1.31)
regularization ⎪
⎪ 2
⎪
⎪ F ← F = ( 2F + 2E )1/2
⎩ 2
z ← (z, x0 )
⎧
⎪
⎪ C ← C
⎪
⎪ ←
⎨ ϕ(x) (M φ(x), φ(x))
state-space
F ← F = F × Y (1.32)
regularization ⎪
⎪ 2
⎪
⎪ F ← F = ( 2F + 2Y )1/2
⎩ 2
z ← (z, y )
1.3.5 Derivation
The last difficulty we shall address is a very practical one, and appears as
soon as one wants to implement a local optimization algorithm to solve a
NLS problem of the form (1.10): one needs to provide the optimization code
with the gradient ∇J of the objective function, or the derivative or Jacobian
D = ϕ (x) of the forward map.
To analyze this difficulty, it is convenient to describe ϕ at a finer level by
introducing a state-space decomposition. So we shall suppose, when needed,
that v = ϕ(x) is evaluated in two steps:
⎧
⎨ given x ∈ C solve:
e(x, y) = 0 (1.33)
⎩
with respect to y in Y ,
followed by
equation:
− (auξ )ξ = gj δ(ξ − ξj ), ξ ∈ Ω, (1.37)
j∈J
over the domain Ω = [0, 1], with a right-hand side made of Dirac sources:
⎧
⎨ ξj denotes the location of the jth source,
gj ∈ IR denotes the amplitude of the jth source, (1.38)
⎩
J is a finite set of source indexes.
We complement (1.37) with Dirichlet boundary conditions:
u(0) = 0, u(1) = 0. (1.39)
Equations (1.37) and (1.38) models, for example, the temperature u in
a one-dimensional slab at thermal equilibrium, heated by point sources of
amplitude gi at locations xi , and whose temperature is maintained equal to
zero at each end, in which case a is the thermal diffusion coefficient.
A physically and mathematically useful quantity is
q(ξ) = −a(ξ) uξ (ξ), ξ ∈ Ω, (1.40)
which represents the (heat) flux distribution over Ω, counted positively in
the direction of increasing ξ’s. Equations (1.37) and (1.38) are simple in the
sense that the flux q can be computed by a closed form formula
− H(ξ),
q(ξ) = H (1.41)
where ξ
H(ξ) = gj δ(ξ − ξj ) dξ (1.42)
0 j∈J
excludes, e.g., the case where ∂ΩD and ∂ΩN are the two halves of the same
circle). In the two-dimensional case, it is satisfied, for example, if ∂ΩN is the
external boundary of Ω, and ∂ΩD is the boundary of a hole in the domain,
as the boundary of a source or sink, for example.
The elliptic equation (1.54) is nonlinear, but with a nonlinearity in the
lowest order term only. It admits a unique solution u ∈ Y [56], where
∂w
Y = {w ∈ L2 (Ω) | ∈ L2 (Ω) , i = 1 . . . m , w = 0 on ∂ΩD }. (1.58)
∂ξi
The source problem consists here in estimating the source terms f and/or
g of the elliptic problem (1.54) from a measurement z in the data space
F = L2 (Ω) of its solution u (the observation operator M is then simply the
canonical injection from the subspace Y of H 1 into L2 ). We shall denote by
def
C ⊂ E = L2 (Ω) × L2 (∂ΩN ), closed, convex, bounded (1.59)
the set of admissible f and g, to be chosen later, and by
def
z ∈ F = L2 (Ω), (1.60)
a (noisefree) measure of the solution u of (1.54).
Estimation of f and g in C from the measure z of u amounts to solve the
nonlinear least-squares problem
ˆ ĝ minimizes 1 ϕ(f, g) − z 2 2
f, over C, (1.61)
2 L (Ω)
where
ϕ : (f, g) ∈ C u ∈ F = L2 (Ω) solution of (1.54) (1.62)
is the forward map, which is obviously injective. Hence one is here in the
case of identifiable parameters.
Given a sequence zn ∈ L2 (Ω), n = 1, 2 . . . , of noise corrupted measure-
ments of z and another sequence n > 0 of regularization coefficients such
that
|zn − z|L2 (Ω) → 0, n → 0,
and the LMT-regularization of (1.61) is
⎧
⎪
⎪ fˆn , ĝn minimizes
⎨ 2
1 2 2 2 2
ϕ(f, g) − zn + n f − f0 L2 (Ω) + n g − g0 L2 (∂Ω ) (1.63)
⎪
⎪
⎩ 2 L2 (Ω) 2 2 N
over C,
26 CHAPTER 1. NONLINEAR INVERSE PROBLEMS
We show in Sect. 5.2 that (1.61) is a FC problem. This will imply, under
adequate hypothesis on the size of C, that it is also a FC/LD problem.
Hence its LMT-regularization (1.63) is Q-wellposed for large n, and fˆn , ĝn
will converge to the solution fˆ, ĝ of (1.61).
In the context of fluid flow through porous media, the solution u of (1.64)
is the fluid pressure, ∂Ωj represents the boundary of the jth well, and Qj its
injection or production rate. In two-dimensional problems as the one we are
considering here, a is the transmissivity, that is, the product of the perme-
ability of the porous medium with its thickness. It is not accessible to direct
measurements, and so it is usually estimated indirectly from the available
pressure measurements via the resolution of an inverse problem, which we
describe now.
The natural admissible parameter set is
def
C = {a : Ω IR | 0 < am ≤ a(ξ) ≤ aM a.e. on Ω}, (1.66)
where am and aM are given lower and upper bounds on a, Then for any
a ∈ C, (1.64) admits a unique solution u in the state-space:
∂w
Y = {w ∈ L2 (Ω) | ∈ L2 (Ω) , i = 1, 2 , w|∂ΩD = 0}. (1.67)
∂ξi
Because of the hypothesis (1.65), we can equip Y with the norm
where
def
IL2 (Ω) = L2 (Ω) × L2 (Ω), vIL2 (Ω) = (|v1 |2L2 (Ω) + |v2 |2L2 (Ω) )1/2 . (1.69)
M : w ∈ Y ∇w ∈ F. (1.70)
Computing Derivatives
We refer to Chap. 3 for the last point, but we do not discuss here the the
other discretizations, and we simply suppose that they are made according
to the state of the art, so that the resulting discrete objective function is a
reasonable approximation to the continuous one. We also do not discuss the
convergence of the parameter estimated with the discrete model to the one
estimated using the infinite dimensional model, and we refer for this to the
book by Banks, Kunisch, and Ito [7].
So our starting point is the finite dimensional NLS problem, which is to
be solved on the computer: the unknown parameter x is a vector of IRn , the
state y a vector of IRP , and the output v = ϕ(x) a vector of IRq . We shall dis-
cuss different techniques, such as the sensitivity function and adjoint state
approaches, for the computation of the derivatives required by local opti-
mization methods, and give on various examples a step-by-step presentation
of their implementation.
The derivatives computed in this way are called discrete: they are the
exact derivatives of the discrete objective function. For infinite dimensional
problems, they are obtained by following the first discretize then differen-
tiate rule. When the discrete equations that describe ϕ are too complicated,
one can be tempted to break the rule, and to first calculate the derivatives
on the continuous model, which is usually simpler, at least formally, and
then only to discretize the resulting equations and formula to obtain the
derivatives. The derivatives computed in this way are called discretized, and
are only approximate derivatives of the discrete objective function. This is
discussed in Sect. 2.8 on an example.
The material of this chapter is not new, but it is seldom presented in a
detailed way, and we believe it can be useful to practitioners.
followed by
set ϕ(x) = v. (2.4)
Equation (2.4) corresponds to the observation operator (y, v) → v, which is
now simply a (linear) selection operator.
So we will always suppose in the sequel that the decomposition (2.1) and
(2.2) has been chosen in such a way that it corresponds to some computa-
tional reality, with the state equation being the “hard part” of the model,
where most of the computational effort rests, and the observation opera-
tor being the “soft part,” given by simple and explicit formulas. We shall
also suppose that the decomposition satisfies the minimum set of hypothesis
(1.35).
Once the decomposition (2.1) and (2.2) of the forward map and the norm
·F on the data space have been chosen, one is finally faced with the nu-
merical resolution of the inverse problem:
1 2
x̂ minimizes J(x) = ϕ(x) − z F over C. (2.5)
2
Among the possible methods of solution for problem (2.5), optimization
algorithms are in good place, although other possible approaches exist, as,
for example, the resolution of the associated optimality condition (see, e.g.,
[14, 52, 53]). We can pick from two large classes of optimization algorithms:
the global minimum of J under quite general conditions, and are very
user-friendly as the only input they require is a code that computes
J(x). The price is the large number of function evaluation required
(easily over one hundred thousand iterations for as few as ten parame-
ters), and so the use of these algorithms tends to be limited to problems
where the product size of x by computation time of J is not too large.
∂y
sj = M (y) , j = 1 . . . n. (2.8)
∂xj
We shall use the shorthand notation ∇G for this gradient. Depending on the
derivative one wants to compute, different choices are possible for G(x, v):
• If one chooses
1
G(x, v) = v − z2F (independant of v !), (2.10)
2
then
G(x, ϕ(x)) = J(x) ∀x ∈ C, (2.11)
so that
∇G = ∇J,
and the adjoint approach computes the gradient of J.
• If one chooses
G(x, v) = v, eiF , (2.12)
where ei is the ith basis vector of IRq , and E = IRn and F = IRq are
equipped with usual Euclidian scalar products, then
so that
∇G = ϕ (x)T ei = D T ei = riT ,
where ri is the ith row of D = ϕ (x). In that case, the adjoint approach
computes the Jacobian D of ϕ row by row.
where now E = IRn and F = IRq are equipped with scalar products
, E and , F , then similarly G(x, ϕ(x)) = ϕ(x), gv F , and
where gradient and transposition are relative to the chosen scalar prod-
ucts on E and F . Hence the adjoint approach will compute the result gx
of the action of the transposed Jacobian D T on any vector gv without
having to assemble the whole matrix D, transpose it, and perform the
matrix × vector product D T gv .
2.3. THE ADJOINT APPROACH 35
Remark 2.3.1 Because of formula (2.9), the choice (2.13) for G with gv =
ϕ(x) − z leads to the computation of ∇J, as did the choice (2.10). But both
choices will produce the same final formula for ∇J.
Remark 2.3.2 The adjoint approach with the choice (2.13) for G is used
when it comes to change parameters in an optimization problem: the gradient
of J with respect to the optimization parameter vector xopt is given by
where xsim is the simulation parameter vector, and ψ is the xopt xsim map-
ping (see Sect. 3.3). An example of such a calculation is given in Sect. 3.8.2.
∂e
∇v G(x, v), M (y)δyF + (x, y)δy, λ = 0,
∂y
Z
∂e
M (y) ∇v G(x, v), δyY +
T
δy, (x, y) λ T
= 0,
∂y Y
∂e
M (y)T ∇v G(x, v) + (x, y)T λ = 0. (2.21)
∂y
2.3. THE ADJOINT APPROACH 37
∂e
∇G = ∇x G(x, M(y)) + (x, y)T λ. (2.22)
∂x
It is, however, not advisable to use formulas (2.21) and (2.22) in practice,
∂e ∂e
as they require to write down the matrices ∂y (x, y) and ∂x (x, y), which can
be very large. Moreover, matrix transposition requires some thinking when it
is made with respect to weighted scalar products.
Despite their abstract appearance, the variational formulations (2.17)
(2.19) of the adjoint and gradient equations, which are based on the ex-
plicit Lagrangian function (2.16), are the most convenient to use in the
applications.
38 CHAPTER 2. COMPUTING DERIVATIVES
The choice of G will depend on the derivative one wants to compute, see
(2.10), (2.12), and (2.13) for examples. It is also necessary to specify the scalar
products on IRn and IRq to make gradient and transposition well defined:
Step 2: Lagrangian
Combine the objective function G(x, v) chosen in step 0 and the decomposi-
tion e(x, y) = 0 , v = M(y) of ϕ chosen in step 2 to build up the Lagrangian:
of freedom can be used to make the e(x, y), λZ term in (2.25) to mimic the
corresponding term of the continuous Lagrangian. This will allow to interpret
the vector λ as a discretization of the continuous adjoint variable.
A useful check is the following:
• Write down explicitly what the parameter vector x, the state vector y,
the state-space Y , and its tangent space δY are in the problem under
consideration
From this point on, there is no more decision to make all calculations follow
from the formula (2.25) for the Lagrangian and the scalar product , E on
the parameter space chosen in step 0.
This reorganization is the most delicate and tedious part. Once it is done,
the “computational form” of the adjoint equations for λ ∈ IRp are obtained
by equating to zero the coefficients of δyj in (2.26):
hj (x, yx , λ) = 0 ∀j = 1 . . . p. (2.27)
∂L
∇G, δxE = (x, yx , λx ) δx ∀δx ∈ IRn . (2.28)
∂x
When IRn is equipped with the usual Euclidean scalar product, the ith com-
ponent of ∇G is simply the coefficient of δxi in ∂L
∂x
(x, yx , λx ) δx.
Step 2: Lagrangian
With the objective function G of step 0 and the state-space decomposition
of step 1, the Lagrangian reads
L(x, y, λ) = (2.32)
R +λ1 (e − eS − eρ )
+λ2 (f − 1 + e2ρ )
+λ3 (S1 − χ(1 + eP ))
2.5. EXAMPLE 1: ADJOINT EQUATIONS 43
∂L
(x, y, λ) = (2.38)
∂x
−λ1 (δeS + δeρ )
+λ2 2eρ δeρ
−λ3 (δχ (1 + eP ) + χ δeP )
46 CHAPTER 2. COMPUTING DERIVATIVES
∂R ∂R ∂R ∂R ∂L
δeρ + δeP + δeS + δχ = (x, yx , λx ) δx (2.39)
∂eρ ∂eP ∂eS ∂χ ∂x
∀δx ∈ IRn .
∂R
Comparing (2.38) and (2.39), we see that is the coefficient of δeρ in
∂eρ
(2.38), etc..., which gives
∂R
= −λ1 + 2λ2 eρ − λ13 (2.40)
∂eρ
∂R
= χ(λ4 − λ3 ) (2.41)
∂eP
∂R
= −λ1 − λ5 T1 + λ6 T2 (2.42)
∂eS
∂R
= −λ3 (1 + eP ) − λ4 (1 − eP ) (2.43)
∂χ
Remark 2.5.1 We are here in the favorable situation where the forward map
ϕ is the juxtaposition of q independent “component” maps ψ1 , . . . , ψq . So if
we value to 1 the computational cost of ϕ, the cost of each ψi , i = 1, . . . , q, is
1/q. Then each row ri of D = ϕ (x) can be computed as earlier as ∇ψiT at
the cost of 1/q (one adjoint equation for ψi ), so that the whole Jacobian D
can be evaluated at an additional cost of 1, that is, at the same cost as the
gradient ∇J in the general case.
◦
where ψ is any function on Ω whose restriction to the interior K of an
◦
element K of Th is continuous on K , and has a limit on the edges and
vertices of K. Hence the restrictions ψ|K to K and ψ|A to a boundary
edge A are well defined at the vertices of K and the endpoints of A.
This ensures that IK (ψ|K ) and I∂ΩN (ψ) in the right-hand sides of the
Definition (2.53) make sense.
We shall use in the sequel the quadrature formula (2.53) for functions
ψ, which are either continuous on Ω or are piecewise constant on Th .
The sets of node ∂ΩN,h and ∂ΩD,h form a partition of the subset ∂Ωh of Ωh
made of nodes located on the boundary ∂Ω. All the terms in the variational
formulation (2.47) have now a finite dimensional counterpart, and so we can
define the finite dimensional variational formulation:
⎧
⎨ find uh ∈ Wh with uh|∂ΩD = ue,h such that
IΩ (ah ∇uh ∇wh )+IΩ (kh (uh )wh ) = IΩ (fh wh )+I∂ΩN (gh wh ) (2.58)
⎩
for all wh ∈ Wh with wh|∂ΩD = 0.
As the node of the quadrature formulas IΩ and I∂ΩN coincide with the nodes
M ∈ Ωh of the degrees of freedom of uh , it is a simple matter to deduce from
the above equation, where uh ∈ Wh is still a function, a system of nonlinear
equations for the vector of degrees of freedom uM ∈ Ωh :
⎧
⎪
⎪ uM = ue,M
∀M ∈ ∂ΩD,h ,
⎪
⎪ −
⎨ P ∈Ωh \∂ΩD,h M,P P
A u + α k (u
M h M ) = P ∈∂ΩD,h AM,P ue,P
+αM fM (2.59)
⎪
⎪
⎪
⎪ (+∂αM gM if M ∈ ∂ΩN,h ),
⎩
∀M ∈ Ωh\∂ΩD,h .
where
AM,P = IΩ (ah ∇wM ∇wP ) ∀M, P ∈ Ωh\∂ΩD,h , (2.60)
and where αM and ∂αM are geometric coefficients related to the triangula-
tion Th :
1 1
αM = |K|, ∂αM = |A|. (2.61)
3 K∈T ,KM 2 A⊂∂Ω ,AM
h N
and we define the scalar product between λD,h and ue,h − uh |∂ΩD via a
quadrature formula I∂ΩD defined in the same way as I∂ΩN as in (2.53).
– λh associated with the variational formulation (2.67), which one can
simply take in the space of the test functions
λh ∈ {wh ∈ Wh | wh |∂ΩD = 0}. (2.70)
With the objective function Gh of (2.65) and the state equations (2.66)
and (2.67), the Lagrangian for the first option is then
L(ah , kh , fh , gh , ue,h; uh ; λD,h , λh ) = (2.71)
IΩ (Zh − ∇uh ) + IΩ (|zh − uh | ) + I∂ΩN (|zN,h − uh | )
2 2 2
+IΩ (ah ∇uh · ∇λh ) + IΩ (kh (uh )λh ) − IΩ (fh λh ) − I∂ΩN (gh λh ).
The introduction of the boundary condition in the state-space leads
hence to a Lagrangian function with fewer arguments and fewer terms,
so that the adjoint state determination will be slightly simpler, but the
price to pay is that this approach will not give the gradient with respect
to the boundary condition ue,h.
56 CHAPTER 2. COMPUTING DERIVATIVES
∂L
(ah , kh , fh , gh , ue,h ; uh ; λD,h , λh ) = (2.73)
∂uh
−2IΩ ((Zh −∇uh ) · ∇δuh )
−2IΩ ((zh −uh )δuh)
−2I∂ΩN ((zN,h −uh )δuh )
−I∂ΩD (δuh λD,h )
+IΩ (ah ∇δuh · ∇λh ) + IΩ (kh (uh )δuh λh )
= 0 ∀δuh ∈ δY,
where
δY = Wh .
Equations (2.70) and (2.73) define uniquely the adjoint states λD,h and
λh . Choosing successively δuh ∈ {wh ∈ Wh | wh |∂ΩD = 0} and δuh =
wM ∀M ∈ ∂Ωh,D , where wM is the basis function of Wh associated
with node M, we obtain the following decoupled adjoint equations for
λh and λD,h :
⎧
⎪
⎪ find λh ∈ Wh with λh |∂ΩD = 0 such that
⎪
⎪ IΩ (ah ∇λh · ∇wh )+IΩ (kh (uh )λh wh ) =
⎪
⎪
⎨ 2IΩ ((Zh −∇uh ) · ∇wh )
(2.74)
⎪
⎪ +2IΩ ((zh −uh )wh)
⎪
⎪ +2I∂ΩN ((zN,h −uh )wh )
⎪
⎪
⎩
for all wh ∈ Wh with wh|∂ΩD = 0,
⎧
⎪
⎪ find λD,h ∈ {λD,M ∈ IR, M ∈ ∂ΩD,h } such that
⎪
⎪
⎪
⎪ I∂ΩD (λD,h wM ) =
⎪
⎪
⎨ +IΩ (ah ∇λh · ∇wM ) + IΩ (kh (uh )λh wM )
−2IΩ ((Zh −∇uh ) · ∇wM ) (2.75)
⎪
⎪
⎪
⎪ −2IΩ ((zh −uh )wM )
⎪
⎪
⎪
⎪ −2I ∂ΩN ((zN,h −uh )wM )
⎩
for all basis functions wM ∈ Wh , M ∈ ∂ΩD,h .
Equation (2.74) is very similar to the variational formulation (2.58) for
uh , but with different right-hand sides: in the case where one does not
2.6. EXAMPLES 3 AND 4: DISCRETE ADJOINT EQUATIONS 57
Hence the two options for the boundary condition define the same adjoint
state λh , but λD,h is defined only in option 1.
If we expand the term I∂ΩD (δue,h λD,h ) using the definition of I∂ΩD , and
pick the coefficient of δue,M in the resulting formula, we see that
∂Jh
= ∂αM λD,M ∀M ∈ ∂ΩD,h . (2.78)
∂ue,M
∂Jh
= |K|(∇uh · ∇λh )|K . (2.79)
∂aK
In order to determine the gradient with respect to the nonlinearity kh ,
let us denote by κ1 , . . . , κnk the coefficients that define the u kh (u)
function. The differential of kh is then
∂kh
δkh (uh ) = (uh )δκj . (2.80)
j=1,...,nk
∂κj
We can now substitute into (2.77) the value of δkh given by (2.80) and
pick the coefficients of δκj , which gives the following expressions for
the partial derivatives of Jh with respect to κ1 , . . . , κnk :
∂Jh ∂k
h
= IΩ (uh ) λh . (2.81)
∂κh ∂κj
2.7. EXAMPLES 3 AND 4: CONTINUOUS ADJOINT EQUATIONS 59
∂Jh
= IΩ (ujh λh ), j = 0, . . . , nk − 1.
∂κj
∂Jh
= IΩ (wj (uh )λh ),
∂κj
This completes the step 0 of derivation. One can then choose the state-space
decomposition corresponding to option 1 above with the vector state-space:
Y = δY = H 1 (Ω),
M : u (∇u, u, u|∂ΩN ),
defines the forward map ϕ chosen in (2.84) and completes the step 1 of
derivation. In step 2, one introduces first the two Lagrange multipliers:
– λD associated with the Dirichlet boundary condition (2.85). The choice
of the function space for λD is a little technical: as λD is expected to define a
2.7. EXAMPLES 3 AND 4: CONTINUOUS ADJOINT EQUATIONS 61
linear functional on the dense subspace H 1/2 (∂ΩD ) of L2 (∂ΩD ), where (2.85)
holds, it is natural to require that
λD ∈ H −1/2 (∂ΩD ),
where H −1/2 (∂ΩD ) ⊃ L2 (∂ΩD ) is the dual space of H 1/2 (∂ΩD ) ⊂ L2 (∂ΩD ).
For any λD ∈ H −1/2 (∂ΩD ) and μ ∈ H 1/2 (∂ΩD ), we denote by λD , μH −1/2 ,H 1/2
the value of the linear functional λD on the function μ. In the case where λD
happens to be in the dense subset L2 (Ω) of H −1/2 (∂ΩD ), one has simply
λD , μH −1/2 ,H 1/2 = λD μ (2.87)
∂ΩD
we obtain that
⎧
⎪
⎪
⎪ Ω
a∇λ·∇w + Ω
k (u) λ w = 2 Ω (Z −∇u)·∇w
⎪
⎨ +2 Ω (z − u)w
+ ∂ΩD (a∇λ − 2(Z −∇u))·ν w (2.94)
⎪
⎪
⎪ + ∂ΩN (a∇λ − 2(Z −∇u))·ν w
⎪
⎩
for all w ∈ H 1 (Ω),
This equation is to be compared with the weak formulation (2.89) and (2.90):
– It reduces to (2.89) when the test function w is chosen such that w = 0
on ∂ΩD . Hence, the classical solution λ satisfies (2.89)
– Because λD is by hypothesis a function, formula (2.87) holds, so that
(2.95) coincides with (2.90). This shows that the classical solution λD satisfies
(2.90)
We conclude the proof by checking that any regular weak solution is a
classical solution. So let λ, λD be a regular solution of (2.89) and (2.90). Let
us choose w in (2.89) in the space D(Ω) of test functions of distributions.
This space is made of infinitely differentiable functions, which vanish over
some neighborhood of ∂Ω. Hence the integral over ∂ΩN disappears in (2.89),
which now reads, in the sense of distributions,
the Green formula (2.93), which gives (2.94) as above. Subtracting (2.94)
from (2.89) for a w which vanishes over ∂ΩD gives
0 = 2 ∂ΩN (zN − u)w − ∂ΩN (a∇λ − 2(Z −∇u))·ν w
(2.97)
for all w ∈ H 1 (Ω) with w|∂ΩD = 0.
Combining (2.98) with formula (2.87), which holds because of the smoothness
hypothesis made on λD , we obtain
λ w = ∂ΩD (a∇λ − 2(Z −∇u))·ν w
∂ΩD D
for all w ∈ H 1 (Ω).
Once again, because of hypothesis (1.57), when w spans H 1 (Ω), its trace on
∂ΩD spans the dense subspace H 1/2 (∂ΩD ) of L2 (∂ΩD ), which shows that λD
satisfies (2.92). This ends the proof of Proposition 2.7.1.
Now that the direct state u and the adjoint state λD , λ are known,
we simply have to differentiate the Lagrangian (2.88) with respect to x =
(a, k, f, g, ue) for fixed u, λD , λ to obtain the differential of the least squares
objective function J defined in (2.46)
Neumann condition g. When the adjoint state λ is regular enough so that its
level sets
Cv = {x ∈ IR2 | u(x) = v}, umin ≤ v ≤ umax
are regular curves, the differential with respect to k can be written as
umax
δJ = δk(v) λ,
umin Cv
Remark 2.7.2 This example gives a first illustration of the difference be-
tween discrete and discretized gradient approaches mentioned in the intro-
duction of the chapter: they both require to choose a discretization of the
direct state equation (2.44) and the objective function (2.46), for example,
the ones described in Sects. 2.6.1 and 2.6.2. But once this is done, the dis-
crete adjoint equations (2.74) and (2.75) and the discrete derivative formulas
(2.77) follow unambiguously, whereas determination of the discretized adjoint
equations and gradient formula would require further to discretize the adjoint
equations (2.91) and (2.92) and the derivative formulas (2.99). There are
usually many different ways to do this, so that there is no guarantee that the
discretized adjoint equations will coincide with the discrete adjoint equations
(2.74) and (2.75), which are the only one that lead to the exact gradient of
Jh . For the problem under consideration, it is reasonable to think that a sea-
soned numerical analyst would choose the discrete adjoint equation (2.74) as
an approximation to (2.91), but it is most unlikely that he would have cho-
sen the intricate equation (2.75) as an approximation to the simple equation
(2.92), in which case the discretized gradient will be only an approximation
to the exact gradient of the discrete objective function.
where t u(t) ∈ IRm is the (infinite dimensional) state variable, and where
the parameter vector a ∈ IRn and the initial data u0 ∈ IRm are to be
estimated.
Consider also that a measure z ∈ IRm of the state u(T ) at a given time
T is available for this (observation operator: u u(T )). The least squares
objective function is then
1
J(a, u0 ) = z − u(T )2IRm . (2.101)
2
It is now an exercise to determine the continuous adjoint state and deriva-
tive formulas as we did for the elliptic problem in Sect. 2.7 (the Green formula
is replaced by integration by part). To compute the gradient with respect to
the initial condition u0 , one decides first not to include the initial condition
in an affine state-space, but rather to consider it as a state equation (this cor-
responds exactly to the Option 1 choice for the state space Y in Sect. 2.6).
The starting point is hence the following Lagrangian:
⎧
⎨ L(a, u0, u, λ, λ0) = 2 z − u(T )IRm
1 2
T
+ 0 (f (u(t), a) − du )·λ (2.102)
⎩ dt
+ (u0 − u(0)) · λ0 ,
hk+1/2 = tk+1 − tk .
Then we can decide, for example, to replace (2.100) by the discrete state
equation
uk+1 − uk
= f (uk+θ , a) , k = 1 . . . K , u0 = u0 (2.107)
hk+1/2
for some θ ∈ [0, 1], where we have used the convenient notation
uh = (uk , k = 0 . . . K) ∈ IR(K+1)m
the solution of (2.107), which is supposed to exist. The vector uh is the state
and IR(K+1)m the state-space of the system.
The observation operator associated with final time data is M : uh uK ,
and we can decide, for example, to approximate the objective function J by
1
Jh (a, u0 ) = z − uK 2IRm
2
At this point, we have the choice between
– Either take advantage of the fact that we have already computed the
continuous adjoint, and go for the discretized adjoint approach
– Or use the continuous adjoint only as a guideline for the choice of the
scalar products, and go for the discrete adjoint approach
We investigate now these two possibilities.
68 CHAPTER 2. COMPUTING DERIVATIVES
λh = (λk , k = K . . . 0) ∈ IR(K+1)m
It is then also natural to replace (2.104) for the multiplier λ0 associated with
the initial condition by
λ0,h = λ0 . (2.109)
– One has to discretize the formulas (2.105) and (2.106) for the gradients
of J. Replacing the integral by the trapezoidal rule, we obtain
K
T
T
∂f ∂f
(∇h )a Jh = hk−1/2 (uk−1, a) λk−1 + (uk , a) λk (2.110)
1
∂a ∂a
(∇h )u0 Jh = λ0 = λ0 . (2.111)
which is of the desired form. Similarly, we see that δu0 is missing in the C
term, but present in E, so we define λ−θ ∈ IRm by
λ−θ = λ0,h , (2.115)
and rewrite the C + E terms as
K−1
C +E =− δuk+1 · λk+1−θ − δu0 · λ−θ ,
k=0
K
∂f k−1+θ
+θ hk−1/2 (u , a)δuk · λk−θ .
k=0
∂u
k k
If we define h and θ for k = 0 . . . K by
hk = tk+1−θ − tk−θ = (1 − θ)hk+1/2 + θhk−1/2
h θ = θhk−1/2 ,
k k
hk (1 − θk ) = (1 − θ)hk+1/2 .
we see that
tk = (1 − θk )tk−θ + θk tk+1−θ ,
and we can rewrite B as follows:
K T
∂f k+θ
B= h (1 − θ )
k k
(u , a) λk+1−θ · δuk
k=0
∂u
K T
∂f k−1+θ
+ k k
h θ (u , a) λk−θ · δuk ,
k=0
∂u
which is of the desired form.
The final discrete adjoint equations for the determination of λh =
(λ . . . λK−θ ) defined in (2.112), λK+1−θ and λ−θ defined in (2.114) and
1−θ
but fixed time steps determined by the time stepping procedure for the cur-
rent parameter value. Whenever the objective function has to be evaluated
for a different value of the parameters, for example, during the line search,
the time steps are determined again by the time stepping procedure.
u0 = u0 , (2.121)
where a ∈ IRn is a vector of parameters, and u0 ∈ IRm is the initial value. For
sake of simplicity, we shall suppose here that u0 is known, and that a is the
parameter vector to be estimated, so that the parameter space is E = IRn .
But there is no difficulty in handling the case where u0 is unknown (see
Option 1 in Sect. 2.6).
When (2.120) is nonlinear with respect to uk , it has to be solved only ap-
proximately on the computer using an iterative scheme (a Newton algorithm,
for example). Such algorithms are governed by tests, and hence are not dif-
ferentiable. So it is practically impossible to include them in the definition
of the forward map, and, for the sake of discrete gradient computations, one
usually considers that (2.120) is the discrete equation, and that it is solved
exactly by the computer. This acceptable if the equation is solved precisely
enough.
As for the observation, let us consider, for example, the case where, at
each “time” index k, a measurement z k of
v k = M k (uk ) (2.122)
M k : IRm IRmk .
74 CHAPTER 2. COMPUTING DERIVATIVES
Adjoint Approach
We follow the step-by-step approach of Sect. 2.4:
Step 0: The forward map ϕ is already defined in (2.123), and, as we want to
compute the gradient of J defined in (2.124), the objective function G(a, v) is
K
1
G(a, v) = z k − v k 2IRmk (independant of a). (2.127)
2 k=1
where ⎧
⎨ a ∈ IRn
u = (u0 , u1 . . . uK ) ∈Y
⎩ 1/2
λ = (λ . . . λ K−1/2
) ∈ IRp .
Step 3: Differentiation of the Lagrangian with respect to the state u gives
the variational form of the adjoint equation:
∂L
δu = A + B + C = 0 ∀δu = (0, δu1 . . . δuK ) ∈ δY,
∂u
76 CHAPTER 2. COMPUTING DERIVATIVES
where
K
A= (M k (uk ) − z k ) · (M k ) (uk )δuk ,
k=1
K
∂E k−1/2
B= (a, uk , uk−1) δuk · λk−1/2 ,
k=1
∂uk
K
∂E k−1/2
C= (a, uk , uk−1 ) δuk−1 · λk−1/2 .
∂uk−1
k=1
Equating to zero successively the coefficients of δu1 . . . δuK gives the compu-
tational form of the adjoint equations (all partial derivatives of E k−1/2 are
evaluated at a, uk , uk−1, those of E k+1/2 are evaluated at a, uk+1, uk )
k−1/2 T k+1/2 T
∂E k−1/2 ∂E
k
λ + k
λk+1/2 (2.130)
∂u ∂u
T
+ (M k ) (uk ) (M k (uk ) − z k ) = 0 for k = K . . . 1,
which can be solved backwards starting from the final condition
λK+1/2 = 0. (2.131)
Step 4: We differentiate now the Lagrangian (2.129) with respect to the
parameter vector a (partial derivatives of E k−1/2 are evaluated at a, uk , uk−1):
K
∂L ∂E k−1/2
δJ = δa = δa · λk−1/2 ,
∂a k=1
∂a
and pick the coefficient of δaj , which gives the gradient equations
K
∂J ∂E k−1/2 k−1/2
= ·λ j = 1 . . . N.
∂aj k=1
∂aj
2.9. EXAMPLE 6: DISCRETE MARCHING PROBLEMS 77
The adjoint approach is the method of choice for the sensitivity analysis
study, which is to be performed before the inversion itself to find out the num-
ber of parameters that can be retrieved from the data (Sect. 3.2 of Chap. 3):
it allows to compute the Jacobian D with respect to the (usually large)
number of simulation parameters, before any choice is made concerning the
parameterization and the optimization parameters. This is possible because
in the adjoint approach the Jacobian is computed row by row, with a cost
proportional to the number of observations, but independent of the number
of parameter (of course the function G(a, v) to be used there is no more
(2.127), but rather (2.12)).
We conclude this example with some implementation remarks.
78 CHAPTER 2. COMPUTING DERIVATIVES
First, given a forward modeling code that solves the marching problem
(2.120) and (2.121), the matrix ∂E k−1/2 /∂uk is necessarily formed somewhere
in the code to perform the corresponding Newton iterations. But this ma-
trix is also the matrix of the linearized equation (2.125) in the sensitivity
approach. Hence the additional work to implement the sensitivity approach
consists in forming the matrices ∂E k−1/2 /∂uk−1 and ∂E k−1/2 /∂a, and in solv-
ing the same system as in Newton’s iterations, but with n different right-hand
sides. As the matrix of the system is already formed, the computational effort
for the resolution of each linearized equation (2.125) is much less than the
one required for one simulation. This approach has been used to implement
sensitivity equations in a complex nonlinear reservoir simulation code [6].
Second, comparison of the sensitivity equations (2.125) and the adjoint
equation (2.130) shows that the same matrices ∂E k−1/2 /∂uk , ∂E k−1/2 /∂uk−1 ,
and ∂E k−1/2 /∂a appear at both places. So when given a modeling code with
sensitivity equations capabilities, one can consider developing an adjoint code
by identifying these matrices in the code, and recombining them into the
desired adjoint equation. This approach has been used for the same reservoir
simulation code as above in [79].
Chapter 3
Choosing a Parameterization
should allow to explain the data up to the noise level and should not lead
to overparameterization (the dimension of the optimization vector should
be smaller than the number of retrievable parameters determined above).
We evaluate in this chapter four parameterization against these objectives:
closed form formula (Sect. 3.4), singular vector basis (Sect. 3.5), multiscale
approximation (Sect. 3.6), and adaptive parameterization (Sect. 3.7)
– Finally, we discuss in Sect. 3.8 some implementation issues:
• How to organize the inversion code to allow an easy experimentation with
various choices of optimization parameters (Sect. 3.8.1)
• How to compute the gradient with respect to optimization parameters once
the gradient with respect to numerical parameters is known (Sect. 3.8.2)
• And, in Sect. 3.9, we describe the maximum projected curvature (MPC) de-
scent step, which is specially designed to enhance the performance of descent
algorithms used for the resolution of nonlinear least squares problems.
3.1 Calibration
The unknown parameters correspond often to different physical quantities:
for example, hydraulic conductivities, porosities, acoustic impedances, etc.
Similarly, the available observations can involve temperatures, pressures, con-
centrations, etc.
Before trying to recover (part of) the parameters from the data, it is
necessary to eliminate the poor conditioning that can be caused by the dif-
ferent orders of magnitude associated with the different physical quantities
in the parameter and the data vectors. So one first has to calibrate the finite
dimensional inverse problem (2.5) by using dimensionless parameters and
data.
1 Xi
xi = √ , i = 1 . . . n,
n Xi,ref
√
where the coefficient n ensures that
n 12
def
xIRn = x2i = mean value of X/Xref .
i=0
With the above calibrations, the Euclidean scalar product in IRn corresponds,
2
up to the multiplicative constant 1/(|Ω|Xref ), to the (possibly approximate)
2
scalar product in L (Ω) for the physical parameters
⎧
n
1 ⎨ Ω Xh (ξ)Yh(ξ) dξ,
def
x, yIRn = xi yi = 2 ⎩
or (3.7)
|Ω|X ref
i=1 IΩ (Xh Yh ),
1 Zj
zj = √ , j = 1 . . . q.
q ΔZj
This amounts to use ΔZj as unit to measure the discrepancy between the
output of the model and the corresponding data x (as in (1.8), for example).
So if δZ ∈ IRq is a vector of data perturbations such that |δZj | ≤ ΔZj
for j = 1 . . . q, the vector δz of corresponding calibrated data perturbation
satisfies
√ def
|δzj | ≤ 1/ q for j = 1 . . . q and hence δz ≤ Δz = 1,
3.1.3 Conclusion
For the rest of the chapter, the parameter x and data z that appear in the
NLS inverse problem (2.5) will be the calibrated parameter and data defined
earlier, with the parameter space E and data space F equipped with the
Euclidean norms
n
1/2 q
1/2
xE = x = x2i , vF = v = vj2 ,
i=1 j=1
Remark 3.1.2 The calibration could have been taken into account as well
by introducing weighted scalar products and norms on the parameter space
IRn and the data space IRq . But this approach is error prone from a practical
point of view, as the transpose of a matrix is no more simply obtained by
exchanging rows and columns! For example, if ·, ·Λ is the scalar product
on IRq associated with a symmetric positive definite matrix Λ, the transposed
M TΛ , for this scalar product, of a q×q matrix M is Λ−1 M T Λ, where M T is the
usual transposed. This can be easily overseen in the numerical calculations.
This approach, however, cannot be avoided in the case where a full (i.e.,
not diagonal) covariance matrix of the data is available – but all the material
below can be easily adapted.
on the data, and on the singular values decomposition of the Jacobian ϕ (x).
There is a large amount of literature on this subject, which usually takes into
account the statistical properties of the errors on the data, see for instance
[74]. We give below an elementary approach based on uncertainty analysis,
but which is sufficient in the large number of cases where the statistical
information on the data is lacking or scarce.
So one chooses a nominal parameter value xnom ∈ C, and replaces the
forward map ϕ by its linearization ϕlin
nom :
∀δx ∈ IRn , ϕlin
nom (δx) = ϕ(xnom ) + ϕ (xnom ) δx.
μ1 ≥ μ2 ≥ · · · ≥ μr > 0 (3.10)
μr+1 = μr+2 = · · · = μn = 0.
86 CHAPTER 3. CHOOSING A PARAMETERIZATION
where δxi and δzi are the coefficients of δx and δz on the singular bases:
δx = ni=1 δxi ei ,
q
δz = j=1 δzj j .
μi |δxi | ≤ Δz, i = 1 . . . r,
Remark 3.2.1 It can happen that the computation of the Jacobian is not
feasible when there are large number of parameters and data, so that nr
cannot be determined before the inversion is actually performed. However, the
determination of the gradient ∇J(x) is always feasible by adjoint state. It is
advisable in this case to use an adaptive parameterization (Sect. 3.7 below),
which can use the information given by ∇J(x) to take care automatically of
the tradeoff between data fitting and overparameterization.
• The optimization parameters xopt are the ones that are input to the opti-
mization code, and hence which are actually estimated by minimization
of the objective function. The vector xopt is used to parameterize the
simulation parameter xsim (c.f. (1.16) of Sect. 1.3.4):
μi
μ1
Δz
znom
1 nr = 4 r=8 n i
Remark 3.3.1 Even in the case where all parameters in xsim are retrievable,
the choice xopt = xsim does not necessarily produce the best conditioning of the
optimization problem, and one can be led to choose a parameter xopt distinct
from xsim , see adapted multiscale basis in Sect. 3.6.
the nr coefficients of xsim on the singular basis vectors associated with singular
values that are above the noise level? With the notations of Sect. 3.2, this
amounts to choose for optimization parameters and parameterization map:
⎧
⎨ nopt = nr ≤ nsim ,
def
x = xSVD = x sim , ei IRnsim , i = 1 . . . nr , (3.17)
⎩ opt,i i
xsim = ψ(xopt ) = i=1 xopt,i ei + ni=n
nr sim
r +1
(xSVD
i )nom ei .
There is no need, with this choice, to redo the analysis of Sect. 3.2 with the
parameters xopt , as by construction they are all above the noise level!
This parameterization is interesting for the analysis of the problem: the
directions of the singular vectors 1 . . . nr indicate which part of the data
space actually carries useful information on the parameter x. Similarly the
singular vectors e1 . . . nr indicate which combination of the simulation pa-
rameters can be best estimated.
These nice properties are counterbalanced by the fact that the correspond-
ing physical interpretation is not always easy (as, e.g., in [36] for a large size
geophysical inverse problem). A nice exception is when the singular vectors
associated with the largest singular values happen to point in the direction of
axes of the simulation parameter space IRnsim . It is then possible to order the
simulation parameters according to decreasing singular values, so that the
nr first simulation parameters are retrievable, each with its own uncertainty
level, with the remaining ones having to be fixed (at their nominal values, for
example). This is the case for the inversion of the Knott–Zoeppriz equations
of Sect. 1.1, see [50].
Another difficulty comes from the fact that the Jacobian – and hence
its singular vectors – depends on the choice of the nominal parameter xnom sim
where it is evaluated. So the nice properties of the singular vector can get
lost when the current parameter changes during the course of the optimiza-
tion, which makes their use purposeless for uncertainty analysis.
For all these reasons, the choice (3.17) is seldom used to define the op-
timization parameters – but performing a SVD at a nominal value is never-
theless useful to estimate nr and gain insight into the problem.
3.6. MULTISCALE PARAMETERIZATION 93
where χKi denotes the characteristic function of the ith cell Ki of the
simulation mesh Th .
2. For a continuous piecewise linear approximation, xi is given by (3.5)
and ei by ⎧
⎪ 12
⎪
⎨ ei = Xref α|Ω| ωMi , i = 1 · · · nsim ,
Mi
(3.20)
⎪
⎪
⎩ |e | 2 = X |Ω| = 2|e | 2 ,
2 2 2
i h,L (Ω) ref i L (Ω)
94 CHAPTER 3. CHOOSING A PARAMETERIZATION
the optimization at scale k, until the data are explained up to the uncertainty
level
⎧ −1
⎨ set: k = 0 and x̂opt = xinit ,
opt ∈ C , find x̂opt = argminxopt ∈C k J(xopt ),
starting from x̂k−1 k k
(3.22)
⎩
increment k until the fit is down to uncertainty level.
In short, “nicely nonlinear” problems are more sensitive but less nonlin-
ear at coarse scales, and less sensitive but more nonlinear at fine scales.
Hence the scale-by-scale resolution (3.22), which solves the coarse scale
problem first, can be expected to perform well.
An example of “nicely nonlinear” problem (see [57, 58]) is the estima-
tion of a one-dimensional diffusion parameter, defined in Sect. 1.4 and
analyzed for wellposedness in Sect. 4.8. One can also conjecture that
the estimation of the two-dimensional diffusion parameter (defined in
Sect. 1.6 and analyzed in Sects. 4.9 and 5.4) is “nicely nonlinear,” as
suggested in [30] and in Remark 4.9.5.
At the opposite end of the spectrum, one finds the problem of estimat-
ing the sound velocity from wavefield measurements: large scale per-
turbations of the sound velocity produce phase-shifts ( translation in
time) in the observed wavefield, which correspond to a high curvature
because of the high frequency content of the geophysical signals – a very
difficult problem, which cannot benefit from the multiscale approach
described here [24].
z3
ϕ(C 0)
0
z2
z1
z3
ϕ(C 1)
0
z2
z1
ϕ(xinit)
z3
ϕ(x̂0opt)
ϕ(C 0)
ϕ(x̂sim) ϕ(C 1 = C)
ϕ(x̂1opt)
0
z2
z1 z
x2
+1
−1 xinit x̂0opt +1
x1
−1
x̂sim x̂1opt
Figure 3.4: Optimization paths in C for the local (dashed line) and
the multiscale (full line) resolution. The thin dashed lines corre-
spond to one level line of the objective function
100 CHAPTER 3. CHOOSING A PARAMETERIZATION
⎪
⎪ e k
= X ref ωMk
i = 1 · · · nk ,
⎪
⎪
i
α i
⎩ k2 M i
|ei |k,L2(Ω) = Xref 2
|Ω| = 2|eki |2L2 (Ω) i = 1 · · · nk ,
(3.24)
where now ωM is the function of E with value 1 at the ith
k k
kj = kj /skr , k = 1 . . . K, (3.28)
where the coefficient 1/skr tries to compensates for the loss of sensi-
tivity of the forward model in the directions of finer details. Using
this adapted multiscale basis tends to spherize the level lines of the
objective function at scale k, and hence speeds up the resolution.
If one does not want to estimate numerically skr by (3.27), one can
experiment with skr = (hk /h0 )α for various α > 0, where hk is the
size of the cells of T k .
out its way in practice in the much smaller space W k , as the coefficients on
E k−1 are already almost at their optimal value.
Oppositely, when the multiscale basis is associated with an oblique decom-
position, allowing the parameter to have a nonzero component on a detail
direction kj of W k does change its mean values at scale k − 1. Hence it
will be necessary, during the course of the optimization at scale k, to change
coordinates both in W k and in E k−1 to maintain the correct mean values
determined at scale k − 1.
Multiscale bases associated to orthogonal decompositions should hence be
preferred, as they tend to ensure a faster resolution of the optimization prob-
lem at each scale k, by concentrating on the smaller problem of determining
the details in W k . But – with the exception of the Haar basis, which is limited
to piecewise constant functions on rectangular meshes – they are difficult to
construct, as the basis vectors kj of W k cannot in general be determined ex-
plicitly over the finite domain Ω. There exists a large amount of literature on
wavelets [48], which are finite support orthogonal multiscale bases – but they
are defined over an infinite domain, and cannot be used directly on bounded
domains. Another approach is to use a numerical orthogonalization proce-
dure as the Gram–Schmitt procedure, but this is computationally expensive,
and the finite support property of the basis functions is lost.
This is the case, for example, of the Haar basis (Remark 3.6.4 below), or of
the adapted basis:
kj = skr kj , k = 1 . . . K.
When such a basis is used, the optimization algorithm works first in the
subspace of the most sensitive coordinates, that is, those corresponding to
coarse scales, thus allowing to overcome the stationary points problems. It is
only when these coordinates have been adjusted that the algorithm feels the
need to adjust the coordinates for the finer scales, which have been made much
less sensitive. The graph of the objective function as a function of the iteration
number has then a stair shape, where going down one stair corresponds to
grasping to the adjustment of the next finer scale.
A B
L
C D
+a −b +a +b
+c −d −c −d
klr ktb
−a +b
+c −d
ksp
and
|L|
|kL,lr |L2 (L) = |kL,tb |L2 (L) = |kL,sp |L2 (L) = m 1 ,
¯2
where ¯ is equal to four times the harmonic mean of the areas of rectangles
A, B, C, and D (¯ = |L| if the kth refinement is regular!):
4 1 1 1 1 1
= + + + .
¯ 4 |A| |B| |C| |D|
The functions kL,lr , kL,tb , kL,sp are linearly independent and orthogonal to
E k−1 . So the orthogonal supplementary space W k to E k−1 in E k is
W k = span{kL,lr , kL,tb , kL,sp , L ∈ T k−1 }.
3.6. MULTISCALE PARAMETERIZATION 107
The basis functions kL,lr , kL,tb , kL,sp of W k associated to the same cell L are
not orthogonal in general (except for a regular refinement, see Remark 3.6.4).
But basis functions associated to different cells (at same or different scales)
are orthogonal. This basis of W k can be normalized independently of the
scale by a proper choice of m over each cell L:
⎧ 1 1
⎨ ¯2 |Ω| 2
m = Xref implies (compare with (3.23)):
|L|
⎩ k 1
|L,lr |L2 (L) = |kL,tb |L2 (L) = |kL,sp |L2 (L) = Xref |Ω| 2 .
We consider now the case where the parameter space E k at scale k is made
of continuous piecewise linear (on triangles) or bilinear (on rectangles) func-
tions, over a mesh T k obtained by k refinements of a background mesh T 0
made of triangles and rectangles. It is supposed that the initial mesh and its
subsequent refinements are done in such a way that all meshes T k are regular
meshes in the sense of finite elements (the intersection of two cells of such a
mesh is either void, or an edge, or a vertex). Let
& '
V k = M such that M is a node of T k but not of T k−1 .
The space E and its subspace E are equipped with the Euclidean scalar
product in IRnsim , which approximates, up to a multiplicative constant, the
L2 -scalar product when x is a function (see Sect. 3.1.1). The current solution
and value of the problem are (we ignore the constraints...)
When the minimum value J( is not satisfying (e.g., if it is above the noise level
in the case of least squares), the question arises of which degree of freedom to
110 CHAPTER 3. CHOOSING A PARAMETERIZATION
λNL = ΔJ = J( − J,
def
(3.33)
where the J is the new value of the problem, computed with the full
nonlinear model:
J = J(x̃ + ỹ), (x̃, ỹ) = arg min J(x + y). (3.34)
x ∈ E , y ∈ IR
This is the most precise indicator, as it ensures by definition that the
associated to the largest λNL produces the strongest decrease of the
objective function!
But it is also the most computationally intensive one, as its computa-
tion requires the solution of the full nonlinear optimization problem.
Hence it is impossible to use λNL = ΔJ as an indicator to choose
among a large number of degrees of freedom.
2. Gauss–Newton indicators: In the case of nonlinear least squares prob-
lems, where J(x) = 12 ϕ(x) − z2 , one can define
λGN = ΔJ GN = J( − JGN ,
def
(3.35)
3.7. ADAPTIVE PARAMETERIZATION 111
where the JGN is the new value of the problem, computed with the
Gauss–Newton approximation to the forward map ϕ:
dJτ∗
λNL = ΔJ = Jỹ∗ − J0∗ = |τ =0 ỹ + . . . (3.39)
dτ
In absence of information on ỹ, one can take a chance and chose
according to the modulus of dJτ∗ /dτ in (3.39). This choice is comforted
by the remark that, because of (3.31), perturbing x̂ of a given amount
y in the direction will produce a large decrease, at first order, of
the optimal objective function for those ’s that exhibit dJτ∗ /dτ ’s with
large modulus. Hence one is led to the . . .
112 CHAPTER 3. CHOOSING A PARAMETERIZATION
But dx∗τ /dτ ∈ E and ∇J(x̂), δxE = 0 ∀δx ∈ E (c.f. the definition
(3.30) of x̂), so that the first term vanishes in the right-hand side of
(3.42) and (3.41) is proved.
Hence the evaluation of λ for a tentative degree of freedom ∈ T
requires only the scalar product of , in the simulation parameter space
IRnsim , with the known vector ∇J(x̂)! This makes it easy to test a very
large number of tentative degrees of freedom before making up one’s
mind for the choice of a new one.
Remark 3.7.2 One could argue that, because of the potentially large
dimension nsim of x, the gradient ∇J(x̂) can be computationally unaf-
fordable. However, when the implementation of the inversion is based,
according to the recommendations of Sect. 3.8.1 below, on the adjoint
state approach of Sect. 2.3, the gradient ∇J(x̂) with respect to simu-
lation parameters is available as a byproduct of the computation of x̂
(search for ∇xsim J in the lower right corner of Fig. 3.8).
whose Lagrangian is
when E = IRnsim and F = IRq are equipped with the Euclidean scalar prod-
ucts. Because the principle of adaptive parameterization is to stop adding
114 CHAPTER 3. CHOOSING A PARAMETERIZATION
Proposition 3.7.4 Let notation (3.43) and hypothesis (3.44) hold. Then the
Gauss–Newton and first order indicators are related by
λ2
λGN = Φ η()2F , (3.50)
2
where η() is defined by (3.48) and (3.49).
This confirms that the first order indicator carries one part of the information
on the variation of λGN with the tentative degree of freedom .
It is possible to explicit the calculations needed for the evaluation of the
coefficient Φ η()2F . We return for this to the optimization variables. The
dimension of the current optimization space E is nopt , and let
e1 . . . enopt ∈ E ⊂ E = IRnsim
be the current optimization basis, and Ψ() be the nsim × (nopt + 1) parame-
terization matrix (3.16) for the tentative optimization space associated to
This rewrites
that is,
Ψ()T ΦT Φ Ψ() ηopt = ( !0 ."#
. . 0$ 1).
nopt times
This formula is derived in [11] starting from the Lagrange multipliers defi-
nition of first order indicators. It will be useful in Sect. 3.7.3 when applying
refinement indicators to segmentation of black and white images.
• And results for the case of vector valued parameters can be found in
[44, 11]
Let Th be the simulation mesh covering Ω, made of nsim cells K. We
approximate the function a : Ω IR by a function ah which takes a constant
value aK over each cell K of Th , and choose as simulation space E = IRnsim ,
and as simulation parameter :
We consider here the case where the additional information that is added to
regularize the problem is that a is constant over a number nopt of subdomains
(much) smaller than the number nsim of elements K (this is “regularization
by parameterization” as described in Sect. 1.3.4, see also [38]). We call such
a partition T of Th into nopt subsets Tj zonation, each one made of cells K of
Th (with the notations of Sect. 3.6.2, Th is a sub-mesh of T ), and we associate
to a function a : Ω IR, which takes a constant value aj on each zone Tj the
vector aopt ∈ IRnopt of its values on each zone calibrated according to (3.1) –
once again we do not adimensionalize for simplicity:
|T | 12
j
aopt = (aopt,j , j = 1 . . . nopt ) ∈ IRnopt with aopt,j = aj . (3.54)
|Ω|
time, by splitting one zone of the current zonation T k into two subzones,
thus producing a new zonation T k+1 . This ensures at each step that the new
degree of freedom is (locally) at a finer scale than the current ones, hence the
name “multiscale” given to these indicators. The zone to be split and the way
it is split are chosen according to refinement indicators. By construction of
the refinement indicators, this ensures that each new optimization problem is
likely to produce the best decrease of the objective function among all tested
degrees of freedom.
The scale-by-scale optimization is stopped, as described in Sect. 3.6.2,
at the first k for which the data are explained up to the noise level. But
the parsimony with which the degrees of freedom are introduced ensures
that the Jacobian of ϕ ◦ Ψ has a full rank at all successive minimizers âkopt ,
and hence overparameterization is avoided.
Let T = T k denote the current zonation, and Ψ and E denote the cor-
responding current parameterization matrix and optimization space. We de-
scribe now the construction of T k+1 . Cutting one domain Tj of T into two
subdomains Tj,+ and Tj,− amounts to add to the basis eT1 . . . eTnopt of E defined
in (3.56) the vector = (K , K ∈ Th ) defined by
⎧
⎨ 0 if K ∈/ Tj ,
= (K , K ∈ Th ) ∈ IR nsim
with K = +1 if K ∈ Tj,+ , (3.57)
⎩
−1 if K ∈ Tj,− .
where ∇xsim J is evaluated at the point ΨâTopt of IRnsim . So we see that, once
∇xsim J has been made available as by-product of the minimization over the
current zonation, the first order indicator associated to any cut of any domain
Tj is obtained by a simple summation of the components of ∇xsim J over the
new subdomains. This makes it possible to test a very large set T of tentative
refinements of the current zonation T = T k . Once the new zonation T k+1
has been chosen, the optimization is performed with respect to the local (at
the zonation level) variables aopt,j , j = 1 . . . nk+1
opt as defined in (3.54).
For example, if the simulation mesh Th is made of rectangles (see Fig. 3.6),
one can choose, for the set T of tentative degrees of freedom, the collection of
the cuts dividing each current zone Zj by a vertical or horizontal line located
on one edge of the mesh, and by a “checkerboard cut” centered at each node
of the zone (see Remark 3.7.9 below). Using these cuts will lead to zones
+ ++ - + + ++
+++- ++ ++
+ ++ - - - - - + + + + + ++ +
+++ - - - - - + + +++ + ++
++ - - - - - - - - - - - -
+- - - - - - - - - - -
- - - - - - - - - -
Tj Tj
++++
++++
++ +++ - - -
++ ++ + - - -
- - - - + ++
- - - +++
- - ++ +
Tj
Figure 3.6: Examples of vertical, horizontal, and checkerboard cuts
for one zone Tj of a mesh Th made of rectangles. The vector takes
value +1 in cells with a plus sign, and −1 in cells with a minus
120 CHAPTER 3. CHOOSING A PARAMETERIZATION
This largest refinement indicator approach has been used in [23] for an oil field
modeling problem, and is applied in Sect. 3.7.3 below to the segmentation of
black and white images.
Notice that the vectors associated to these cuts by (3.57) and (3.60)
satisfy the normalization (3.32) for T .
Remark 3.7.5 The refinement indicators have been defined in the context
of unconstrained optimization. In this case, the optimum âTopt at the current
zonation T satisfies
In practical applications, however, constraints are taken into account for the
determination of âTopt , so that (3.61) does not necessarily hold for all zones
(unless all constraints are taken into account via regularization, see Sect. 3.8.1
below). It is hence recommended to compute the refinement indicators by
formula (3.58) rather than by (3.62).
3.7. ADAPTIVE PARAMETERIZATION 121
Because of the linearity of the forward map, the nonlinear refinement indica-
tor λNL coincides here with the Gauss–Newton indicator λGN . The question
that arises then naturally is: for such a simple inverse problem, does the
largest first-order indicator λ also hint at the largest decrease of the objec-
tive function? To answer this question, we explicit the relation (3.52) between
λNL = λGN and λ in the case of image segmentation. The matrix Φ is here
the nsim × nsim identity matrix, and so we are left to determine Ψ()T Ψ().
The resolution of the minimization problem (2.5) for a given zonation is
straightforward when ϕ = Id , as it amounts to affect to each zone the mean
gray level of the zone in the image! So there is no need here to calibrate
the optimization parameters to maintain the conditioning of the optimiza-
tion problem, and we replace the Definition 3.54 of optimization parameters
simply by
Formula (3.56) for the definition of the basis eTj ∈ IRnsim , j = 1 . . . nopt , of E
becomes now
T 0 if K ∈ / Tj ,
ej,K = (3.65)
1 if K ∈ Tj .
The parameterization matrix Ψ() associated to the tentative splitting of the
j-zone Tj into two subzones Tj,+ and Tj,− is given by (3.51), where eTj and
are given by (3.65) and (3.57). Hence, the (nopt + 1) × (nopt + 1) matrix
122 CHAPTER 3. CHOOSING A PARAMETERIZATION
deciding to add this new degree of freedom, one can ask whether aggregating
one subzone, say Tj,+ , to one adjacent neighboring zone, say T , would not
be already beneficial to the objective function, thus leaving unchanged the
number of degrees of freedom.
Coarsening indicators that evaluate the interest of such an aggregation
have been defined in [10] via Lagrange multipliers. We give here a more direct
definition:
Definition 3.7.6 The coarsening indicator for the aggregation of the sub-
zone Tj,+ with one adjacent zone T is (see Fig. 3.7)
T
Tj,+
Tj,−
It is given by
μ+ = ∇xsim J K (3.68)
K∈Tj,+
According to Definition 3.7.1, the coarsening indicator ΔJj∗+ , gives the first
order decrease of the objective function incurred by changing, on Tj,+ , the
parameter from its current value aopt,j to the value aopt, in the adjacent
zone T .
ΔJj∗+ , can be positive or negative, but if, for some adjacent zone T , one
finds that ΔJj∗+ , /J ∗ is larger than some a-priori given percentage, one can
consider aggregating the zones Tj,+ and T , (rather than splitting the zone Tj
into two). In this case, the new zonation T k+1 will have the same number of
zones than T k , but will nevertheless produce a better fit to the data.
Remark 3.7.7 For the aggregation of Tj,− one has to compute (see (3.68))
μ− = ∇xsim J K
.
K∈Tj,−
λ = μ+ − μ− .
∂J
μ+ + μ− = ∇xsim J K
= (âTopt ) 0,
K∈Tj
∂xopt,j
by computing the actual decrease of the objective function for a set of indi-
cators with large absolute values before taking a decision. The situation is
∗
different for the coarsening indicators ΔJj±, , which give directly an estima-
∗
tion of ΔJ for a given aggregation: the algorithm will decide to aggregate
or not at the sole view of the coarsening indicators.
Of course, many variants of this algorithm are possible: for example, one
could postpone the decision to aggregate after the actual decrease of J ∗ has
been computed for a family of potentially interesting aggregations, which
would require the resolution of a few more minimizations problems at each
step. We describe now the algorithm.
As a preliminary step, one has to choose the family T of tentative refine-
ments of the current zonation T , which will be explored by the algorithm
to define the next zonation (e.g., the set of cuts of Fig. 3.6 for each zone Tj
of T ).
Once this is done, the algorithm can go as follows:
1. Choose an initial zonation T .
2. Do until data are satisfactorily fitted:
3. Estimate a on the current zonation T by minimizing J with respect
to aTopt .
4. Compute the refinement indicators λ for all tentative refinements T :
For every zone Tj of T and for every cut of Tj do
compute the corresponding refinement indicator λ
Enddo
5. Compute |λ|max the largest absolute value of all computed refinement
indicators. Select a subset T 80% of cuts corresponding to refinement
indicators, which are larger than 80% of |λ|max (this percentage can
be adjusted)
6. Minimize J successively over the zonations associated to the cuts of
∗
T 80% , and let (ΔJcuts )max denote the best decrease obtained.
7. Compute the coarsening indicators for all selected cuts of T 80% :
For every cut of T 80% ,
every subzone Tj,± ,
every adjacent zone T do
∗
compute the coarsening indicator ΔJj±,
Enddo
126 CHAPTER 3. CHOOSING A PARAMETERIZATION
∗ ∗
8. If (ΔJj±, )max is larger than 50% of (ΔJcuts )max (this percentage can
be adjusted)
then aggregate Tj,± and T
else refine according to the best cut found at step 6.
Endif
9. Update the current zonation according to the retained cut or aggre-
gation.
Enddo
Remark 3.7.8 The above procedure is by nature interactive: after comple-
tion of step 7, one can ask the algorithm to display the selected cuts and
aggregations, and the user can mark those that seem of particular interest to
him, or even figure out a new refinement pattern that incorporates his or her
insight of the problem. The minimizations of step 8 can then be limited to the
marked items.
Remark 3.7.9 If one wants the parameter to be constant on zones Tj made
of a single connected component (i.e., made of “one piece”), it is necessary
to add a step 5 as
If some cuts of T 80% generate subdomains with more than one con-
nected component then
compute the refinement indicators corresponding to the subcuts
associated to each connected component (this will be the case
each time a checker board cut is selected!).
Update the set T 80% of selected cuts according to the 80% rule.
Endif
xinit
opt OPTIMIZER x̂opt
ψ ψ (xopt)t
P P (xsim)t
J ∇xsim J
where
+ c (x) if c (x) ≥ 0,
c (x) =
0 if c (x) ≤ 0.
Using a small penalization parameter η will produce a small violation of the
constraints, and also a poorly conditioned optimization problem, in practice
experimentation is needed to choose η.
Introduction of additional a-priori information via LMT-regularization is
easily taken into account by addition of a regularizing functional Jreg (see
center part of Fig. 3.8). For example, when the information is that the pa-
rameter is “not too far” from some a-priori value x0 , Jreg has the form (see
Sect. 5.1 in Chap. 5, and (1.25) in Chap. 1)
2
Jreg = x − x0 2E .
2
Remark 3.8.1 It can happen that the regularizing functional depends on the
parameter x not only explicitly, but also through the state vector yx , as, for
example, in the adapted regularization considered in Sect. 5.4.3, or in the
state-space regularization approach of Chap. 5. For a functional Jreg (x, yx )
like this, the gradient has to be computed together with that of J by the ad-
joint state technique: this leads to add ∇y Jreg (x, yx ) in the right-hand side of
the adjoint equation, and a term ∇x Jreg (x, yx ) in the formula that give the
gradient with respect to x.
According to Sect. 3.1.1, the simulation and optimization parameters are de-
fined, for continuous piecewise linear approximations, by
α 12 a
sim,M M
asim,M = , ∀M ∈ ∂Th ,
|Ω| aref
α 12 a
opt,P P
aopt,P = , ∀P ∈ ∂Topt ,
|Ω| aref
where αsim,M and αopt,M are defined, with obvious adaptation, by (2.61),
and aM and aP are the value of the parameter a at nodes M and P . The
simplest parameterization map ψ : aopt asim is obtained by computing aM
by interpolation between the values aP on the mesh Topt :
aM = ζM,P aP , ∀M ∈ ∂Th ,
P ∈∂Topt (M )
where
M= ζM,P , P 1= ζM,P .
P ∈∂Topt (M ) P ∈∂Topt (M )
parameter : (θ1 · · · θn ),
state vector : y = (uk , xkj , k = 1 · · · n − 1, j = 1 · · · k + 1),
state equation : (3.75), (3.76), and(3.77),
observation operator : M : y x = xn−1 ∈ IRn ,
which is of the form (2.26). Following (2.27), the adjoint state Λθ is the
solution obtained by equating to zero the coefficients of δxn−1
j , δxkj , and δuk
in (3.78):
μn−1
j = gx,j ∀j = 1 · · · n,
μkj = u k+1
μk+1
j ) ∀k = n − 2 · · · 1 , ∀j = 1 · · · k + 1,
k
λ k
= xk−1
j μkj ∀k = n − 1 · · · 1.
j=1
It is the way the descent direction yk is chosen, which gives its name
to the optimization algorithm class: Steepest Descent, Quasi-Newton,
Gauss–Newton, Levenberg–Marquardt, etc.
gk (α) = xk + α yk α ≥ 0. (3.83)
The reason behind this choice is mostly its simplicity. But we shall
also consider other search curves, in particular, if one can afford to
compute the Jacobian F (xk ), the more intrinsic geodesic search curves
(Definition 3.9.1 below).
where 0 < ω ≤ 1/2 is a given number (the same through all iterations
of course),
• αk is not too small:
⎧
⎪
⎪ g (αk ), ∇J(gk (αk )) ≥ ω yk , ∇J(xk )
⎪
⎪
⎪
⎪ (Wolfe condition),
⎨
or
⎪
⎪
⎪
⎪
⎪
⎪ J(gk (αk )) ≥ J(xk ) + ω αk yk , ∇J(xk )
⎩
(Goldstein condition),
An example of acceptable descent step is the Curry step ᾱ: one moves
on the search curve g in the parameter space until the first stationary
point of J = 1/2F 2 is encountered:
d
ᾱ = Inf{α ≥ 0 such that F (gk (α))2 = 0}. (3.85)
dα
However, the Curry step cannot be used in practice, as its determina-
tion would require to many function evaluations.
4. Once αk is accepted, the k + 1 iterate is defined by
xk+1 = gk (αk ). (3.86)
which satisfy
V ∈ C 0 ([0, ᾱ], IRq ), A ∈ L∞ (]0, ᾱ[, IRq ).
138 CHAPTER 3. CHOOSING A PARAMETERIZATION
Definition 3.9.1 Let F be derivable with F (x) injective for all x ∈ IRn .
We shall say that g is a geodesic search curve if it satisfies the differential
equation
F (g)T F (g) g + F (g)T Dg2 ,g F (g) = 0 ∀α ≥ 0. (3.92)
The acceleration A is hence orthogonal to the hyperplane {δy ∈ IRq | δy =
F (g)δx} tangent to the attainable set F (IRn ) at P (g), which shows that P
is a geodesic of F (IRn ). The velocity V and acceleration A satisfy moreover
We shall consider in the sequel only the straight search curves (3.83), or the
geodesic search curves of Definition 3.9.1.
The Definition (3.85) of the Curry step ᾱ along P implies that
and Proposition 8.2.1 shows that the first and second derivatives v and a of
P with respect to the arc length ν are given, for almost every α ∈]0, ᾱ[, by
∃ 1/R > 0 : A(α) ≤ 1/R V (α)2 for almost every α ∈]0, ᾱ[, (3.97)
The idea behind the MPC step is then to use the lower bound R on the
radius of curvature to find a calculable “worst case” lower bound ᾱW to the
computationally unaffordable Curry step ᾱ.
We give first an intuitive presentation of the MPC step in the case of a
two-dimensional data space IRq = IR2 . Then P is a plane curve which, as one
can see in Fig. 3.9, starts at F (xk ) (arc length ν = 0) in the direction v = v(0).
After that, the only thing known about P is, according to (3.98), that its
3.9. MAXIMUM PROJECTED CURVATURE (MPC) STEP 139
ν̄L 0
(target)
( x k) 1)
r̄L =F (x k
+
0)
r̄
F
P( )=
ᾱ W
P(
r̄W
ν̄
v
ν̄W
first stationary point
R
curve P : α F ◦ g(α)
An important property that can be seen on the figure is that the residual
rk+1 = F (xk+1 ) at the point obtained by the MPC step is better (smaller)
than the worst case stationary residual r̄W . This will ensure for the MPC
step a useful “guaranteed decrease property.”
We turn now to the rigorous presentation of the MPC step in the general
case of a data space of any dimension. The curve P does not necessarily
remain anymore in the plane defined by P (0) and v(0), and the curvature
that matters for the determination of the worst-case Curry step turns out
be the projected curvature 1/ρproj (α) of P on the plane defined by the two
vectors P (α) and v(α). But this plane is defined only when the two vectors
do not point in the same direction, and so we define a set of exceptional
points:
The measure of Z can be strictly positive, and so one has to give special
attention to its points in the definition of projected curvature.
Definition 3.9.2 The projected radius of curvature along P is defined by
The proof of the proposition is given in Appendix 1. The next theorem proves
the intuitive properties of the MPC step seen on Fig. 3.9:
Theorem 3.9.4 Let xk , y, g, ᾱ be the current estimate, descent direction,
search curve, and Curry step at iteration k of a descent optimization al-
gorithm for the resolution of problem (3.79). Let r(α) denote the residual
along P as defined in (3.88), and suppose that the curve P : α F (g(α))
satisfies (3.87) and (3.97).
Then one can choose as descent step αk the MPC step ᾱW
αk = ᾱW defined by ν(ᾱW ) = ν̄W , (3.106)
which satisfies
0 < ᾱW ≤ ᾱ (3.107)
def def def
rk = r(0) > r̄W ≥ rk+1 = r(ᾱW ) ≥ r̄ = r(ᾱ), (3.108)
where
• ν̄W and r̄W are the worst case stationary arc length and residual:
ν̄L
ν̄W = R tan−1 , (3.109)
R + r̄L
1
r̄W = ((R + r̄L )2 + ν̄L2 ) 2 − R, (3.110)
• ν̄L and r̄L are the linear case stationary arc length and residual:
V
ν̄L = F (xk ), , (3.111)
V
)
r̄L = rk2 − ν̄L2 , (3.112)
Proposition 3.9.5 Let P satisfy the hypothesis (3.87) and (3.97). Then the
MPC step αk :
1. Ensures a guaranteed decrease of the objective function:
1 2 1
J(xk ) − J(xk+1 ) = rk − rk+1
2
≥ ν̄W ν̄L > 0, (3.114)
2 2
This does not imply necessarily that the projected curvatures satisfy the same
inequality, as it depends on the relative position of F (xk ), the acceleration
astraight (0) (orthogonal to Vk ), and the acceleration ageodesic (0) (orthogonal to
the tangent plane). But over a large number of problems and iterations, one
can expect that
Hence using the geodesic search curve amounts giving oneself a better chance
of making a larger descent step!
for some M ≥ 0 and 1/R ≥ 0. Then for any current estimate xk ∈ D and
any descent direction yk , the curve Pk = F ◦ gk , where gk is the associated
straight or geodesic search curve, satisfies
Hence one can apply to the resolution of the nonlinear least squares problem
(3.79) the descent algorithm (3.81), (3.82), and (3.86) with the MPC step
144 CHAPTER 3. CHOOSING A PARAMETERIZATION
(3.106) and (3.109) along the straight search curve (3.83) or the geodesic
search curve (3.92). The iterates xk and descent directions yk satisfy then
where θk ∈ [0, π/2] is the angle between −∇J(xk ) and the descent direction
yk at iteration k, and the algorithm converges for the following choices of the
descent directions:
• Mk yk + Fk Fk = 0, (Quasi-Newton)
provided the matrices Mk are uniformly bounded and coercive,
• Determination of gk
We shall use for gk a second degree polynomial curve in the parameter space:
α2
gk (α) = xk + α yk + g , (3.123)
2 k
where gk is to be chosen.
1. Case of a straight search curve. This choice is the only one for Steepest
Descent or Quasi Newton algorithms, as the modeling routine returns
only ∇Jk , but not Fk , so that no information is available on the tangent
plane to the attainable set at F (xk ). One can of course use for gk the
linear function (3.83), which corresponds to gk = 0. But one can also
take advantage of the Proposition 3.9.5, which shows that the MPC
step satisfies the Armijo condition with ω = 1/2 when the arc length ν
146 CHAPTER 3. CHOOSING A PARAMETERIZATION
The computational cost is negligible, as Dy2k ,yk F (xk ) has in any case
to be computed for the determination of the MPC step αk , and this
choice increases the chances that the first guess of αk to be determined
below in (3.129) satisfies the Armijo condition.
2. Case of a geodesic search curve. This choice is possible only for the
Gauss–Newton or the Levenberg–Marquardt algorithms, and in general
for all algorithms where the Jacobian F (xk ) is available. The numerical
resolution of the differential equation (3.92) is feasible, but computa-
tionally intensive. So we shall use for gk the second degree polynomial
(3.123), where gk is chosen such that the associated curve Pk has a
second order contact with the geodesic at F (xk ) = Pk (0):
The additional cost here is the resolution of the linear system (3.124),
which turns out to be the same linear system as the Gauss–Newton
equation for the descent direction yk in Proposition 3.9.8, but with a
different right-hand side. By construction, this (approximated) geodesic
search curve has the same advantages as the straight line search, plus
the prospect of producing in the mean larger steps (Remark 3.9.6), and
hence a faster decrease of J.
In both cases, the arc length along Pk satisfies d2 ν/dα2 (0) = 0, so that the
second order development of ν at 0 reduces to
• Determination of αk
There are two points in the determination of αk by Theorem 3.9.4, where an
exact calculation unfeasible, and which require an approximation:
1. Determination of 1/Rk . A precise determination of the upper bound
1/Rk to the (unknown) projected curvature along Pk up to the (un-
known) Curry step ᾱk (condition (3.113)) would require the evaluation
3.9. MAXIMUM PROJECTED CURVATURE (MPC) STEP 147
Rk ← τ Rk ,
xk+1 = gk (αk ),
• Steepest descent algorithm: The MPC step (with straight line search
necessarily) performs better than the classical Armijo and Goldstein
stepping algorithms, but worse than the Wolfe stepping (see [13, 2] for
the definition of the stepping algorithms)
3.9. MAXIMUM PROJECTED CURVATURE (MPC) STEP 149
Because p and v are derivable almost everywhere on [0, ν̄], we can sup-
pose that p (ν) = v(ν) and v (ν) = a(ν) exist. Then dividing (3.137) by
νn − ν and passing to the limit gives, as p and v are continuous at ν,
ᾱ < +∞,
which corresponds to the usual practical situation. But the proof remains
true if ᾱ = +∞, provided one defines r̄ by
r̄ = lim r(α).
α→ᾱ
which satisfies
def
0 ≤ t ≤ t̄ = r02 − r̄ 2 .
Derivation of (3.138) gives
For a linear problem, p is a straight half line of the data space IRq , and so
r̄L is constant. We calculate in this section |dr̄L /dt|, which will provide the
adequate measure of nonlinearity for our purpose.
Derivation of (3.141) with respect to ν gives, together with (3.139) and
(3.105),
dr̄ 2
L r̄L
= |p, a| = for a.e. t ∈ [0, t̄ ].
dt ρproj
152 CHAPTER 3. CHOOSING A PARAMETERIZATION
The function t r̄L is derivable a.e. on [0, t̄ ] \ Z (as the square root of
the strictly positive, a.e. derivable function t r̄L2 ), where (c.f. (3.99) and
(3.135))
Z = {t ∈]0, t̄ [ such that r̄L (t) = 0}.
Hence dr̄
L 1 1
= ≤ for a.e. t ∈ [0, t̄ ] \ Z. (3.142)
dt 2ρproj 2R
We prove now that this property remains true a.e. on Z. Let t ∈ Z be a
point where dr̄L /dt exists. Then r̄L (t) = 0 and r̄L (t ) ≥ 0 for a.e. t ∈ [0, t̄ ],
which implies that dr̄L /dt = 0. Hence (3.142) holds necessarily true at any
such point, where by definition 1/ρproj = 0 ≤ 1/R, and we are left to prove
that r̄L is derivable a.e. on [0, t̄ ].
We define for that purpose a sequence of functions
ηk = max{r̄L , 1/k} k = 1, 2, . . . ,
which converges simply to r̄L , and hence in the sense of distributions
ηk −→ r̄L in D (]0, t̄ [) when k −→ +∞. (3.143)
This sequence is bounded independently of k in the L∞ (0, t̄) norm:
ηk ∞ ≤ max{r̄L ∞ , 1} ≤ max{p∞ , 1} ≤ max{p(0), 1}. (3.144)
The functions ηk are derivable a.e. on [0, t̄ ], as the square root of the a.e.
derivable functions max{r̄L2 , 1/k 2 } ≥ 1/k 2 > 0. Hence,
⎧
⎪ dr̄L 1 1 1
⎪ (t) = ≤
⎪
for a.e. t such that r̄L (t) > ,
dη ⎨ dt 2ρproj 2R k
k
(t) =
dt ⎪
⎪
⎪
⎩ 0 1
for a.e. t such that r̄L (t) ≤ ,
k
where we have used (3.142) to evaluate dr̄L /dt when r̄L > 1/k. This implies
dη
k 1
≤
dt ∞ 2R
which, together with (3.144), shows that the sequence ηk , k = 1, 2 . . . , is
bounded in, say, H 1 (]0, t̄ [). Hence there exists a subsequence, still denoted
by ηk , and w ∈ H 1 (]0, t̄ [) such that
ηk w weakly in H 1 (]0, t̄ [) ⊂ D (]0, t̄ [) when k −→ +∞. (3.145)
3.9. MAXIMUM PROJECTED CURVATURE (MPC) STEP 153
Comparison of (3.143) and (3.145) implies that r̄L = w ∈ H 1 (]0, t̄ [), which
proves the desired result: r̄L is a.e. derivable over [0, t̄ ]. Hence
dr̄
L 1 1
= ≤ for a.e. t ∈ [0, t̄ ], (3.146)
dt 2ρproj 2R
and the following majoration holds for the continuous function r̄L :
t def
r̄L (t) ≤ r̄L (0) + = r̄L,W (t) for all t ∈ [0, t̄ ], (3.147)
2R
where r̄L,W is the worst case stationary linearized residual. As we shall prove
in the Sect. 3.9.5, and as one can guess from Fig. 3.9, r̄L,W is actually the
stationary linearized residual along the arc of circle pW of radius R (thick
lower dashed line on the figure), which turns away from the target 0 in the
plane containing p(0) and v(0). But for the time being, one can simply take
(3.147) as the definition of r̄L,W .
where we have used the fact that v(0), p(0) < 0 (v(0) is a descent direction),
and the majoration (3.147).
First, the right equality in (3.148) shows that t corresponds to a stationary
point of r 2 = p(t)2 , where p(t), v(t) = 0, if and only if μ(t) = r02 . Hence
the first stationary residual r̄ 2 and the corresponding parameter t̄ are given by
μW (t̄W ) = r02 , 2
r̄W = r02 − t̄W = r̄L,W (t̄W )2 , (3.152)
154 CHAPTER 3. CHOOSING A PARAMETERIZATION
μ
r02
μW μ
r̄L(0)2
0 tW t̄W t0 t t̄ r02 t
Then combining (3.140) with (3.148), one can express ν̄ and define ν̄W by
1 t̄ dt 1 t̄W dtW
ν̄ = , ν̄W = . (3.154)
2 0 (r0 − μ(t))
2 1/2 2 0 (r0 − μW (tW ))1/2
2
We are now close to our objective, which is to show that ν̄ ≥ ν̄W . We remark
first, using (3.151), that for a given t, the first integrand is larger than the
second one – this goes in the right direction, but it does not allow to conclude,
as the two domains of integrations are not the same! So we make a change
of variable in order to obtain integrals defined over the same interval. Define
first t0 by (Fig. 3.10):
The range of μ over the [t0 , t̄] interval is then [r̄L (0)2 , r02 ], the same as the
range of μW over [0, t̄W ]. Hence, we can define a (nonmonotonous) change of
variable t tW from the [t0 , t̄] interval onto the [0, t̄W ] interval by
We can now use this change of variable in the integral that gives ν̄W :
1 t̄W
dtW 1 t̄
μ (t) dt
ν̄W = =
,
2 0 (r0 − μW (tW ))
2 1/2 2 t0 μW (tW ) (r0 − μ(t))1/2
2
where μW (tW ) > 0 and μ (t) can be positive or negative. But differentiation
of (3.148) gives, using (3.155) and (3.146),
dr̄L
|μ (t)| = |1 + 2r̄L (t) (t)|
dt
dr̄L
≤ 1 + 2r̄L (t)| (t)|
dt
1
≤ 1 + r̄L,W (tW ) = μW (tW ).
R
Hence ν̄W satisfies
1 t̄
|μ (t)| dt 1 t̄
dt
ν̄W ≤
≤ ,
2 t0 μW (tW ) (r0 − μ(t))
2 1/2 2 t0 (r02 − μ(t))1/2
t̄
1 dt
≤ = ν̄,
2 2
0 (r0 − μ(t))1/2
and (3.107) is proved. We turn now to the proof of (3.108). Let t(ν̄W ) be the
squared residual decrease along p at the worst case stationary arc length ν̄W .
Formulas (3.154) for the arc length give
t(ν̄W ) tW
1 dt 1 dt
ν̄W = = .
2 0 (r0 − μ(t))
2 1/2 2 0 (r02 − μW (t))1/2
The first integrand is smaller than the second, so t(ν̄W ) has to be larger than
tW , which gives, by definition of t and r̄W ,
which is (3.108).
156 CHAPTER 3. CHOOSING A PARAMETERIZATION
where u is defined by
t
R + r̄L +
u(t) = 2R
2
1/2 .
ν̄L + (R + r̄L )2
With this change of variable, formula (3.154) becomes
1
du
ν̄W = R
u(0) (1 − u )
2 1/2
&π '
= R − sin−1 u(0)
2
ν̄L
= R tan−1 ,
R + r̄L
which is (3.109). We calculate now the worst residual r̄W . The definitions
(3.152) of t̄W , r̄W , and (3.147) of r̄L,W show that
t̄W
r̄W = r̄L,W (t̄W ) = r̄L + .
2R
Hence the first equation of (3.152) rewrites
2
t̄W + r̄W = r02
2R(r̄W − r̄L ) + r̄W
2
= r02 .
which is (3.110).
3.9. MAXIMUM PROJECTED CURVATURE (MPC) STEP 157
The above expressions for ν̄W and r̄W are those of the first stationary arc
length and residual along a circle pW of radius R (thick lower dashed line in
Fig. 3.10), which turns away from the target 0 in any plane containing p(0)
and v(0) (there can be many if these two vectors are colinear). This proves
that the worst case is achieved by the circle(s) pW .
which proves the convergence of the series (3.121). Finally, the conditions
associated to each choice of descent direction ensure that cos2 θk ≥ cmin > 0.
Together with the convergence of the series (3.121), this implies that
∇J(xk )2 → 0,
which is the definition of convergence of the algorithm. This ends the proof
of Proposition 3.9.8.
Chapter 4
We consider in this chapter the nonlinear least squares (NLS) problem (1.10),
which we recall here for convenience:
1
x̂ minimizes J(x) = ϕ(x) − z2F over C. (4.1)
2
As we have seen in Chap. 1, this inverse problem describes the identification
of the parameter x ∈ C from a measurement z of ϕ(x) in F . We suppose
that the minimum set of hypothesis (1.12) of Chap. 1 holds:
⎧
⎪
⎪ E = Banach space, with norm E ,
⎪
⎪
⎪
⎪ C ⊂ E with C convex and closed,
⎪
⎪
⎨ F = Hilbert space, with norm F ,
z ∈ F (4.2)
⎪
⎪
⎪
⎪ ϕ : C F is differentiable along segments of C,
⎪
⎪
⎪
⎪ and : ∃ αM ≥ 0 s.t. ∀x0 , x1 ∈ C, ∀t ∈ [0, 1],
⎩
Dt ϕ((1 − t)x0 + tx1 )F ≤ αM x1 − x0 E ,
G. Chavent, Nonlinear Least Squares for Inverse Problems: Theoretical Foundations 161
and Step-by-Step Guide for Applications, Scientific Computation,
DOI 10.1007/978-90-481-2785-6 4, c Springer Science+Business Media B.V. 2009
162 CHAPTER 4. OLS-IDENTIFIABILITY AND Q-WELLPOSED
for some r > 0, which then represents the upper limit of the noise and
modeling error level for which the NLS problem is well posed.
We recall first in Sect. 4.1 the results for the linear case we would like to
generalize to nonlinear problems, and define in Sect. 4.2 the class of FC/LD
problems that retain some useful properties of the linear case. We introduce
in Sect. 4.3 the notions of linearized identifiability and stability. These ingre-
dients are used in Sect. 4.4 to state a sufficient conditions for Q-wellposedness
of NLS problems, based on the results of Chaps. 7 and 8 on strictly quasi-
convex (s.q.c.) sets. The case of finite dimensional parameters is studied in
Sect. 4.5, where it is shown that Q-wellposedness holds locally as soon as
the linearized problem is identifiable. The remaining sections are devoted to
examples of Q-wellposed parameter estimation problems in elliptic equations.
Under the minimum set of hypothesis (4.2), P (t) has a derivative V (t) =
Dt ϕ((1 − t)x0 + tx1 ), and the arc length of the curve P of the output set
ϕ(C) is 1
L(P ) = V (t)F dt. (4.12)
0
Definition 4.2.1 Let C and ϕ satisfy the minimum set of hypothesis (4.2).
Problem (4.1) is a finite curvature least squares problem (in short: a FC
problem) if the following conditions are satisfied:
⎧
⎪
⎪ there exists R > 0 such that
⎪
⎨ ∀x0 , x1 ∈ C, the curve P : t ϕ((1 − x0 )t + tx1 ) satisfies
1 (4.13)
⎪
⎪ P ∈ W 2,∞ ([0, 1]; F ) and A(t)F ≤ V (t)2F for a.e. t ∈ [0, 1],
⎪
⎩ R
where V (t) = P (t), A(t) = P (t).
• Or V (t) = 0 ∀t ∈ [0, 1], so that L(P ) > 0 and the radius of curvature
ρ(t) along the curve P satisfies
1 A(t)F
≤ for a.e. t in [0, 1], (4.14)
ρ(t) V (t)2F
The lower bound R to the radius of curvature R(ϕ(C)) of the attainable set
is called a radius of curvature of the inverse problem (4.1) (Definition 4.2.3
below).
Notice that (4.13) is only a sufficient condition for the radius of curvature
along P to be larger than R (compare (4.13) with the formula (8.66) for the
curvature, and/or think of ϕ : [0, 1] IR2 defined by ϕ(x) = (x2 , x2 )).
• Fold over itself, which prevents property (i) of Proposition 4.1.1 to hold
on any neighborhood of ϕ(C) (think of the greek letter α as attainable
set, or consider the simple case where C = [0, diam C] ⊂ IR, and ϕ(x) =
(cos x, sin x) ∈ IR2 , so that ϕ(C) is an arc of circle of radius 1 and arc
length diam C, when diam C > 2π).
P (t)
P (t) P
V (t)
• If Θ < 2π, the arc of circle can obviously not fold over itself!
Definition 4.2.2 Let C and ϕ satisfy the minimum set of hypothesis (4.2).
A FC problem (4.1) is a Limited Deflection least squares problem (in short,
a FC/LD problem) if it satisfies the Deflection Condition:
π
Θ≤ . (4.16)
2
The attainable set of an FC/LD problems is s.q.c. The upper bound Θ to
the deflection of the curves P is called the deflection of the inverse problem
(4.1).
4.2. FINITE CURVATURE/LIMITED DEFLECTION PROBLEMS 169
Proof. The strict quasi-convexity property of the attainable set follows from
Theorem 8.1.6, which gives a sufficient condition for a set to be s.q.c.
We explain now how to estimate the deflection Θ of a FC problem. We
remark that when P is an arc of circle, Θ is simply equal to the length of the
arc divided by its radius, that is, the product of the size of the arc by its curva-
ture. For a curve P other than a circle, a similar formula holds at the infinites-
imal level: the deflection dθ between the endpoints of the arc corresponding
to parameters t and t + dt is bounded (not equal in general) by the length
of the arc V (t)F dt multiplied by its curvature 1/ρ(t) (Proposition 8.1.2):
V (t)F dt
dθ ≤ , (4.17)
ρ(t)
(with the equality if P is a plane curve!). Combining with (4.14) gives
A(t)F V (t)F
dθ ≤ dt ≤ dt. (4.18)
V (t)F R
If we denote by t0 and t1 the values of t corresponding to the points of
the curve P where the deflection is maximum, we obtain, for the maximum
deflection Θ(P ) along the curve P ,
t1 1 1
A(t)F
Θ(P ) = dθ ≤ dθ ≤ dt ≤ L(P )/R ≤ L/R, (4.19)
t0 0 0 V (t)F
1
where L(P ) = 0 V (t)F dt denotes the arc length of P and L an upper
bound to L(P ), This shows that any number Θ which satisfies
1
A(t)F
dt ≤ Θ ≤ L/R for all x0 , x1 ∈ C (4.20)
0 V (t)F
Definition 4.2.3 The geometric attributes of the FC problem (4.1) are the
following:
1. Its radius of curvature R > 0, defined as a lower bound to the radius
of curvature along all curves P of the form (4.11) with L(P ) > 0. It is
estimated by (4.13)
2. Its (arc length) size L ≥ 0, defined as an upper bound to the arc length
L(P ) of all curves P of the form (4.11). It is estimated by
1
L(P ) = V (t)F dt ≤ L ≤ αM diam C ∀x0 , x1 ∈ C, (4.21)
0
or
π
A(t)F ≤ V (t)F for a.e. t ∈ [0, 1] and all x0 , x1 ∈ C, (4.25)
2
or
π
L≤ R. (4.26)
2
Proof. it follows immediately from Proposition 4.2.4.
Condition (4.26) shows that the deflection condition can be enforced by
a size×curvature condition, that is, by requiring that the product of the size
L of ϕ(C) by its curvature 1/R is bounded, here by π/2 .
Though the deflection condition Θ ≤ π/2 can be satisfied for an un-
bounded set ϕ(C) (think, e.g., of the graph of a sine function in IR2 ),
Corollary 4.2.5 shows that Θ ≤ π/2 can be ensured by limiting the size of
the admissible parameter set C (A(t) is proportional to x1 − x0 2 and
V (t) is proportional to x1 − x0 !). Hence the deflection condition will act
in practice as a localization constraint: this is an example of regularization
by size reduction mentioned in Sect. 1.3.4.
We can now state the nonlinear counterpart of Proposition 4.1.1 concern-
ing the properties of the projection on ϕ(C). The stability of the projection
will hold for the “arc length distance” δ(X0 , X1 ) on the attainable set ϕ(C),
defined by
∀X0 , X1 ∈ ϕ(C), δ(X0 , X1 ) = supx0 ∈ϕ−1 (X0 ),x1 ∈ϕ−1 (X1 ) L(P ), (4.27)
where L(P ) is the length, defined in (4.12), of the curve P associated to x0
and x1 by (4.11).
Remark 4.2.6 The quantity δ(X0 , X1 ) is positive. But without further hy-
pothesis, it can happen that δ(X0 , X1 ) = 0 and X0 = X1 . However, for a
FC/LD problem, as it is the case in Proposition 4.2.7 below, δ(X0 , X1 ) = 0
implies X0 = X1 (Proposition 6.2.5), so that the first axiom of a distance
is satisfied. The second axiom of a distance δ(X0 , X1 ) = δ(X1 , X0 ) is always
satisfied, but we do not know whether the third axiom (triangular inequality)
is satisfied, this is why we use the word distance between quotes.
Proposition 4.2.7 Let (4.1) be a FC/LD problem with curvature 1/R < ∞,
and let ϑ be the enlargement neighborhood of ϕ(C) defined by
+ ,
ϑ = z ∈ F | d(z, ϕ(C)) < R . (4.28)
172 CHAPTER 4. OLS-IDENTIFIABILITY AND Q-WELLPOSED
Then the projection on the attainable set ϕ(C) has the following properties:
(i) Uniqueness: For any z ∈ ϑ, there exists at most one projection of
z on ϕ(C)
Proof. Equation (4.1) is a FC/LD problem, and so its attainable set ϕ(C)
is s.q.c. (Definition 4.2.2) and the proposition follows from the properties of
the projection on s.q.c. sets summarized in Theorem 7.2.11.
We investigate now the shape of the preimage sets, that is, the sets of
parameters x ∈ C that have the same image by ϕ. For a linear problem
ϕ(x) = B.x, the preimage of X ∈ B.C is the intersection of the closed affine
subspace {x ∈ E such that B.x = X} of E with the admissible parameter
set C, it is hence closed and convex. For FC/LD problems, the following
result holds:
4.2. FINITE CURVATURE/LIMITED DEFLECTION PROBLEMS 173
Proof. let X ∈ ϕ(C) be given. The finite curvature hypothesis implies that
C is closed and ϕ continuous (second and last properties of (4.2)), and hence
that ϕ−1 (X) is closed. Then the condition Θ ≤ π/2 on the deflection implies
that ϕ(C) is s.q.c. (Proposition 4.2.7). Let then x0 , x1 be two pre-image of
X, and P the curve (4.11) image by ϕ of the [x0 , x1 ] segment of C. The
function t X − P (t)2 is s.q.c. (use (7.4) with D = ϕ(C) and z = X in
the Definition 7.1.2 of s.q.c. sets), positive, and takes the value 0 for t = 0
and t = 1: this can be possible only if X − P (t)2 = 0 ∀t ∈ [0, 1]. Hence
xt = (1 − t)x0 + tx1 belongs to ϕ−1 (X) for all t ∈ [0, 1], which shows that
ϕ−1 (X) is convex.
Propositions 4.2.7 and 4.2.8 (and also 4.3.3 below) show that FC/LD
problems are a direct generalization of linear least squares problems.
where RG ≤ R is the global radius of curvature (Sect. 7.2), as soon as the ge-
ometric attributes 1/R, L , and Θ of the FC problem (4.1) (Definition 4.2.3)
satisfy the extended deflection condition
def
RG = R(sin Θ + (L/R − Θ) cos Θ) > 0. (4.33)
Figure 8.2 shows that the set of deflection Θ and size×curvature product L/R
that satisfy (4.33) is of the form
with Θmax given by (8.19). This shows that the range of authorized deflections
can be extended beyond π/2. Examples of simple sets that have a deflection
larger than π/2 but are nevertheless s.q.c. can be seen in Figs. 7.3 and 8.3.
Hence the use of conditions (4.33) or (4.34) allows to enlarge the size of
the admissible parameter set C for which ϕ(C) is s.q.c. – so that Propositions
4.2.7 and 4.2.8 above and 4.3.3 below still hold – at the price of reducing the
size of the neighborhood ϑ on which the projection is well-behaved.
For example, in the case where only the worst deflection estimate Θ =
L/R (see (4.23)) is available, one sees that (4.33) is satisfied for any π/2 <
Θ < π, but on the smaller neighborhood (4.32) of size RG = R sin Θ.
For sake of simplicity, we shall not attempt, in the rest of this chapter
and in Chap. 5, which deal with the regularization of inverse problems, to
produce the least constraining conditions on the size of C. So we shall not
use the extended deflection condition (4.34), but only the simple deflection
condition (4.16), which will ensure that the output set ϕ(C) is s.q.c. with a
neighborhood (4.28) of size R independent of the deflection Θ.
This can be seen easily: given x0 , x1 ∈ C and t ∈ [0, 1], subtraction of the
state equations written at x1 and x0 gives
b(x1 − x0 , y1 ) = 0 =⇒ x1 = x0 . (4.38)
b(x1 − x0 , yt ) = 0 =⇒ x1 = x0 ,
But for nonlinear problems, identifiability does not imply in general lin-
earized identifiability (think, e.g., of ϕ(x) = x3 on C = [−1, +1], but this is
not a FC problem), and conversely linearized identifiability does not imply
identifiability. However, for FC/LD problems the following result holds:
Proposition 4.3.3 Let (4.1) be a FC/LD problem. Then
Definition 4.3.4 Let C, ϕ satisfy the minimum set of hypothesis (4.2). The
parameter x is linearly stable on C if
∃αm > 0 such that ∀x0 , x1 ∈ C :
1 (4.39)
αm x0 − x1 E ≤ L(P ) = 0 V (t)F dt,
where L(P ) is the arc length of the curve P image by ϕ of the [x0 , x1 ] segment,
as defined by (4.13), (4.11), and (4.21). A sufficient condition for (4.39) to
hold is
∃αm > 0 such that ∀x0 , x1 ∈ C, ∀t ∈ [0, 1] :
(4.40)
αm x0 − x1 E ≤ V (t)F .
Linear stability is weaker than stability:
Proposition 4.3.5 Let C, ϕ satisfy the minimum set of hypothesis (4.2).
Then
stability =⇒ linear stability.
Proof. The stability condition (1.15) implies (4.39) with αm = 1/k, which
proves the implication.
The first and most natural way to do that consists in considering that B
in (4.9) is the forward map ϕ, and to replace (4.9) by the stability condition
(1.15) of Definition 1.3.1, which we recall here for convenience:
∃k ≥ 0 such that
(4.41)
x0 − x1 E ≤ k ϕ(x0 ) − ϕ(x1 )F ∀x0 , x1 ∈ C.
Of course, combining (4.41) with the stability property (4.30) of the projec-
tion onto ϕ(C) would ensure the desired OLS-identifiability property. But it
would not take full advantage of (4.30), where the stability of the projection
is achieved not only for the usual distance X (0 − X(1 of the projections
F
of z0 and z1 , but also for their larger “arc length distance” δ(X(0 , X
(1 ) mea-
sured along the curves P(. This distance is defined in term of the directional
derivative V (t) of ϕ (see (4.31)). This leads us to consider that B in (4.9) is
rather the derivative of the forward map ϕ, and to replace (4.9) by the linear
stability condition of Definition 4.3.4:
Theorem 4.4.1 Let (4.1) be a FC/LD problem with curvature 1/R < ∞.
If the linear stability condition (4.39) is satisfied, the parameter x is OLS-
identifiable, that is, the NLS problem (4.1) is Q-wellposed on a neighborhood
+ ,
ϑ = z ∈ F |d(z, ϕ(C)) < R , (4.42)
and the following local Lipschitz stability result holds for the inverse problem:
for all z0 , z1 ∈ ϑ close enough so that there exists d ≥ 0 satisfying
where
• L(P() is the arc length of the curve P( : t ϕ((1 − t)x(0 + tx(1 ) image by
ϕ of the [x(0 , x(1 ] segment
Proof. The linear stability hypothesis implies of course the linear identifia-
bility of x and, because the problem has finite curvature and satisfies the
deflection condition, identifiability of x on C (Proposition 4.3.3). Hence ϕ is
injective on C: any X ∈ ϕ(C) has a unique preimage x = ϕ−1 (X) in C.
Let then Xj ∈ ϕ(C), j = 0, 1, be two points on the output set, and
xj = ϕ−1 (Xj ), j = 1, 2, be the corresponding (unique) preimage. Combining
the Definition (4.27) of δ(X, Y ) with the linear stability condition (4.39)
shows that
1
δ(X0 , X1 ) = L(P ) = V (t)F dt ≥ αm x0 − x1 E . (4.45)
0
Theorem 4.5.1 Let the FD minimum set of hypothesis (4.46) hold. Then
ϕ is continuously differentiable over Cη , and the following properties hold:
1. If C is bounded, then
x is OLS-identifiable on C, or equivalently:
the NLS problem (4.1) is Q-wellposed on C.
which proves the linear stability of x over C. As for the finite curvature of the
problem, the only thing in Definition 4.2.1 that does not follows immediately
from the FD minimum set of hypothesis (4.46) is the inequality (4.13). The
proof is similar: the best (smallest) 1/R that satisfies (4.13) is by definition
def AF
1/R = sup (4.56)
x0 ,x1 ∈C , x0 =x1 , t∈[0,1] V 2F
Dh,h
2
ϕ(x)
≤ sup (use (4.53) left and right)
x∈C , h∈S Dh ϕ(x)
2
1. Linear identifiability:
does V = 0 imply x1 = x0 ?
2-a. Closedness:
is the attainable set ϕ(C) closed?
or alternatively,
3. Deflection condition:
does one have, for all x0 , x1 ∈ C and t ∈ [0, 1],
1
A(t)F ≤ θ(t) V (t)F with Θ = 0 θ(t) ≤ π/2? (4.58)
4. Finite curvature:
does there exist R > 0 such that,
for all x0 , x1 ∈ C and t ∈ [0, 1], A(t)F ≤ 1/R V (t)2F ? (4.59)
The theoretical and numerical tools available (or not available...) to an-
swer these questions are described in Sect. 4.7 below.
In the case where one is able to give a positive answer to question 2-b
for some norm · E , the closedness question 2 is also answered positively,
and one has
αm x̂1 − x̂0 E ≤ δ(x0 , x1 ), (4.63)
so that the the NLS problem (4.1) is still Q-wellposed on the same neigh-
borhood (4.61), but for the stronger norm · E on the parameter space for
which one has been able to prove linear stability.
this does not mean that it is ill-posed, only that one cannot decide between
well- and ill-posedness. A reasonable thing to do is then to add information,
either by reducing the number of parameters (Chap. 3), or by using L.M.T.
regularization or strengthening the observation in the perspective of using
state-space regularization (Chap. 5), until the regularized problem satisfies
one set of sufficient conditions. Hence these sufficient conditions provide a
guideline for the design of a wellposed and optimizable regularized problem,
and it is only at that point that numerical attempts to minimize the least
squares objective function should be made, now with a reasonable hope of
producing meaningful results.
We discuss for the rest of this section the available tools for checking these
condition under the minimum set of hypotheses (4.2): one can try to answer
questions 1–4 either rigorously, by proving the corresponding property or
giving a counterexample, or approximately, when a proof is out of reach,
by checking numerically the property – but this becomes computationally
intensive when there are more than a few parameters.
In any cases, it is useful, whenever possible, to estimate even crudely
the linear stability constant αm , and the lower bound R to the radius of
curvature of the problem, which provide useful practical information: αm , R
appear in the stability estimate (4.62) and (4.63) of the inverse problem,
and R gives the size of the neighborhood of the attainable set on which the
inverse problem is stable, and hence provides an upper bound on the size of
the admissible measurement and model errors.
For problems of small size, one can combine formula (4.65), (4.70), and
(4.73) for the numerical determination of αm , Θ, and R with Theorem 4.5.1
and the stability property (4.44) to perform a nonlinear stability analysis,
see, for example, [80].
• One has to limit the computation of ϕ (xnom ) and its SVD to a finite
set CN of nominal parameters, and then cross fingers that things do not
change too much away from the chosen nominal value(s). The number
of points in CN depend on the computational power one can afford, it
can be reduced to one for computationally intensive problems. . .
The estimate (4.64) is optimistic because one performs the SVD only
at a finite number of points of C, but it can also be pessimistic because
performing the SVD amounts to investigate the stability in all direc-
tions of IRn , some of them pointing outside of C when the dimension
of C is strictly smaller than n.
• One can also go along the line of the next sections on the deflection and
finite curvature conditions: one discretizes both points and directions
of the admissible set C. Figure 4.2 shows two naturals way to do that:
– On the left, a very coarse coverage where only one (or a few) point(s)
and the directions of the coordinate axis are investigated
– On the right a more comprehensive coverage, based on a discretization
of the boundary ∂C into a finite set ∂CN (circles), where each [x0 , x1 ]
interval is in turn discretized into N intervals by N + 1 points xk/N , k =
0, 1 · · · N (black dots)
188 CHAPTER 4. OLS-IDENTIFIABILITY AND Q-WELLPOSED
x0
xk/N
x0 x1/2 x1
x1
C C
Then αm is estimated by
Vk−1/2
αm = min , (4.65)
x0 ,x1 ∈∂CN , k=1···N x1 − x0
where b is bilinear and c linear, and by full observation property we mean that
the observation operator M is linear and satisfies, for some κM ≥ κm > 0,
∂yt ∂ 2 yt
V (t) = M and A(t) = M ,
∂t ∂t2
where yt is the solution of the state equation (4.67) for x = xt = (1−t)x0 +tx1 .
Deriving (4.67) twice gives the equations for ∂yt /∂t and ∂ 2 yt /∂t2 :
∂y ∂y
t t
b xt , +c + b(x1 − x0 , yt ) = 0,
∂t ∂t
∂2y ∂2y ∂yt
t t
b xt , 2 + c + 2b x1 − x 0 , = 0. (4.69)
∂t ∂t2 ∂t
The same arguments that ensure that the state equation (4.67) has a unique
solution depending continuously on c will, in general, imply that (4.69) has
a unique solution, with
2
∂ yt ∂yt
∂t2 ≤ κ(C) ∂t ,
Y Y
where the constant κ(C) can be expressed in terms of the bounds imposed
on x in C. The deflection condition is then satisfied as soon as
κM
κ(C) ≤ π/2.
κm
An example of this situation can be found in the deflection estimates of
Proposition 4.9.3 (respectively, 5.4.5) for the estimation of the diffusion co-
efficient in a two-dimensional elliptic equation with H 1 -observation (respec-
tively, H 1 -observation with adapted regularization).
As for the numerical evaluation of the deflection Θ, it can be performed
on a coverage of points and directions of C that takes into account the size
of C, as the one at the right of Fig. 4.2:
where θ(k, k ) is the deflection between the points xk/N and xk /N :
Vk , Vk
θ(k, k ) = cos−1 .
Vk Vk
Here Vk is the velocity at xk along x1 − x0 . It can be evaluated either ana-
lytically, or numerically by
Vk = (Vk+1/2 + Vk−1/2 )/2, (4.71)
for example, where Vk−1/2 is defined in (4.66).
Formula (4.70), which is based on the definition of Θ, is the more precise
one, but is computationally expensive, as it requires the comparison of θ(k, k )
for all couples k, k . A formula with a simple summation in k only, which is
based on the majoration of Proposition 4.2.4 but requires an estimation of
the acceleration A, is given in the next section in (4.74).
Remark 4.7.2 The deflection condition Θ ≤ π/2 is only a sufficient condi-
tion for the more precise extended deflection condition RG > 0 on the global
radius of curvature, which characterizes s.q.c. sets (see Remark 4.2.10 and
Chap. 7).
So in a situation where one can afford to compute the deflection by (4.70),
one should rather compute, at a similar computational burden, the global
radius of curvature RG using Proposition 7.3.1
RG = max ρep
G (k, k ),
x0 ,x1 ∈∂CN , k,k =0,1···N, k =k
where ρepG (k, k ) is the global radius of curvature at xk/N seen from xk /N
( ρG (k, k ) = ρep
ep
G (k , k)!), given by
ep max{0, N}/D if V, V ≥ 0,
ρG (k, k ) = (4.72)
max{0, N} if V, V ≤ 0,
with (Proposition 7.2.6)
⎧
⎪
⎪ X = ϕ(xk/N ), X = ϕ(xk /N ),
⎨
v = V /V , v = V /V ,
⎪
⎪ N = sgn(k − k)X − X, v ,
⎩
D = (1 − v, v 2 )1/2 .
When RG > 0, all stability results of this chapter apply on a neighborhood of
size RG ≤ R (Remark 4.2.10).
4.8. APPLICATION TO EXAMPLE 2 WITH H 1 OBSERVATION 191
ϕ(b) = −b qb . (4.77)
or by 1
b
V = 1 (b1 − b0 )qb − (b1 − b0 )qb . (4.80)
0
b 0
Formula (4.82) will be used to estimate the deflection Θ, and (4.81) will serve
to estimate the curvature 1/R.
We check first that the minimum set of hypothesis (4.2) is satisfied: the
only part that needs a proof is the last property. Multiplication of (4.78) by
ηξ and integration over Ω = [0, 1] shows that
b1 − b0
|V |L2 = |ηξ |L2 ≤ bM | qb |L2 ,
b
and (4.2) holds with
bM
αM = qM .
bm
η(0) = 0, η(1) = 0.
The flux function qb is constant between consecutive source points. If one of
these constants is zero, parameters b0 and b1 that differ only on the corre-
sponding interval produce η = 0, that is, V = 0, and linear stability cannot
hold. It is hence necessary, if we want stability, to suppose that the flux
satisfies
0 < qm ≤ qb ≤ qM ∀b (4.85)
for some 0 < qm ≤ qM (notice that (4.85) will be automatically satisfied if,
e.g., a finite number of sources and sinks of the same amplitude are located
in an alternate way on the [0, 1] interval).
But now qb−1 is finite, and so it can happen that d = (b1 − b0 )/b becomes
proportional to qb−1 . In this case, d qb is a constant, and (4.78) and (4.79)
imply η = 0. Hence V = 0, and linear stability fails once again! So we have
to prevent d qb to become close to a constant function. To quantify this, we
decompose L2 (Ω) into the direct sum of two orthogonal subspaces
IR
dqb
γ
L2/IR
Then (4.84) shows that b−1 ηξ = b−1 V and d qb are equivalent, so that
Step 2: To keep the angle γ away from zero, we remark that qb−1 is
constant between two source points, and is discontinuous at each active source
point, where gi = 0. So if we require that the admissible coefficients b (and
hence the vector d = (b1 − b0 /b!)) are regular, say, for example, constant,
on some neighborhood of at least one active source, then d qb , which is also
discontinuous at active source points, cannot be constant, and we can expect
that γ remains larger than some γm > 0.
So we define a smaller admissible parameter set
where
are intervals surrounding the sources ξj . Of course, the intervals are chosen
such that
where the sign is chosen such that e, v ≥ 0. With this convention, the angle
γ between the directions of v and e is given by the median theorem:
1
0 ≤ cos γ = e, v = 1 − |e − v|2L2 , γ ∈ [0, π/2]. (4.93)
2
To minor γ when b0 , b1 ∈ D and t ∈ [0, 1], we have to search for a lower
bound to |e − v|2L2 :
1
|e − v|2L2 = |e − v|2
0
≥ |1 − v|2
j∈J Ij
& − '
= ηj (1 − dj qb (ξj− ))2 + ηj+ (1 − dj qb (ξj+ ))2 , (4.94)
j∈J
where dj = dj /|d qb|L2 ∈ IR. We have used the fact that, on each interval Ij ,
e is equal to 1, and v takes constant values dj qb (ξj− ) left from ξj and dj qb (ξj+ )
right from ξj . Taking the infimum over dj ∈ IR for each interval gives
ηj− ηj+
|e − v|2L2 ≥ g2 (4.95)
j∈J
ηj− qb (ξj− )2 + ηj+ qb (ξj+ )2 j
1 ηj− ηj+
≥ g2 (4.96)
2
qM j∈J
ηj− + ηj+ j
4.8. APPLICATION TO EXAMPLE 2 WITH H 1 OBSERVATION 197
The sum term in the right-hand side of (4.98) depends only on the
strength of the sources (last factor), and on the disposition of the inter-
vals Ij surrounding the sources (first factor). It is strictly positive if at least
one active source is interior to the corresponding interval, that is,
∃j ∈ J such that ηj− ηj+ gj = 0. (4.99)
Combining (4.89) (step 1) and (4.97) (step 2) give then the “flux-weighted
relative stability estimate”
b − b
1 0
bm sin γm qb ≤ |V |L2 . (4.100)
b L2
Hb,m ≤ Hb ≤ Hb,M ,
A better estimation
γm of γm is then
1
cos γm = 1 − inf g(h) ≤ cos γm ,
2 Hb,m ≤h≤Hb,M
This gives, using the stability estimate (4.100) and the property (4.87) of the
norm in L2 /IR,
*
2 1 |b|L2 /IR
|A|L2 ≤ 1+ |V |L2 (bM − bm ) + |c|L2 /IR . (4.102)
bm sin γm bm
Let us denote by max ∈ J the index of the source for which |Ij | = ηj− + ηj+
is maximum, and by cmax and dmax the constant values of c and d on Imax .
Using (4.87) again gives, as c − cmax and d − dmax vanish over Imax ,
With these hypotheses, the regularity theorem in [76], page 180, implies
that {|ua |W 2,p : a ∈ C} is bounded for any p > 2. Since W 2,p (Ω) is contin-
uously embedded into C 1 (Ω) for every p > 2, there exists uM and γM such
that
|ua |L∞ (Ω) ≤ uM , |∇ua |L∞ (Ω) ≤ γM for all a ∈ C. (4.109)
The results in this section are derived under the assumption (1.65) that
the Dirichlet condition u = 0 holds on a nonvoid part ∂ΩD of the boundary
∂Ω, but all results can be extended to the case where meas (∂ΩD ) = 0,
see [30].
We have now to choose the Banach parameter space E for which one
hopes to prove the stability of the inverse problem. As we will end up in this
section by further reducing the problem to finite dimension, all norms will
be equivalent, and we shall choose simply for E the space that makes the
a ua mapping naturally regular:
The admissible set C defined in (4.106) is clearly a closed and convex subset
of E – but with a void interior!
To any a ∈ C, we associate the solution ua of the state equation (1.64),
which we write here for convenience in its variational form. We incorporate
for this the boundary conditions of lower order in the state-space Y :
Y = {v ∈ H 1 (Ω) : v|∂ΩD = 0, v|∂Ωi = vi = const , i = 1, · · · , N}
vY = |∇v|IL2 ,
Integrating by parts the second term on the right hand side implies
u 1 N
1 h2 1 u h2i
h∇u · ∇v = |∇u|2 − 2
h Δu + 2 ∇a · ∇u + ui Qi ,
Ω 2 Ω a 2 Ω a a 2 i=1
a2i
Proposition 4.9.2 Let notations and hypothesis (1.65) and (4.105) through
(4.107) hold. Then a is linearly identifiable over C as soon as
and
|∂Ωi |, i = 1, · · · , N, are sufficiently small. (4.117)
where ua is the solution of (4.111) for a = (1−t)a0 +ta1 . Lemma 4.9.1 implies
then
N
1 h2 h2i
|∇ua |2 + 2
uiQi = 0. (4.118)
2 Ωa i=1
ai
We argue first that the second term in 4.118 can be made positive us-
ing (4.118). Suppose that ∂Ωi surrounds for each i = 1, · · · , N, a fixed
source/sink location xi . If |∂Ωi | → 0, for all i = 1, · · · , N, the solution ua con-
verges towards the weak solution associated to a right-hand side with Dirac
N
source term Qi δ(x − xi ), which is singular at xi . Hence ua |∂Ωi = ua,i → ∞
i=1
if Qi > 0 and ua |∂Ωi → −∞ if Qi < 0. Since C is compact in E by the Ascoli
theorem, and a → ua,i is continuous on E, we conclude that, when |∂Ωi |
satisfies (4.117), the solution ua satisfies
We argue now that ∇u(a) cannot vanish on a set I of positive measure. Let
γ denote a curve in Ω connecting the inner boundaries ∂Ωi to ∂ΩD ∪∂ΩN , such
that Ω \ γ is simply connected and meas γ = 0. Then Iγ = (Ω \ γ) ∩ I satisfies
meas Iγ > 0. From [5], Theorem 2.1, and Remark, it follows that either ua is
constant on Ω \ γ and hence on Ω or ua has only isolated critical points, that
is, points z satisfying ∇u(z) = 0. But ua cannot equal a constant over Ω as
this violates the boundary conditions at the wells ∂Ωi at which Qi = 0. On
the other hand, the number of isolated critical points in Iγ can be at most
countable, and hence meas Iγ = 0. Consequently, meas{x : ∇ua (x) = 0} = 0.
Hence h = 0 a.e. in Ω, and a is linearly identifiable over C, which ends
the proof.
We turn now to the deflection condition (4.16):
Proposition 4.9.3 Let notations and hypotheses (1.65) and (4.105) through
(4.107) hold. Then the deflection condition Θ ≤ π/2 is satisfied for problem
(4.113) as soon as
π
aM − am ≤ am . (4.119)
4
Proof: Taking v = ζ in (4.115) gives, using (4.107), the Cauchy–Schwarz
inequality and (4.112):
(one can, e.g., construct E using finite elements or splines). The result follows
then immediately from Theorem 4.5.1:
Theorem 4.9.4 Let notations and hypothesis (1.65) and (4.105) through
(4.110) hold, as well as (4.116), (4.117), and (4.121). Then
|∇u|IL2 (Ω)
αm = inf >0
a0 ,a1 ∈C , a0 =a1 , t∈[0,1] a1 − a0 C 0 (Ω)
The FD-minimum set of hypotheses (4.46) is then verified, the set C – and
hence C – is obviously bounded, and Proposition 4.9.2 applied to Cη instead
of C shows that a is linearly identifiable over Cη – and hence over C η . Then
points 1 and 2 of Theorem 4.9.4 follow immediately from points 1 and 2 of
Theorem 4.5.1, and point 3 follows from Proposition 4.9.3 and point 3 of
Theorem 4.5.1.
Regularization of Nonlinear
Least Squares Problems
back to the forties, and that of Marquardt [62] to the sixties. The approach
was popularized in the seventies by Tikhonov and the Russian school [75, 63].
In practice, the available data are always corrupted by noise. So we retain
the letter z to denote the noise free data, and suppose that a sequence zn of
noisy measurement of increasing quality is available:
1 2
x̂n minimizes Jn (x) = ϕ(x) − zn 2F + n x − x0 2E over C. (5.5)
2 2
• For the class of linear problems, where ϕ(x) = Ax with A ∈ L(E, F ),
the theory is rather well developed, see, for example, the monographs
by Baumeister, Groetsch, Louis, and Morozov [8, 43, 59, 63], and the
papers [65, 66, 27]. The main results are the following:
We give in Sect. 5.1.1 a direct hard analysis proof of these results, which
will serve as guideline for the nonlinear case.
• The results of the linear case are generalized completely in Sect. 5.1.2 to
the class of finite curvature/limited deflection (FC/LD) problems, pro-
vided that the true data z is close enough to the attainable set ϕ(C).
5.1. LMT REGULARIZATION 211
This class contains NLS problems that satisfy both the finite curva-
ture property of Definition 4.2.1 and the deflection condition (4.25) of
Corollary 4.2.5. We follow [28] for the proof, but use sharper estima-
tions that allow to obtain the convergence results even for unattainable
data and without the need for the a-priori guess x0 to be close to the
minimal norm solution x̂.
• For general nonlinear problems, however, where the parameter out-
put mapping ϕ exhibits no interesting mathematical properties except
regularity, > 0 does not necessarily ensure wellposedness of the regu-
larized problem (5.2) any more, in particular for the small values of .
We give first in Sect. 5.1.3 an estimation of a minimum value min > 0
of , which restores Q-wellposedness of (5.2) [18]. Then we study the
convergence of a sequence of (non-necessarily unique) minimizers x̂n of
(5.5) when the data z is attainable and the a-priori guess x0 is close
enough to a minimum-norm solution x̂ of (5.1) [27]. Convergence results
for unconstrained nonlinear problems are also available in [37, 67, 45].
T (K, x) and T (K, x)− are closed convex cones. The tangent cone to X sat-
isfies
Lemma 5.1.1
Proof. Let y ∈ T (X, x), and {xn } in X, {λn } with λn > 0 be sequences such
that y = lim λn (xn − x). It follows that y ∈ KerA and, since xn ∈ X ⊂ C,
we also have that y ∈ T (C, x).
Unluckily, the converse inclusion is not true in general: for example, if C
is a closed ball and X is reduced to one single point {x}, then T (X, x) = {0}
and Ker A ∩ T (C, x) = Ker A, which do not coincide as soon as Ker A is not
trivial. So we make the following definition:
Lemma 5.1.3 Let (5.1) admit a solution and x be identifiable over C (Def-
inition 1.3.1). Then the solution set X contains one single element x̂, and x̂
is qualified.
For the case of linear constraints, one has the following result:
Lemma 5.1.4 Let C be defined by a finite number NC of linear constraints:
& def '
C = x ∈ E : Mi x ≤ bi , i ∈ I = {1 . . . NC } ,
where Mi are bounded linear functionals on E and bi ∈ IR. Then all points x
of X are qualified.
Proof. Step 1: for any x̃ ∈ C, we prove that
coincides with T (C, x̃). By definition of T (C, x̃), one has T (C, x̃) = K, and
hence it suffices to show that K is closed. Let yn ∈ K and y ∈ E be such that
yn → y. As yn ∈ K, there exist λn > 0 and xn ∈ C such that yn = λn (xn − x̃).
Let us denote by I(x̃) the set of active indices at x̃: I(x̃) = {i ∈ I : Mi x̃ =
bi }. Then we find Mi yn = λn (Mi xn − Mi x̃) = λn (Mi xn − bi ) ≤ 0 for all
i ∈ I(x̃) and n = 1, 2, . . . . Hence Mi y ≤ 0 for all i ∈ I(x̃). Next we choose
λ > 0 small enough so that Mi x̃ + λMi y ≤ bi for all i ∈ / I(x̃). It is simple to
check that x̃ + λy ∈ C and hence y ∈ K and K is closed.
Step 2: we prove that Ker A ∩ K ⊂ T (X, x̃), for any x̃ ∈ X. Let y ∈
Ker A ∩ K be given. Then y = λ(x − x̃), where λ > 0 and x ∈ C, and
A y = 0. Hence A x = A x̃, so that x ∈ X and y = λ(x − x̃) ∈ T (X, x̃).
The following property will be useful in the convergence study of the
regularized solutions xn to the x0 -minimum-norm solution x̂:
Lemma 5.1.5 Hypothesis and notations (5.6) through (5.11).
1. If the x0 -minimum-norm x̂ solution of (5.1) is qualified, then
x0 − x̂ ∈ Rg A∗ = Ker A⊥ ,
Proof. The Euler condition applied to problem (5.11) shows that the x0 -
minimum-norm solution x̂ satisfies
The qualification hypothesis for x̂ implies that T (X, x̂) is given by (5.14),
and using a property of polar cones
x0 − x̂ ∈ Rg A∗ + T (C, x̂)− .
Let η > 0 be given. We can first find x1 ∈ T (C, x̂)− and x2 ∈ Rg A∗ such
that |x0 − x̂ − x1 − x2 | ≤ η/2. Then we can find x3 ∈ Rg A∗ such that
|x3 − x2 | ≤ η/2. Hence we obtain
|x0 − x̂ − x1 − x3 | ≤ η,
The name given to condition (5.17) will be explained after Definition 5.1.14.
The regularity condition is equivalent to the existence of a Lagrange mul-
tiplier for the optimization problem (5.11), which can be rewritten as a con-
strained optimization problem:
1
x̂ ∈ C minimizes x − x0 2E over C under the constraint Ax = ẑ.
2
The Lagrangian for this problem is
1
L(x, w) = x − x0 2E + w, A x − ẑF ,
2
and ŵ is a Lagrange multiplier at x̂ if
or equivalently, as L is convex in x
∂L
(x̂, ŵ)(x − x̂) ≥ 0 for all x ∈ C,
∂x
x̂ − x0 , x − x̂ + ŵ, A (x − x̂) ≥ 0 for all x ∈ C. (5.18)
If we define μ ∈ E by
x0 − x̂ = A∗ ŵ + μ,
we can rewrite (5.18) as
Hence μ ∈ T (C, x̂)− , and x̂ satisfies the regularity condition (5.17), and the
equivalence is proved.
n x̂n → 0. (5.19)
A x̂n → ẑ. (5.20)
2. Let the data z satisfy (5.8), which means that (5.1) has solution(s).
Then if the x0 -minimum-norm solution x̂ satisfies
xn − x̂ → 0, (5.22)
A xn − ẑF = O(n ). (5.23)
Proof.
Part 1 Let η > 0 be given. Using (5.7), we can find x̃ ∈ C such that
To estimate A x̂n − A x̃ and x̂n − x̃, we rewrite the above inequality as
A(x̂n − x̃)2F + 2n x̂n − x̃2E ≤ A(x̂n − x̃)2F + 2n x̂n − x̃2E (5.28)
+A x̃ − zn 2F + 2n x̃ − x0 2E
−A x̂n − zn 2F − 2n x̂n − x0 2E .
a2 + b2 ≤ 2aα + 2bβ.
Hence,
(a − α)2 + (b − β)2 ≤ α2 + β 2 ≤ (α + β)2
and finally
a ≤ 2α + β,
(5.32)
b ≤ α + 2β.
Hence we deduce from (5.31) that
η
A(x̂n − x̃)F ≤ 2 δn + + n x0 − x̃E , (5.33)
4
η
n x̂n − x̃E ≤ δn + + 2n x0 − x̃E . (5.34)
4
From the first inequality we obtain, using (5.27),
3η
Ax̂n − ẑF ≤ 2δn + + n x0 − x̃E ,
4
218 CHAPTER 5. REGULARIZATION
so that Ax̂n − ẑF ≤ η for n large enough, and (5.20) is proved. Then we
deduce from (5.34) that
η
n x̂n E ≤ n x̃E + δn + + 2n x0 − x̃E ,
4
so that n x̂n E ≤ η for n large enough, and (5.19) is proved.
Part 2 From now on we suppose that (5.1) has a solution, and hence an
x0 -minimum-norm solution x̂. Hence we can choose η = 0 and x̃ = x̂ in
(5.27), so that (5.30), (5.33), and (5.34) become
and
where (5.36) proves (5.23) as now δn /n → 0. But (5.37) gives no information
on the convergence of x̂n to x̂, it is necessary for that to use hypothesis (5.21)
on x̂: let again η > 0 be given, and w ∈ F, μ ∈ T (C, x̂)− be such that
and, transposing A∗ in the right-hand side and using the fact that x̂n −
x̂, μ ≤ 0
Part 3 Now x̂ satisfies the regularity condition (5.24), and so we can choose
η = 0 in Part 2, and estimations (5.38), and (5.39) simplify to
• The attainable set ϕ(C) is not necessarily convex nor closed, but Propo-
sition 4.2.7 shows that the projection of z onto ϕ(C) is unique and
stable (when it exists) as soon as z belongs to the neighborhood:
+ ,
ϑ = z ∈ F | d(z, ϕ(C)) < R . (5.44)
This property will allow to handle the case of nonattainable data, pro-
vided the error level is smaller than R.
Let
ẑ = projection of z on ϕ(C). (5.45)
The unregularized problem (5.1) admits solution(s) as soon as
X = ϕ(ẑ)−1 ∩ C,
and hence
0 = lim λn ϕ (x̂)(xn − x̂) = ϕ (x̂)y.
n→∞
Let now y ∈ Ker ϕ (x̂) be given. One can always suppose y is small
enough so that y = x − x̂ for some x ∈ Cη . Then the velocity along the curve
P : t (1 −t)x̂+ tx satisfies V (0) = P (0) = ϕ (0)y = 0, which again implies
x = x̂ using the linear identifiability property. Hence y = 0, which proves
that Ker ϕ (x̂) = {0}, and the lemma is proved.
which expresses the fact that x̂ is a local minimum of the distance to x0 over
X = ϕ−1 (ẑ).
The name given to condition (5.51) comes from the distributer parameter
case: when E is an infinite dimension function space, Rg ϕ (x̂)∗ can be a
dense subspace of E made of regular functions, in which case (5.51) can be
satisfied only if x0 − x̂, and hence necessarily the data z, are smooth. This is
illustrated by (5.129) in Sect. 5.2 on the LMT regularization of the nonlinear
2D source problem.
As in the linear case, we shall obtain convergence rates for the solutions
x̂n of regularized problems (5.5) when this condition is satisfied.
5.1. LMT REGULARIZATION 223
the unregularized problem (5.1) has generally no solution, but the regu-
larized problems (5.5) are Q-wellposed for n large enough when n → 0
and δn → 0, and
n x̂n → 0 (5.53)
ϕ(x̂n ) → ẑ. (5.54)
2. Let in addition ϕ satisfy condition (5.47), and suppose that the data z
satisfy (5.46), which means that (5.1) has a convex nonempty solution
set, and a unique x0 -minimum-norm solution x̂. If this latter satisfies
x̂n → x̂ , (5.56)
ϕ(x̂n ) − ẑF = O(n ). (5.57)
Proof.
Along the curves Pn : t ϕn ((1 − t)x0 + tx1 ), the velocity and acceleration
But
which is true for n large enough as n → 0, δn → 0, and d(z, ϕ(C) < R be-
cause of (5.52). So the Q-wellposedness of the regularized problems is proved
for n large enough.
We turn now to the proof of (5.53) and (5.54). Let , d, and η be chosen
such that
> 0, (5.64)
0 ≤ d(z, ϕ(C)) < d < R, (5.65)
0 < η ≤ d − d(z, ϕ(C)), (5.66)
1 1
η + 2η 2 d 2 ≤ , (5.67)
ϕ(x̂n ) − zn 2F + 2n x̂n − x0 2E ≤ ϕ(x̃) − zn 2F + 2n x̃ − x0 2E , (5.69)
z
z̃
zn
ϕ(x̂n )
ϕ(C) L̃n ϕ(x̃) ẑ ∈ ϕ(C)
and we proceed from here as in the linear case, replacing the estimation of
A(x̂n − x̃)F by that of the arc length distance L̃n of ϕ(x̃) to ϕ(x̂n ) in ϕ(C).
So we rewrite (5.69) as (compare to (5.28))
d 2 d 2
1− L̃ + n x̂n − x̃E ≤ 1 −
2 2
L̃ + 2n x̂n − x̃2E (5.70)
R n R n
+ ϕ(x̃) − zn 2F + 2n x̃ − x0 2E
−ϕ(x̂n ) − zn 2F − 2n x̂n − x0 2E .
But the distance of z̃ to P̃n has a local minimum at t = 0 by definition
of ϕ(x̃) = projection of z̃ on ϕ(C), and we can apply the obtuse angle
lemma 6.2.9
(1 − k(z̃, P̃n ))L̃2n ≤ z̃ − ϕ(x̂n )2 − z̃ − ϕ(x̃)2 . (5.71)
But (5.68) and the triangular inequality give
z̃ − ϕ(x̂n ) ≤ z̃ − z + z − zn + zn − ϕ(x̂n )
= z̃ − z + z − zn + zn − ϕ(x̃) + n x̃ − x0
≤ 2z̃ − z + 2z − zn + z̃ − ϕ(x̃) + n x̃ − x0
= 2z̃ − z + 2z − zn + d(z̃, ϕ(C)) + n x̃ − x0
≤ 3z̃ − z + 2z − zn + d(z, ϕ(C)) + n x̃ − x0
≤ d(z, ϕ(C)) + 3η/4 + 2δn + n x̃ − x0
≤ d for n large enough,
z̃ − ϕ(x̃) = d(z̃, ϕ(C))
≤ z̃ − z + d(z, ϕ(C))
≤ d(z, ϕ(C)) + η/4 ≤ d.
This implies, as in the proof of Theorem 7.2.10, that k(z̃, P̃n ) ≤ d/R, so that
(5.71) gives
d 2
1− L̃ ≤ z̃ − ϕ(x̂n )2 − z̃ − ϕ(x̃)2 for n large enough. (5.72)
R n
Substitution of (5.72) in the right-hand side of (5.70) and addition and sub-
straction of z̃ − zn 2F gives
d 2
1− L̃ + 2n x̂n − x̃2E ≤ ϕ(x̂n ) − z̃2F − ϕ(x̃) − z̃2F
R n
+z̃ − zn 2F − z̃ − zn 2F
5.1. LMT REGULARIZATION 227
1
ϕ(yp) − ẑF ≤ .
p
The function dp : t ∈ [0, 1] z̃ − ϕ((1 − t)x̃ + typ )F has a local minimum
at t = 0, as ϕ(x̃) is the projection of z̃ on ϕ(C), and satisfies moreover
η
dp (0) = d(z̃, ϕ(C)) ≤ d(z, ϕ(C)) + ≤ d,
4
1 η
dp (1) = z̃ − ϕ(yp )F ≤ + d(z, ϕ(C)) + ≤ d for n large enough.
p 4
So we can apply once again the obtuse angle lemma 6.2.9
d
1− ϕ(x̃) − ϕ(yp )2F + z̃ − ϕ(x̃)2F ≤ z̃ − ϕ(yp )2F . (5.76)
R
But
η
z̃ − ϕ(x̃)F = d(z̃, ϕ(C)) ≥ d(z, ϕ(C)) − ≥ 0,
4
1 η
z̃ − ϕ(yp )F ≤ + d(z, ϕ(C)) + ,
p 4
and
d 1/2 d −1/2
1− L̂n ≤ 2 1 − δn + n x0 − x̂E , (5.79)
R R
d −1/2
n x̂n − x̂E ≤ 1 − δn + 2n x0 − x̂E , (5.80)
R
where (5.79) proves (5.57) as now δn /n → 0. But (5.80) gives no information
on the convergence of x̂n to x̂, it is necessary for that to use hypothesis (5.50)
on x̂: let again η > 0 be given, and w ∈ F, μ ∈ T (C, x̂)− be such that
and, transposing ϕ (x̂)∗ in the right-hand side and using the fact that x̂n −
x̂, μ ≤ 0
d 2
1− L̂ + 2n x̂n − x̂2E ≤ 2ϕ(x̂n ) − ϕ(x̂), zn − z
R n
+ 22n x̂n − x̂, x0 − x̂ − (ϕ (x̂)∗ w + μ)
+ 22n ϕ (x̂)(x̂n − x̂), w.
so that
d 2
1− L̂ + 2n x̂n − x̂2E ≤ 2ϕ(x̂n ) − ϕ(x̂), zn − z + 2n w (5.81)
R n
+ 22n x̂n − x̂, x0 − x̂ − (ϕ (x̂)∗ w + μ)
1
− 2n
2
P̂n (t)(1 − t) dt, w.
0
Part 3 Now x̂ satisfies the regularity condition (5.58), and so we can choose
η = 0 in Part 2, and estimations (5.82) and (5.83) simplify to
d 1/2 d −1/2 π
1− L̂n ≤ 2 1 − δn + 2n 1 + wF ,
R R 2
d −1/2 π
n x̂n − x̂E ≤ 1 − δn + 2n 1 + wF ,
R 2
which prove (5.59) and (5.60).
The previous theorem reduces exactly to the Theorem 5.1.8 when the
forward map ϕ is linear.
Hypothesis (4.2) and (5.84) are verified as soon as the map ϕ to be inverted
is smooth, which is the case in most of the situations. Hence the results of
this section apply to all examples of Chap. 1 for which no stronger FC/LD
property can be proved:
232 CHAPTER 5. REGULARIZATION
and hence
β β
A (t)F ×E ≤ V (t) 2
F ×E , A (t) F ×E ≤ diam CV (t)F ×E .
2
This shows that the the regularized problem (5.2) is linearly stable (Definition
4.3.4), and has a curvature and a deflection bounded by (compare with (5.42)
and (5.43) for the case of FC/LD problems)
1 β β
= 2, Θ = diam C.
R
As expected, this upper bound to the curvature blows up to infinity when the
regularization parameter goes to zero. Application of Theorem 4.4.1 shows
that (5.2) is Q-wellposed in E × F on the neighborhood
+ 2 ,
(z, x0 ) ∈ F × E | dF ×E ((z, x0 ), ϕ (C)) < R =
β
Θ = (β/)diam C ≤ π/2
where d is defined by
d2 + 2 rad2x0 C = R2 . (5.86)
234 CHAPTER 5. REGULARIZATION
When the data z0 , z1 ∈ ϑ are close enough, one can choose d < d such that
12
z0 − z1 F + max d(zj , ϕ(C))2 + 2 rad2x0 C (5.87)
j=0,1
2 1/2
≤ d + 2 rad2x0 C < R ,
Remark 5.1.17 Formula (5.85) through (5.88) describe also the stability
property of the regularized problem (5.5) for a fixed n, both in the linear case
of Theorem 5.1.8 (take 1/R = Θ = 0, so that d = +∞) and in the case
of FC/LD problems of Theorem 5.1.15 (replace R , Θ by Rn , Θn defined in
(5.62)).
which eliminates the need of projecting z onto ϕ(C). The following additional
properties of ϕ will be needed:
Condition (5.90) ensures that the solution set X = ϕ(z)−1 of (5.1) is non-
void, and so we can define an x0 -minimum-norm solution x̂ of (5.1) as one
solution of the minimization problem (5.11) (there may be more than one
x0 -minimum-norm solution as X is not necessarily convex).
5.1. LMT REGULARIZATION 235
and a short calculation using hypothesis (5.91) shows that Lemma 5.1.10 still
holds
T (X, x̂) ⊂ Kerϕ (x̂) ∩ T (C, x̂) ,
and, when x̂ is qualified in the sense of definition 5.1.11, that lemma 5.1.13
also holds:
x0 − x̂ ∈ Rg ϕ (x̂)∗ + T (C, x̂)− .
The LMT-regularized problems (5.5) are not in general Q-wellposed, but
they all have at least one solution because of hypothesis (5.90).
Theorem 5.1.18 Let hypothesis (4.2), (5.84), and (5.89) through (5.91)
hold, and let x̂ be an x0 -minimum-norm solution to (5.1) that satisfies
ϕ(x̂n ) − zn 2F + 2n x̂n − x0 2E ≤ ϕ(x̂) − zn 2F + 2n x̂ − x0 2E .
Adding and subtracting ϕ(x̂n ) − z2F + 2n x̂n − x̂2E gives, as in the proof of
Theorems 5.1.8 or 5.1.15,
ϕ(x̂n ) − z2F + 2n (1 − βwF )x̂n − x̂2E ≤ 2ϕ(x̂n ) − zF (δn + 2n w), (5.97)
Remark 5.1.19 Using weak subsequential arguments, one can show that any
sequence {x̂n } of solutions to (5.5) contains a subsequence, which converges
strongly to an X0 -minimum-norm solution of (5.1), provided that n → 0 and
δn /n → 0.
5.2. APPLICATION: NONLINEAR 2-D SOURCE PROBLEM 237
the NLS problem (1.61) coincides with (5.1) and its regularized versions
(1.63) coincides with (5.5).
We show in this section that (1.61) is a finite curvature problem, and
estimate the size of C, which ensures a deflection smaller than π/2, making
thus (1.61) a FC/LD problem to which Theorem 5.1.15 can be applied. We
follow for the proof reference [28], where the result was proved for a slightly
more complex model involving a convection term.
Remark 5.2.1 When the nonlinearity is in the higher order term, as, for
example, in
the finite curvature property is lost. However, the least square objective func-
tion still possesses minimizer(s) over a suitable admissible set, is deriv-
able, and the parameter a can be retrieved numerically over the range of u
(see [31]).
We begin with the FC/LD properties of (1.61). The variational formulation
of (5.98) is, with the definition (1.58) for Y ,
⎧
⎨ find u ∈ Y such that
∇u, ∇wL2(Ω) + k(u), wL2(Ω) = f, wL2 (Ω) + g, wL2(∂ΩN ) (5.100)
⎩
for all w ∈ Y.
238 CHAPTER 5. REGULARIZATION
Lemma 5.2.2 (Hypothesis and notations of Sect. 1.5). Then (5.100) has a
unique solution u ∈ Y , which satisfies the a-priori estimate
u1 − u0 Y ≤ M(f1 , g1 ) − (f0 , g0 )E , M defined in (5.102). (5.103)
Proof. The left-hand side of (5.100) defines an operator A : Y Y :
∀v ∈ Y , A(v) ∈ Y is defined by
A(v), wY Y = ∇v, ∇wL2(Ω) + k(v), wL2(Ω) ∀w ∈ Y.
Using the properties imposed on k, one can check that A maps bounded
sets to bounded sets, and that A is hemicontinuous (i.e., for all the function
λ A(u + λv), wY Y is continuous from IR to IR). For every v, w ∈ Y ,
moreover,
⎧
⎨ A(v) − A(w), v − wY Y = |∇(v − w)|2 2
1 L (Ω)
+ Ω 0 k ((1 − t)w + tv)(v − w)2
⎩
≥ v − w2Y ,
hence A is strictly monotone, and the lemma is proved [56, page 171].
5.2. APPLICATION: NONLINEAR 2-D SOURCE PROBLEM 239
and hence
V (t) − V (τ )2Y ≤ k (P (t) − k (P (τ )L3 (Ω) V (τ )L3 (Ω) V (t) − V (τ )L3 (Ω) .
240 CHAPTER 5. REGULARIZATION
where we have used (5.106) to obtain the last inequality. Since t P (t) is
Lipschitz continuous, it follows that t A(t) is Lipschitz continuous as well.
Hence V is a.e. differentiable, and using (5.103),
which proves that P ∈ W 2,∞ ([0, 1], Y ). Derivation of (5.108) with respect to
t gives then
⎧
⎨ ∇A(t), ∇wL2(Ω) + k (P (t))A(t), wL2(Ω) =
−k (P (t))V (t)2 , wL2(Ω) (5.109)
⎩
for all w ∈ Y,
Proof. From (5.106) and the Poincaré inequality (5.101), (5.110) follows with
αM = CP M = CP (CP2 + CN2 )1/2 .
To prove (5.111), we introduce, for fixed but arbitrary t ∈ [0, 1] and
(f0 , g0 ), (f1 , g1 ) ∈ C, an operator B : D(B) L2 (Ω) defined by
& ∂ψ '
D(B) = ψ ∈ H 2 (Ω) : ψ = 0 on ∂ΩD and = 0 on ∂ΩN , (5.115)
∂ν
∀ψ ∈ D(B), Bψ = −Δψ + k (P (t))ψ ∈ L2 (Ω), (5.116)
where D(B) is endowed with the norm of H 2 (Ω). Because of the assumption
made in (1.57) that ∂ΩD is both open and closed with respect to ∂Ω and the
C 1,1 -regularity of ∂Ω, the regularity results on elliptic equations imply that
B is an isomorphism from D(B) onto L2 (Ω):
1
∃CB > 0 such that ψH 2 ≤ BψL2 ≤ CB ψH 2 ∀ψ ∈ D(B).
CB
The same regularity results imply that the solution A(t) of the variational
problem (5.109) satisfies A(t) ∈ H 2 (Ω), and hence is a strong solution of
(5.105), so that A(t) ∈ D(B), and
B A(t), ψL2 (Ω) = −k (P (t)) V (t)2 , ψL2 (Ω) ∀ψ ∈ D(B). (5.117)
A(t) and ψ belong to D(B) ⊂ H 2 (Ω), so one can integrate by part twice in
the left-hand side of (5.117) using the Green formula, which gives
the injectivity of ϕ (x) ensure that Ker ϕ (x) = {0}, so that (5.49) is trivially
satisfied, and x is qualified.
We determine now ϕ (x)∗ to find its range. Let δv ∈ F = L2 (Ω) and
δx = (δf, δg) ∈ E = L2 (Ω) × L2 (∂ΩN ) be given, and define δu and δh by
Because of the boundary conditions included in the spaces Y and D(B), the
boundary terms vanish in (5.122), which becomes, using (5.119),
ϕ (x)∗ δv, δxE = δf, δhL2 (Ω) + δg, δhL2(∂ΩN ) (5.123)
= (δh, δk), (δf, δg)E ,
! "# $
δx
where we have set
δk = τN δh (5.124)
The last equation in (5.123) shows that
The nonlinear source estimation problem (1.61) has hence the following
properties before any regularization is applied:
Theorem 5.2.6 Let hypothesis and notations of Sect. 1.5 hold. Then
1. The parameter x is identifiable on C, and (1.61) is a FC problem, with
curvature and deflection bounded by 1/R and Θ given in (5.113).
2. If moreover the deflection Θ satisfies (5.114), then (1.61) is a FC/LD
problem, and Proposition 4.2.7 applies: when the data z is in the neigh-
borhood ϑ of the attainable set defined in (5.44), there exists at most one
solution x̂, when a solution exists the objective function J is unimodal
over C, and the projection in the data space onto the attainable set is
stable – but there is no stability estimate for x in the L2 (Ω) × L2 (∂ΩN )
parameter norm.
3. If moreover C is bounded, the FC/LD problem (1.61) has a (unique)
solution for all z ∈ ϑ.
Proof. Points 1 follows from Lemma 5.2.5 and Proposition 5.2.4, point 2 from
Proposition 4.2.7. The proof of point 3 goes as follows: let xk = (fk , gk ) be a
minimizing sequence of J, and uk = ϕ(xk ) the associated solutions of (5.100).
The sequence xk is bounded in E and hence uk is bounded in Y ⊂ H 1 (Ω),
which embeds compactly in L2 (Ω). Hence there exists x̂ ∈ C, û ∈ Y ⊂ H 1 (Ω)
and subsequences, still noted xk , uk such that
xk x̂ in E, uk x̂ in Y and almost everywhere on Ω,
where denotes weak convergence. It is then possible to pass to the limit
in the variational formulation (5.100) giving uk , which shows that û = ϕ(x̂).
Hence x̂ is a minimizer of J over C, which ends the proof.
We can now apply the LMT-regularization to the FC/LD nonlinear source
problem:
Theorem 5.2.7 Let hypothesis and notations of Sect. 1.5 as well as the de-
flection condition (5.114) hold, and suppose the data z belong to the neigh-
borhood ϑ defined in (5.44). Then
1. The regularized problems (1.63) are all Q-wellposed for n large enough,
and
n fˆn → 0, n ĝn → 0,
un → ẑ.
when n → 0 and δn → 0, where ẑ =projection of z onto ϕ(C).
5.2. APPLICATION: NONLINEAR 2-D SOURCE PROBLEM 245
Proof. Theorem 5.2.6 shows that the unregularized problem (1.61) is a FC/LD
problem, so we can apply Theorem 5.1.15, which together with Lemma 5.2.5
gives the desired results.
We finally interpret the regularity condition (5.126) in two specific cases
depending on the location of (fˆ, ĝ) within C:
ˆ ĝ) is in the interior of C, then T (C, x̂)− = {0}, and (5.126) reduces
• If (f,
to
(f0 , g0 ) − (fˆ, ĝ) ∈ {(f ∈ D(B), g = τn f )}. (5.127)
• If C is a closed ball centered at the origin, and (fˆ, ĝ) lies on the bound-
ary of C, then T (C, x̂)− = {(f, g) : (f, g, ) = λ(fˆ, ĝ), λ > 0}, and
(5.126) becomes
∃λ > 0 : (f0 , g0 ) − (1 + λ)(fˆ, ĝ) ∈ {(f ∈ D(B), g = τn f )}. (5.128)
Remark 5.2.8 Problem (1.61) is also a FC/LD problem for the stronger ob-
servation space F = H 1 (Ω)). In this case, it suffices to require in (1.57) that
m ≤ 4, and the condition that ∂ΩD is both open and closed is not needed. This
follows from Lemma 5.2.3 and its proof, (5.106), and the following estimate:
A(t)Y ≤ k L∞ (IR) |V (t)|2L2 (Ω)
≤ k L∞ (IR) |V (t)|2L4 (Ω)
≤ constk L∞ (IR) |V (t)|2Y ,
where const is the embedding constant of Y into L4 (Ω) for m ≤ 4. The
proof of the existence of the Gâteaux derivative ϕ (x) is the same, and the
characterization of its adjoint follows the same arguments, with now B ∈
L(Y, Y ) and D(B) = Y . Then
& '
Rg ϕ (x)∗ = (f, g) ∈ E : f ∈ Y and g = τN f ,
and (5.129) is replaced by
fˆ ∈ H 1 (Ω), fˆ = 0 on ∂ΩD ,
ĝ = τN fˆ ∈ H 1/2 (∂ΩN ).
Properties of Data:
We suppose that a sequence zn ∈ F, n = 1, 2 . . . of noise corrupted measure-
ments is available, which converges to the attainable noise free data z = ẑ:
zn ∈ F, zn − ẑF ≤ δn , δn → 0 when n → ∞, (5.138)
and that an a-priori guess
y0 ∈ Y (5.139)
of the true state ŷ has been chosen. This a-priori guess can be enhanced by
application of the linear LMT-regularization theory of Sect. 5.1.1 to the esti-
mation of ŷ from zn , n = 1, 2 . . . : let n → 0 be a sequence of regularization
parameters, and define yn by the auxiliary problem
2n 1
yn ∈ Y minimizes y − y0 2Y + M(y) − zn 2F over Y. (5.140)
2 2
Suppose that ŷ satisfies the regularity condition (5.24) of Theorem 5.1.8 (iii):
ŷ − y0 = M ∗ w for some w ∈ F, (5.141)
and that the regularization parameters 2n goes to zero more slowly than the
error δn on the data:
∃λ > 0 such that ∀n ∈ IN : λ wF 2n ≥ δn . (5.142)
The error estimate (5.41) gives then
yn − ŷY ≤ n (λ + 1)wF , (5.143)
which leads us to use yn as a priori guess for the state-space in lieu of y0 in
(5.134). So we replace in a first step (5.134) by
2n 1
x̂n minimizes Jn (x) = φ(x) − yn 2Y + M φ(x) − zn 2F over C. (5.144)
2 2
and rewrite it as
1
x̂n minimizes Jn (x) = ϕn (x) − Zn 2n over C. (5.146)
2
A simple calculation shows that the quantities αm,n , αM,n , Rn , Θn associated
to C and ϕn by (5.135) are
αm,n = αm n > 0, (5.147)
1
αM,n = αM (2n + M ) ,
2 2 (5.148)
2n
Rn = R > 0, Rn → 0 when n → ∞, (5.149)
(2n + M2 )1/2
(2n + M2 )1/2
Θn = Θ → ∞ when n → ∞, (5.150)
n
which satisfy, for all n,
2n+1 Rn+1 n+1
2
≤ ≤ . (5.151)
n Rn n
We see from (5.147)–(5.149) that (5.146) is a linearly stable FC-problem,
which is good, but (5.150) shows also that its deflection Θn will become
larger than π/2 for n large enough, which is bad! However, Θn decreases to
Θ < π/2 when n → +∞, so one chooses 0 such that
1 − 1
(20 + M2 ) 2 ≥ (1 − Θ2 /(π/2)2 2 M ⇔ Θ0 ≤ π/2, (5.152)
which ensures that ϕ0 (C) is s.q.c. in F0 , and guarantees at least that (5.146)
is Q-wellposed for n = 0!
Cn ⊃ Cn+1 , (5.158)
Cn is closed and convex in E, (5.159)
ϕn (Cn ) is closed and s.q.c. in Fn , (5.160)
where ϕn (Cn ) is equipped with the family of paths image of the segments of
Cn by ϕn .
( n ≤ n ϕn+1 (x) − Z
ϕn+1 (x) − Z ( n+1 ≤ χC n Rn+1 ≤ χC Rn ,
n+1 n+1
We have used in the fourth line the fact that Cn+1 is included in the
convex Cn , and inequalities (5.151) and (5.156) in the last line. The
convexity property gives then
χ Ln+1 − ν ν
c
f (ν)+ν(Ln+1 −ν) 1− ≤ f (0)+ f (Ln+1 ) (5.168)
μ Ln+1 Ln+1
≤ (χC Rn+1 )2
L2n+1 χc
(1− ) ≤ (χC Rn+1 )2 . (5.169)
4 μ
Ln+1 2χC π
Θn+1 ≤ ≤ 1 ≤ , (5.170)
Rn+1 (1 − χC /μ) 2 2
Main Result
We can now summarize the hypothesis we have made so far and state the
main result of this section:
where χZ is defined by
Proof. The two first points follow from Lemma 5.3.1, which shows that Cn is
convex, and that (5.173) is a FC/LD problem with an enlargement neighbor-
hood of size Rn given by (5.149) and a linearly stable problem as one can see
in (5.147) – and hence a Q-wellposed problem according to Theorem 4.4.1.
It remains to check that the distance of the data Zn = (yn , zn ) to ϕn (Cn ) is
strictly smaller than Rn . Inequality (5.143) and (5.142) gives
( 2n = 2n yn − ŷ2Y + zn − ẑ2F
Zn − Z (5.175)
& '
≤ 4n (λ + 1)2 + λ2 w2F
R 2 & '
n
≤ (2n + M2 ) (λ + 1)2 + λ2 w2F
R
R 2 & '
n
≤ (20 + M2 ) (λ + 1)2 + λ2 w2F
R
R 2 2 2
n 2 (λ + 1) + λ
≤ M w2F
R 1 − Θ/(π/2) 2
2
= (χZ Rn ) ,
where we have used (5.149) and (5.152) in the third and fifth lines. Hence
dn (Zn , ϕn (Cn )) ≤ χZ Rn < Rn , (5.176)
which ends the proof of the two first points.
5.3. STATE-SPACE REGULARIZATION 255
Z(n = ϕn (x̂n )
Pn : t ϕn ((1 − t)x̂n + tx̂)
Ln = arc length of Pn in Fn
dn (t) = Zn − Pn (t)n
dn = sup0≤t≤1 dn (t).
But
Hence Jn is positive definite over Cn , which proves the last point.
“true” state y ∗ ! There are, however, often natural choices for y0 : for example,
if ẑ represent pointwise data in a finite dimensional space F and Y is a
function space, then y0 can be an interpolation in Y of the pointwise data. If
both Y and F are function spaces with Y strictly embedded in F and ẑ ∈ F
but ẑ ∈/ Y , then y0 would arise from ẑ by a smoothing process, for example,
by solving one auxiliary problem (5.140).
Hence the state-space regularized problems are here (compare with
(5.173))
2n 1
x̂n minimizes Jn (x) = φ(x) − y0 2Y + M φ(x)−zn 2F over C, (5.181)
2 2
where the regularization parameters n are chosen such that
2. When k → ∞,
φ(xnk ) φ(x̂) weakly in Y,
3. If δn /n → 0,
φ(xnk ) → φ(x̂) strongly in Y,
and x̂ is a state-space y0 -minimum-norm solution:
+ ,
φ(x̂) − y0 Y ≤ min φ(x) − y0 Y : x ∈ C and M φ(x) = ẑ .
x̂nk → x̂ strongly in E.
258 CHAPTER 5. REGULARIZATION
2n 1 2 1
φ(x̂n ) − y0 2Y + M φ(x̂n )−zn 2F ≤ n φ(x∗ ) − y0 2Y + ẑ−zn 2F (5.183)
2 2 2 2
2n δ 2
≤ φ(x∗ ) − y0 2Y + n .
2 2
Boundedness of δn /n implies that φ(xn ) is bounded in Y . Since C is bounded
as well, there exists a weakly convergent subsequence of xn , again denoted
by xn , and (x̂, φ̂) ∈ E × Y such that (xn , φ(xn )) (x̂, φ(x̂)) weakly in E × Y .
Then (5.177) implies that x̂ ∈ C and φ̂ = φ(x̂). Since M φ(xn ) → ẑ in F , it
follows also that M φ(x̂) = ẑ. This proves point 1 and the first assertion of
point 2.
Since zn − ẑF ≤ δn and δn/ n is bounded, it follows that
which ends the proof of Part 2. Next we assume that δn /n → 0. Choosing
x∗ = x̂ in (5.183) gives
δ2
φ(x̂n ) − y0 2Y ≤ φ(x̂) − y0 2Y + 2n ,
n
and consequently
This implies that φ(x̂n ) → ϕ(x̂) strongly in Y . Then (5.183) gives, for an
arbitrary element x∗ satisfying M φ(x∗ ) = ẑ,
and Part 3 is proved. Finally Part 4 follows immediately from the hypothesis
(5.178).
Because of the much weaker hypothesis involved on C, φ, and M,
Theorem 5.3.3 is more widely applicable than Theorem 5.3.2: it applies of
5.4. ADAPTED REGULARIZATION FOR EXAMPLE 4 259
This quantity satisfies the two first axioms of a distance, but we do not know
whether it satisfies the third one (triangular inequality). This is why we have
written the word distance between quotes.
where (·, ·) denotes the scalar product in IL2 (Ω), and to decompose accord-
ingly IL2 (Ω) into the sum of two orthogonal subspaces
IL2 (Ω) = G ⊕ G⊥ ,
where
G = IL2 (Ω)/∼ the quotient space
(5.190)
G⊥ the orthogonal complement,
or equivalently
P (a1 − a0 )∇ua(t) = P − a(t)∇η(t) ,
5.4. ADAPTED REGULARIZATION FOR EXAMPLE 4 261
and hence
|P (a1 − a0 )∇ua(t) |IL2 (Ω) ≤ aM |∇η(t)|IL2 (Ω) . (5.191)
Hence we see that |∇η(t)|IL2 (Ω) constrains only the norm of the component
in G of (a1 − a0 )∇ua(t) . There is no direct information on the norm of its
component in G⊥ , it is exactly this information that has to be supplied by
the regularization term.
A partial counter example to this property has been given in [30] for a simple
diffusion problem on the unit square with a diffusion coefficient a = 1, no
source or sink boundaries inside the domain, and flow lines parallel to the
x1 -axis. It was shown there that for any perturbation h orthogonal to the flow
line – and hence function of x2 only – and satisfying
1
h ∈ C ([0, 1]), h(0) = h(1) = 0,
0,1
h = 0, (5.193)
0
∈ H 1 (Ω) × H 1 (Ω),
∀ψ =
rot ψ ∂ψ2
− ∂ψ1
.
∂x1 ∂x2
Lemma 5.4.2 Let (1.65) and (4.105) through (4.107) and (5.189) and
(5.190) hold. Then
G = {∇ϕ : ϕ ∈ Y }, G⊥ = {rot
ψ : ψ ∈ W }, (5.199)
P q = ∇ϕ, P ⊥ q = rot
ψ,
|rotψ|
IL2 (Ω) = sup
(rotψ, s)
s∈IL2 (Ω),|s|IL2 (Ω) =1
= sup
(rotψ, ∇v + rotw)
v∈Y,w∈W,|∇v|2 2
+|rotw| 2 =1
IL (Ω) IL2 (Ω)
= sup
(rotψ,
rotw)
w∈W,|∇w|IL2 (Ω) =1
|rotψ|
IL2 (Ω) = sup (a1 − a0 )∇ua(t) · rotw,
(5.201)
w∈W,|∇w|IL2 (Ω) =1 Ω
where we have used the grad-rot decomposition ∇v + rot w of s, the orthog-
⊥
onality of G and G , the fact that |rot w| = |∇v|, and the Definition (5.200)
264 CHAPTER 5. REGULARIZATION
shows that
|P ⊥ (a1 − a0 )∇ua(t) |IL2 (Ω) ≤ CW |rot(a
1 − a0 ) · ∇ua(t) |L2 (Ω) . (5.204)
ϕ : a (∇ua(t) , rot
a · ∇ua ) ∈ IL2 (Ω) × L2 (Ω), (5.205)
5.4. ADAPTED REGULARIZATION FOR EXAMPLE 4 265
1 2
â minimizes |∇ua − z|2IL2 + |rota· ∇ua |2L2 over C. (5.206)
2 2
Proposition 5.4.3 Let (1.65) and (4.105) through (4.107) and (5.188) hold.
Then the adapted-regularized forward map (5.205) satisfies the linear stability
estimate (compare with (4.39)
∀a0 , a1 ∈ C one has
1 (5.207)
αm () dgrad (a0 , a1 ) ≤ 0 V (t)IL2 ×L2 dt,
with CW equal to the Poincaré constant for the space W defined in (5.203).
We define now, for each t ∈ [0, 1], a linear mapping Gt from IL2 × L2 into
itself by ⎧
⎨ Gt (q, v) = (aM q , CW (v/ − rot a(t) · q)),
which satisfies (5.211)
⎩
αm ()Gt (q, v)IL2 ×L2 ≤ (q, v)IL2 ×L2 ,
so that
G(V (t)) = (aM ∇η(t), CW rot(a
1 − a0 ) · ∇ua(t) ). (5.212)
Combining then (5.210) with (5.212) and (5.211) gives
Proposition 5.4.5 Let (1.65) and (4.105) through (4.107) hold. The de-
flection condition Θ ≤ π/2 is satisfied for the adapted-regularized problem
(5.206) as soon as
1/2 π
(aM − am )2 + 2 b2M (aM + am )2 ≤ am , (5.215)
4
where am , aM , and bM are the constants that define the admissible parameter
set C in (4.106).
Proof. We use (5.214) to evaluate the norm of A (t) to see whether it can
satisfy the deflection sufficient condition (4.25)
a2m A (t)2F ≤ a2m |∇ζ(t)|2IL2 + a2m 2 ∇a(t)IL∞ |∇ζ(t)|IL2 (5.216)
2
+2∇(a1 − a0 )IL∞ |∇η(t)|IL2
≤ 4a1 − a0 2C 0 + 2 2bM a1 − a0 C 0
2
+2am ∇(a1 − a0 )IL ∞ |∇η(t)|2IL2 ,
But (5.209) shows that |∇η(t)|IL2 ≤ V F , so that (4.25) follows from (5.215)
and (5.217), which ends the proof.
This is as far as one can currently go for the problem (5.206) on the infinite
dimensional set C.
where
a1 − a0 L∞ ∇(a1 − a0 )L∞ 2 1/2
2
Ma0 ,a1 , t = 1 + (bM + am ) .
|(a1 − a0 )∇ua(t) |IL2 a1 − a0 L∞
When C is defined through a finite element approximation with mesh size
h > 0, the curvature 1/R blows up to infinity like 1/h2 .
Proof. From (5.216) and |∇η(t)|IL2 ≤ V F one obtains, with the notation
· L∞ instead of · C 0 ,
am A (t)F ≤ 2 a1 − a0 2L∞
2 1/2
+2 bM a1 − a0 L∞ + am ∇(a1 − a0 )IL∞ V (t)F .
Theorem 5.4.7 Let (1.65) and (4.105) through (4.107) hold, and suppose
that
d −1
αm () dgrad (â1 − â0 ) ≤ (1 − ) z1 − z0 IL2 , (5.224)
R
where αm () is given by (5.208) and dgrad by (5.188).
2. Optimizability:
∀z ∈ ϑ, the least squares problem (5.206) has no parasitic local mini-
mum over C .
Proof. Propositions 5.4.3, 5.4.5, and 5.4.6 ensure that the hypotheses of The-
orem 4.4.1 are satisfied for the adapted regularized problem (5.206). Hence
existence, uniqueness, stability, and optimizability will hold for this problem
as soon as the distance of the regularized data z = (z, 0) to the regularized
attainable set is strictly less than R:
dF (z, 0), ϕ (C ) < R. (5.225)
270 CHAPTER 5. REGULARIZATION
A Generalization of Convex
Sets
271
PART II: A GENERALIZATION OF CONVEX SETS 273
Quasi-Convex Sets
X ∈ D, X − z ≤ d(z, D) + η. (6.1)
then clearly any z ∈ E(D) has the property that any sequence that minimizes
the distance to z over D is a Cauchy sequence. This implies that, when D is
closed, all points of the Edelstein set E(D) have a unique projection on D.
The interesting result is (cf, e.g., Aubin 1979) that this set fills out almost
G. Chavent, Nonlinear Least Squares for Inverse Problems: Theoretical Foundations 275
and Step-by-Step Guide for Applications, Scientific Computation,
DOI 10.1007/978-90-481-2785-6 6, c Springer Science+Business Media B.V. 2009
276 CHAPTER 6. QUASI-CONVEX SETS
• In Sect. 6.2, we define the property, for a set D equipped with a family of
paths P – in short a set (D, P) – to be quasi-convex. Then we re-do, for
such a set, all classical proofs for the projection of a point on a convex
set, with the necessary adaptations. This leads to the conclusion that
the properties (i), (iii) and (iv) of Proposition 4.1.1 can be generalized
to neighborhoods of quasi-convex sets. However, parasitic stationary
points may still exist, and (ii) does not generalize to quasi-convex sets.
In (6.4), W 2,∞ ([0, ]; F ) is the space of function from [0, ] into F , whose two
first distributional derivatives happen to be L∞ ([0, ]; F ) functions.
Condition (6.5) ensures that the path p stays in the set D. Then by definition
of the arc length along p, one has
ν
arc length from p(0) to p(ν) = p (ν)F dν, (6.7)
0
Conversely, if ν is defined as the arc length along p, then necessarily (6.6) will
hold (the proof will be given in Proposition 8.2.1 below). So the hypothesis
6.6 simply means that we have chosen to parameterize our paths p by their
278 CHAPTER 6. QUASI-CONVEX SETS
From now on, we shall always consider that the set D ⊂ F is equipped with a
family of paths P, which we shall write as (D, P). The notions of quasi-convex
and strictly quasi-convex sets will be developed for such couples (D, P), and
hence depend on the choice made for the family of paths P, which equips D.
We discuss now possible choices for the family of paths P:
z dz,p(ν)
1111
0000
0000
1111
0000
1111
0000
1111 path p
0000
1111
0000
1111
C
H γ
ρ(ν)
M = P (ν)
main normal to p
Finally, given z and η > 0, we will consider the worst case for all paths p ∈ P,
which connect two η-projections of z on D, that is, which are in the subset
P(z, η) = {p ∈ P| p(j) − zF ≤ d(z, D) + η, j = 0, L(p)} (6.29)
6.2. DEFINITION AND MAIN PROPERTIES 283
or equivalently
∀z ∈ ϑ, ∀η, such that 0 < η < ηmax (z), one has
(6.34)
d2z,p is uniformly α − convex over P(z, η).
The equivalence between the conditions (6.33) and (6.34) results from
d2
2
(d2z,p)(ν) = 2 1 − k(z, p; ν) , (6.35)
dν
≥ 2(1 − k(z, p)), (6.36)
≥ 2(1 − k(z, η)), ∀p ∈ P(z, η). (6.37)
We shall see soon that, as its name suggests, the neighborhood ϑ is the
one on which the “projection on D” is well-behaved. But before getting into
284 CHAPTER 6. QUASI-CONVEX SETS
this, we check first that Definition 6.2.1 ensures the existence of a largest
open regular neighborhood ϑ:
Proposition 6.2.2 Let (D, P) be quasi-convex. Then there exists a largest
open regular neighborhood ϑ of D, and a largest l.s.c. function ηmax : ϑ →
]0, +∞] satisfying the definition 6.2.1 of quasi-convex sets.
ϑ = ∪i∈I ϑi (6.38)
From the Definition (6.39) of ηmax (z), there exists i0 ∈ I such that
This proves, as η̃max,i0 (z) > 0, that z ∈ ϑi0 . But ϑi0 and ηmax,i0 satisfy (6.33)
by hypothesis, so that
k(z, η) < 1,
which proves that ϑ and ηmax satisfy also (6.33).
We give now two very simple examples:
Example 6.2.3 If D is convex and P made of the family of all segments of
D, then (D, P) is quasi-convex, with
ϑ=F
ηmax (z) = +∞ ∀z ∈ F.
6.2. DEFINITION AND MAIN PROPERTIES 285
L < 2πR,
and the largest associated q.c. regular neighborhood ϑ is shown in Fig. 6.2,
together with a graphical illustration of the way ηmax (z) is determined.
• L ≤ πR D
11111
00000
ηmax
00000
11111
00000
11111
ϑ = complementary
of gray area
ηmax(z) as shown
z
• πR ≤ L < 2πR D
ϑ = complementary
of thick half-line
1111111111
0000000000
ηmax (z) = M in{η1, η2} 0000000000
1111111111
η2
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
z
0000000000
1111111111
η1 0000000000
1111111111
0000000000
1111111111
ν → p(ν) is injective ∀p ∈ P,
or equivalently, in term of arc length distance
δ(X, X) = 0 ∀X ∈ D.
Proof. If ν → p(ν) were not injective, there would exist ν , ν ∈ [0, L(p)], ν <
ν , such that p(ν ) = p(ν ). Let us call this point of D as X
L(p)/2
p(L(p))
L(p)/2
p(L(p)/2)
p(0)
d1/2 d1
d0
Proof. Let z ∈ ϑ and ε > 0 be given. For η ∈]0, ηmax (z)], let X0 and X1 be
two η-projections of z on D. Then
either X0 = X1 , in which case one has using Proposition 6.2.5
which gives
L(p)2
d(z, D)2 + (1 − k(z, η)) ≤ (d(z, D) + η)2 = d(z, D)2 + 2ηd(z, D) + η 2 ,
4
that is,
2d(z, D) + η
L(p)2 ≤ 4η . (6.45)
1 − k(z, η)
When η → 0+, k(z, η) → k(z, 0) < 1 and the right-hand side of (6.45)
goes to zero.
Hence there exists η(z, ε) ∈]0, ηmax (z)[ such that
which implies
Comparing (6.44) and (6.46) shows that (6.43) holds in all cases as soon as
η is taken equal to the η(z, ε) determined in the X0 = X1 case, which ends
the proof of the Corollary 6.2.7.
Notice that in Corollary 6.2.7, the estimation (6.43) on the proximity
of two η-projections X0 and X1 of z is obtained not only for X0 − X1 F
(which corresponds exactly to saying that ϑ ⊂ E(D)), but also for the arc
length distance δ(X0 , X1 ): this stronger result will be very important in the
applications we have in mind to nonlinear least-squares inversion where D =
ϕ(C), as in this case δ(X, Y ) can be made equivalent in a natural way to the
distance in the parameter set C, whereas X − Y cannot.
We prove now that some of the properties of the projection on convex
sets recalled in Proposition 4.1.1 generalize to quasi-convex sets. We begin
with the
Proposition 6.2.8 Let (D, P) be quasi-convex, and ϑ, ηmax be a pair of as-
sociated regular neighborhood and function. Then properties (i) and (iv) of
Proposition 4.1.1 generalize as follows:
(i) Uniqueness: for any z ∈ ϑ, there exists at most one projection X( of z
on D
(ii) Existence: if z ∈ ϑ, any minimizing sequence Xn ∈ D of the “distance
to z” function over D is a Cauchy sequence in for both X − Y F and
δ(X, Y ). Hence Xn converges in F to the (unique) projection X( of z
on the closure D of D.
( ∈ D, and δ(Xn , X)
If D is closed, then X ( → 0 when n → 0.
Proof. We prove first (i). Let z ∈ ϑ be such that it admits two projections X(0
and X(1 on D. As X(0 and X (1 are η-projections of z for any η ∈]0, ηmax (z)[,
we see from Corollary 6.2.7 that
X (1 F ≤ δ(X
(0 − X (0 , X
(1 ) ≤ ε for any ε > 0,
which shows that X ( is the (necessarily unique as we have seen earlier) pro-
jection of z on D.
When ϕ(C) is closed, it remains to prove that Xn converges to X ( also in the
stronger arc length distance δ(X, Y ).
This results once again from the Corollary 6.2.7:
X( is an η-projection for any η ∈]0, ηmax (z)[, and Xn is an η(z, ε)-projection
for all n ≥ N(z, ε), as we have seen in (6.48), so one has
∀n ≥ N(z, ε), ( F ≤ δ(Xn , X)
Xn − X ( ≤ ε,
( → 0.
which proves that δ(Xn , X)
Notice that, if we choose for (D, P) a convex set D equipped with the
family P of its segments, then ϑ = F and X − Y F = δ(X, Y ), and the
Proposition 6.2.8 reduces exactly to the corresponding results of Proposi-
tion 4.1.1!
We turn now to the generalization of the stability property (iii) of Propo-
sition 4.1.1 to quasi-convex sets. We begin with two lemma:
Lemma 6.2.9 (obtuse angle lemma). Let D be equipped with a collection of
paths P, and let z ∈ F and p ∈ P be given.
If
def
t → dt = z − p(tL(p))F has a local minimum at t = 0,
then
d20 + (1 − k(z, p))L(p)2 ≤ d21 , (6.49)
(where k(z, p) is not necessarily smaller than one).
6.2. DEFINITION AND MAIN PROPERTIES 291
Proof. Define, as in the proof of Lemma 6.2.6, for any ν ∈ [0, L(p)]:
f (0) ≥ 0.
becomes
f (L(p)) ≥ f (0) + (1 − k(z, p))L(p)2 ,
which is (6.49).
We have illustrated in Fig. 6.4 the geometric interpretation of this lemma:
formula (6.49) is the analogous for the curvilinear triangle (z, p(0), p(L(p))) of
the property that, in a triangle, the sum of squared length of edges adjacent
to an obtuse angle is smaller than the squared length of the opposite edge.
Of course, this analogy holds only in situations where z and p are such that
k(z, p) < 1!
where
k(z0 , p) + k(z1 , p)
k= . (6.51)
2
L(p) p(L(p))
d1
d0
z1
z0
d0 d1
d0 d1/2 d1
p(0) = X0 X1 = p(L)
L/2 p(L/2) 2
L/
= L(p)
The second derivative of d2t with respect to t is then (with the usual
notation v = p and a = p )
(d2t ) = 2z1 − z0 − v(t)2F − 2(1 − t)z0 + tz1 − p(t), 2 a(t)F ,
which can be rewritten as
(d2t ) = 2z1 − z0 2F − 4z1 − z0 , v(t)F + 22 v(t)2F
−2(1 − t)2 z0 − p(t), a(t)F
−2t2 z1 − p(t), a(t)F .
Using the Cauchy–Schwarz inequality, the fact that v(t)F = 1, and the
inequality
zj − p(t), a(t)F = k(zj , p; t) ≤ k(zj , p) j = 0, 1,
one can minorate (d2t ) as follows:
+ z0 − z1 F z0 − z1 2F ,
2
(dt ) ≥ 2 1 − 2
2
+ − (1 − t)k0 − tk1 ,
2
where
kj = k(zj , p), j = 0, 1.
This implies the convexity of the function
+ 2 1 z0 − z1 F z0 − z1 2F ,
t d2t +t(1−t)2 1− (k0 +k1 )+ ((1−t)k1 +tk0 )−2 +
3 3 2
over the [0, 1] interval. Hence
2 + 1 z0 − z1 F z0 − z1 2F , 1 2 1 2
d21 + 1 − (k0 + k1 ) − 2 + ≤ d0 + d1 . (6.52)
2 4 2 2 2 2
We now use the hypothesis that X0 is a projection of z0 on D. This implies
that the “distance to z0 ” function has a local minimum on p at ν = 0.
Hence the curvilinear triangle (z0 , X0 , p(/2)) has an “obtuse angle” at X0 .
Application of the Lemma 6.2.9 gives then, with the notation of Fig. 6.5,
2
d20 + (1 − k0 ) ≤ d2
0 .
4
In a similar way, the fact that X1 is a projection of z1 on D implies that
2
d21 + (1 − k1 ) ≤ d2
1.
4
294 CHAPTER 6. QUASI-CONVEX SETS
with ηj , j = 0, 1 defined by
Proof. We check first that the hypothesis (6.55) is satisfied as soon as z0 −z1
is small enough: if z0 ∈ ϑ and z1 → z0 , then
because of the continuity of the norm and the “distance to D” function, and
lim inf min {d(zj , D) + ηmax (zj )} → d(z0 , D) + ηmax (z0 ) > d(z0 p)
ε→0 z0 −z1 ≤ε j=0,1
because of the lower semi continuity of the function f : z d(z, D)+ηmax (z),
which implies that z min{f (z), f (z0 )} is l.s.c. at z = z0 . This ensures the
existence of a d satisfying (6.55).
We check now that the functions η0 , η1 defined by (6.58) satisfy inequali-
ties announced there, namely
which proves (6.59) by subtracting d(zj , D) from all three terms and using
the definition
ηj = d − d(zj , D), j = 0, 1
of ηj . Majoration (6.59) implies, by definition of quasi-convex sets, that
k(zj , ηj ) < 1 for j = 0, 1, so that k defined by (6.57) satisfies also k < 1.
We check now that X0 is an η1 -projection of z1 on D:
where
k(p) = (k(z0 , p) + k(z1 , p))/2.
admits two distinct stationary points over D, then necessarily one of them
gives to dz a value larger than or equal to d(z, D) + ηmax (z).
Proof. Let z ∈ ϑ be given such that the “distance to z” function admits two
distinct stationary points at X0 , X1 ∈ D, with values strictly smaller than
d(z, D) + ηmax (z):
G. Chavent, Nonlinear Least Squares for Inverse Problems: Theoretical Foundations 299
and Step-by-Step Guide for Applications, Scientific Computation,
DOI 10.1007/978-90-481-2785-6 7, c Springer Science+Business Media B.V. 2009
300 CHAPTER 7. STRICTLY QUASI-CONVEX SETS
A quasi-convex function has one or more global minima, and no parasitic sta-
tionary points. But it can have parasitic stationary points (in the zones where
f is constant), as well as inflexion points. Sometimes, the name “strictly
quasi-convex” is used for functions f ∈ C 0 ([0, L]), which satisfy (7.3) with
7.1. DEFINITION AND MAIN PROPERTIES 301
(6.38) and (6.39). Then ηmax satisfies obviously (7.5). We check now that ϑ
and η satisfy (7.4); let us choose z ∈ ϑ and p ∈ P such that
From the formula (6.39) defining ηmax , we see that there exists î ∈ I such
that
d(z, p) < d(z, D) + ηmax î (z). (7.6)
As d(z, p) ≥ d(z, D), (7.6) implies that ηmax î (z) > 0, which shows that
necessarily z ∈ ϑî . Then the strict quasi-convexity of d2z,p over p results from
the fact that (7.4) holds, by hypothesis, for all ϑ = ϑi and ηmax = ηmax i , i ∈ I,
and hence in particular for î.
The definition chosen for s.q.c. sets makes it very easy to handle the
problem of parasitic stationary points:
Using the Definition 7.1.2 of s.q.c. sets, this implies that the “distance to z”
function d2z,p is s.q.c. along the path p, which contradicts the fact that this
function has a global minimum at ν = 0 (because X is a global minimum on
D) and a stationary point at ν = L(p) (because of the hypothesis we have
made that Y is a stationary point of d2z on D).
Of course, convex sets, which were already quasi-convex sets as we have
seen in Chap. 6 earlier, are also s.q.c. sets with a regular neighborhood ϑ = F
and a function ηmax (z) = +∞. Arcs of circle of radius R and length L,
7.1. DEFINITION AND MAIN PROPERTIES 303
D
ηmax
L < πR
ϑ = complementary
of gray area
ηmax (z) as shown z
however, are s.q.c. only if L < πR, with a largest regular neighborhood ϑ
shown in Fig. 7.1. Comparison with Fig. 6.2 shows the diminution in size of
the neighborhood ϑ associated to the same arc of circle of length L < πR
when it is considered as s.q.c. instead of quasi-convex.
As we see in Fig. 7.1, the largest regular neighborhood ϑ associated to
the arc of circle D by the Definition 7.1.2 of s.q.c. sets catches exactly all
points z of the plan admitting a unique projection on D with no parasitic
stationary points on D of the distance to z. So the notion of s.q.c. provides
a sharp description of the sets D to which the result of Proposition 4.1.1 can
be generalized, when D is an arc of circle. In fact, as we shall see in the next
section, this remains true for all D made of one path p.
The definition of s.q.c. set is quite technical. We have been able to use
it directly and to determine the associated largest regular neighborhood ϑ
in the above examples only because they were extremely simple (D was a
convex set or an arc of circle). Using the definition itself to recognize s.q.c.
sets in the applications to nonlinear least-squares problems (see Chap. 1)
seems quite impractical, if not unfeasible. So we develop in the next section
a characterization of s.q.c. sets, which will be more easy to use, and will
provide the size RG of the largest open regular enlargement neighborhood
ϑ = {z ∈ F |d(z, D) < RG }
associated to the s.q.c. set (the reason for calling RG the size of the neigh-
borhood will become clear in the next section).
For the applications to nonlinear least-squares, the determination of such
a regular enlargement neighborhood is very important, as RG gives an upper
304 CHAPTER 7. STRICTLY QUASI-CONVEX SETS
bound to the size of the noise level (measurement and model errors) on data
for which the least-squares problem remains Q-wellposed.
N (ν)
p
p(ν) p(ν )
Case a) N (ν )
z
p
p(ν )
N (ν) N (ν )
p(ν) = p(0)
Case b) p(L)
z
N (ν)
p(ν) = p(0) p(ν ) = p(L)
Case c)
z N (ν )
As we have seen in Fig. 7.2, the d2z,p function has parasitic stationary
points as soon as d(z, p) is equal to the global radius of curvature at the
projection p(ν) of z on p, seen from some other point p(ν ) of p. It seems
reasonable to make the conjecture that d2z,p will have no parasitic stationary
points as soon as d(z, p) is strictly smaller than the infimum of all global
radii of curvature! This is confirmed by the
Then Definition 7.1.1 of s.q.c. functions implies that d2z,p possesses at least,
beside a global minimum at some ν0 , a second stationary point at ν1 = ν0 .
Hence (7.2) with f replaced by d2z,p holds at both ν0 and ν1 . This can be
rewritten, using the Definition 7.2.1 of affine normal cones, as
z ∈ N(ν0 ) ∩ N(ν1 ),
which implies
But the left-hand side of (7.11) is ρG (ν0 , ν1 ) from the Definition 7.2.2 of the
global radius of curvature, and the right-hand side is d(z, p) by definition of
ν0 . Hence (7.11) becomes
and, using the Definition 7.2.3 of RG (D) and the properties of z, and p
and, from the definition (7.8) of RG (p) , that there exists ν0 , ν1 ∈ [0, L(p)],
ν0 = ν1 , such that
ρG (ν0 , ν1 ) ≤ RG (p) + /2. (7.15)
Let z ∈ F be the projection on N(ν0 ∩ N(ν1 )) of p(ν0 ). By construction, d2z,p
has two distinct stationary points at ν0 and ν1 , and so cannot be s.q.c.! On
the other hand, the definition (7.7) of ρG shows that
Hence given > 0, we have been able to find a z ∈ F and p ∈ P such that
(7.17) holds and d2z,p is not s.q.c., which proves that
inf{...} ≤ RG (D) + .
This holds for any > 0, which proves that inf{...} ≤ RG (D).
Proof. We prove first the sufficient condition. So let (7.18) hold, and define
ϑ and ηmax by (7.19) and (7.20). Then for any z, η, and p satisfying the
hypothesis of (7.4), one has, by definition of ηmax ,
which by Proposition 7.2.4 implies that d2z,p is necessarily s.q.c. So (7.4) holds,
and (7.9) holds trivially as d(z, D) + ηmax (z) = RG (D) > 0. Hence (D, P) is
s.q.c.
We prove now the necessary condition. So let (D, P) be s.q.c., and suppose
that RG (D) = 0. Using Proposition 7.2.4, we can find zn ∈ F , pn ∈ P for all
n = 1, 2, ... such that
1
d(zn , pn ) = , (7.21)
n
d2zn ,pn not s.q.c. (7.22)
But (7.21) implies that d(zn , D) → 0, and, using (7.9), the existence of γ > 0
such that
d(zn , D) + ηmax (zn ) ≥ γ > 0 ∀n.
Hence, for n large enough, one has
2
d(zn , D) + ηmax (zn ) > . (7.23)
n
This implies, as d(zn , D) ≤ d(zn , pn ) = 1/n, that
1
ηmax (zn ) > .
n
So (7.23) can be rewritten as
1
d(zn , pn ) = ≤ d(zn , D) + ηn ,
n
7.2. CHARACTERIZATION BY THE GLOBAL RADIUS 309
1
where ηn = ηmax (zn ) − satisfies
n
0 < ηn < ηmax (zn ).
As (D, P) is s.q.c., the two last inequalities imply that d2zn ,pn is s.q.c. for n
large enough, which contradicts (7.22); hence necessarily RG (D) > 0, and
Proposition 7.2.4 implies that ϑ defined in (7.19) is the largest regular en-
largement neighborhood associated with the s.q.c. set (D, P).
Theorem 7.2.5 is illustrated graphically in Fig. 7.3 in the case of a set D
made of one s.q.c. arc of circle p: one sees that the enlargement neighbor-
hood of size RG (p) is the largest regular enlargement included in the largest
possible regular neighborhood, which is recalled from Fig. 7.1.
ϕ
0 ≤ θ ≤ π/2 p
RG(p) = R
R
0000000
1111111
1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
ϕ
π/2 ≤ θ < π
p
RG(p) = R sin θ R
11111111111111111111111
00000000000000000000000
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
Now that we have defined, and characterized, s.q.c. sets that allow for
the generalization of property (ii) of 4.1.1 (no parasitic stationary points for
the projection), we would like to see whether properties (i), (iii), and (iv)
(uniqueness, stability, and existence) can also be generalized to s.q.c. sets.
According to the results of Chap. 6, this would be true if s.q.c. sets happened
to be quasi-convex sets; to prove that this is indeed the case, we need to study
further the analytical properties of the global radius of curvature ρG (ν, ν )
and its relation with the usual, local radius of curvature ρ(ν) = a(ν)−1
defined in Chap. 6.
The proof of this formula is elementary, with the basic ingredients being the
projection of a point on a hyperplane and the angle between two hyperplanes.
If the reader looks now at Fig. 7.2 and imagines what is going to happen when
ν → ν, he will not be surprised by the
Hence - 1/2
D = δμ 1 − δμ2 4 ,
and, as δμ → 0 when δν → 0,
v − v 1 dν
δμ
= = a(ν + τ )dτ .
|dν| |dν| dν 0 F
But the right-hand side of (7.32) tends to zero when dν → 0 each time
ν is a Lebesgue point for the chosen realization of a (see, e.g., Theorem 8.8
of [Rudin 1987]). As almost every point of [0, L(p)] is a Lebesgue point, we
have proven the left part of (7.29). To prove the right part, we remark that
where
|X − X, v − v | |dν| |v(ν + θ dν, v − v |
0 ≤ dβ = =
D D
for some 0 ≤ θ ≤ 1. The Cauchy–Schwarz inequality implies then
|dν| δμ
0 ≤ dβ ≤ ,
D
-
which, as D δμ → 1 (see (7.31)), shows that
δβ → 0 when dν → 0.
and
R(p) ≥ RG (p) ≥ 0, (7.36)
Proof. Inequality(7.35) holdsbecausep ∈ W 2,∞ [0, L(p)] (cf.Definition6.1.1),
and inequalities (7.36) and (7.37) follow from Proposition 7.2.7 (ii).
We can now state the second main result of this chapter:
Proof. Let (D, P) be s.q.c. Then we know from Theorem 7.2.5 that
or, with the notation dz,p (ν) = z − p(ν)F and a(ν)F = 1/ρ(ν)
We summarize in the next theorem the nice properties of s.q.c. sets with
respect to projection, which generalize those recalled for the convex case in
Proposition 4.1.1:
(0 to X
then for any path p going from X (1
-
X(0 − X(1 ≤ L(p) ≤ (1 − d R(p))−1 z0 − z1 F , (7.47)
F
Proposition 7.2.12 Let (D, P) be s.q.c. Then, for any z ∈ ϑ and any p ∈ P
such that
d(z, p) < RG (D),
the function d2z,p satisfies
(i) d2z,p – and hence dz,p – is s.q.c. over the whole path p
(ii) For any η such that 0 < η < RG (D) − d(z, D), d2z,p is α-convex between
any two η-projections of z, where
d(z, D) + η d(z, D) + η
α=2 1− ≥2 1− > 0.
R(p) R(D)
where ρep
G (“ep” stands for “end points”) is given by
+
ep N /D if v, v ≥ 0,
ρG (ν, ν ) = (7.49)
N+ if v, v ≤ 0,
ρG (ν, ν ) ≥ ρep
G (ν, ν ) ∀ν = ν ,
We treat separately the cases according to the signs of N and v, v :
Case 1: N = N(ν, ν ) ≤ 0.
The mapping τ → N(τ, ν ) is continuous, negative at τ = ν, and positive
when τ is inferior and close enough to ν . Hence, there exists ν̄ ∈ [ν, ν [ such
that N(ν̄, ν ) = 0. Hence
G (ν, ν ) ≥ 0 = ρG (ν̄, ν ),
ρep
and (7.51) is satisfied.
Case 2: N(ν, ν ) > 0.
Subcase 2.1 : v, v ≥ 0.
Then the formula for ρG and ρep
G coincide, so that
ρep
G (ν, ν ) = N/D = ρG (ν, ν ),
ρep
G (ν, ν ) = N, (7.52)
ρG (ν, ν = N = ρep
G (ν, ν ),
As N(ν, ν ) > 0 and v, v < 0, one has necessarily ν̄ < ν, and one checks
easily that the function τ N(τ, ν )2 is strictly increasing over the [ν̄, ν]
interval. Hence
N(ν̄, ν)2 < N(ν, ν )2 = N 2 . (7.55)
As the functions τ N(τ, ν ) and v(τ ), v are continuous, the abscissa ν̄
defined by (7.54) satisfies necessarily one of the two following conditions:
• Either N(ν̄, ν ) = 0 and v(ν̄), v ≤ 0. Then
ρep
G (ν, ν ) = N > 0 = ρG (ν̄, ν ), (7.56)
This ends the proof of formula (7.48). Then the possibility of skipping
in the couples ν = ν that satisfy (7.50) follows from (7.56) and (7.58), and
from the remark that, in both cases,
ρG (ν̄, ν ) = ρep
G (ν̄, ν ). (7.59)
Notice that this result cannot be seen immediately when RG (p) is computed
using ρG (ν, ν ) as in its definition: if p is the sub-path of p going from ν to ν ,
the normal cones at ν and ν to p are necessarily larger than the normal cones
to p, so that one has in general only
ρG (ν, ν ; p) ≥ ρG (ν, ν ; p ).
In the upper part of the figure, one sees that D is s.q.c., with the largest
regular enlargement neighborhood ϑ of size RG = R, as soon as it satisfies
the deflection condition
Θcircle ≤ π/2. (8.3)
But the lower part of the figure indicates that it is in fact enough that
Θcircle satisfies the extended deflection condition
G. Chavent, Nonlinear Least Squares for Inverse Problems: Theoretical Foundations 321
and Step-by-Step Guide for Applications, Scientific Computation,
DOI 10.1007/978-90-481-2785-6 8, c Springer Science+Business Media B.V. 2009
322 CHAPTER 8. DEFLECTION CONDITIONS
for the set D to remain s.q.c., at the price of a possible reduction of the size
of the regular enlargement neighborhood ϑ to RG ≤ R given by
1 if 0 ≤ Θcircle ≤ π/2,
RG = R × (8.5)
sin Θ if π/2 < Θcircle < π.
Definition 8.0.3 (Size of paths and sets) Let (D, P) and p ∈ P be given.
The size of p is its arc length (Definition 6.1.2):
and that of D is
Formula (8.6) for L(p) is duplicated from Definition 6.1.1, which gives the
geometrical quantities associated to a path. Then (8.7) defines L(D) as the
size of D measured along its paths.
which satisfies
⎧
⎨ θ(ν, ν ) ∈ [0, π] ∀ν, ν ∈ [0, L(p)],
θ(ν, ν) = 0 ∀ν ∈ [0, L(p)], (8.9)
⎩
θ(ν, ν ) = θ(ν , ν) ∀ν, ν ∈ [0, L(p)].
CHAPTER 8. DEFLECTION CONDITIONS 323
v(ν) θ(ν, ν )
p(ν)
p(ν ) p
v(ν )
Figure 8.1: The deflection θ(ν, ν ) between two points ν and ν (ν > ν )
of a path p
and satisfy
R ≥ 0, L > 0, Θ ≤ L/R, RG ≤ R. (8.14)
Then we search a lower bound RG to RG (D) as a function of the
lower/upper bounds R, L, and Θ. We prove in Theorem 8.1.5 that
def
∀p ∈ P, RG (p) ≥ RG (D) ≥ RG = RG (R, L, Θ), (8.15)
where
r, 0 ≤ θ ≤ π/2,
RG (r, l, θ) = (8.16)
r sin θ + l − rθ cos θ, π/2 ≤ θ ≤ π.
This shows first, using Theorem 7.2.5, that a sufficient condition for a set
(D, P) with finite curvature 1/R to be s.q.c. with an enlargement neighbor-
hood ϑ of size RG = R > 0 is to satisfy the deflection condition
Θ ≤ π/2, (8.17)
We show in Fig. 8.2 the set of values of the deflection Θ and the size × curva-
ture product L/R that satisfy (8.18), and ensure that the set (D, P) is s.q.c.,
with a regular enlargement neighborhood ϑ of size 0 < RG R, L, Θ ≤ R.
A simple calculation shows that (8.18) is equivalent to
def
Θ ≤ Θmax = L/R if 0 ≤ L/R < π, (8.19)
Θ < Θmax such that Θmax − tan Θmax = L/R if π ≤ L/R.
This justifies the name given to condition (8.18): the upper limit to the
deflection Θ is extended beyond π/2, up to a value θmax which depends on the
estimated size×curvature product L/R. This extended deflection condition
reduces to the condition (8.4) obtained for an arc of circle as soon as the
“worst” estimate Θ = L/R is used for the deflection.
CHAPTER 8. DEFLECTION CONDITIONS 325
L/R
size
×
curvature RG < 0
π RG > 0
Θ
0 π/2 π deflection
Finally, in Sect. 8.2, we consider the case where the set D is the attainable
set ϕ(C) of a nonlinear least squares problem set over a convex set C of
admissible parameters. It is then natural to try to equip D with the set of
paths P made of the images by ϕ of the segments of C: to any x0 , x1 ∈ C,
one can always associate a curve P drawn on D = ϕ(C) by
P : t ∈ [0, 1] P (t) = ϕ (1 − t)x0 + tx1 . (8.20)
The first question is to know under which conditions the curve P – when
it is not reduced to a point – becomes, once reparameterized by its arc length
ν, a path p in the sense of Definition 6.1.1, that is, a W 2,∞ function of ν.
A necessary condition is of course to require that
P ∈ W 2,∞ [0, 1]; F , (8.21)
326 CHAPTER 8. DEFLECTION CONDITIONS
which allows to define the velocity V (t) and acceleration A(t) along the
curve by
dP d2 P
V (t) = (t), A(t) = (t). (8.22)
dt dt2
(We reserve the lower case notation p, v, and a to path, velocity, and accel-
eration with respect to arc length, as in Definition 6.1.2).
But under the sole hypothesis (8.21), the reparametrization
p of P with
respect to arc length satisfies only p ∈ W 1,∞
[0, ]; F . So our first task will
be to show in Proposition 8.2.2 that the additional condition
∃R ∈]0, +∞] such that
A(t) ≤ 1 V (t)2 a.e. on ]0, 1[ (8.23)
F R F
are upper bounds to the arc length size and deflection of (ϕ(C), P). Hence
R, L, Θ are the sought geometric attributes of ϕ(C), P.
The sufficient condition (8.18) for strict quasi-convexity (Sect. 8.1) and
the estimations (8.24) through (8.28) of R, L, Θ (Sect. 8.2) are the basis for
all Q-wellposedness results of Chaps. 4 and 5. Examples of application of these
conditions can be found in Sect. 5.2, where it is proved that the attainable
set of the 2D elliptic nonlinear source estimation problem of Sect. 1.5 is s.q.c.,
and in Sects. 4.8 and 4.9, where the same result is proved for the 1D and 2D
parameter estimation problems of Sects. 1.4 and 1.6.
2. Moreover,
∂θ 1
(ν, ν ) ≤ a(ν)F def
= for a.e. ν ∈ [0, L(p)], (8.32)
∂ν ρ(ν)
so that ∂θ/∂ν(., ν ) ∈ L∞ ([0, L(p)]).
3. The largest deflection Θ(p) along p satisfies
L(p)
∂ν
Θ(p) ≤ aL1 (0,L(p);F ) = ≤ L(p)/R(p). (8.33)
0 ρ(ν)
Proof: Let p ∈ P and ν ∈ [0, L(p)] be given, and choose 0 < θ < π/2. The
function θ(., .) is continuous over [0, L(p)] × [0, L(p)], and hence uniformly
continuous. So there exists Δν > 0 such that
implies
θ(ν1 , ν2 ) ≤ θ. (8.35)
A Taylor–MacLaurin development of cos t at t = 0 gives
t2
cos t = 1 − cos αt for some α ∈ [0, 1],
2
and, as the cosine is a decreasing function over [0, π]
t2
cos t ≤ 1 − cos t for 0 ≤ t ≤ π.
2
Hence,
t2 cos t ≤ 2 1 − cos t for 0 ≤ t ≤ π.
Choosing t = θ(ν1 , ν2 ), where ν1 , ν2 satisfy (8.34) gives, as then cos t ≥
cos θ > 0 and cos θ(ν1 , ν2 ) = v1 , v2 ,
2 2(1 − v1 , v2 )
θ ν1 , ν2 ≤ .
cos θ
This can be rewritten as
v1 − v2 F
θ(ν1 , ν2 ) ≤ ,
(cos θ)1/2
8.1. THE GENERAL CASE: D ⊂ F 329
We prove first the absolute continuity of the ν θ(ν, ν ) function. Let > 0
be given, and (αi , βi ), i = 1, 2...N, be disjoint segments of the interval [0, L(p)]
satisfying
βi − αi ≤ Δν i = 1, 2...N.
Then we get from (8.36) that
N N
1
|θ(βi , ν ) − θ(αi , ν )| ≤ v(βi ) − v(αi )F ,
i=1
(cos θ)1/2 i=1
so that
βi
v(βi ) − v(αi )F ≤ a(t)F dt ≤ βi − αi a∞ .
αi
Hence,
n
a∞
n
θ βi , ν − θ αi , ν ≤ βi − αi , (8.37)
i=1
(cos θ)1/2 i=1
which can be made smaller than by choosing the intervals (αi , βi ) such that
n
+ (cos θ)1/2 ,
βi − αi ) ≤ min Δν, .
i=1
a∞
This proves that the function θ(., ν ) is absolutely continuous over the [0, L(p)]
interval, which in turn implies that ∂θ/∂ν(., ν ) is in L1 (0, L(p)), and that
formula (8.31) holds.
330 CHAPTER 8. DEFLECTION CONDITIONS
We prove now that θ(., ν ) has a bounded variation over [0, L(p)]. Let
be given. One can always add a finite number of points to obtain a new
subdivision:
0 ≤ t0 < t1 < ..... < tN ≤ L(p),
with N ≥ N, such that
|ti − ti−1 | ≤ Δν i = 1, 2...N . (8.38)
independently of the positions and number of the points ti , which proves that
the function θ(., ν ) has a bounded total variation.
We prove now formula (8.32). Let ν ∈ [0, L(p)] be a Lebesgue point for
both a ∈ L∞ ([0, L(p)]; F ) and ∂/∂ν(., ν ) ∈ L1 ([0, L(p)]) (almost every point
of [0, L(p)] has this property!), and dν = 0 such that ν + dν ∈ [0, L(p)] and
|dν| ≤ Δν defined at the beginning of the proof. Then we get from (8.36)
that -
θ(ν + dν, ν ) − θ(ν, ν ) v(ν + dν) − v(ν) |dν|
≤ F
,
dν (cos θ)1/2
which, by definition of the Lebesgue points, converges when dν → 0 to
∂θ a(ν)F
(ν, ν ) ≤ .
∂ν (cos θ)1/2
8.1. THE GENERAL CASE: D ⊂ F 331
But θ can be chosen arbitrarily in the ]0, π/2[ interval, which proves (8.32)
as cos θ can be made arbitrarily close to one.
Finally, (8.33) follows immediately from (8.31) and (8.32), and
Proposition 8.1.2 is proved.
We turn now to the estimation of a lower bound to the global radius of
curvature RG (p) of a path p. Because of Proposition 7.3.1, this amounts to
search for a lower bound to ρep
G
(ν, ν ) independent of ν and ν , where
+
ep N /D if v, v ≥ 0,
ρG (ν, ν ) = (8.39)
N +
if v, v ≤ 0,
N = sgn (ν − ν) X − X, v , (8.40)
1/2
D = 1 − v, v 2 . (8.41)
We give first a lower bound on the numerator N.
Lemma 8.1.3 Let (D, P) and p ∈ P be given. Then, for any ν, ν ∈ [0, L(p)]
one has
N ≥ R sin θ + |ν − ν| − R θ cos θ, (8.42)
where
& '
R = inf ess min(ν, ν ) < t < max(νν ) , (8.43)
ρ(t),
& '
θ = sup ess θ(t, ν ), min(ν, ν ) < t < max(νν ) , (8.44)
Hence, if we define
Remark 8.1.4: The lower bound (8.42) on N retains its maximum value
only from the shape of the deflection θ(., ν ). However, the inequality (8.47)
shows that, among pieces of paths having the same size |ν − ν| and curvature
1/R, the ones whose deflection has a large total variation are more likely
to have positive global radii of curvature, and hence to be s.q.c. (the θ →
sin θ − θ cos θ function is nondecreasing over the [0, θ] interval, and so a
large variation of θ(., ν ) corresponds to a large variation of sin θ(., ν ) −
θ(., ν ) cos θ, and hence is more likely to produce through (8.47) a positive
lower bound to N).
But the total variation of the deflection is difficult to estimate in applica-
tions, and so we shall retain only the less sharp estimate (8.42) based on the
maximum deflection.
Proof. The partial derivatives of RG are positive with respect to r, and neg-
ative with respect to l and θ over the domain of definition, which proves the
announced monotonicity property.
Let now p ∈ P be given. We shall drop in the rest of the proof the argu-
ment p in R(p), RG (p), L(p), Θ(p), and write simply instead R, RG , L, Θ.
Let then ν, ν ∈ [0, ] be given. Formula (8.33) shows that
ν − ν − R θ ≥ 0, (8.51)
N ≥ R sin θ. (8.54)
The sine function is increasing over [0, π/2], and so we obtain from (8.41)
that 1/2
D = 1 − v, v 2 = sin θ(ν, ν ) ≤ sin θ. (8.55)
Then (8.53), (8.54), and (8.55) imply
ρep
G
(ν, ν ) ≥ R ,
ρep
G
(ν, ν ) ≥ R ≥ R. (8.56)
• Or π/2 < θ ≤ Θ, and then v, v = cos θ(ν, ν ) can be either positive
(if 0 ≤ θ(ν, ν ) ≤ π/2) or negative (if π/2 ≤ θ(ν, ν ) ≤ θ), and so the
only information we get from (8.39) is
ρep
G
(ν, ν ) ≥ N + ≥ N.
But
R≥R>0 |ν − ν| ≤ L θ ≤ Θ,
0 ≤ θ ≤ |ν − ν|/R 0 ≤ Θ ≤ L/R,
so the monotonicity property of RG shows again that
ρep
G
(ν, ν ) ≥ RG (R, L, Θ),
which implies (8.49). Then (8.50) follows immediately from the mono-
tonicity properties of the r, l, θ RG function.
Theorem 8.1.6 Let (D, P) be a set equipped with a family of paths, and
R, L, Θ be three first geometric attributes of (D, P) (Definition 8.0.5). Then
R tanΘ
RΘ z
p
which satisfies
def
0 ≤ ν ≤ = ν(1),
and the arc length of the curve P is
1
L(P ) = = ν(1) = V (t) dt. (8.62)
F
0
338 CHAPTER 8. DEFLECTION CONDITIONS
Definition 6.1.1 requires that paths of ϕ(C) have a strictly positive length.
Hence we shall consider only in the sequel the curves P such that L(P ) > 0 .
By construction, t ν(t) is a nondecreasing mapping from [0, 1] onto
[0, ], which can be constant on some intervals of [0, 1]. The reparameterized
path p : [0, ] → D is hence unambiguously defined by
p ν(t) = P (t) ∀t ∈ [0, 1]. (8.63)
It will be convenient to denote the velocity and acceleration along p, that is,
the two first derivatives of p with respect to ν, when they exist, by the lower
case letters
v(ν) = p (ν), a(ν) = v (ν) = p (ν) ∀ν ∈ [0, ].
Proposition 8.2.1 Let (8.59) and (8.61) hold, and x0 , x1 ∈ C be such that
the curve P defined by (8.60) has an arc length L(P ) > 0. Then its reparam-
eterization p by arc length defined in (8.63) satisfies
p ∈ W 1,∞ [0, ]; F , (8.64)
v(ν) = 1 a.e. on [0, ], (8.65)
F
and P has, at all points t ∈ [0, 1], where V is derivable and V (t) = 0, a finite
curvature 1/ρ(t) – that is, a radius of curvature ρ(t) > 0 – given by
5 2 26 12
1 AF A V AF
= a(ν(t))F = − , ≤ . (8.66)
ρ(t) V 2F V F V F
2 V 2F
Proof. The reparameterized path p is in L∞ [0, ]; F by construction, and
hence defines a distribution on ]0, [ with values in F. We compute the∞deriva-
tive v of this distribution p. For any ϕ ∈ D ]0, [ (the space of C ]0, [
functions with compact support in ]0, [) one has
v, ϕ = − p(ν) ϕ (ν) dν
0
1
= − p ν(t) ϕ ν(t) ν (t) dt
0 1
d
= − P (t) ϕ ν(t) dt
0 dt
8.2. THE CASE OF AN ATTAINABLE SET D = ϕ (C) 339
But ν(.) belongs to W 1,∞ (]0, 1[) and so does ϕ ν(.) , with moreover zero
values at t = 0 and t = 1. So we can integrate by part, as P ∈ W 2,∞
]0, 1[; F ,
1
v, ϕ = V (t) ϕ ν(t) dt. (8.67)
0
To express- v, ϕ, as an integral with respect to ν, one has to replace in (8.67)
dt by dν V (t)F , which is possible only if V (t) = 0. So we define
+ ,
I = t ∈]0, 1[ | V (t) = 0 .
We define now
7
∞
J = Ji , where Ji = ν(Ii ) i = 1, 2....
i=1
The sets Ji i = 1, 2... are (also pair-wise disjoints) open intervals of ]0, [. So
we can associate to any ν ∈ J a number t(ν) ∈]0, 1[, which is the reciprocal
of the t ν(t) function over the interval Ji containing ν. Hence we see that
∞
V (t(ν))
v, ϕ = ϕ(ν) dν,
i=1 Ji
V (t(ν))F
V (t(ν))
v, ϕ = ϕ(ν) dν, (8.68)
J V (t(ν))F
340 CHAPTER 8. DEFLECTION CONDITIONS
= dν = meas J
J
where right-hand sides are evaluated at t = t(ν). Hence for any t ∈ I, such
that V (t)F > 0, one has ν(t) ∈ J, and (8.70) shows that
A(t)F
a(ν(t)) ≤ < +∞,
V (t)2F
which is (8.66).
So we see that the hypothesis that P ∈ W 2,∞ [0, 1]; F ) is not enough to
ensure that its reparametrization p as a function of arc length is W 2,∞ ([0, ]; F ):
in general, the derivative v of p can have discontinuities at points ν ∈ J!
8.2. THE CASE OF AN ATTAINABLE SET D = ϕ (C) 341
Proposition 8.2.2 Let (8.59) and (8.61) hold, and x0 , x1 ∈ C be such that
the curve P associated by (8.60) satisfies
1
there exists RP > 0 s.t.: A(t)F ≤ V (t)2
F
a.e. on ]0, 1[. (8.71)
RP
Then one of the two following properties holds:
• Either
V (t) = 0 ∀t ∈ [0, 1] ⇐⇒ L(P ) = 0, (8.72)
where L(P ) is the length of P defined in (8.62), and the curve P is
reduced to one point of ϕ(C),
• or
V (t) = 0 ∀t ∈ [0, 1] =⇒ L(P ) > 0, (8.73)
and the reparameterization p of P by its arc length, defined by (8.63),
is a path in the sense of Definition 6.1.1. So we can define the radius
def
of curvature of the curve P by R(P ) = R(p) and its emphasize by
def
Θ(P ) = Θ(p), which satisfy the following:
(i) Curvature estimate:
R(P ) ≥ RP > 0. (8.74)
dt ∀t ∈ I,
RP R
d 1
(t) ≤ 1 ∀t ∈ I. (8.76)
dt g RP
342 CHAPTER 8. DEFLECTION CONDITIONS
Hence,
1 1 |t − t0 | 1 1
≤ + ≤ + ∀t ∈ I,
g(t) g(t0 ) RP g(t0 ) RP
so that 1
def 1 −1
g(t) ≥ c = + >0 ∀t ∈ I.
g(t0 ) RP
It follows that g(β) = V (β)F ≥ c > 0, which is a contradiction as we have
seen that V (β) = 0. Hence I =]0, 1[, and (8.72) and (8.73) are proved.
Let now (8.73) hold, and p be the reparameterization of P by arc length.
As V (t) = 0 ∀t ∈ [0, 1], formula (8.65) of Proposition 8.2.1 applies for all
t ∈ [0, 1], where V is derivable, that is, almost everywhere on [0, 1]. Hence
a(t)F ≤ 1/RP for a.e. t ∈ [0, 1], so that p ∈ W 2,∞ ([0, L(P )]; F ), and p is a
path of curvature smaller than 1/RP , which proves (8.74).
Then Proposition 8.1.2 part (iii) gives the following estimate for the de-
flection of p:
L(P )
Θ(p) ≤ a(ν) dν. (8.77)
0
Changing for the variable t ∈ [0, 1] in the integral gives
1
Θ(p) ≤ a(ν(t)) V (t) dt, (8.78)
0
Theorem 8.2.3 Let C and ϕ be given such that (8.59) and (8.61) hold.
(i) If there exists R > 0 such that
1
∀x0 , x1 ∈ C, A(t)F ≤ V (t)2F a.e. in [0, 1], (8.80)
R
then the family of curves P defined in (8.79) is, once reparameterized by
the arc length using (8.63), a family of paths of ϕ(C) in the sense of
Definition 6.1.3, and the attainable set (ϕ(C), P) has a finite curvature:
[1] Aki, K., Richards, P.G., 1980, Quantitative seismology: Theory and
methods, W.H. Freeman, New York 6
[3] Al Khoury, Ph., Chavent, G., 2006, Global line search strategies for
nonlinear least squares problems based on curvature and projected cur-
vature, Inverse Probl. Sci. Eng. 14(5), 495–509 135
[6] Anterion, F., Eymard, R., Karcher, B., 1989, Use of parameter gradients
for reservoir history matching, In SPE Symposium on Reservoir Simu-
lation, Society of Petroleum Engineers, Houston, Texas, SPE 18433 78
[7] Banks, H.T., Kunisch, K., 1989, Estimation techniques for distributed
parameter systems, Birkhäuser, Boston 29
345
346 BIBLIOGRAPHY
[10] Ben-Ameur, H., Chavent, G., Jaffré, J., 2002, Refinement and coarsening
indicators for adaptive parametrization: Application to the estimation
of the hydraulic transmissivities. Inverse Probl. 18, 775–794 113, 116,
120, 123
[11] Ben-Ameur, H., Clément, F., Chavent G., Weis P., 2008, The multi-
dimensional refinements indicators algorithm for optimal parameteriza-
tion, J. Inverse Ill-Posed Probl. 16(2), 107–126 116, 122
[12] Bjork, A., 1990, Least squares methods, In Ciarlet, P.G., and Lions, J.L.,
eds, Handbook of Numerical Analysis, North-Holland, Amsterdam 17
[13] Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizbal, C.A., 2003,
Numerical optimization: Theoretical and practical aspects, Springer,
Universitext Series XIV, p 423 32, 127, 148
[14] Borzi, A., 2003, Multigrid methods for optimality systems, Habilita-
tion Thesis, Institut fr Mathematik, Karl-Franzens-Universitt Graz,
Austria 31
[15] Chardaire, C., Chavent G,. Jaffré J., Liu J., 1990, Multiscale representa-
tion for simultaneous estimation of relative permeabilities and capillary
pressure, Paper SPE 20501, In Proceedings of the 65th SPE Annual
Technical Conference and Exhibition, New Orleans, Louisiana, pp 303–
312 96
[18] Chavent, G., 1990, A new sufficient condition for the wellposedness of
nonlinear least-squares problems arising in identification and control,
In Bensoussan, A., and Lions, J.L., eds, Lecture Notes in Control and
Information Sciences 144, Springer, Berlin, pp 452–463 211, 232
BIBLIOGRAPHY 347
[19] Chavent, G., 1991, New size×curvature conditions for strict quasi-
convexity of sets, SIAM J. Contr. Optim. 29(6), 1348–1372 12, 273
[21] Chavent, G., 2002, Adapted regularization for the estimation of the dif-
fusion coefficient in an elliptic equation, In Proceedings of Picof 02,
Carthage, Tunisie 259
[22] Chavent, G., 2004, Curvature steps and geodesic moves for nonlin-
ear least squares descent algorithms, Inverse Probl. Sci. Eng. 12(2),
173–191 135
[23] Chavent, G., Bissel, R., 1998, Indicator for the refinement of
parametrization. In Tanaka, M., and Dulikravich, G.S., eds, Inverse
Problems in Engineering Mechanics, Elsevier, Amsterdam, pp 309–314
113, 116, 120
[25] Chavent, G., Kunisch, K., 1993, A geometric theory for the inverse prob-
lem in a one-dimensional elliptic equation from an H 1 -observation, Appl.
Math. Optim. 27, 231–260 192, 201
[26] Chavent, G., Kunisch, K., 1993, Regularization in state space, M2AN
27, 535–564 18, 247, 258, 259
[28] Chavent, G., Kunisch, K., 1996, On weakly nonlinear inverse problems,
SIAM J. Appl. Math. 56(2), 542–572 16, 166, 211, 237, 273
[29] Chavent, G., Kunisch, K., 1998, State space regularization: Geometric
theory, Appl. Math. Opt. 37, 243–267 18, 247
348 BIBLIOGRAPHY
[30] Chavent, G., Kunisch, K., 2002, The output least square identifiability
of the diffusion coefficient from an H 1 observation in a 2-D elliptic equa-
tion, ESAIM: Contr. Optim. Calculus Variations 8, 423 97, 200, 202,
259, 261, 263
[31] Chavent, G., Lemonnier, P., 1974, Identification de la non linéarité d’une
équation parabolique quasilinéaire, J. Appl. Math. Optim. 1(2), 121–162
237
[32] Chavent, G., Jaffré, J., Jégou, S., Liu, J., 1997, A symbolic code gener-
ator for parameter estimation. In Berz, M., Bischof, C., Corliss, G., and
Griewank, A., eds, Computational Differentiation, SIAM, 129–136 38
[33] Chavent, G., Jaffré, J., Jan-Jégou, S., 1999, Estimation of relative per-
meabilities in three-phase flow in porous media, Inverse Probl. 15, 33–39
116
[34] Chicone, C., Gerlach, J., 1987, A note on the identifiability of distributed
parameters in elliptic systems, SIAM J. Math. Anal. 18(5), 1378–1384
185
[35] Cominelli, A., Ferdinandi, F., De Montleau, P., Rossi, R., 2005, Using
gradients to refine parameterization in field-case history match projects,
In Proceedings of 2005 SPE Reservoir Simulation Symposium, paper
SPE 93599, Houston, Texas, January 31st–February 2nd 116
[36] Delprat-Jannaud, F., Laiily, P., 1992, What information on the earth
model do reflection travel times provide?, J. Geophys. Res. 97(B13),
19827–19844 92
[37] Engl, H.W., Kunisch, K., Neubauer, A., 1989, Convergence rates for
Tikhonov regularization of nonlinear ill-posed problems, Inverse Probl.
5, 523–540 18, 211
[38] Engl, H.W., Hanke, M., Neubauer, A., 1996, Regularization of inverse
problems, Kluwer, Dordrecht, p 321, (Mathematics and its applications,
375) ISBN 0-7923-4157-0 17, 117
[39] Girault, V., Raviart, P.A., 1979, Finite element methods for Navier-
Stokes equations, Springer, Berlin 263
BIBLIOGRAPHY 349
[42] Grimstad, A.A., Mannseth T., Nævdal G., Urkedal H., 2003, Adaptive
multiscale permeability estimation, Comput. Geosci. 7, 1–25 111
[44] Hayek, M., Lehmann, F., Ackerer, Ph., 2007, Adaptive multiscale
parameterization for one-dimensional flow in unsaturated porous media,
Adv. Water Resour. (to appear) 116, 270, 343
[46] Isakov, V., 1998, Inverse problems for partial differential equations,
Springer, Berlin, p 284 (Applied mathematical sciences, 127) ISBN
0-387-98256-6 11, 185, 191
[47] Ito, K., Kunisch, K., 1994, On the injectivity and linearization of the
coefficient to solution mapping for elliptic boundary value problems,
J. Math. Anal. Appl. 188(3), 1040–1066 11, 185, 191
[48] Jaffard, S., Meyer, Y., Ryan, R.D., 2001, Wavelets (Tools for science
and technology), Society for Industrial and Applied Mathematics, p 256,
ISBN 0-89871-448-6 104
[50] Lavaud, B., Kabir, N., Chavent, G., 1999, Pushing AVO inversion be-
yond linearized approximation, J. Seismic Explor. 8, 279–302 6, 88, 92
350 BIBLIOGRAPHY
[52] Le Dimet, F.-X., Shutyaev, V., 2001, On Newton method in data assim-
ilation, Russ. J. Numer. Anal. Math. Model. 15(5), 419–434 31
[53] Le Dimet, F.-X., Navon I.M., Daescu, D.N., 2002, Second order infor-
mation in data assimilation, Mon. Weather Rev. 130(3), 629–648 31
[54] Levenberg, K., 1944, A method for the solution of certain nonlinear
problems in least squares, Appl. Math. 11, 164–168 17, 209
[55] Lines, L.R., Treitel, S., Tutorial: A review of least-squares inversion and
its application to geophysical problems, Geophys. Prospect. 39, 159–181
17, 270, 343
[56] Lions, J.L., 1969, Quelques Méthodes de Résolution des Problèmes aux
limites Non Linéaires, Dunod, Paris 25, 238
[57] Liu, J., 1993, A multiresolution method for distributed parameter esti-
mation, SIAM J. Sci. Comput. 14, 389 96, 97, 104, 207
[58] Liu, J., 1994, A sensitivity analysis for least-squares ill-posed problems
using the haar basis, SIAM J. Numer. Anal. 31, 1486 96, 97, 104
[59] Louis, A.K., 1989, Inverse und Schlecht Gestellte Probleme, Teubner,
Stuttgart 17, 210
[61] Marchand, E., Clément, F., Roberts, J.E., Pépin, G., 2008, Deterministic
sensitivityanalysisfor a modelfor flow in porousmedia, Adv.Water Resour.
31, 1025–1037, http://dx.doi.org/10.1016/j.advwatres.2008.04.004 38
[63] Morozov, V.A., 1984, Methods for solving incorrectly posed problems,
Springer, New York 17, 210
[64] Næval, T., Mannseth, T., Brusdal, K., Nordtvedt, J.E., 2000, Multiscale
estimation with spline wavelets, with application to two-phase porous
media flow, Inverse Probl. 16, 315–332 96
[65] Neubauer, A., 1987, Finite dimensional approximation of constrained
Tikhonov-regularized solutions of ill-posed linear operator equations,
Math. Comput. 48, 565–583 210, 215
[66] Neubauer, A., 1988, Tikhonov reularization of ill-posed linear operator
equations on closed convex sets, J. Approx. Theor. 53, 304–320 210
[67] Neubauer, A., 1989, Tikhonov regularization for nonlinear ill-posed
problems: Optimal convergence rate and finite dimensional approxima-
tion, Inverse Probl. 5, 541–558 18, 211, 215
[68] Nocedal, J., Wright, S.J., 1999, Numerical optimization, Springer Series
in Operation Research, New York 32, 127
[69] Økland Lien, M., 2005, Adaptive methods for permeability estimation
and smart well management, Dr. Scient. Thesis in Applied Mathematics,
Department of Mathematics, University of Bergen, Norway 96, 116
[70] Richter, G.R., 1981, An inverse problem for the steady state diffusion
equation, SIAM J. Math. 4, 210–221 11
[71] Sanchez Palencia, E., 1983, Homogenization method for the study of
composite media, Lecture Notes in Mathematics 985, 192–214, Springer,
Berlin 200
[72] Schaaf, T., Mezghani, M., Chavent, G., 2002, Direct conditioning of fine-
scale facies models to dynamic data by combining gradual deformation
and numerical upscaling techniques, In Proceedings of the 8th European
Conference on Mathematics of Oil Recovery (ECMOR VIII), Sept 3–6,
Freiberg, Germany
[73] Schaaf, T., Mezghani, M., Chavent, G., 2003, In Search of an opti-
mal parameterization: An innovative approach to reservoir data inte-
gration, paper SPE 84273, In Proceedings of the SPE Annual Technical
Conference and Exhibition, Denver
352 BIBLIOGRAPHY
[74] Sen, A., Srivastava, M., 1990, Regression analysis: Theory, methods and
applications, Springer, Berlin 85
[76] Troianiello, G.M., 1987, Elliptic differential equations and obstacle prob-
lems, Plenum Press, New York 202
[77] van Laarhoven, P., Aarts, E., 1987, Simulated annealing, theory and
practice, Kluwer, Dordrecht 31
[78] Vogel, C., 2002, Computational methods for inverse problems, Frontiers
in Applied Mathematics series 23, SIAM 135, 270, 343
[79] Zhang, J., Dupuy, A., Bissel, R., 1996, Use of optimal control technique
for history matching, 2nd International Conference on Inverse Problems
in Engineering: Theory and Practice, June 9–14, Le Croisic, France,
Engineering Foundation ed. 78, 135
[80] Zhang, J., Jaffré, J., Chavent, G., 2009, Estimating nonlinearities in
multiphase flow in porous media, INRIA Report 6892 88, 185
Index
353
354 INDEX