PDE Discretization
PDE Discretization
J. Liesen Z. Strako
s
The work of J. Liesen was supported by the Heisenberg Program of the Deutsche Forschungs-
gemeinschaft.
The work of Z. Strakos was supported by the research project MSM0021620839 and partially
also by the GACR grant 201/09/0917.
Institute of Mathematics, Technical University of Berlin, Strae des 17. Juni 136, 10623
Berlin, Germany, E-mail: [email protected] Charles University in Prague, Fac-
ulty of Mathematics and Physics, Sokolovska 83, 18675 Prague, Czech Republic, E-mail:
[email protected]
2
To introduce the setting and notation of this paper we very briefly describe the solution
process of a partial differential equation (PDE) boundary value problem, arising from
mathematical modeling, by the finite element method (FEM). Further details can be
found in any book on the numerical solution of PDEs; see, e.g., [8, 14, 15, 19].
In the first step of the solution process the given PDE or system of PDEs Lu = f
(plus appropriate boundary conditions) is transformed into its variational formulation:
in the sense that the solution vector x = [1 , . . . , N ]T of (3) contains the coefficients
of the solution uh of (2) with respect to the basis 1 , . . . , N , i.e.,
N
X
uh = j j . (4)
j=1
the original mathematical model or its variational formulation (1), which typically
also requires to understand its origin;
the discretized problem (2), where the approximate solution is restricted to some
finite-dimensional function subspace;
the algebraic problem (3) that determines the coefficients for the approximate so-
lution with respect to the given basis of the finite-dimensional function subspace.
1 Here we do not give any specifics of the choice of the appropriate function space and of the
concept of solution related to the choice of this space. While in the model problem presented
below the situation is simple and the energy norm is appropriate, in many practical cases
the choice of an appropriate function space represents a substantial difficulty. As an example,
the state-of-the-art theory of nonlinear partial differential equations focuses on solutions that
are local in time and exist under the assumptions of sufficient smoothness, this does not
give a strong guidance for the (physically) meaningful evaluation of error. In order to obtain
such guidance, one must take into account the underlying (physical) principles. Models in
continuum thermodynamics as well as thermodynamics of multi-component materials may
serve as examples. Here the natural function spaces are determined in relation to the properties
of the entropy and the rate of the entropy production; see [16, 17, 24].
3
If we leave aside, for simplicity, the errors due to modeling and possible uncertainty in
the data, we are confronted in this solution process with three different type of errors:
the discretization error u uh , where u solves (1) and uh solves (2) (for simplicity
we assume that these solutions exist);
the algebraic error x xn , where x solves (3) and xn is a computed approximation
to x;
(n) (n) (n)
the total error u uh , where u solves (1) and uh = N
P
j=1 j j is determined
(n) (n)
by the coefficient vector xn = [1 , . . . , N ]T .
These errors are related by the simple, yet fundamental equation
(n) (n)
u uh = (u uh ) + (uh uh ),
which means that the total error is the sum of the discretization error and the algebraic
error (after being transferred from the coordinate space RN to the function space Vh ).
A main point of the FEM is that in order to simplify the mathematical issues re-
lated to estimation of the discretization error and in order to obtain a sparse matrix
A, each basis function j is nonzero only on a small subset of the domain . This
fact is computationally crucial, because in mathematical modeling of real world phe-
nomena typically the matrices are very large. Thus, the FEM in general gives up the
global approximation property of individual basis functions; each FEM basis function
approximates the solution only locally. The global approximation is restored by solving
the linear algebraic system (3) and by forming the linear combination (4). (One can
also point out the requirement for investigating the approximation error in the regions
of interest, with convincing arguments presented, e.g., by Babuska and Stroboulis in
[8, p. 417 and Chapter 6] and by Bangerth and Rannacher in [9, Chapter 1].)
If one assumes that the linear algebraic system (3) is solved exactly, then the total
error reduces to the discretization error. In numerous publications on the numerical
analysis of partial differential equations, the exact solution x is indeed assumed to
be available. Our major point is that this assumption does not reflect the reality of
numerical computations. Moreover, aiming at the smallest possible algebraic error is
in conflict with the requirement of computational efficiency of numerical PDE solvers.
In practice only an approximation xn to the exact algebraic solution x is available.
The local character of the FEM basis functions on the one hand, and the global
character of the linear algebraic problem resulting from the discretization on the other
have the following fundamental consequence: The algebraic error x xn can have
strongly varying individual entries, which potentially lead to a large variation in the
(n)
sizes of the local components of the total error u uh on the individual elements,
irrespectively of the local value of the solution u or the local value of the discretization
error u uh . These facts will be illustrated numerically in Section 3 below. In practice
they always should be taken into consideration when evaluating the total error, unless
they can be (rigorously) shown to be insignificant for the given problem.
The goal of the whole computation is to obtain an acceptable approximation to
the solution of the original problem. Here the acceptability refers to the mathematical
modeling level, which uses the given PDE (or system of PDEs) as a tool, and the error
is measured in the proper function space. For an instructive account of the related
issues (without considering the algebraic error) we refer to [7, 28]. Here we argue that
the algebraic part of the error must also be taken into account, which, in general,
brings into the numerical PDE error analysis a fundamental challenge that is very
4
(A + A) xn = b + b (5)
and answers the question how close the perturbed problem (5), which is solved exactly
by xn , is to the original problem Ax = b, which is solved approximately by xn . As
shown by Rigal and Gaches [32] (also see [22, Theorem 7.1]) the normwise (relative)
backward error of xn , defined by
satisfies
kb Axn k kAmin k kbmin k
(xn ) = = = . (7)
kbk + kAk kxn k kAk kbk
In other words, (xn ) is equal to the norm of the smallest relative perturbations in A
and b such that xn exactly solves the perturbed system. Here k k is any vector and
the corresponding induced matrix norm. The componentwise variant can be found in
[29]; see also [22, Chapter 7].
Although the concept of backward error arose from investigations of numerical
instabilities (see, e.g., [30], [22, Chapter 7] that describe the role of Goldstine, von
Neumann, Turing and the epochal contribution of Wilkinson), it can be used irrespec-
tively of the source of the error (truncation and/or roundoff). The algebraic backward
error ingeniously separates the properties of the method (and even of the particular
individual computation) from the conditioning of the problem. Their combination al-
lows to estimate the size of the algebraic error x xn measured in an appropriate
norm; see the essays [38, 6], [10, Section 3.2] and the monograph [22]. Arioli, Noulard
and Russo [5] used the function backward errors and extended the concept to function
spaces; see also [1, 3] and [26, Section 4.3].
5
0.9
0.8
0.7
2
0.6
0.5
0.4
1
0.3
0.2
0.1
0
0 0
0.2 0
0.4 0.8 1 0.2
0.6 0.6 0.4 0.8 1
0.8 0.2 0.4 0.6 0.6
1 0 0.8 0.2 0.4
1 0
Fig. 1 Left: MATLAB plot of the exact solution u of the Poisson model problem (8)(9).
Right: MATLAB plot of the discretization error u uh (the vertical axis is scaled by 103 ).
It should be emphasized that plots show the piecewise linear approximations of the actual
functions, which is, as explained in Remark 1, for the discretization error misleading.
At first sight the incorporation of the algebraic backward error concept into the
estimates of the total error measured in the function space seems to be just a technical
exercise. Due to the error of the model, the discretization error and the uncertainties
in the data, the system Ax = b represents a whole class of admissible systems. Each
system in this class corresponds (possibly in a stochastic sense) to the original real-
world problem. One can therefore argue that as long as the algebraic backward error
(xn ) in (6)(7) is small enough, the computed algebraic solution xn is with respect
to the subject of the mathematical modeling as good as the solution x of Ax = b.
The meaning of small enough is sometimes intuitively interpreted as, say, an order of
magnitude below the size of the discretization error (all measured in the norms which
physically correspond to each other). It is worth to point out that the balance between
the discretization and the algebraic errors is typically evaluated globally (in norms).
The practical situation is, however, much more subtle. In particular, in order to
perform the computations efficiently, we need tight a posteriori estimates of the local
distribution of the total error which incorporate the algebraic error; a more detailed
argumentation and example can be found in [23]. Whether and to which extent the
algebraic backward error can serve this purpose is yet to be found. The experimental
results presented in the following section indicate the nontrivial problems which need
to be resolved.
3 Experimental results
We discretize the variational problem using the (conforming) Galerkin finite element
method (FEM) with linear basis functions on a regular triangular grid with the mesh
size h = 1/(m + 1), where m is the number of inner nodes in each direction. The
basis function j , j = 1, 2, . . . , m2 , corresponding to the jth inner node has its support
composed of six triangle elements with the node j as the central point.
It is well known that nodes can be ordered to obtain the discrete Laplacian matrix A
of the form
2
m2
A = [a(j , i )] = tridiag(I, T, I) Rm , T = tridiag(1, 4, 1) Rmm ;
see, e.g., [15, Section 15.1]. The matrix A is symmetric and positive definite, with its
extreme eigenvalues given by
h mh
min (A) = 8 sin2 , max (A) = 8 sin2 ;
2 2
see, e.g., [19, Chapter 4]. We assemble the right hand side b using a two-dimensional
Gaussian quadrature formula that is exact for polynomials of degree at most three.
In our numerical experiment we use m = 50, and thus A is of size 2500 2500.
Similar numerical results can be obtained for any other choice of m. All computations
have been performed using MATLAB. The extreme eigenvalues of A and the resulting
condition number (with respect to the matrix 2-norm) are
min (A) = 7.5867 103 , max (A) = 7.9924, (A) = 1.0535 103 .
The shape of the discretization error on the MATLAB plots seems very similar to the
shape of the solution, see Fig. 1. As explained in the following remark, the discretization
error is, however, much less smooth than shown on the right part of Fig. 1.
Remark 1 All figures shown in this paper have been generated by the MATLAB
trisurf command, which generates a triangular surface plot. The inputs of trisurf
are the coordinates of the nodes in the given triangular mesh and the respective values
of the plotted function at these nodes. In the plot the function values in the triangle
interiors are interpolated linearly from the values at the nodes, and hence the figures
do not show the actual function values inside the triangles. For the solution u the dif-
ference is not significant. In case of the discretization error u uh (see the right part
of Fig. 1), the plot is, however, misleading . The discretization error is not as smooth
as suggested by the plot, but contains bubbles inside the triangles, which can be
(depending on the size of the error) significant. The same holds for the total errors
shown in Figs. 26.
7
Now we apply the conjugate gradient method (CG) of Hestenes and Stiefel [21] to
the linear algebraic system Ax = b. We use x0 = 0 and stop the iteration when the
normwise backward error drops below the level h , i.e., when
kb Axn k
< h , (12)
kbk + kAk kxn k
where > 0, > 0 are positive parameters and k k denotes the 2-norm. If the size of
the backward error is small enough, then the algebraic approximate solution xn exactly
solves an algebraic problem that is very close to Ax = b. Then one might expect that
the algebraic error does not have a noticeable impact on the total error (here we use
the normwise backward error; the componentwise variant, which is also reported in the
table below, would not lead to any significant change).
In order to examine this reasoning and, in particular, in order to examine quanti-
tatively an intuitive understanding of the term small enough in relation to the size of
the discretization error, we have used
=3
which may seem sufficient, with the choice = 1, to keep the algebraic error in-
significant in comparison to the discretization error. The other values of used in the
experiment are given in Table 1. With m = 50, the values = 3 with = 50 closely re-
semble the situation = 2 with = 1, which corresponds to the size of the inaccuracies
in determining of A and b being proportional to h2 = (51)2 . With decreasing , the
algebraic error measured in the algebraic energy norm quickly drops very significantly
below the discretization error (11).
The componentwise backward error given in Table 1 is computed by the formula
where (y)i denotes the ith entry of the vector y and | | means that we take the corre-
sponding matrix or vector with the absolute values of its entries; see [22, Theorem 7.3].
The discretization error (11) and the values in the second and third column of Ta-
ble 1 satisfy (up to a small inaccuracy proportional to machine precision) the Galerkin
orthogonality relation
(n) (n)
k(u uh )k2 = k(u uh )k2 + k(uh uh )k2 = k(u uh )k2 + kx xn k2A ;
8
3 Algebraic error u u(n) with =50 and =3 3 Total error uu(n) with =50 and =3
x 10 h h x 10 h
15 15
10 10
5 5
0 0
5 5
0 0
0.2 0.2
0.4 0.8 1 0.4 0.8 1
0.6 0.6 0.6 0.6
0.8 0.2 0.4 0.8 0.2 0.4
1 0 1 0
(n) (n)
Fig. 2 = 50.0: algebraic error uh uh (left) and total error u uh (right); the vertical
axes are scaled by 103 . While the algebraic error is piecewise linear, the total error is not
(the MATLAB plot does not show the small bubbles over individual elements; see Remark 1).
(n) (n)
4 Algebraic error uhuh with =1 and =3 4 Total error uuh with =1 and =3
x 10 x 10
10
6
8
4
6
2
4
0
2
2 0
4 2
6 4
0 0
0.2 0.2
0.4 0.8 1 0.4 0.8 1
0.6 0.6 0.6 0.6
0.8 0.2 0.4 0.8 0.2 0.4
1 0 1 0
(n) (n)
Fig. 3 = 1.0: algebraic error uh uh (left) and total error u uh (right).The vertical
axis are scaled by 104 ; see also Remark 1.
5
(n)
Algebraic error uhuh with =0.5 and =3 4 Total error uu(n) with =0.5 and =3
x 10 x 10 h
8 3
2
2
2 1
0
0 0
0.2 0.2
0.4 0.8 1 0.4 1
0.6 0.6 0.6 0.6 0.8
0.8 0.2 0.4 0.8 0.4
1 0 1 0 0.2
(n) (n)
Fig. 4 = 0.5: algebraic error uh uh (left) and total error u uh (right). The vertical
axes are scaled by 105 (left) and by 104 (right); see also Remark 1.
9
1.5
0.5
0
1
0.5
1
0 0
0.2 0
0.4 0.8 1 0.2
0.6 0.6 0.4 0.8 1
0.8 0.2 0.4 0.6 0.6
1 0 0.8 0.2 0.4
1 0
(n) (n)
Fig. 5 = 0.1: algebraic error uh uh (left) and total error u uh (right). The vertical
axes are scaled by 105 (left) and by 104 (right); see also Remark 1.
1.5
1
2
0.5
0.5
1
1.5
2
0 0
0.2 0
0.4 0.8 1 0.2
0.6 0.6 0.4 0.8 1
0.8 0.2 0.4 0.6 0.6
1 0 0.8 0.2 0.4
1 0
(n) (n)
Fig. 6 = 0.02: algebraic error uh uh (left) and total error u uh (right). The vertical
axes are scaled by 106 (left) and by 104 (right); see also Remark 1.
see [14, Theorem 1.3, p. 38]. Therefore, except for = 50, the total error measured in
the energy norm is dominated by the discretization error, with the globally measured
contribution of the algebraic error being orders of magnitude smaller.
When one considers the local distribution of error, the whole picture dramatically
changes. Figs. 26 show the algebraic and total errors for our choice of parameters. For
= 50 the global discretization and algebraic errors measured in the energy norm are
(n)
of the same order. Both uh and uh are piecewise linear and their gradients as well
(n)
as the gradient of the algebraic error in the function space (uh uh ) are piecewise
constant. In contrast to that, the gradient of the solution u and therefore also the
gradient of the discretization error (u uh ) are not piecewise constant. Since we use,
for simplicity, zero Dirichlet boundary conditions, we can even write
see, e.g., [14, Section 1.5, relation (1.61) and Problem 1.11]. This suggests that the
local distribution of the discretization and the algebraic errors can be very different,
which is indeed demonstrated by our experiment. Despite the comparable size of the
10
values
k(u uh )k2
and
(n)
k(uh uh )k2 = kx xh k2A
for = 3 and = 50, the shape of the total error is fully determined by its algebraic
part. With decreasing the algebraic error gets smaller and it eventually becomes
insignificant. Still, it seems counterintuitive that this happens only after kx xh k2A
drops seven orders of magnitude below the squared energy norm of the discretization
error k(u uh )k2 .
It seems also surprising that the algebraic error exhibits such a strongly oscillating
pattern. This can be explained in the following way. It is well known that the CG
method tends to approximate well the largest and smallest eigenvalues of the system
matrix. Assuming exact arithmetic, a close approximation of an eigenvalue means that
the corresponding spectral component of the error is diminished, and the method con-
tinues in the subsequent iterations as if the given component was not present (leading
to superlinear convergence of CG); see, e.g., [35] and [27, Theorem 3.3]. In finite pre-
cision arithmetic this issue is, in general, more complicated, because due to rounding
errors (large) outlying eigenvalues are approximated by computed multiple copies and
the convergence of the CG method is delayed; for a survey see [27, Sections 4 and 5].
For the discretized Laplace operator and the relatively small number of iterations this
finite precision arithmetic phenomenon is, however, not significant, and the largest and
the smallest eigenvalues are approximated at a similar rate.
The approximation of the largest and the smallest eigenvalues means that in the
observed range of iterations in our experiment the smooth and the high frequency parts
of the error are gradually suppressed by the CG method, while the middle frequency
components prevail. Because of the very smooth solution (10), the effect of eliminating
the smooth components is dominating, and the algebraic error exhibits an increasingly
oscillating pattern as the iteration step n grows.
One may suggest to apply a postprocessing smoothing by performing a few ad-
ditional steps of the Jacobi, Gauss-Seidel or SOR iterations. In our experiment, such
postprocessing smoothing is not efficient. While it smooths out some high frequencies
(which are here not significant), it does not change the moderate frequencies which
determine the oscillating pattern of the algebraic error.
4 Conclusions
Let us first stress that we do not advocate to solve the Poisson problem on regular
domains by the CG method; here CG has no chance to compete with some other
specialized fast Poisson solvers. We are also well aware that a proper preconditioner
can suppress the reported oscillations of the algebraic error, and that a multigrid solver
can in our example naturally balance the local error on individual elements. These facts
do not diminish the message which is on purpose kept as simple as possible. Our goal is
to show on a simple model problem some important phenomena which should be taken
into account when solving large scale mathematical modeling problems in general,
where the easy remedies mentioned above might not be applicable. Summarizing, our
main message is twofold:
11
From both the numerical PDE and the numerical linear algebra sides it should be
admitted that matrix computations can not be considered a separate (black box) part
of the numerical PDE solution process. Apart from relatively simple cases, black box
approaches may not work. Even worse, they are philosophically wrong. Even if direct
algebraic solvers are applicable, the resulting algebraic error might not be small and
it should be considered (or the opposite should be rigorously justified). The stopping
criteria in iterative algebraic solvers should be linked, in an optimal case, with fully
computable and locally efficient (on individual elements) a posteriori error bounds
that allow to keep an appropriate balance between the discretization and the algebraic
parts of the error; see, e.g., the discussion in the book by Bangerth and Rannacher [9],
in the recent papers [31, 25, 4, 2, 12, 13, 11, 23, 34, 20], in the habilitation thesis [37], in
the Ph.D. thesis [26], and the references given there. Although this goal seems highly
ambitious and is certainly very difficult to achieve, the near future will certainly bring
new exciting results in that direction.
Acknowledgments. We thank Andre Gaul and Petr Tich y for their help with the
numerical experiments, and Howard Elman, Tom as Vejchodsk
y and Martin Vohralk
for pointing out several inaccuracies in the original manuscript.
References
1. M. Arioli, A stopping criterion for the conjugate gradient algorithms in a finite element
method framework, Numer. Math., 97 (2004), pp. 124.
2. M. Arioli, E. H. Georgoulis, and D. Loghin, Convergence of inexact adaptive finite
element solvers for elliptic problems, Technical Report RAL-TR-2009-021, SFTC RAL
(2009).
12