0% found this document useful (0 votes)
16 views

PHD Digioia

Uploaded by

tasneemengsol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

PHD Digioia

Uploaded by

tasneemengsol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 144

Università degli Studi di Trento

Università degli Studi di Brescia


Università degli Studi di Padova
Università degli Studi di Trieste
Università degli Studi di Udine
Università degli Studi IUAV di Venezia

Arturo di Gioia

Fast Multipole accelerated


Boundary Element techniques
for large-scale problems,
with applications to MEMS.

prof. Giorgio Novati


prof. Attilio Frangi

2005
UNIVERSITA’ DEGLI STUDI DI TRENTO
Dottorato di ricerca in Modellazione, Conservazione e Controllo dei Materiali e
delle Strutture
XVII Ciclo

Coordinatore del Dottorato:


prof. Oreste Bursi

Esame finale: 4 febbraio 2005

Commissione esaminatrice:
prof. Luigi Mongiovı̀, Università degli Studi di Trento
prof. Ján Sládek, Slovak Academy of Sciences
prof. Aurelio Muttoni, Ecole Politechnique Federale de Lausanne
Summary - Sommario

Il lavoro svolto in questa tesi tratta alcune implementazioni di solutori agli


Elementi di Contorno accelerati utilizzando la tecnica Fast Multipole. I problemi
risolti spaziano dal problema elettrostatico (equazione di Laplace) a problemi
evolutivi (diffusione di calore nel transitorio ed equazione scalare delle onde), al
problema del flusso non turbolento (equazioni di Stokes).
Enfasi è stata posta sulla particolare convenienza dei metodi di contorno
opportunamente accelerati, rispetto ai più classici metodi di dominio, per la
soluzione dei problemi ad elevato numero di gradi di libertà e su dominio esterno
tipici dell’analisi di dispositivi micro-elettro-meccanici (MEMS).
Riguardo al problema di Stokes, in questa tesi viene svolta un’analisi ap-
profondita dei problemi di malcondizionamento riscontrati utilizzando le for-
mulazioni esistenti in letteratura per la soluzione di modelli tridimensionali
dalla geometria elaborata. Viene infine descritta una nuova tecnica basata su
un’equazione integrale denominata Mixed Velocity-Traction equation, sviluppata
da Frangi e Tausch [11]. Questa formulazione non soffre dei problemi di mal-
condizionamento sperimentati dalla formulazione classica, e permette quindi di
analizzare dispositivi MEMS completi in tempi ragionevoli su semplici PC.

This thesis deals with several implementations of Fast Multipole accelerated


Boundary Element solvers for different problems, ranging from the electrostatic
problem (Laplace equation) to evolutive problems (transient heat equation and
scalar wave equation) and to creep flow (Stokes equations).
An emphasis is posed on the advantages of accelerated Boundary Integrals
implementations for the solution of the large-scale external problems arising
during the analysis of Micro-Electro-Mechanical systems (MEMS), which allow
to overcome the severe limitations of domain based methods.

3
With reference to the Stokes flow problem, a thorough analysis of the ill-
conditioning issues inherent in the existing formulations applied to 3D problems
with complex geometries is carried out. The Mixed Velocity-Traction equation
formulation, first introduced by Frangi and Tausch [11], is finally described.
This formulation does not suffer from the ill-conditioning issues of the classical
formulation, allowing to analyze complete MEMS devices in reasonable times
and with standard computational resources.
... to Emanuela.
Acknowledgments

First of all, I would like to thank my advisors, for their helpful and patient
guidance. I would also like to thank the group of Solid and Structural Mechanics,
prof. Davide Bigoni, prof. Antonio Cazzani, prof. Marco Rovati, dr. Roberta
Springhetti, dr. Massimiliano Gei, together with prof. Walter Drugan and
with my colleagues and friends Michele Brun, Massimiliano Margonari, Andrea
Piccolroaz, Katia Bertoldi, Giulia Franceschini, Daniele Veber, Sergia Colli,
Lorenzo Magnarelli and Paola Bonetti, for the friendly and at the same time
stimulating environment they provided.
My thank goes also to all the people who indirectly contributed to this work
providing me helpful tips, dr. Claudio Fontanari, dr. Julien Langou, prof. Keith
Davey, prof. Marcus Grote and prof. Naoshi Nishimura.
My gratitude goes to my friends Claudio, Stefano, Rosanna, Andrea, Ari-
anna, Alessandro, Francesca, Barbara, Francesco, Alessandro ”the king”, Mi-
haela and Gigi ”the king of tarots”, who survived my horribly cooked Saturday
dinners, and to Gerolamo, Alessandro ”twenty minutes”, Mauro and Gianluca,
who taught me that when all you have is a snowboard, every obstacle starts
resembling a jump.
Finally, I would like to thank my family for their love and support, and
Emanuela, for her endless love and patience.

9
Contents

1 Introduction 1
1.1 The Boundary Element Method . . . . . . . . . . . . . . . . . . . 1
1.2 The MEMS devices . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 The Laplace equation 5


2.1 The elastostatic problem . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The boundary element method for the Laplace equation . . . . . 8
2.3 The electrostatic problem in MEMS analysis . . . . . . . . . . . 10
2.4 Discretization of the problem . . . . . . . . . . . . . . . . . . . . 11
2.5 Solution of the system of equations . . . . . . . . . . . . . . . . . 13
2.6 Preconditioning of BEM matrices . . . . . . . . . . . . . . . . . . 16

3 Introduction to the Fast Multipole Method 19


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 The kernel expansion . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 The oct-tree construction . . . . . . . . . . . . . . . . . . . . . . 24
3.4 The construction of cell lists . . . . . . . . . . . . . . . . . . . . . 28
3.5 The matrix-vector multiplication . . . . . . . . . . . . . . . . . . 30
3.6 Computational efficiency of the Fast Multipole Method . . . . . . 33
3.7 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8.1 Spherical conductors . . . . . . . . . . . . . . . . . . . . . 35
3.8.2 Parallel-plate condenser with moving boundary . . . . . . 36
3.8.3 Comb finger resonator . . . . . . . . . . . . . . . . . . . . 39
CONTENTS

4 A FMM accelerated Boundary Element technique for scalar


evolutive problems in 3D 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 The boundary-domain integral formulation for the diffusion prob-
lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Time discretization of the boundary-domain integral equation for
the diffusion problem . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 The boundary-domain integral formulation for the scalar wave
equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Time discretization of the boundary-domain integral equation for
the wave propagation problem . . . . . . . . . . . . . . . . . . . . 50
4.6 Spatial discretization and Fast Multipole technique . . . . . . . . 54
4.7 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7.1 Simple wave propagation . . . . . . . . . . . . . . . . . . 56
4.7.2 Radial wave propagation . . . . . . . . . . . . . . . . . . . 61
4.7.3 Transient conduction: heat sink . . . . . . . . . . . . . . . 63

5 The Boundary Element Method for the exterior Stokes flow


problem 65
5.1 Microflows and integral formulations . . . . . . . . . . . . . . . . 65
5.2 The Stokes flow in MEMS analysis . . . . . . . . . . . . . . . . . 66
5.3 The single-layer formulation . . . . . . . . . . . . . . . . . . . . . 68
5.4 The completed double-layer formulation . . . . . . . . . . . . . . 72
5.5 Discretization of the single-layer and double-layer operators . . . 74
5.6 Algebraic properties of the single-layer operator . . . . . . . . . . 75
5.7 The solution of a singular system with the GMRES method . . . 77
5.7.1 Issues in exact arithmetics . . . . . . . . . . . . . . . . . . 77
5.7.2 Other numerical issues . . . . . . . . . . . . . . . . . . . . 80
5.8 The ill-conditioning of the single-layer operator . . . . . . . . . . 84

6 The Mixed Velocity-Traction formulation for the external Stokes


flow 91
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 The Mixed Velocity-Traction formulation . . . . . . . . . . . . . 93
6.3 The discretization of the MVT equation . . . . . . . . . . . . . . 95
6.3.1 The Galerkin discretization . . . . . . . . . . . . . . . . . 96
6.3.2 The Collocation discretization . . . . . . . . . . . . . . . . 97
6.3.3 The Qualocation discretization . . . . . . . . . . . . . . . 98
6.4 The Fast Multipole Method for the MVT equation . . . . . . . . 99

ii
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

6.4.1 Single-layer velocity operator . . . . . . . . . . . . . . . . 100


6.4.2 Single-layer traction operator . . . . . . . . . . . . . . . . 103
6.5 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.5.1 Translating sphere . . . . . . . . . . . . . . . . . . . . . . 106
6.5.2 Comb finger resonator . . . . . . . . . . . . . . . . . . . . 109
6.5.3 Torsional accelerometer . . . . . . . . . . . . . . . . . . . 109
6.5.4 Parallel plate resonator . . . . . . . . . . . . . . . . . . . 111
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A Recursive evaluation of spherical functions 119

B The internal mesh for the DBEM formulation 121


B.1 Automatic generation of an internal mesh . . . . . . . . . . . . . 121
B.2 Analytical integrations over parallelepiped-shaped volume elements123

iii
CONTENTS

iv
Chapter 1

Introduction

The present work deals with some recent advancements in Boundary Element
techniques for large-scale problems, with particular reference to Fast Multipole
accelerated formulations for the analysis of Micro-Electro-Mechanical structures
(MEMS). This thesis also deals with a mixed boundary-domain formulation for
evolutive problems.

1.1 The Boundary Element Method


The theory of Boundary Integral Equations, upon which the Boundary Element
Method is based, can be dated back to the second half of the nineteenth century.
The development of numerical solvers has been made possible only with the
advent of electronic computers.
The benefits of the Boundary Element Method, with respect to other numer-
ical techniques, were clear since the very beginning. The reduced dimensionality
of the model, the ease of dealing with external problems and moving bound-
aries, the high accuracy for relatively coarse discretizations were major features
which attracted many researchers in different fields. The major drawbacks of
the method soon appeared: the inability to deal with nonlinear problems in
a straightforward manner, the more difficult implementation with respect to
Finite Difference and Finite Element Methods, its inability to compete as a
general purpose method.
However, with the growth of computational resources and with the conse-
quent growth in the complexity of numerical problems, the lack of scalability of
Introduction

the Boundary Element Method quickly became a major issue, expecially when
dealing with three-dimensional problems with complex geometries. The para-
dox was clear. The Boundary Element Method was relegated to a niche position
in the numerical methods world, however in that niche it had the possibility to
easily outperform its competitors in terms of accuracy and ease of use. Unfor-
tunately the exploitation of this potential advantage was seriously compromised
by its lack of scalability.
During the last ten years many efforts have been directed towards the devel-
opment of several numerical techniques able to grant a proper scalability to the
Boundary Element Method, among which is the Fast Multipole Method (FMM-
BEM), extensively used in the present work. These techniques allow to fully
exploit the power of boundary formulations, allowing them to gain a widespread
acceptance as a specialized tool for the solution of several classes of problems.

1.2 The MEMS devices


Micro-electro-mechanical systems (MEMS) are microscale devices combining an
electronic and a mechanical part on a common silicon substrate. They consist
of several fixed or movable conductors, which can be used as sensors or actua-
tors. Microactuators are activated by an imposed electric field which produces
a mechanical response (e.g. micromirrors for switching light direction in fiber
optics technology), while microsensors react to mechanical actions through a
variation in the electric field (e.g. airbag sensors).
They roughly work as their macroscale counterparts (standard sensors and
actuators) with the advantages of small size, low production costs, high effi-
ciency, low latency and low power consumption. Moreover the very small size
allows both new applications, such as in biomechanics, in portable devices tech-
nology, or in S.M.A.R.T. (Self Monitoring, Analysis and Reporting Technology)
devices, i.e. devices able to monitor themselves in order to predict and/or pre-
vent their failure or loss of functionality, and a significant performance improve-
ment of existing applications (e.g. ink-jet printer micronozzles, or microsensors
placed on hard disks heads in order to increase their accuracy and therefore
data density).
Despite these advantages, MEMS technology development is slowed down by
the high prototyping costs; therefore efficient tools for the numerical simulation
of these devices are highly required (Meckerle [21]), since they could signifi-
cantly cut down these costs. Standard Finite Element tools can efficiently solve
many of the problems arising in MEMS design, however they fail when dealing

2
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

with the large-scale 3D external electromagnetic and fluid flow problems. Due
to the complex geometry of MEMS devices and the presence of movable parts,
the usage of domain based techniques is severely limited by the excessive com-
putational requirements, since the problems size can quickly grow up to tens of
hundreds of millions of domain DOFs. Equivalent problems, when tackled by
boundary integral equation approaches, can be discretized using a much smaller
number of DOFs and are therefore solvable on average workstations by means
of the FMM-BEM. Moreover, the considerably smaller (if not null) remeshing
requirement upon any rigid or quasi-rigid boundary motion is another major
advantage of the Boundary Element Method over the Finite Element Method
for that class of problems.

Figure 1.1: Large Force Electrostatic MEMS Actuator Segment, Photo Courtesy
of MEMX inc. (www.memx.com)

3
Introduction

4
Chapter 2

The Laplace equation

The Laplace equation is the governing equation of many physical problems,


ranging from steady-state heat conduction to electrostatics. Due to its simplic-
ity, it is the ideal setting to introduce the Fast Multipole accelerated Boundary
Element Method. Moreover, with reference to the electrostatic problem, it
demonstrates the advantages of boundary formulations for the solution of the
large-scale external problems arising in MEMS analysis.

2.1 The elastostatic problem


The electrostatic problem in the free space is governed by the first of the Maxwell
equations

ρ (x)
∇ · E (x) = (2.1)
0
where ρ is the charge density and 0 is the permittivity.
Since E is curl-free it can be expressed as the gradient of a an electrostatic
potential V

E (x) = −∇V (x) (2.2)

The electrostatic equation can be expressed in the form of a Poisson equation


ρ
∆V (x) = − (2.3)
0
The Laplace equation

which, if ρ = 0, reduces to the Laplace equation

∆V (x) = 0 (2.4)

With reference to Figure 2.1 a general internal boundary value problem can
be defined by the Laplace equation (2.4) imposed on the closed domain Ω, with
boundary conditions

V (x) = V (x) (2.5)

over ΓV and

∇V (x) · n (x) = En (x) = E n (x) (2.6)

over ΓE .

E E

V
E

Figure 2.1: Reference model for the electrostatic internal problem

In a similar way, the complementary external problem with ground condi-


tions at the infinity (V∞ = 0) can be defined with reference to the geometry

6
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS


E
E
11
V
E

Figure 2.2: Reference model for the electrostatic external problem

described in Figure 2.2, with the only modification in the direction of the normal
vector to the boundary surface Γ.
In MEMS analysis the electrostatic forces acting on the fixed and movable
parts for an assigned set of voltages are fundamental design parameters. The
model problem is that of m perfect conductors immersed in a dielectric medium
of dielectric constant  = r 0 with assigned constant potential Vα over each
cosed surface Γα , α = 1, . . . , m. The resulting external problem with Dirichlet
boundary conditions is depicted in Figure 2.3. Being t (x) the electrostatic
traction acting at each boundary point
En2 (x)
t (x) =  n (x) (2.7)
2
the total force
Z
Fα = t (x) dS (x) (2.8)
Γα

and torque with respect to the centroid xg,α


Z
Mα = (x − xg,α ) × t (x) dS (x) (2.9)
Γα

7
The Laplace equation

are sought.

m
1
nm
n1 Vm
V1
n2 ,
V2
2

Figure 2.3: Reference model for the electrostatic MEMS analysis

2.2 The boundary element method for the Laplace


equation
The derivation of the direct boundary integral equation for the three-dimensional
Laplace problem is well documented in almost all the literature of Boundary El-
ement Method (see e.g. Aliabadi [2], Bonnet [5], Balas, Sladek J and Sladek V
[3]). By applying the second Green identity to equation (2.4) and by using the
fundamental solution
1 1
G (x, y) = r = kx − yk (2.10)
4π r
the resulting boundary integral equation is
Z
cV (x) = [G (x, y) En (y) − K (x, y) V (y)] dS (y) (2.11)
Γ

8
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where
1 r · ny
K (x, y) = ∇y G (x, y) · n (y) = − (2.12)
4π r3
and

 0 x 6∈ Ω
c = 1 x∈Ω (2.13)
 1
2 x∈Γ
is the free-term coefficient resulting from the Cauchy Principal Value integral
involving the strongly singular kernel K.
Equation (2.11) simplifies if pure Dirichlet or Neumann boundary conditions
are applied.
In case of Dirichlet boundary conditions
V (x)|Γ = V (x) (2.14)
the resulting equation is a Fredholm integral equation of the first kind
Z
G (x, y) En (y) dS (y) = g (x) (2.15)
Γ

with r.h.s.
Z
g (x) = cV (x) + K (x, y) V (y) dS (y) (2.16)
Γ

while in case of pure Neumann boundary conditions


En (x)|Γ = E n (x) (2.17)
it is a Fredholm integral equation of the second kind
Z
cV (x) + K (x, y) V (y) dS (y) = g (x) (2.18)
Γ

with r.h.s.
Z
g (x) = G (x, y) E n (y) dS (y) (2.19)
Γ

The more general mixed boundary conditions


V (x)|ΓV = V (x)
(2.20)
En (x)|ΓE = E n (x)

9
The Laplace equation

lead to an equation of the form


Z Z
K (x, y) V (y) dS (y) − G (x, y) En (y) dS (y) =
ΓE ΓV (2.21)
= cV (x) + g (x)

if x ∈ ΓV , and of the form


Z Z
K (x, y) V (y) dS (y) − G (x, y) En (y) dS (y) +
ΓE ΓV (2.22)
− cV (x) = g (x)

if x ∈ ΓE , where
Z Z
g (x) = G (x, y) E n dS (y) − K (x, y) V (y) dS (y) (2.23)
ΓE ΓV

The existence and uniqueness of the solution of (2.15) and (2.21) are always
guaranteed, while (2.18) is a singular operator, and in order for the r.h.s. to
belong to its range, the solvability condition
Z
E n (x) dS (x) = 0 (2.24)
Γ

must be satisfied. In that case, the double layer operator admits infinite solu-
tions, which differ by a set of constant potentials.

2.3 The electrostatic problem in MEMS analy-


sis
As explained in previous sections, the evaluation of electrostatic forces in MEMS
devices can be modeled by the Laplace problem over the external domain Ω
with m immersed perfect conductors of closed surfaces Γα , α = 1, . . . , m, with
assigned constant potentials

V (x)|Γα = V α (2.25)

Equation (2.15) must be solved in order to evaluate the normal electrostatic


flux En along the boundary. The right hand side (2.16) can be further simplified

10
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

due to the fact that, for constant potentials,


XZ
g (x)|x∈Γα = cV α + K (x, y) V β dS (y) =
β Γβ
X Z (2.26)
= cV α + Vβ K (x, y) dS (y)
β Γβ

and
Z 
(1 − c) V α β=α
K (x, y) dS (y) = (2.27)
Γβ 0 β 6= α
x∈Γα

The resulting Fredholm integral equation of the first kind


Z
G (x, y) En (y) dS (y) = V (x) (2.28)
Γ

allows to directly evaluate the electric flux as a function of the applied voltages.

2.4 Discretization of the problem


In order to solve (2.11) a discretization of the boundary is performed by means
of plane triangular elements. The potential V is approximated by a continuous,
piecewise linear function over the boundary, while the normal flux En is mod-
elled as piecewise constant over the boundary elements. This is the simplest
approximation which still preserves the continuity requirements of (2.11).
The starting boundary integral equation thus reduces to the form
Xp Z
cV (x; V1 , . . . , Vq ) = [G (x, y) En (y; En,1 , . . . , En,p ) +
k=1 Γk
(2.29)
−K (x, y) V (y; V1 , . . . , Vq )] dS (y)
where p is the number of elements, q is the number of nodes, and
q
(V )
X
V (x; V1 , . . . , Vq ) = Ni (x) Vi (2.30)
i=1

and
p
(En )
X
En (x; En,1 , . . . , En,p ) = Ni (x) En,i (2.31)
i=1

11
The Laplace equation

are the approximations of V and En by means of the parameters V1 , . . . , Vq


(V ) (V )
and En,1 , . . . , En,p , weighted respectively by the functions N1 , . . . , Nq and
(E ) (E )
N1 n (x) , . . . , Np n (x). Some of the parameters appearing in (2.29) are de-
termined by a proper approximation of the assigned boundary conditions.
In order to determine the remaining n unknown parameters, equation (2.29)
is enforced at n different boundary points. The resulting linear system of equa-
tions has the following structure

[G] {En } − [K] {V } = {0} (2.32)

where
Z
(En )
[G]ij = G (xi , y) Nj (y) dS (y) (2.33)
Γ

and
Z
(V )
[K]ij = K (xi , y) Nj (y) dS (y) . (2.34)
Γ

The structure of the system matrix depends upon the applied boundary
conditions. The discretization of (2.15) leads to a linear system of the form

[G] {En } = {b} (2.35)

while the discretization of (2.18) leads to


 
1
+ K {V } = {b} . (2.36)
2

The evaluation of the matrix coefficients (2.33) and (2.34) is a widely in-
vestigated topic, and various numerical and analytical techniques are available,
which deal with the singularity of the involved kernels. When the source point is
far enough from the integration region, numerical quadrature rules can be safely
employed. In the present work, the singular and quasi-singular integrations are
evaluated by means of the analytical integrations described in Milroy et. al.
[26] which, while being slower than semianalytical techniques or purely numer-
ical techniques based on element subdivision and coordinate transformations,
provide higher accuracy.
The described Collocational formulations are characterized by system ma-
trices which are unsymmetric and fully populated. Advanced formulations such

12
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

as the Symmetric Galerkin approach are also characterized by sysistem matri-


ces that, although being symmetric, are still fully populated. The presence of
dense matrices is both one of the most important features of BEM, since the
far-field interaction between the unknowns leads to a tremendous accuracy in
comparison with other numerical methods, and one of its worse weaknesses, due
to the derived lack of scalability, in terms of both computational and memory
requirements.

2.5 Solution of the system of equations


The majority of textbooks on the Boundary Element Method, even the recent
ones, deal in great detail with the formulation of the method for many physical
problems, but in many cases they do not investigate with the same accuracy the
fundamental issue of the solution of the resulting system of equations. However,
many references in literature deal with the iterative solution of BEM problems
(see e.g. Merkel et al. [25], Valente et al. [38]).
A first remark on this topic is about the inefficiency of direct solvers. The
sparse matrices generated by the Finite Element Method can be efficiently solved
by properly tuned LU-decomposition solvers, or by more advanced sparse direct
solvers. The Boundary Element Method requires instead a full Gaussian elimi-
nation, which is very expensive even for relatively small systems. Moreover, the
size of the required workspace (even for out-of-core solvers) and the size of the
matrix itself totally inhibit a decent scalability.
The adoption of an iterative solver is thus a necessary step to ensure an
optimal performance of Boundary Element Methods. Two solvers are available
for unsymmetric, nondefinite matrices: the General Minimal RESidual method
(GMRES), and the Bi-Conjugate Gradient STABilized method (Bi-CGSTAB).
The symmetric matrices of the SGBEM technique can sometimes also be solved
with the ordinary Conjugate Gradient method.
The GMRES method (Saad et al. [34], Fraysse et al. [12]), provides a bet-
ter convergence than the CG-like methods, even if at the expense of a higher
memory utilization, therefore it has been the method of choice in the present
work. However, the characteristics which allow the adoption of the Fast Multi-
pole Method, which will be introduced in the next chapter, are present in both
the GMRES and the Bi-CGSTAB solvers.
The GMRES method searches an approximate solution of the system

Ax = b (2.37)

13
The Laplace equation

in the form

xk = x0 + zk = x0 + Vk y (2.38)

where x0 is a starting trial solution and Vk is an orthonormal basis of the kth


Krylov subspace

r0 = b − Ax0
(2.39)
Kk ≡ span r0 , Ar0 , A2 r0 , . . . , Ak−1 r0


and y is evaluated by minimizing the residual norm

kb − A (x0 + zk )k2 = min kr0 − Azk2 (2.40)


z∈Kk

The Arnoldi method is employed to build the orthonormal basis Vk :


r0 = b − Ax0
β = kr0 k
V1 = r0 /β
do i = 1, k
wi+1 = AVi
do j = 1, i − 1
hj,i =< wi+1 , Vj >
wi+1 = wi+1 − hj,i Vj
end
hi+1,i = kwi+1 k
Vi+1 = wi+1 /hi+1,i
end
The first vector is the unit vector in the direction of the residual, and the
subsequent vectors are orthogonalized with respect to the previous ones using
the Gram-Schmidt procedure.
The Arnoldi basis for the Krylov space Kk satisfies the identity

AVk = Vk+1 H k (2.41)

where H k is a (k + 1) × k upper Hessemberg matrix.


The residual at kth iteration can thus be expressed as

rk = b − Axk = b − A (x0 + Vk y) = r0 − AVk y =


 (2.42)
= Vk+1 βe1 − Vk+1 H k y = Vk+1 βe1 − H k y

14
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

and, since Vk+1 is a matrix with orthonormal columns, the minimal residual
norm is obtained as the solution of the (k + 1) × k linear least-squares problem

min βe1 − H k y (2.43)


y∈R

which can be solved using the QR factorization of the matrix [H k , βe1 ]. The
solution of (2.43) can be written in the form

y = R(1 : k, 1 : k)−1 R(1 : k, k + 1) (2.44)

and the residual at kth iteration is given by

krk k = R(k + 1, k + 1) (2.45)

For a generic nonsingular system, the GMRES method converges in at most


n iterations, n being the system size. In practise, the method is stopped when
rk reaches a desired threshold.
It is worth recalling that, by using the minimal polynomial of a nonsingular
n × n square matrix A

0 = q (A) = α0 I + α1 A + . . . + αn An (2.46)

its inverse can be written as


n−1
1 X
A−1 = − αj Aj (2.47)
α0 j=0

and x = A−1 b clearly belongs to the Krylov space Kn .


The workspace required by the solver at kth iteration is given by the size
of the Arnoldi basis, which is of order O(k × n), and must be summed to the
memory used by the matrix A.
The noticeable property of the GMRES method is that it accesses the system
matrix A only in terms of matrix-vector multiplications during the construction
of the Arnoldi basis. Therefore, unlike the direct solvers, the GMRES solver
does not necessarily require the matrix to be stored in memory.
Formally, each matrix coefficient could be recalculated at each iteration,
during the matrix-vector multiplication step. This could lead to a complete
removal of the memory bottleneck. However, even in this case scalability is an
issue, since the calculation of a full matrix whose coefficients are given by (2.33)

15
The Laplace equation

and (2.34) quickly becomes computationally unbearable, let alone a complete


recomputation at each step.
During the last ten years, many efforts have been directed towards the de-
velopment of fast solvers, which act at the matrix-vector multiplication level
and, by using some sort of approximation, provide a viable compromise be-
tween memory utilization and computational performance, in order to allow the
solution of large-scale problems which would otherwise be precluded.

2.6 Preconditioning of BEM matrices


The convergence rate of iterative solvers strongly depends upon the structure
of the system matrix. A generic linear system

Ax = b (2.48)

can be modified by left preconditioning

M1−1 Ax = M1−1 b (2.49)

or by right preconditioning

AM2−1 y = b x = M2−1 y (2.50)

provided that the resulting system matrices of the preconditioned systems have
better properties than that of the original one. A trivial observation is that
the best system matrix is the identity matrix. In that case, iterative solvers
converge in just one iteration. Therefore, the best preconditioner is the inverse
of the system matrix. Two preconditioners can be used in sequence, i.e. a right
preconditioner is applied, and then the resulting matrix is left preconditioned.
The goal of any preconditioner is to approximate the inverse of the system
matrix at its best, while achieving a positive balance between the time spent
during its setup and the time saved due to the improved convergence of the
iterative solver.
The right choice of preconditioner strongly depends on the nature of the
underlying system. The simplest preconditioner which should always be used
for BEM matrices is the right Jacobi preconditioner
 
1 1 1
M2−1 = diag , ,..., (2.51)
a11 a22 ann

16
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

which has the effect to scale the unknowns vector, also scaling the correspond-
ing columns in the system matrix. This is expecially useful for problems with
mixed boundary conditions, where the unknows vector contains quantities of dif-
ferent nature (for example, potentials and fluxes) and thereby the corresponding
columns contain coefficients which can differ by orders of magitude.
The block Jacobi preconditioner is an extension of the diagonal precondi-
tioner, based on the inverses of block diagonal submatrices. The leaf precon-
ditioner which will be used in the last Chapter is a variant of this class of
preconditioners. Since it depends on the matrix partitioning introduced by the
Fast Multipole method, it will be described later in this work.
Classical preconditioners are often tuned for sparse matrices, and are there-
fore useless, or computationally inefficient, for BEM matrices.
According to some recent investigations (Chen [6], Giraud et al. [13]) the
SParse Approximate Inverse (SPAI) preconditioner (Benzi et al. [4], Grote et al.
[16]) appears to be a viable choice for the solution of dense, diagonally dominant
problems.
The SPAI preconditioner has been developed for sparse matrices. It consists
in the construction of a sparse matrix M −1 such that ||M −1 A − I|| is small in
some norm (usually the Fredholm norm is adopted). For an assigned sparsity
pattern, the determination of the values for the coefficients of the sparse approx-
imate inverse M −1 reduces to the solution of n least-squares problems whose
size is at most equal to m, where n is the problem size and m is the number of
nonzero coefficients per row in the sparse matrix A.
The SPAI preconditioner can be adapted to a dense system by simply ex-
tracting the most significant values from the original matrix A, therefore giving
rise to a sparse approximation à which is used to calculate the preconditioner.
The efficiency and consequently the setup cost of the SPAI preconditioner de-
pend on the parameters which control both the extraction of the sparse approxi-
mation from the original matrix A, and the sparse inverse from the approximate
matrix à (see e.g. Margonari [24]).
The described preconditioners are based on the algebraic structure of the sys-
tem matrix. Other classes of preconditioners specifically exploit the analytical
properties of the starting integral equations (Steinbach et al. [35], Christiansen
et al. [8]).
Another possibility is given by the adoption of an iterative solver which
allows a variable preconditioner, such as the FGMRES (Saad [33]) or the GM-
RESR (van der Vorst [39]). The advantage given by these solvers is that the
computation of an approximate inverse M −1 can be avoided. Instead, the vec-
tor z which should be obtained by the application of the preconditioner to a

17
The Laplace equation

known vector y

z = M −1 y (2.52)

can be approximated by an inner iterative solution of the linear system

Mz = y (2.53)

where M is a convenient approximation of the original matrix A.


The optimal preconditioner depends on many factors, among which are the
nature of the equations and the problem size. One good property of integral
formulations is that, let alone the possible ill-conditioning stemming from the
properties of the underlying problem, such as in the case of incompressible
elasticity and Stokes flow, the fast-decaying kernels ensure that the resulting
matrices achieve constant, albeit slower, convergence rates even without any
preconditioner.

18
Chapter 3

Introduction to the Fast


Multipole Method

The Fast Multipole Method (FMM) is a technique which approximates and


accelerates the calculations needed to perform a matrix-vector multiplication
within boundary element procedures. With reference to potential problems,
this chapter describes the key steps in the implementation the Fast Multipole
Method. Several examples showing the performance of the FMM-BEM tech-
nique for the solution of large-scale problems are given.

3.1 Introduction
The matrix-vector multiplication z = Aw in a generic BEM problem corresponds
to the evaluation of the left-hand-side of eqn. (2.29) for a set of source points
xi , i = 1, . . . , n
Z
Ii = G (xi , y) En (y; En,1 , . . . , En,p ) dS (y) +
ΓV
Z (3.1)
− K (xi , y) V (y; V1 , . . . , Vq ) dS (y)
Γ En

where the approximated functions V and En depend on the nodal values con-
tained in the vector w. The domain of integration is given by the union of a
Introduction to the Fast Multipole Method

number of integration panels which is proportional to the problem size, so the


direct evaluation of (3.1) requires O(n2 ) operations.
Due to the rapid decrease of the kernels G and K for r = kx − yk −→ ∞, the
domain of integration can be intuitively divided in two regions for every source
point xi . The so called near-field region is a small part of the integration domain
near the source point and provides the more important numerical contribution
to the total value of the integral Ii ; such contribution must be evaluated with
the best possible accuracy through direct (analytical or numerical) integrations.
The complementary far-field region provides a significantly smaller, although
not negligible, contribution to Ii . The goal of fast solvers, among which is
the Fast Multipole Method, is to efficiently approximate the computationally
expensive far-field integrations, once a precise definition of near and far-field
is given, based on the distance between the source points and the integration
panels.
The Fast Multipole method has been first applied to an integral formulation
for the solution of the 2D Laplace equation in 1987 by Greengard and Rokhlin
[14]. Although it was only sporadically adopted (see e.g. Hackbusch et al. [17])
soon after its introduction, after ten years and further developments (Greengard
et al. [15]) the FMM gained popularity among the researchers working on
boundary integral equation methods as a highly efficient building block for the
development of scalable and efficient solvers.
In the following sections, the main ingredients of the FMM will be illus-
trated. First, the technique for an appropriate approximation of the integrand
functions will be introduced. Then, based on the requirements of the adopted
approximation, an algorithm for the definition of the near-field and far-field
parts of the integration region will be described. Finally, the procedure for the
approximation of the matrix-vector multiplication will be analysed in detail.

3.2 The kernel expansion

Since the computational complexity of the Boundary Element Method derives


from the presence of convolution integrals of the form

Z
I (x) = K (x − y) φ (y) dS (y) , (3.2)
Γ

20
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

the first step consists of performing a kernel expansion with respect to a pole
y0 which separates the x-dependence from the y-dependence
X
K (x, y) = kn(1) (x − y0 ) kn(2) (y − y0 ) . (3.3)
n

(1) (2)
The choice for the functions kn and kn is not unique. In order to expand
the kernel as in eqn. (3.3), the Fast Multipole Method uses harmonic expansions,
which are solutions of the Laplace equations, while, for example, the Panel
Clustering method (Hackbush et. al. [17]) uses Taylor expansions.
Usually the expansion (3.3) holds if

|x − y0 | > |y − y0 | (3.4)
(2)
The function kn satisfies a relationship of the form
X
kn(2) (y − y1 ) = km(2)
(y − y0 ) cR
n,m (y1 , y0 ) (3.5)
m

which allows a pole shift. For example, if eqn. (3.3) is obtained through a Taylor
expansion, then eqn. (3.5) reduces to the binomial expansion of a polynomial.
(1)
Moreover the function kn can be further expressed in the form
X
kn(1) (x − y0 ) = (3)
km (x − x0 ) cSm,n (x0 , y0 ) (3.6)
m

(3)
where cSm,n (x0 , y0 ) are constants and km are usually entire functions. Just like
for the expansion (3.3), also (3.6) generally holds if

|y0 − x0 | > |x − x0 | . (3.7)

A common choice for the expansion (3.6) is given by


(3) (2)
km = km . (3.8)
(3) (2)
The function kn , just like kn , also satisfies a pole shift relationship of the
form
X 0
kn(3) (x − x1 ) = km(3) R
(x − x0 ) cn,m (x1 , x0 ) . (3.9)
m

21
Introduction to the Fast Multipole Method

If the kernel is expanded according to (3.3), the integral (3.2) reduces to


Z X
I (x) = K (x, y) φ (y) dS (y) = kn(1) (x − y0 ) Mn (y0 ) (3.10)
Γ n

where
Z
Mn (y0 ) = kn(2) (y − y0 ) φ (y) dS (y) (3.11)
Γ

is the multipole moment centered at y0 . Throughout this thesis (3.11) will be


called Φ2M .
Using (3.6), equation (3.10) can be rewritten in the form
Z X
I (x) = K (x, y) φ (y) dS (y) = Ln (x0 ) kn(3) (x − x0 ) (3.12)
Γ n

where
X
Lm (x0 ) = cSm,n (x0 , y0 ) Mn (y0 ) (3.13)
n

provides the local expansion coefficients as functions of the multipole moments


(M 2L relationship). The value of the integral with respect to the source point
x can thus be obtained as by means of local expansions using (3.12), which will
be called L2I.
Moreover, by using respectively (3.5) and (3.9), an M 2M relationship
X
Mn (y1 ) = Mm (y0 ) cR
n,m (y1 , y0 ) (3.14)
m

and an L2L relationship


X 0
R
Ln (x1 ) = Lm (x0 ) cn,m (x1 , x0 ) (3.15)
m

can be derived.
With reference to Figure 3.1, suppose we have to evaluate the set of integrals
m Z
X
I (xi ) = K (xi , y) φ (y) dS (y) i = 1, . . . , n (3.16)
j=1 Γj

22
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Direct integrations

xi j

m source points n field panels

y0
L2I
M2L 2M
x0

Fast Multipole Method

Figure 3.1: Scheme of basic FMM application

over m field panels, with respect to n source points.


The integrations could be performed directly (for example by using a Gauss
quadrature scheme) leading to an O(n × m) computational complexity.
In order to apply the Fast Multipole Method, let us choose a point y0 near
the field region and another point x0 near the source region, such that they
satisfy (3.4) and (3.7). The integrations can then be evaluated in three steps:

- Using the Φ2M relationship (3.11) the multipole moments centered at y0


are evaluated (O(m) operations);

- The M 2L relationship (3.13) is employed to evaluate the local expansions


centered at x0 as functions of the multipole moments (O(1) operations);

- The value of the integrals is recovered by using the L2I relationship (3.12)
(O(n) operations).

The total computational cost of the operation is thus of order O(m + n). In
order to be numerically evaluated, the required multipole and local expansions
must be truncated at a certain order p. The quality of approximation is directly
related to the order of expansion.

23
Introduction to the Fast Multipole Method

3.3 The oct-tree construction


In order to correctly apply the Fast Multipole Method to a generic BEM prob-
lem, for each source point it is necessary to rigorously determine the region of
the domain in which the expansions introduced in the former section can be
adopted, which is called far-field region. The remaining part of the domain of
integration is the near-field region.
This can be accomplished by grouping both source points and field panels
in clusters of entities. The relationship between sources and fields can then
be evaluated at cluster level. These clusters are determined by means of an
oct-tree structure. As a first step, a cubic region has to be singled out, which
contains both the integration domain and all the source points. Such region is
called root cell. This cell is then recursively subdivided in 8 cubic subcells of the
same size. The subdivision is stopped when the cells containing the field panels
have reached a desired size. An important parameter affecting the accuracy
and performance of the Fast Multipole approximation is given by the size of the
clusters. Bigger clusters obviously lead to a bigger near-field region.
The subdivision of space using the oct-tree structure is instrumental in choos-
ing an appropriate set of expansion poles, and in determining the integration
regions which satisfy the conditions (3.4) and (3.7) for a given cluster of source
points. The described technique is basically the quite common adaptive oct-tree
generation algorithm described by Cheng et. al. [7] and implemented by many
authors together with its 2D quad-tree variant (Mammoli et al. [22], Nishimura
et al. [28], Yoshida et al. [42], Nishimura [27]).
Before describing in detail the oct-tree construction algorithm, some defini-
tions must be introduced.
A cell is a cubic region of space containing source points and/or field panels.
The root cell is the cell containing all the other cells. The level of a cell indicates
the number of subdivisions required to obtain it starting from the root cell,
which is at level 0. The oct-tree depth is the maximum level of its cells. Every
cell of level l, except the root cell, derives from the subdivision of its parent cell,
of level l − 1. Each cell can have up to 8 children. A childless cell is called leaf
cell, otherwise it is called branch cell. A cell is a descendant of another cell if it
can be obtained by a recursive subdivision of the latter. The family of a cell is
the list of cells of which it is a descendant.
The construction of the oct-tree can be formalized in the following steps:

- Two lists of points are prepared. The first one contains a representative
point for each integration panel (for example, the centroid of the panel).

24
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

The second one contains all the source points. This differentiation could
appear cumbersome at first sight, since usually both source and field points
belong to the same region. However, for example, a noteworthy situation
in which the source points can be located in a distinct region from the in-
tegration domain occurs for the evaluation of internal quantities by means
of integral equations (Yoshida et al. [42]);
- The root cell containing all the points belonging to the two lists is defined;
- A given cell of level l contains a certain number of source points, and
another number of field points. If the number of field points is greater than
a certain threshold M , then the cell is a branch cell, so it is subdivided
into eight parts and the children cells are analysed. If it contains less than
M field points, than it is a field leaf. Any field leaf containing at least
one source point is also a source leaf. If the cell contains no field points,
but it contains more than one source point than it is a source-only cell.
A source-only cell is a source leaf if its level is equal to the field depth
of the oct-tree, otherwise it is a source branch and it must be recursively
subdivided in order to obtain a set of source leaves at the oct-tree depth.
The described strategy results in a structure in which any source point be-
longs to a source leaf, and any field panel belongs to a field leaf (some leaves
can be at the same time source leaves and field leaves). Moreover the described
structure is adaptive, i.e. the local depth (level of the field cells in a certain sub-
region) of the oct-tree is proportional to the mesh density. A sample quad-tree
structure (the 2D counterpart of the oct-tree) with depth 3 is shown in Figure
3.2. The naming convention for the cells is level-row-column. For example, 3B4
is the name of the cell at row B and column 4 at level 3. Figure 3.3 shows an
oct-tree built on top of a surface mesh of the Stanford bunny (from the Stanford
University Computer Graphics Laboratory).
During the construction of the oct-tree the field depth is a required parameter
in order to properly obtain the source leaves. However, this value is known only
when the field part of the oct-tree is completely determined. From the practical
point of view, the construction algorithm must be split in two parts. A first
recursive scheme evaluates the field depth without considering any source point
(fake oct-tree), while successively the real oct-tree is generated.
Once the oct-tree is built, it must be set up to be useful for the Fast Multipole
algorithm. That is, for each field cell a reference point for the evaluation of
multipole moments must be chosen. Similarly, a reference point for the local
expansions must be provided for each source cell. Usually, the center of the

25
Introduction to the Fast Multipole Method

Branch cell
Leaf cell

Level 0

1 2

Level 1

1 2 3 4
A

C
Level 2

1 2 3 4 5 6 7 8
A
B
C
D
Level 3
E
F
G
H

Leaves

Figure 3.2: A sample oct-tree

26
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

The Stanford bunny The surface mesh

The oct-tree The oct-tree leaves

Figure 3.3: Oct-tree built on top of a mesh of the Stanford bunny

27
Introduction to the Fast Multipole Method

cell is chosen for this task. However, a better choice is given respectively by
the centroids of the field points and of the source points belonging to the cell.
In order to determine if a field cell lies in the far-field region of a source cell,
the conditions (3.4) and (3.7) must be checked in the worst case. To do so, it
is necessary to know the field radius of each field cell, which is the maximum
distance between any point of any integration panel belonging to the cell and
the reference point for the multipole moments. Similarly, the source radius of
each source cell is the maximum distance between any source point belonging
to the cell and the reference point for the local expansion.
The far-field condition between a source cell i and a field cell j is thus
(j)
d > rs(i) + rf (3.17)
where d is the distance between the reference point for the local expansion in
(i)
cell i and the reference point for multipole moments in cell j, rs is the source
(j)
radius in cell i and rf is the field radius in cell j.

3.4 The construction of cell lists


In order to formally distinguish between the near-field part and the far-field
part of the integration domain with respect to a given source cell, Cheng et. al.
[7] proposed the construction of four lists of cells, which are valid if the source
points and the field reference points coincide, and thus every source cell is also
a field cell and vice versa. Here a slight modification to those lists is proposed,
in order to properly deal with the more general oct-tree structure described in
the former section.
To this purpose, some more definitions are needed. Two cells are adjacent,
i.e. geometrically contiguous, if they share at least one point (one cell is adjacent
to itself). Two cells are well separated if they are not adjacent. Two cells are
colleagues if they are adjacent and they are at the same level. A field cell is in
the far-field of a source cell if the far-field condition (3.17) is satisfied.
Using the definition just introfuced, four lists are created.
• A field leaf j belongs to the List 1 of a source leaf i if i and j are adjacent.
• A field cell j belongs to the List 2 of a source cell i if i and j are well
separated and one of the following conditions is satisfied:
- i and j are at the same level, and the parent of j is a colleague of
the parent of i and does not belong to the far-field of i (either it is

28
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

adjacent to i or it is well separated from i but it does not satisfy the


far-field condition);
- j is colleague of the parent of i, is well separated from i and is in the
far-field of i.

These conditions are mutually exclusive, and simply state that for a source
cell at level l it is desirable to have in list 2 wherever possible a field
cell at level l − 1 instead of the set of its children at level l. This is a
modification to the classical definition of list 2 which will be useful in
order to reduce the number of required M 2L operations during the Fast
Multipole multiplication.

• A leaf j belongs to the List 3 of a source leaf i if it is a descendant of a


collegue k of i, if it is well separated from i and if its parent is adjacent
to i.

• A field leaf j belongs to List 4 of a source leaf i if i belongs to List 3 of j.

It is worth emphasizing some properties of the defined lists. List 1 and List
2 are typical of non adaptive oct-tree structures (all leaves are at the same level
since the algorithm is stopped at the depth of the bigger leaf), while List 3 and
List 4 can be non-empty only when the adaptive algorithm described in the
previous section is employed.
List 1, List 3 and List 4 are defined only on source leaves. Moreover, List 1
and List 4 contain only field leaves. List 2 is defined on both source leaves and
source branches.
In order to simplify the description of the Fast Multipole algorithm it is
convenient to define two lists which are functions of the four lists just introduced.
The interaction list of a source cell, be it a branch or a leaf cell, is given by the
union of List 2 and the list of field cells in List 1, List 3 and List 4 which satisfy
the far-field condition. The interaction list reduces to List 2 alone for source
branch cells. The near-field list of a source leaf cell is given by the list of field
cells in List 1, List 3 and List 4 which do not satisfy the far-field condition.
The definition of these lists allows, for each source leaf cell, to define a com-
plete covering of the integration domain by means of field cells. This covering
formally defines a near-field part, given by the near-field list of the source cell
itself, and a far-field part, given by the union of the interaction lists of the cell
and of all the branch cells belonging to its family, as shown in Figure 3.4 for cell
3G3 of the sample oct-tree of Figure 3.2.

29
Introduction to the Fast Multipole Method

Interaction list 2D2

Interaction list 3G3


Near field list 3G3

Source cell 3G3 (family: 2D2-1B1-root)

Figure 3.4: Near-field and far-field identification

3.5 The matrix-vector multiplication


In order to perform a Fast Multipole accelerated matrix-vector multiplication
it is necessary to introduce the specific formulation for the Laplace problem of
the equations introduced in section 3.2.
Equation (3.1), which corresponds to the ith component of the matrix-vector
multiplication, can be rewritten in the form

Z
Ii = G (xi , y) En (y; En,1 , . . . , En,p ) − K (x, y) V (y; V1 , . . . , Vq ) dS (y)
Γ
(3.18)

if the value of V on ΓV and the value of En on ΓEn are set to zero.


The approximation in the evaluation of eqn. (3.1) in the far-field region
requires an expansion of the kernel G in the form

∞ n −→ −→
1 1 X X −→ −→
G (x, y) = = Sn,m Ox Rn,m Oy Oy < Ox (3.19)
4πr 4π n=0 m=−n

where Rn,m and Sn,m are defined in terms of the associated Legendre functions

30
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Pnm in spherical coordinates


−→ 1
Rn,m Ox = P m (cos θ) e imφ rn
(n + m)! n
−→ (3.20)
1
Sn,m Ox = (n − m)!Pnm (cos θ) e imφ n+1
r
and can be evaluated using a recursive relation.
Similarly, the kernel K can be written in the form


K (x, y) = G (x, y) ni (y) =
∂yi
∞ n −→ −→
1 X X ∂ −→ −→
= Sn,m Ox ni (y) Rn,m Oy Oy < Ox.
4π n=0 m=−n ∂yi
(3.21)
The integral over the surface Γk belonging to the kth field leaf can the be
approximated by eqn. (3.10) which reduces to
∞ X
n −−→
(Γk )
X
Ii = Sn,m Oxi Mn,m (O) (3.22)
n=0 m=−n

where the Φ2M formula (3.11) is given by


Z −→ ∂ −→
Mn,m (O) = Rn,m Oy En (y) − ni (y) Rn,m Oy V (y) . (3.23)
Γk ∂yi
The local expansion (3.12) is rewritten in the form
∞ X
n
Rn,m (−
x−

(Γ )
X
Ii k = 0 x) Ln,m (x0 ) (3.24)
n=0 m=−n

where the M2L formula (3.13) is


0

X n
X −→
n
Ln,m (x0 ) = (−1) Sn0 +n,m0 +m Ox Mn0 ,m0 (O) . (3.25)
n0 =0 m0 =−n0

The M 2M formula (3.14) is finally specialized in


∞ X
X n −−→
Mn0 ,m0 (O0 ) = Rn,m O0 O Mn0 −n,m0 −m (O) (3.26)
n=0 m=−n

31
Introduction to the Fast Multipole Method

while the L2L formula (3.15) is given by


0
∞ n
Rn0 −n,m0 −m (−
x− →
X X
Ln,m (x1 ) = 0 x1 ) Ln0 ,m0 (x0 ) . (3.27)
n0 =n m0 =−n0

At this time, a Fast Multipole accelerated matrix vector multiplication is


performed in five steps:

1. The multipole moments centered in the centroid of fields are evaluated in


each field leaf, by collecting the contributions of the associated integration
panels using the Φ2M formula (3.23);

2. Starting from the field leaves the multipole moments in a generic field
branch are evaluated by collecting the contributions of its children through
the M 2M formula (3.26), where the reference point shift is from the cen-
troid of fields in the child to the centroid of fields in the considered cell.
This step is repeated up to the cells at level 1;

3. Starting from level 2, the local expansion corresponding to each source


cell (be it a branch or a leaf cell) is evaluated by suming two kind of
contributions. The first part derives from the parent cell, and is shifted
from the centroid of sources of the parent to the centroid of sources of
the considered cell using the L2L formula (3.27). The second part is the
contribution to the local expansion of the integration domain belonging to
the interaction list. This is evaluated by applying the M 2L formula (3.25)
from the centroid of fields of any of thee cells in the list to the centroid of
sources of the considered cell. Since the interaction list of the root cell and
of cells of level 1 is empty, the L2L contribution for cells at level 2 is null.
Moreover, the fact that the field cells belonging to the the interaction list
of source cells of level 2 are at least of level 1 explains why the former step
did not involve the root cell.

4. The L2I formula (3.24) is applied to each source point, using the local
expansion of the corresponding source leaf. This step allows to evaluate
the far-field of each integral.

5. The near-field part of each integral is evaluated using a direct integration


strategy over the panels belonging to the near-field list of the source cell
corresponding to the considered source point.

32
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

The first two steps (multipole moment collection in field leaves and multipole
moment evaluation in field branches) are referred to as upward pass, since the
field part of the oct-tree is spanned from leaves to root. The last three steps
(evaluation of local expansions, evaluation of far-field integral and evaluation of
near-field intagral) is instead called downward pass, since the source part of the
oct-tree is spanned from root to leaves.

3.6 Computational efficiency of the Fast Multi-


pole Method
In previous sections, it has been pointed out that the computational complexity
of classical Boundary Element implementations is of order O(n2 ), where n is the
problem size, if an iterative solver is employed. From the practical point of view,
the performance is strongly dependent on the implementation. If the problem
very small, the system matrix can be completely stored in main memory, and
the matrix-vector multiplication time is negligible if compared with the matrix
setup time. For bigger problems, it could be still convenient to store the system
matrix on a larger storage device, such as an hard disk, and read it back at each
matrix-vector multiplication. This option is limited by both the speed of the
device (bandwidth bottleneck) and by the matrix size (size bottleneck), which is
also of order O(n2 ). When one of these bottlenecks is reached, the only option
left is to re-evaluate a part of the system matrix at each multiplication, without
storing it. This solution quickly becomes computationally unbearable.
The computational efficiency of the Fast Multipole method can be evaluated
for a fixed order of expansion p starting from the consideration that, for a fixed
number m of field panels per leaf, the number of oct-tree leaves nl is of order
O(n/m) and the oct-tree depth is of order O(log(n/m)). Also, the total number
of oct-tree cells is of order O(n/m log n/m).
This leads to a computational cost of order O(n) for the multipole moments
collection by the Φ2M operation, and the local expansion of integrals by the
L2I operation. Moreover, for each cell the cost of the collection of the local
expansion coefficients from the interaction list is independent from the size of
the problem, but the number of interaction lists spanned by the FMM algorithm
for each source leaf depends on the depth of the oct-tree, which grows logarith-
mically. Therefore the computational complexity of the far-field matrix-vector
multiplication is thus of order O(n logα n), where α is a positive number.
Due to the fact that the near-field list of each source leaf has a finite size

33
Introduction to the Fast Multipole Method

which does not increase with the mesh size, the near-field matrix size is of order
O(n×m). For this reason the computational cost of the near-field matrix-vector
multiplication is linearly dependent on the problem size.
For these reasons the total computational complexity of the adptive Fast
Multipole technique described above is of order O(n logα n), which from the
practical point of view is an almost linear behaviour.

3.7 Implementation details


Given the complex nature of the Fast Multipole algorithms, the optimal imple-
mentation depends on the hardware specifications of the target machine. Ba-
sically, both the CPU power and the memory/storage subsystem performance
should be carefully considered during the implementation phase in order to
obtain an optimal algorithm, expecially concerning the near-field part of the
matrix-vector multiplication, since the memory requirements for the storage of
the oct-tree with the associated multipole moments and local expansion coeffi-
cients is negligible compared to the space required for the near-field matrix and
so the far-field multiplication is almost totally CPU dependent.
Depending on the relative performance of the the hardware components, a
different strategy can be adopted. It is always convenient to store as much
coefficients as possible in main memory, in order to avoid the recalculation of
lenghty direct integrations. The choice of using or not a secondary storage device
to save the coefficients which cannot be stored in main mamory is driven by the
relative performance between the device speed and the CPU speed in terms
of coefficients throughput. Nowadays, a quite inexpensive dedicated RAID 0
array (two or more hard drives which work in parallel) can easily outperform
a multiprocessor configuration in terms of matrix coefficients throughput. Due
to the fact that the near-field matrix size is linearly dependent on the problem
size, the storage device space is rarely a bottleneck. Moreover, since during I/O
operations the CPU is almost idle, a clever multithread implementation of the
algorithm could exploit the spare CPU cycles to perform at the same time the
far-field multiplication.
As a side note, usually the Collocation Boundary Element Method requires
the evaluation of the free-term c on non-smooth points, which depends on the
solid angle enclosed by the boundary at the collocation point. In classical BEM
implementations, the explicit evaluation of the solid angle is overridden exploit-
ing the fact that constant potentials belong to the null-space of the duble-layer
operator, therefore the diagonal coefficients of the double-layer matrix (which

34
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

contain the free-terms), can be obtained from the remaining coefficients. This is
obviously not possible when the Fast Multipole Method is employed, therefore
the required solid angles must be explicitly evaluated (see e.g. Mantic [23]).

3.8 Numerical examples


3.8.1 Spherical conductors
A first benchmark problem solved allows to evaluate the scalability of the Fast
Multipole Method for increasing mesh sizes. The normal flux generated by a
group of spherical perfect conductors charged at different electrostatic potentials
is evaluated by means of the FMM-BEM technique. Each spherical surface is
meshed with approximatively 5900 triangular elements. The problem is solved
for configurations of 2, 4, 8, 16, 32 and 64 spheres. The number of field elements
for each oct-tree cell is set to m = 40, which leads to an average of little
more than 200 near-field coefficients per equation, and the maximum order of
multipole approximation is p = 6, which provides a good engineering accuracy.
The tolerance for the residual in the GMRES solver is set to η = 10−4 .
The total time required for the solution of a FMM-BEM problem can be
divided in three parts: the construction of the oct-tree and of the auxiliary
data structures, the evaluation of the precalculated near-field coefficient (which
can be avoided at the expense of a complete recalculation at each matrix-vector
multiplication), and the time necessary to perform the FMM matrix-vector mul-
tiplications required by the iterative solver. Each of these steps can be more
or less computationally expensive, and thus can aquire more or less relative
importance, depending on the problem size and on the required Fast Multipole
parameters. In the present benchmark the complete near-field matrix is precal-
culated and stored as a sparse matrix, with a memory occupation of 12 bytes
per coefficient.
Table 3.1 shows the time Tmv required for a matrix-vector multiplication as a
function of the matrix size, together with the relative matrix-vector multiplica-
tion and solution times with respect to those of the 2 conductors configuration.
The Fast Multipole Method scales linearly at each matrix-vector multiplication.
However, the total solution time depends on the total number of iterations,
which increases with the mesh size for a fixed tolerance of the GMRES solver.
Table 3.2 shows the oct-tree size together with the setup times required for
both the oct-tree construction (Tot ) and the near-field matrix (Tnf ).
The dimensions of the near-field matrix are well under the RAM size of the

35
Introduction to the Fast Multipole Method

Spheres DOF Tmv (s) Rel Tmv nmv Rel Tmv · nmv
2 11768 1.67 1.00 7 1.00
4 23552 3.36 2.01 9 2.58
8 47098 6.92 4.14 11 6.51
16 94206 14.00 8.38 14 16.76
32 188430 28.10 16.57 17 40.24
64 376838 58.31 34.91 20 99.74

Table 3.1: Benchmark problem: matrix-vector multiplication time

Spheres Oct-tree cells Near-field matrix size (MB) Tot (s) Tnf (s)
2 1045 47.7 1.47 5.98
4 2095 107.7 5.75 15.21
8 4187 209.5 26.93 45.65
16 8265 398.2 136.53 189.12
32 16340 981.9 531.20 713.21
64 33098 1791.1 2133.01 2633.93

Table 3.2: Benchmark problem: problem setup times

workstation used to solve the problem, therefore the near-field matrix-vector


multiplication is almost istantaneous. The oct-tree setup time is almost deter-
mined by the time required to build the near-field and interaction lists. If a
generic algorithm which spans the whole oct-tree is employed to build the lists
for each cell, this time grows with the square of the number of cells and can
quickly become a serious bottleneck. This suggests that a clever implementation
of the lists construction algorithm should exploit the oct-tree structure to limit
the search to a certain neighborhood of the considered cell.

3.8.2 Parallel-plate condenser with moving boundary


Three squared plates charged at different potentials are surrounded by a grounded
enclosure, as shown in Figure 3.5. The total electrostatic force acting on the
central plate is evaluated for different positions of the plate itself. The results
obtained with several Boundary Element meshes are compared with Finite Ele-

36
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

ment solutions obtained with Femlab, v3.1, using 10-nodes tetrahedral elements.

Figure 3.5: Three-plates problem geometry

The advantage of the Boundary Element method during the modelling stage
is clear, since the problems solved differ only by a rigid translation of the sur-
face mesh corresponding to the central plate. Moreover, the presence of sharp
edges requires a rather fine boundary discretization to properly approximate
the normal flux, which is singular. The corresponding Finite Element meshes
can quicky become huge, with the consequent resource requirements.
The geometry of the problem is summarized in Table 3.3, together with some
details of the FEM and BEM meshes solved.
A fundamental parameter in the comparison between the FEM and the BEM
discretizations is the number of boundary elements obtained by the restriction
to the boundary of the FEM mesh. As a matter of fact, the finest of the FEM
meshes used defines less than 20000 triangles on the boundary, although with
a higher polynomial degree than the corresponding piecewise constant elements
used with the BE approach.
The x and z components of the total force acting on the central plate are
plotted as functions of the plate’s offset in Figure 3.6. The first graph clearly
shows that the correct evaluation of the x component of the force requires a very
detailed discretization over the short edge of the central plate. The z component
is less demanding in terms of DOFs, although even the finest FEM mesh solved
leads to a visible error. This benchmark confirms the better accuracy delivered
by the BEM over the FEM for comparable surface meshes.
More carefully designed FEM meshes could lead to better results for the
same number of DOFs, but on the other hand they would require a significant
amount of modelling work, expecially for complex geometries like the ones of

37
Introduction to the Fast Multipole Method

Total force, x
8

4
Fz

2
BEM 1
1 BEM 2
BEM 3
FEM 1
0 FEM 2
FEM 3
FEM 4
-1
-1 -0.8 -0.6 -0.4 -0.2 0
Central plate offset

Total force, z
160

140

120

100

80
Fz

60

BEM 1
40 BEM 2
BEM 3
FEM 1
20 FEM 2
FEM 3
FEM 4
0
-1 -0.8 -0.6 -0.4 -0.2 0
Central plate offset

Figure 3.6: Three plates example; total force on central plate

38
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS
Gemetry
Plate size 1 × 0.1 × 0.1
Gap between plates 0.1
External enclosure size 4×2×2
Meshes, DOF
BEM 1 1802
BEM 2 25756
BEM 3 244467
FEM 1 22731
FEM 2 89441
FEM 3 287266
FEM 4 914308

Table 3.3: Three plates problem; model details

MEMS. Moreover, the required mesh sizes could quickly grow to the order of
hundreds of millions of unknowns. The corresponding FMM-BEM meshes would
be significantly smaller and well within the reach of a typical workstation.

3.8.3 Comb finger resonator


A lateral comb-drive resonator similar to the one shown in Figure 3.7, which
dimensions are listed in Table 3.4, is solved for different positions of the movable
rotor. As the image clearly indicates, the allowed direction of motion is parallel
to the fingers axis, and the device is designed to exert a constant electrostatic
force on the rotor along its direction of motion, regardless of its position. This
example analyses the effect of a lateral deviation (along the y axis) of the rotor.
The permittivity is assumed  = 0 = 8.85418781 · 10−12 C 2 N −1 m−2 .
First, a single-finger problem is solved. The assigned potential is Vs = 1V
on the stator, while the rotor (the single finger) is grounded. The four meshes
shown in Figure 3.8 have been solved, and the x and y components of the
total force acting on the rotor are reported in Figure 3.9 as functions of the y
displacement of the rotor from the central position.
The results clearly show that the three coarser meshes are not able to prop-
erly approximate the total force in x direction, which depends on the electric
field at the finger tip, while the y component is surprisingly well approximated
even using the coarsest mesh. In order to check that the fourth mesh actually

39
Introduction to the Fast Multipole Method

Dimensions (µm)
Finger gap 2.88
Finger length 39.96
Finger overlap 19.44
Tether length 151
Tether width 1.1
Thickness 1.96
Substrate gap 2

Table 3.4: Resonator geometry (Wang [40])

converged, the comparison with a reference mesh which has been further refined
at the finger tip, for a total of 313064 DOFs, is reported in the first graph.
The complete structure, characterized by 15 fingers for each side of the rotor,
is then solved. One of the two stators has an assigned potential Vs1 = 1, while
the rotor and the second stator are grounded. The induced electrostatic field is
thus concentrated along the fingers of the charged stator and the corresponding
fingers of the rotor, therefore the meshes are refined only in that region. Five
meshes have been used, for a total of 9602, 29802, 97366, 376748 and 795532
degrees of freedom, respectively.
The x and y components of the total force acting on the rotor are reported
in Figure 3.10 as functions of the y displacement of the rotor from the central
position, together with the force acting on 15 separate fingers.
This example confirms the need of very refined meshes in order to properly
approximate the singularities of the electrostatic field.

40
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 3.7: Lateral comb-drive resonator (Wang [40])

Figure 3.8: Meshes for the single finger problem

41
Introduction to the Fast Multipole Method

Total force - x component


6e-11

5e-11

4e-11
Force

3e-11

2e-11

Mesh 1
1e-11 Mesh 2
Mesh 3
Mesh 4
Reference mesh
0
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
Displacement

Total force - y component


8e-09

6e-09

4e-09

2e-09
Force

-2e-09

-4e-09

Mesh 1
-6e-09 Mesh 2
Mesh 3
Mesh 4
-8e-09
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
Displacement

Figure 3.9: Total force on the single finger configuration

42
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Total force - x component


0
Mesh 1
Mesh 3
-1e-10 Mesh 5
Mesh 3 - 15 separate fingers
-2e-10 Mesh 5 - 15 separate fingers

-3e-10

-4e-10
Force

-5e-10

-6e-10

-7e-10

-8e-10

-9e-10

-1e-09
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
Displacement

Total force - y component


1e-07

8e-08

6e-08

4e-08

2e-08
Force

-2e-08

-4e-08

-6e-08 Mesh 1
Mesh 3
-8e-08 Mesh 5
Mesh 3 - 15 separate fingers
Mesh 4 - 15 separate fingers
-1e-07
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
Displacement

Figure 3.10: Total force on the full comb finger resonator

43
Introduction to the Fast Multipole Method

44
Chapter 4

A FMM accelerated
Boundary Element
technique for scalar
evolutive problems in 3D

This chapter presents a Boundary Element procedure for the solution of evo-
lutive problems based on the use of the time-independent kernels of Laplace
equation. The resulting time-dependent mixed boundary-domain integral equa-
tions are solved using standard finite difference time-marching schemes, and a
Fast Multipole accelerated solver.

4.1 Introduction
The classical Boundary Element Method, both in the collocation and the varia-
tional versions, has essentially failed to live up to the initial high expectations,
especially in dynamics, for various reasons some of which are briefly summarized
in what follows. (i) The BEM produces fully populated matrices which make
the application of direct solvers unrealistic for large scale problems; moreover
the generation of such matrices is numerically costly and hence iterative solvers
do not bring much benefit to the method as is. (ii) The BEM applied to dy-
namic analyses with the appropriate time-dependent kernels yields a series of
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

still unresolved issues related to instabilities (see e.g. Frangi et al. [10] and
Peirce et al. [29]) and lengthy convolutions.
This has motivated numerous investigations on boundary-domain methods
(BDM) employing static kernels. In particular, a number of techniques, such
as the dual-reciprocity method (DRM) and the multiple-reciprocity method
(MRM) have been developed aiming at eliminating the need to compute costly
domain integrals by transforming them to equivalent boundary integrals (see
e.g. the review in Ingber et al. [18]).
With respect to point (i), the adoption of Fast Multipole accelerated iterative
solvers seem to be changing the situation drastically. Indeed they allow to reduce
the operation count per iteration to approximately O(n) (to be compared with
O(n2 ) of the classical approach), where n is the problem size. The FMM can
be considered as an efficient tool for evaluating the contributions to the integral
equations relevant to regions far away from the collocation point, the near field
contributions being evaluated by means of classical BEM tools.
Moreover, a recent contribution (Ingber et al. [18]) comparing different
approaches for the evaluation of domain integrals, has pointed out the superior
performance of truly BDM approaches over DRM and MRM, both in terms of
accuracy and computing time.
In what follows both the scalar parabolic equation and the scalar wave equa-
tion are treated by applying the BDM-FMM to evaluate directly and efficiently
the relevant integrals.

4.2 The boundary-domain integral formulation


for the diffusion problem
The three-dimensional time-dependent heat diffusion problem is governed by
the scalar parabolic equation
κ∆u (x, t) − u̇ (x) = 0 ∀ (x, t) ∈ Ω × [0, T ] (4.1)
where the thermal diffusivity
k
κ = (4.2)
ρCp
is a function of the thermal conductivity k, the density ρ and specific heat Cp .
The initial condition
u (x, 0)|Ω = u0 (x) (4.3)

46
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

and boundary conditions


∂u
u (x, t)|Γu = ū (t) p (x, t)|Γp = (x, t) = p̄ (t) ∀t > 0 (4.4)
∂n Γp

complete the problem definition.


In order to develop a mixed boundary-domain integral formulation, the sec-
ond Green identity is applied to (4.1), by using the fundamental solution of the
Laplace equation
1 1
G (x, y) = r = kx − yk (4.5)
4π r
and its normal derivative
1 r · ny
K (x, y) = ∇y G (x, y) · n (y) = − (4.6)
4π r3
The resulting integral equation
Z Z
1
− G (x, y) u̇ (y, t) dV (y) − K (x, y) u (y, t) dS (y) +
κ Ω Γ
Z (4.7)
+ G (x, y) p (y, t) dS (y) = cu (x, t)
Γ
is still a differential equation in time domain, and it is characterized by the pres-
ence of both surface and volume integrals. The Dual Reciprocity approach aims
at transforming the domain integral by approximating the internal unknown
variable u̇. On the contrary, in the present work the domain integral is directly
evaluated through an automatically generated internal discretization.
The unknown quantities are both the unassigned boundary fields and the
value of the first derivative of the primary unknown u̇ in the interior of the
domain Ω. Due to the presence of the free term, the value of u at internal
points is also present, but the latter quantity can be written in terms of u̇ and
of the initial condition using the integral relation
Z t
u (x, t) = u̇ (x, τ ) dτ + u0 (x) . (4.8)
0

4.3 Time discretization of the boundary-domain


integral equation for the diffusion problem
The discretization of (4.7) in time domain is performed by defining a sequence
t0 , t1, . . . , tN of time knots.

47
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

In order to distinguish between domain and boundary quantities, which are


modeled as independent fields, specific subscripts (respectively Ω and Γ) are
introduced.
Then, the equation is solved at time t = t0 = 0 in order to obtain the internal
quantity u̇Ω (x, 0) and the unassigned boundary fields uΓ (x, 0)|Γp and pΓ (x, 0)|Γu
as functions of the assigned boundary conditions ūΓ (x, 0)|Γu and p̄Γ (x, 0)|Γp and
of the initial condition uΩ (x, 0).
The resulting integral equation for step 0 writes
Z Z Z
1
− Gu̇Ω,0 dV + KuΓ,0 dS − GpΓ,0 dS =
Ω κ Γp Γu
Z Z (4.9)
= cūΓ,0 + Gp̄B,0 dS − K ūB,0 dS
Γp Γu

for x ∈ Γu ,
Z Z Z
1
− Gu̇Ω,0 dV + KuΓ,0 dS − GpΓ,0 dS − cuΓ,0 =
Ω κ Γp Γu
Z Z (4.10)
= + Gp̄B,0 dS − K ūB,0 dS
Γp Γu

for x ∈ Γp , and
Z Z Z
1
− Gu̇Ω,0 dV + KuΓ,0 dS − GpΓ,0 dS =
Ω κ Γp Γu
Z Z (4.11)
= cūΩ,0 + Gp̄B,0 dS − K ūB,0 dS
Γp Γu

for x ∈ Ω \ Γ.
Once the initial quantities are known, it is possible to start the time-marching
scheme. For a given time knot tk+1 eqn. (4.7) writes
Z Z Z
1
− Gu̇Ω,k+1 dV + KuΓ,k+1 dS − GpΓ,k+1 dS =
Ω κ Γp Γu
Z Z (4.12)
= cūΓ,k+1 + Gp̄B,k+1 dS − K ūB,k+1 dS
Γp Γu

48
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

for x ∈ Γu ,
Z Z Z
1
− Gu̇Ω,k+1 dV + KuΓ,k+1 dS − GpΓ,k+1 dS − cuΓ,k+1 =
Ω κ Γp Γu
Z Z (4.13)
= + Gp̄B,k+1 dS − K ūB,k+1 dS
Γp Γu

for x ∈ Γp , and
Z Z Z
1
− Gu̇Ω,k+1 dV + KuΓ,k+1 dS − GpΓ,k+1 dS − cuΩ,k+1 =
Ω κ Γp Γu
Z Z (4.14)
= Gp̄B,k+1 dS − K ūB,k+1 dS
Γp Γu

for x ∈ Ω \ Γ.
Since at the solution of step k + 1 the quantities at kth step are known, (4.8)
can be rewritten in the form
Z tk+1
u (x, tk+1 ) = u̇ (x, τ ) dτ + uk (x) (4.15)
tk

which can be discretized with the time difference scheme

uk+1 = ∆t [αu̇k+1 + (1 − α) u̇k ] + uk (4.16)

Different time-marching schemes can be adopted by using different values of


the coefficient α: Forward Euler (α = 0), Backward Euler (α = 1), or Central
Difference (α = 1/2).

4.4 The boundary-domain integral formulation


for the scalar wave equation
The transient scalar wave equation can be employed to describe a variety of
physical problems, ranging from acoustics to magnetodynamics, and often serves
as a prototype of the hyperbolic differential equations class. In the present work
we focus on the equation written in the form
1
∆u (x, t) − ü (x, t) = 0 ∀ (x, t) ∈ Ω × [0, T ] (4.17)
c2s

49
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

where cs is the wave speed.


The problem formulation is completed by the initial conditions

u (x, 0)|Ω = u0 (x)


(4.18)
u̇ (x, 0)|Ω = u̇0 (x)

and the boundary conditions

∂u
u (x, t)|Γu = ū (t) p (x, t)|Γp = (x, t) = p̄ (t) ∀t > 0 (4.19)
∂n Γp

Paralleling the steps taken in the parabolic case, the second Green identity
with the static fundamental solution is applied to eqn. (4.17), leading to the
integral equation
Z Z
1
− 2 G (x, y) ü (y, t) dV (y) − K (x, y) u (y, t) dS (y) +
cs Ω Γ
Z (4.20)
+ G (x, y) p (y, t) dS (y) = cu (x, t) .
Γ

Even in this case the free term must be expressed in terms of ü and of the
initial conditions using the relations
Z t
u̇ (x, t) = ü (x, τ ) dτ + u̇0 (x)
0
Z t
u (x, t) = u̇ (x, τ ) dτ + u0 (x) =
0 (4.21)
Z tZ τ
= ü (x, τ̃ ) dτ̃ dτ +
0 0
+ tu̇0 (x) + u0 (x)

4.5 Time discretization of the boundary-domain


integral equation for the wave propagation
problem
The general algorithm for the solution of eqn. (4.20) is equivalent to the proce-
dure defined for the parabolic equation, with a first set of equations used to find

50
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

the unassigned quantities at the starting time, and then another set of equations
embedding a suitable time marching scheme.
The equation at step 0 writes
Z Z Z
1
− 2
Gü Ω,0 dV + Ku Γ,0 dS − GpΓ,0 dS =
Ω cs Γp Γu
Z Z (4.22)
= cūΓ,0 + Gp̄Γ,0 dS − K ūΓ,0 dS
Γp Γu

for x ∈ Γu ,
Z Z Z
1
− 2
Gü Ω,0 dV + Ku Γ,0 dS − GpΓ,0 dS − cuΓ,0 =
Ω cs Γp Γu
Z Z (4.23)
= + Gp̄Γ,0 dS − K ūΓ,0 dS
Γp Γu

for x ∈ Γp , and
Z Z Z
1
− 2
Gü Ω,0 dV + Ku Γ,0 dS − GpΓ,0 dS =
Ω cs Γp Γu
Z Z (4.24)
= cūΩ,0 + Gp̄Γ,0 dS − K ūΓ,0 dS
Γp Γu

for x ∈ Ω \ Γ.
At time knot tk+1 eqn. (4.20) writes
Z Z Z
1
− 2
Gü Ω,k+1 dV + Ku Γ,k+1 dS − GpΓ,k+1 dS =
Ω cs Γp Γu
Z Z (4.25)
= cūΓ,k+1 + Gp̄Γ,k+1 dS − K ūΓ,k+1 dS
Γp Γu

for x ∈ Γu ,
Z Z Z
1
− 2
GüΩ,k+1 dV + KuΓ,k+1 dS − GpΓ,k+1 dS − cuΓ,k+1 =
Ω cs Γp Γu
Z Z (4.26)
= + Gp̄Γ,k+1 dS − K ūΓ,k+1 dS
Γp Γu

51
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

for x ∈ Γp , and
Z Z Z
1
− 2
Gü Ω,k+1 dV + Ku Γ,k+1 dS − GpΓ,k+1 dS − cuΩ,k+1 =
Ω cs Γp Γu
Z Z (4.27)
= Gp̄Γ,k+1 dS − K ūΓ,k+1 dS
Γp Γu

for x ∈ Ω \ Γ.
As in the case of the parabolic equation, a discretization of eqn. (4.21),
rewritten in the form
Z tk+1
u̇ (x, tk+1 ) = ü (x, τ ) dτ + u̇k (x)
tk
Z tk+1
u (x, tk+1 ) = u̇ (x, τ ) dτ + uk (x) =
tk (4.28)
Z tk+1 Z τ
= ü (x, τ̃ ) dτ̃ dτ +
tk tk
+ (tk+1 − tk ) u̇k (x) + uk (x)

is required.
A widely used discretization technique for (4.28) is the Newmark family of
time-marching schemes

u̇k+1 = uk + ∆t [(1 − γ) ük + γ ük+1 ]


∆t2 (4.29)
uk+1 = uk + ∆tu̇k + [(1 − 2β) ük + 2β ük+1 ]
2
which specializes in several methods for different values of the coefficients β and
γ: Average Acceleration (β = 1/4 and γ = 1/2), Linear Acceleration (β = 1/6
and γ = 1/2), Fox-Goodwin (β = 1/12 and γ = 1/2), and Central Difference
(β = 0 and γ = 1/2).
A slightly modified time-marching scheme can be derived from the so-called
α-HHT (Hilber-Huges-Taylor) method for Finite Difference and Finite Element
systems, developed with the aim to improve stability without degrading the
order of accuracy throughout the introduction of an artificial damping.
In order to simplify the notation, eqn. (4.20) is rewritten in the form
     
− [GΩ ] ü − [KΓu ] + KΓp u + [GΓu ] + GΓp p = cu (4.30)

52
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where
Z
1
[GΩ ] f = G (x, y) f (y) dV (y)
c2s
ZΩ
[GΓu ] f = G (x, y) f (y) dS (y)
Γu
Z
 
GΓp f = G (x, y) f (y) dS (y) (4.31)
Γp
Z
[KΓu ] f = K (x, y) f (y) dS (y)
Γu
Z
 
KΓp f = K (x, y) f (y) dS (y)
Γp

The proposed α-HHT like technique consists in modifying eqns. (4.25)-(4.27)


in the form
  
− [GΩ ] üΩ,k+1 + (1 + α)KΓp uΓ,k+1 − [GΓu ] pΓ,k+1 =
  
= (1 + α) cūΓ,k+1 + [KΓu ] ūΓ,k+1 − GΓp p̄Γ,k+1 + (4.32)
    
− α cuΓ,k + [GΓu ] + GΓp pΓ,k − [KΓu ] + KΓp uΓ,k

for x ∈ Γu ,
  
− [GΩ ] üΩ,k+1 + (1 + α) KΓp uΓ,k+1 − [GΓu ] pΓ,k+1 − cuΓ,k+1 =
  
= (1 + α) [KΓu ] ūΓ,k+1 − GΓp p̄Γ,k+1 + (4.33)
    
− α cuΓ,k + [GΓu ] + GΓp pΓ,k − [KΓu ] + KΓp uΓ,k

for x ∈ Γp , and
  
− [GΩ ] üΩ,k+1 + (1 + α) KΓp uΓ,k+1 − [GΓu ] pΓ,k+1 − cuΩ,k+1 =
  
= (1 + α) [KΓu ] ūΓ,k+1 − GΓp p̄Γ,k+1 + (4.34)
    
− α cuΩ,k + [GΓu ] + GΓp pΓ,k − [KΓu ] + KΓp uΓ,k

for x ∈ Ω\Γ, where α ∈ − 13 , 0 and the Newmark time-marching scheme (4.29)


 

wih coefficients γ = (1 − 2α)/2 and β = (1 − α)2 /4 is used.

53
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

4.6 Spatial discretization and Fast Multipole tech-


nique
The spatial discretization of (4.7) and of (4.20) is similar. The surface Γ is
discretized by means of plane triangular panels. In order to approximate the
domain integrals appearing in both the equations, a discretization of the domain
Ω is required. One of the motivations of the present work is to check the quality
of the results obtained by means of an automatically generated structured mesh
made of cubic elements. The details of the internal mesh generation are given
in Appendix B.
As in the static case, the potential u is approximated by a continuous, piece-
wise linear function over the boundary, while the normal flux p is modelled as
piecewise constant over the boundary elements. The internal variable, which
is u̇ for the parabolic equation and ü for the scalar wave equation, is approxi-
mated by a piecewise constant function over the internal cubic elements. The
unknowns are thus all the unassigned boundary values (m potentials uΓ , n fluxes
pΓ ) and k internal unknowns (u̇Ω,i or üΩ,i ), for a total of n + m + k unknowns.
A corresponding number of collocation points is required in order to enforce
the equation to solve (respectively (4.7) or (4.20) for parabolic and hyperbolic
problems). The chosen points are:

- the m nodes on Γp , where the value of the potential uΓ is unknown;

- the center of the n elements on Γu , where the value of the flux pΓ is


unknown;

- the center of the k internal elements in Ω, where the value of the internal
unknown field (u̇Ω or üΩ ) is unknown.

The matrix coefficients involve surface integrals of the type


Z
(p )
GΓ,ij = G (xi , y) Nj Γ (y) dS (y) (4.35)
Γ

and
Z
(uΓ )
KΓ,ij = K (xi , y) Nj (y) dS (y) (4.36)
Γ

which are evaluated by means of Gauss quadrature (nonsingular integrations)


and by the analytical integration formulae described in Milroy et. al. [26]

54
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

(singular and quasi-singular integrations). They also involve domain integrals


of the type
Z
GΩ,ij = G (xi , y) NjfΩ (y) dV (y)

 (4.37)
u̇Ω parabolic equation
f =
üΩ scalar wave equation

A procedure for the analytical evaluation of singular and quasi-singular in-


tegrals of type (4.37) with constant shape functions (NjfΩ = 1) over a regular
parallelepiped is given in Appendix A. The nonsingular integrations are treated
by means of a standard Gauss quadrature scheme.
The Fast Multipole Method is applied for the evaluation of both the surface
integrals and the volume integrals appearing in (4.7) and (4.20). Both integra-
tions are characterized by the presence of source points which do not belong to
the integration region, and the generalized implementation of the Fast Multipole
Method, as described in Chapter 3, must be adopted.
For what concerns the FMM algorithm, the only difference between surface
integrations and volume integrations is in the definition of the Φ2M formula,
which in the former case involves a surface integral of the type
 −→ 
∂Rn,m −→
Z
(Γ)
Mn,m (O) = Rn,m Oy p (y) − Oy u (y) dS (y) (4.38)
Γk ∂n

over a generic kth boundary element, while in the latter case involves the volume
integral
Z  −→ 
(Ω)
Mn,m(O) = Rn,m Oy f (y) dV (y)

 (4.39)
u̇Ω parabolic equation
f =
üΩ scalar wave equation

over a generic kth internal element. As a consequence the total Fast Multipole
integration strategy must be specialized for the two kinds of integrals, and
requires the construction of separate oct-trees. Except for this modification,
the matrix-vector multiplication algorithm is analogous to the one described for
the potential problem.

55
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

4.7 Numerical examples


4.7.1 Simple wave propagation
Let us consider a straight rod with square section and of length l1 = 1. A
potential u = 0 is assigned at one of its ends, denoted as end A, while at the
other end B a step loading p = H(t − t̃) is assigned. The short edges of the
rod have both length l2 = 0.1. The external structured triangular mesh has
characteristic size ∆xB = 0.025, with 4 subdivisions along the short edges and
40 subdivisions along the longitudinal direction.
Keeping fixed the boundary discretization, the problem has been solved
adopting different internal meshes, and different time-step lengths for each given
mesh. Moreover, two configurations of the internal mesh have been tested. In
the first case the internal mesh is aligned with the rod and perfectly matches
the interior domain. In the second case the mesh is oblique with respect to the
rod, with an approximation in the representation of the internal domain. Fig-
ure 4.1 shows the coarsest and the finest meshes employed, for both the aligned
and the oblique configurations. It is quite clear from the figure how, for a suffi-
ciently refined mesh, the error in the representation of the real domain becomes
negligible even in the case of the oblique mesh.
The adopted ∆t is expressed as a function of an nondimensional coefficient
k, which measures the number of internal elements crossed by the wave during
a time-step. The corresponding time-step length is thus ∆t = kc−1 s ∆x, where
∆x is the side length of the internal elements. For k = 1 the time-step is the
time needed by the wave to cross one internal element. The α-HHT like method
has been employed, with a fictitious damping coefficient α = −0.05.
This problem, despite its simplicity, is a good benchmark for the evaluation
of the performance of numerical solvers. The flux step loading is a critical
condition, which exposes the weaknesses of any numerical technique. The results
of this benchmark are useful to evaluate the accuracy of both the internal spatial
discretization and the time discretization. Figures 4.2-4.4 show several results
obtained for both the aligned and the oblique mesh. Each graph represents the
time history of both the flux at end A and the potential at end B.
The graphs in Figure 4.2 show the solution obtained with the finest mesh,
with different time-step cofficients ranging from k = 0.1 (∆t = 0.00125) to
k = 2.0 (∆t = 0.025). The potential history is well approximated, regardless of
the adopted ∆t, for both the aligned and the oblique mesh. The flux history
exhibits an optimal accuracy for k = 1.0, showing increasing oscillations after
the jumps for larger time-steps, and before the jumps for smaller time-steps.

56
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Aligned rod - coarse mesh Oblique rod - coarse mesh

Aligned rod - fine mesh Oblique rod - fine mesh

Figure 4.1: Simple wave propagation: internal meshes

57
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

Potential and flux history, dx=0.0125


2
dt/dx=0.1
dt/dx=0.5
1.5 dt/dx=1.0
dt/dx=2.0
1

0.5

0
Potential, Flux

-0.5

-1

-1.5

-2

-2.5

-3
0 1 2 3 4 5 6 7 8 9
Time

Potential and flux history, dx=0.0125


2
dt/dx=0.1
dt/dx=0.5
1.5 dt/dx=1.0
dt/dx=2.0
1

0.5

0
Potential, Flux

-0.5

-1

-1.5

-2

-2.5

-3
0 1 2 3 4 5 6 7 8 9
Time

Figure 4.2: Simple wave propagation: fine internal mesh, different time-step
lengths

58
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Potential and flux history, dt/dx=1.0


2
dx=0.1
dx=0.05
1.5 dx=0.025
dx=0.0125
1

0.5
Potential, Flux

-0.5

-1

-1.5

-2

-2.5
0 1 2 3 4 5 6 7 8 9
Time

Potential and flux history, dt/dx=1.0


2
dx=0.1
dx=0.05
1.5 dx=0.025
dx=0.0125
1

0.5
Potential, Flux

-0.5

-1

-1.5

-2

-2.5
0 1 2 3 4 5 6 7 8 9
Time

Figure 4.3: Simple wave propagation: coarser meshes, fixed time-step coefficient

59
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

Potential and flux history, dt=0.0125


2
dx=0.1
dx=0.05
1.5 dx=0.025
dx=0.0125
1

0.5

0
Potential, Flux

-0.5

-1

-1.5

-2

-2.5

-3
0 1 2 3 4 5 6 7 8 9
Time

Potential and flux history, dt=0.0125


2
dx=0.1
dx=0.05
1.5 dx=0.025
dx=0.0125
1

0.5

0
Potential, Flux

-0.5

-1

-1.5

-2

-2.5

-3
0 1 2 3 4 5 6 7 8 9
Time

Figure 4.4: Simple wave propagation: coarser meshes, fixed time-step length

60
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

The graphs in Figure 4.3 show the solution obtained with coarser meshes. All
the solutions are obtained with a fixed time-step coefficient k = 1, corresponding
to time-step lengths ranging from ∆t = 0.0125 for the finest mesh to ∆t = 0.1
for the coarsest one. While the results obtained with the aligned mesh are
satisfactory, those relative to the oblique mesh are acceptable only with the
finest mesh. In fact, one notes that the wave reflections do not happen at
the correct times, as a consequence of the excessively coarse discretization of
the internal domain; moreover, the flux history experiences stronger oscillations
than in the case of the aligned mesh. Despite these drawbacks, the maximum
value of the potential and that of the flux (purged from the oscillations) are
correctly approximated even with the coarser meshes.
Finally, the graphs in Figure 4.4 are relevant to a fixed time-step ∆t =
0.00125 with different spatial discretizations. The results obtained with the
oblique mesh confirm that the shift in reflection times is due to the lack of
accuracy in the internal discretization, more than to the adoption of excessively
large time-steps.

4.7.2 Radial wave propagation

A disc slice of unit thickness is subject to a triangular perturbation in the


potential along the inner circular edge, and has an assigned fixed potential at
the opposite edge. A zero flux is applied to the remaining surfaces. The internal
radius of the disc is r1 = 2, the external one is r2 = 10. A time window of T = 81
time units has been studied.
Several meshes and time-step values, listed in table (4.1), have been tested.
A picture of the most refined internal mesh is reported in Figure 4.5. As for the
time discretization, the α-HHT like time-marching scheme has been adopted,
with α = −0.05. The time-step coefficient has been set to ∆t/∆x = 1.0.
The reference Finite Element solution has been obtained using Femlab, with
a mesh of 10-node tetrahedral elements, for a total of 44924 DOFs. The Femlab
time-dependent solver uses an implicit variable-order variable-stepsize backward
differentiation formula.
Figure 4.6 shows the time-history of the potential at the center of the bottom
surface during the first 20 seconds of simulation, showing a good agreement
between the BEM and FEM results.

61
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

DOF ∆t Time steps


Internal Boundary
BEM 1 606 381 0.5 162
BEM 2 4816 381 0.25 324
BEM 3 9430 5157 0.2 405
BEM 4 9430 42221 0.2 405
FEM 44924 - Variable 2495

Table 4.1: Radial wave propagation, discretization details

Figure 4.5: Radial wave propagation, finest internal mesh

Radial wave propagation


0.6
Femlab
Coarse external, dx=0.5
Coarse external, dx=0.25
0.4 Fine external, dx=0.2

0.2
Potential

-0.2

-0.4

-0.6
0 5 10 15 20
Time

Figure 4.6: Radial wave propagation, time-history of the potential

62
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

4.7.3 Transient conduction: heat sink


The heat dissipator shown in Figure 4.7 is loaded by a constant heat flux at its
bottom surface. The remaining surfaces are subject to convection-like Robin
boundary conditions (p = −k(u − u0 )). The time-history of the temperature at
the center of the base of the heat sink is sought.

Figure 4.7: Heat sink, geometry

Three Boundary Element meshes, together with two Finite Element meshes,
have been studied. Details on the solved discretizations are given in Table 4.2.

DOF
Internal Boundary
BEM 1 6640 1909
BEM 2 6640 52392
BEM 3 53120 52392
FEM 1 13375 -
FEM 2 98890 -

Table 4.2: Heat sink, mesh details

63
A FMM accelerated Boundary Element technique for scalar evolutive
problems in 3D

Temperature, bottom surface


0.45

0.4

0.35

0.3
Temperature

0.25

0.2

0.15

0.1
BEM 1
BEM 2
0.05 BEM 3
FEM 1
FEM 2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Time

Figure 4.8: Heat sink: time-history of temperature

Figure 4.8 shows the time history of temperature at the central point of the
bottom plate. The BEM results are in good agreement with the ones obtained
with FEM, with the exception of a slight underestimation of the maximum value
of the temperature obtained with the coarsest BEM mesh.

64
Chapter 5

The Boundary Element


Method for the exterior
Stokes flow problem

Two existing formulations based on integral equations for the external Stokes
problem are tested and compared, in order to determine their accuracy and
their weaknesses. The issues related to the iterative solution of singular and ill-
conditioned BEM problems are then analysed, and the necessary justification
for the necessity of the Mixed Velocity-Traction equation approach is provided.

5.1 Microflows and integral formulations


One of the open issues in MEMS analysis is the correct evaluation of the drag
forces exerted on movable parts by the surrounding fluid. According to the
experimental literature (see e.g. Karniadakis et. al. [20]), due to the microscale
of the MEMS devices, the resulting flows are characterized by low Reynolds
numbers. When dealing with microscale flows, another important important
parameter is the Knudsen number Kn, which depends on the ratio between
the free path of gas molecules and the characteristic spatial dimension of the
flow. The Knudsen number is fundamental to determine if the continuum model
is applicable, or if a molecular dynamics model should be used instead. As
depicted in Figure 5.1, many MEMS applications at atmospheric pressure are
The Boundary Element Method for the exterior Stokes flow problem

characterized by low Knudsen number flows, belonging to the continuum flow


and slip flow regimes, which can be efficiently simulated by the Stokes model
(creep flow).

Figure 5.1: Fluid flow regimes for several MEMS applications (Karniadakis et.
al. [20])

The low Reynolds number allows to neglect the effects of turbolence, giving
rise to a linear problem, which can be solved by means of integral equations.
While the classical Boundary Element Method can not guarantee the required
scalability to solve large-scale problems arising in MEMS analysis, the usage of
an accelerated solver employing a fast matrix-vector multiplication technique
like the Fast Multipole Method or the Wavelet Compression technique (see e.g.
Wang [40], Tausch [36]) allows to solve problems which are out of reach at a
reasonable cost using the Finite Element Method or the Finite Volume Method.

5.2 The Stokes flow in MEMS analysis


An incompressible, Newtonian fluid is characterized by the constitutive law
 
∂ui ∂uj
σij = −P δij + µ + (5.1)
∂xj ∂xi

66
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where u is the fluid velocity, µ is the viscous drag coefficient and P is the
hydrostatic pressure, which is not dependent on the rate of deformation, due to
the incompressibility of the fluid.
The steady-state Stokes equations, which can be recovered from the Navier-
Stokes equations by neglecting the time-dependent terms and the terms related
to turbolence, can be written in the form

µ∆u − ∇P = 0
(5.2)
∇·u = 0
where the first equation is the 3D equilibrium equation, while the second one is
the continuity equation, which reduces to an incompressibility constraint.
With reference to Figure 5.2, a set of m rigid bodies Ωα with surface Γα
are immersed in a viscous fluid, which in turn can be enclosed by an external
envelope of surface ΓE .

m
1
nm
n1 um
u1

,
u2 n2
2

Figure 5.2: Stokes flow problem in MEMS analysis

The rigid motion velocity at each surface Γα is assigned


u|Γα = U α + Ωα × X = g (α) (x) (5.3)
α x0 ,α
while the total force F and torque M exerted on the rigid bodies Ωα are
the sought quantities.

67
The Boundary Element Method for the exterior Stokes flow problem

A free flow must also satisfy the radiation condition at infinity


 
−1
lim u (x) = O |x| (5.4)
x→∞

while confined flows must match the velocity of the enclosing boundary
u|ΓE = V E + ΩE × X = g (E) (x) (5.5)
We define with G the set of right hand sides obtained by any possible com-
bination of rigid body motions over the closed surfaces Γα .
This problem is known as the resistance problem. Two different integral
formulations are available in literature to solve the resistance problem, and will
be described in the next sections.
Moreover, the possible flows affecting MEMS devices can be roughly divided
in two categories, as illustrated in Figure 5.3. Pressure driven flows are gener-
ated when the relative distance between the immersed parts varies during the
motion (e.g. in parallel plate resonators), and the total force is thus mainly
determined by the pressure difference between the opposite sides of the bodies.
Shear driven flows are instead characterized by a relative slide between the im-
mersed parts (e.g. in comb drive resonators and torsional accelerometers), and
the total force is hence determined by the shear surface tractions.

5.3 The single-layer formulation


The single-layer formulation for the Dirichlet problem is similar to the formula-
tion described for the electrostatic problem (see e.g. Wrobel [41]). The velocity
boundary integral equation for internal points
Z Z
ui (x) = Gij (x, y) tj (y) dS (y) − Kij (x, y) uj (y) dS (y) (5.6)
Γ Γ

is derived by applying the second Green identity to the equilibrium equation


(5.2-1). The equation expresses the velocity at any internal point x as a function
of the boundary distributions of velocity u (y) and traction t (y), by means of
appropriate fundamental solutions known as the Stokeslet
1
Gij (x, y) = (δij + r,i r,j ) (5.7)
8πµr
and the Stresslet
3 r,i r,j r,k nyk
Kij (x, y) = (5.8)
4π r2

68
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

u u u
u

Pressure driven Shear driven


flow flow
Figure 5.3: Classification of flows in MEMS devices

where r = x − y, and r,i = ∂r/∂xi .


The corresponding external equation is
Z Z
0 = Gij (x, y) tj (y) dS (y) − Kij (x, y) uj (y) dS (y) (5.9)
Γ Γ

By taking the limit to a boundary smooth point and inserting the Dirichlet
boundary conditions of rigid body motion described in the former section we
obtain
Z XZ
1 α (α)
Gij (x, y) tj (y) dS (y) = ui (x)+ − Kij (x, y) uj (y) dS (y) (5.10)
Γ 2 α Γ α

for the internal problem, and


Z XZ
1 α (α)
Gij (x, y) tj (y) dS (y) = − ui (x)+ − Kij (x, y) uj (y) dS (y) (5.11)
Γ 2 α Γα

for the corresponding external problem.


When a set of rigid body motions is applied to the surfaces composing the

69
The Boundary Element Method for the exterior Stokes flow problem

boundary, the following simplification holds


(α)
Z 
(α) − 12 gi α=β
x ∈ Γβ ⇒ − Kij (x, y) gj dS (y) = (5.12)
Γα 0 α 6= β

thus the boundary integral equation for the external flow reduces to a Fredholm
Boundary Integral Equation of the first kind
Z
Gij (x, y) tj (y) dS (y) = −gi (x) (5.13)
Γ

where the unknown potential t (x) is the surface traction exerted by the fluid
flow over the rigid surfaces Γα .
Let us define the two integral operators
Z
 
V f i (x) = Gij (x, y) fj (y) dS (y)
ZΓ (5.14)
 
K f i (x) = Kij (x, y) fj (y) dS (y) .
Γ

The velocity equation can then be rewritten as


  
1
[V t] (x) − K + u (x) = 0 (5.15)
2

for the internal problem, and as


  
1
[V t] (x) − K − u (x) = 0. (5.16)
2

for the complementary external problem.


The single-layer operator for the external Dirichlet problem (5.13) can thus
be written in compact form

[V t] (x) = −u. (5.17)

The corresponding traction equation for both the internal and the external
problem can be recovered by a proper combination of x-derivatives of (5.6),
which lead to
Z Z
ti (x) = Sij (x, y) tj (y) dS (y) − Dij (x, y) uj (y) dS (y) (5.18)
Γ Γ

70
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where
3 r,i r,j r,k nxk
Sij (x, y) = − (5.19)
4π r2
and Dij is an hypersingular term (see e.g. Aliabadi [2]) which satisfies
Z
Dij (x, y) gj (y) dS (y) = 0 ∀g ∈ G. (5.20)
Γ

The traction equation thus specializes at the boundary in


Z
1
Sij (x, y) tj (y) dS (y) − ti (x) = 0 (5.21)
Γ 2

for the internal problem, and in


Z
1
Sij (x, y) tj (y) dS (y) + ti (x) = 0 (5.22)
Γ 2

for the external problem, which can be rewritten respectively as


  
1
S− t = 0 (5.23)
2

and
  
1
S+ t = 0 (5.24)
2

where the single-layer traction operator


Z
 
S f i (x) = Sij (x, y) fj (y) dS (y) (5.25)
Γ

is defined.
Moreover, since
Z
 
S f (x) = Sij (x, y) tj (y) dS (y) =
Γ
Z (5.26)
= − Kji (y, x) ti (y) dS (y) = − K ∗ f (x)
 
Γ

71
The Boundary Element Method for the exterior Stokes flow problem

then (5.23) and (5.24) can be rewritten in the form


  
∗ 1
K + t = 0 (5.27)
2
and
  
1
K∗ − t = 0 (5.28)
2
respectively.
The internal single-layer traction operator is thus the adjoint of the internal
double-layer operator, and similarly the external single-layer traction operator
is the adjoint of the external double-layer operator.
The corresponding variational version of the single-layer velocity operator is
obtained by multiplying it by a test function t̃ (x) and integrating it over the
boundary
Z Z
t̃, [V t] = t̃i (x) Gij (x, y) tj (y) dS (y) dS (x) =
Γ Γ
Z (5.29)
= − t̃i (x) ui (x) dS (x) = − t̃, g
Γ
Due to the properties of the Stokeslet, the single-layer velocity operator is
singular, i.e. it is characterized by a non trivial null-space spanned by the
vectors

n (x) /meas (Γα ) x ∈ Γα
tα (x) = (5.30)
0 elsewhere
This means that an arbitrary set of hydrostatic pressures can be added to a
solution of (5.13) or (5.29), generating another valid solution. The correct value
for the hydrostatic pressure can be calculated by solving a pressure integral
equation (Abosleiman et al. [1], Wang [40]).
However, since a constant hydrostatic pressure over the boundary is a self-
equilibrated load, any of the solutions of (5.13) is a suitable solution for the
evaluation of the total force and torque, provided that the iterative solver can
actually recover a numerically acceptable solution.

5.4 The completed double-layer formulation


According to an alternative approach (see e.g. Pozrikidis [32] or Power et al.
[31]), the velocity at any internal point of the flow region can be expressed as a

72
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

function of a double-layer density φ, through the equation


Z
ui (x0 ) = − Kij (x, x0 ) φj (x) dS (x) (5.31)
Γ

where

3 (x − x0 )i (x − x0 )j (x − x0 )k nk (x)
Kij (x, x0 ) = r = kx − x0 k
4πµ r5
(5.32)

is the Stresslet defined in the previous section.


When the point x0 is taken to the boundary from the external domain, the
strongly singular nature of the stresslet kernel gives rise to a free term. The
resulting boundary integral equation at a smooth boundary point is
Z
1
ui (x0 ) = φi − − Kij (x, x0 ) φj (x) dS (x) x∈Γ (5.33)
2 Γ

where the integral exists in the Cauchy Principal Value sense. Equation (5.31)
cannot describe an arbitrary solution of (5.2) since it can be shown that (5.33)
has a null-space given by the set G of rigid body motions
 
1  
N K− = G = span ψ 1,(α) , . . . , ψ 6,(α)
2 (5.34)
k,(α) k+3,(α)
ψj = δjk ψj = jkl (x − xα )l k = 1, 2, 3

where xα is an arbitrary point internal to the surface Γα .


Moreover, the generic flow described by (5.31) decays at infinity with order
O(1/r2 ), contradicting the radiation condition for unbounded flows.
Power and Miranda ([30]) proposed a completed formulation, which describes
the velocity field in the form
Z
ui (x0 ) = − Kij (x, x0 ) φj (x) dS (x) +
X Γj (α)
X j (α)
(5.35)
ui (x0 , xα ) Fj + ri (x0 , xα ) Mj
α α

where
1 ijk (x0 − xα )k
rij (x0 , xα ) = (5.36)
8πµ r5

73
The Boundary Element Method for the exterior Stokes flow problem

is called Rotlet, and the internal singularities allow to complete the range of the
double-layer operator, therefore providing an equation yielding a unique solution
which satisfies the Stokes equations and the prescribed boundary conditions.
The strength of the internal singulatities can be arbitrarily expressed as
linearly dependent upon the double-layer density:
Z
(α) k,(α)
Fk = φj (x) ψj (x) dS
Γ
Zα (5.37)
(α) k+3,(α)
Mk = φj (x) ψj (x) dS
Γα

The completed double-layer equation can be solved in terms of the unknown


double-layer density. Due to the properties of the Stresslet, Stokeslet and Rotlet,
the total force acting on the generic body α is equal to F (α) ; analogously, the
total torque with respect to the point xα is equal to M (α) .

5.5 Discretization of the single-layer and double-


layer operators
The discretization of (5.13) and (5.29) is performed by means of a piecewise
constant approximation of the unknown traction t over triangular elements, in
analogy to the discretization used for the normal flux for the potential problem.
Several discretizations over triangular elements have been tested for the
double-layer operator, in order to check the relative accuracy with respect to
the single-layer operator for comparable problem sizes. The four approximations
adopted are:

- Piecewise constant discretization (1 unknown per element);

- Continuous linear discretization (1 unknown per node);

- Piecewise linear discretization (k unknowns per node, where k is the num-


ber of concurrent elements);

- Partially piecewise linear discretization (1 unknown per node where the


surface is sufficiently smooth, k unknowns per node on edges or vertices).

The total drag forces exerted on a sphere and on a cube subject to a pure
translation in an unbounded domain are evaluated.

74
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 5.4 shows the error in the evaluation of the drag force using different
meshes and formulations. The reference value is represented by the analytical
one in the case of the spherical body, and by the value obtained with a 150000
DOF mesh using the single-layer formulation for the cubic body.

Sphere Cube
1 0.1
SLPC
DLPC
DLCL
0.1 DLPL
Error

Error 0.01
SLPC
0.01 DLPC
DLCL
DLPL
DLPPL
0.001 0.001
100 1000 10000 100 1000 10000
DOF DOF

Figure 5.4: Error in the evaluation of the total drag force on a translating sphere
and cube ; SL - Single-Layer; DL - Double-Layer; PC - Piecewise Constant; CL
- Continuous Linear; PL - Piecewise Linear; PPL - Partially Piecewise Linear

These benchmarks show that the single-layer formulation achieves a better


accuracy for a given probem size in presence of sharp edges; other tests carried
out on slender bodies (typical of MEMS geometry) also have confirmed the
better performance of the single-layer formulation, therefore in this thesis focus
has been set on this formulation.

5.6 Algebraic properties of the single-layer op-


erator
Both the Collocational and the Symmetric Galerkin single-layer formulations
(5.13) and (5.29), as already remarked, are singular operator, since they possess
a non empty null-space given by (5.30). A close look to their algebraic properties
is therefore required to comprehend the various issues related to the numerical
solution of the derived discretized BEM formulations.
Let us denote by E the ∞-dimensional space of functions containing the
solution of the two operators, while Eh is the n-dimensional subspace of E given

75
The Boundary Element Method for the exterior Stokes flow problem

by the piecewise constant interpolation functions defined over a trangulation T


with characteristic size h of the panels.
The quadratic form
Z Z
t̃, [V t] = t̃i (x) Gij (x, y) tj (y) dS (y) dS (x) (5.38)
Γ Γ

appearing in (5.29) is symmetric, and the corresponding discretized version is


still symmetric if the test functions t̃i (x) are properly chosen from the space Eh .
The resulting discretized linear system is then
 SG 
{t} = − g SG

V (5.39)

where
Z Z
 SG 
V hk
= Gij (x, y) dS (y) dS (x) (5.40)
Γh Γk

and
Z
g SG

h
= g (x) dS (x) (5.41)
Γh

Apart from the eventual loss of precision due to the adopted integration
strategy, the matrix V SG is still symmetric.
Things radically change if the Collocational formulation (5.13) is considered.
Despite the fact that the fundamental solution is symmetric

Gij (x, y) = Gji (y, x) = Gji (x, y) (5.42)

and consequently the integral operator


Z
fi (x) = Gij (x, y) tj (y) dS (y) (5.43)
Γ

is symmetric with respect to the norm


Z
f (x) , g (x) = fi (x) gi (x) dS (y) (5.44)
Γ

the linear system stemming from the discretization of (5.13)


 C
V {t} = {u} (5.45)

76
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where the coefficients of the unsymmetric matrix V C are grouped in 3×3 blocks
in the form
Z  
 C
V hk = Gij x(h) , y dS (y) (5.46)
Γk

and x(h) is the hth collocation point, is generally not symmetric.

5.7 The solution of a singular system with the


GMRES method
5.7.1 Issues in exact arithmetics
The first issue to be addressed when trying to solve a singular system with a
Krylov based method is to determine if a solution could be reached. Assuming
that the right hand side b of the equation Ax = b belongs to the range of A, the
singular system has infinite solutions, since a generic vector belonging to the
null-space of A can be added to a solution x to obtain another solution.
Even if the system has infinite solutions, it could be that none of them
belongs to the Krylov space Kn (A, b).
It can be shown (Ipsen et al. [19]) that
 a square linear system Ax = b has a
Krylov solution if and only if b ∈ R Ai , where i is the index of the zero eigen-
value of A. The index of an eigenvalue is the dimension of the largest Jordan
block associated to that eigenvalue. If the null-space dimension (geometric mul-
tiplicity) is equal to the algebraic multiplicity of the zero eigenvalue then i = 1,
and the condition for the existence of a Krylov solution reduces to b ∈ R (A).
This is the case of the external Stokes problem. Since the normal vectors to
the surfaces of the immersed bodies are mutually orthogonal, a Krylov solution
exists.
According to Figure 5.5 the Krylov solution is the only one belonging to
the affine space parallel to the null-space N (A) and belonging also to the range
R(A).
This means that symmetric matrices lead to a Krylov solution which is null-
space free. The theoretical advantage of Boundary Element formulations leading
to symmetric matrices will be clear in the next section, and derives from the
explicit knowledge of the vector space orthogonal to the range of the operator.
The Krylov space solution of a singular system can be written in terms of
the Drazin inverse of the system matrix A (Drazin [9]), which is defined as the

77
The Boundary Element Method for the exterior Stokes flow problem

Figure 5.5: The Krylov solution of a singular system

unique matrix AD satisfying the properties

AD AAD = AD AD A = AAD Ai+1 AD = Ai (5.47)

If A is nonsingular then AD = A−1 .


Due to the fact that the range of a linear operator is orthogonal to the
null-space of its adjoint operator, i.e.

R (A) = N (A∗ ) (5.48)

the Drazin inverse solution of a linear system Ax = b is orthogonal to N (AT ).


If A is a symmetric matrix, the Drazin inverse solution is the null-space free
solution, since A = AT and

R (A) = N (A) . (5.49)

The construction of the Drazin inverse of a singular matrix is straightfor-


ward by means of the Jordan decomposition of the matrix itself. The Jordan
decomposition of A leads to its representation in the form
 
C 0
A=X X −1 (5.50)
0 N
where C is a nonsingular matrix and N is a nilpotent matrix. In particular,
N = 0 if i(λ = 0) = 1. The Drazin inverse of a matrix in this form is
 −1 
C 0
AD = X X −1 (5.51)
0 0

78
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

It can be shown that x = AD b is a solution of Ax = b if b ∈ R Ai , and
also that the Drazin solution is the unique Krylov solution.
Therefore, in exact arithmetic, the GMRES method converges to the Drazin
solution of the n × n singular system

Ax = b, m(λ = 0) = α, dim(N (A)) = α, b ∈ R(A) (5.52)

in at most n − α iterations.
Another issue which arises when solving the discretized version of eqns.
(5.45) and (5.39) is the fact that the discrete right hand side of

[V ] {t} = − {g} (5.53)

could not belong to R([V ]), and the deviation from the correct range is often
much larger than the machine precision, expecially for coarse meshes.
The effect of this inaccuracy is that the discretized singular system has no
solution, and the GMRES method cannot converge to any vector {t}, since a
positive lower limit for the norm of the residual is defined by the out-of-range
component of the right hand side. Moreover, the Krylov space

r0 = u − V t0
(5.54)
Kk ≡ span r0 , V r0 , V 2 r0 , . . . , V k−1 r0


is malformed, since g 6∈ R(G) and therefore so does the Krylov space Kk (A, b).
As a consequence of these issues, the GMRES method keeps increasing the
Krylov space up to its completion, which happens at a generic kth step. At
this point, the (k + 1)th Krylov vector is linearly dependent upon the former
ones, the resulting Hessemberg matrix is singular and the associated least square
problem can not be solved, determining the failure of the iterative solver.
In order to recover a solution, the right hand side should be projected onto
R(V ) leading to the modified linear system

V t = −g̃ (5.55)

where
" #
X
∗(α) ∗(α)
g̃ = I − t ⊗t g (5.56)
α

and the vectors t∗(α) form an orthonormal basis for the null-space N (V T ). This
ensures that a solution can be actually reached.

79
The Boundary Element Method for the exterior Stokes flow problem

5.7.2 Other numerical issues


Up to this point, the main problems present in exact arithmetics have been
properly addressed. However, the numerical solution of a singular system is
subject to other sources of error, which are consequences of the finite preci-
sion representation of floating point numbers in computers and of the adoption
of approximations for the evaluation of the various quantities involved in the
discretized problem.
First of all, the numerical integration schemes used in the evaluation of the
coefficients lead to a nonsingular, though very ill-conditioned, system matrix.
As a consequence iterative methods could converge to a solution even if the
right hand side does not belong to the theoretical range of the matrix. In this
latter case, however, the Krylov solution shows a null-space component which
often overwhelms the orthogonal component, with severe loss of accuracy.
When dealing with finite precision arithmetics, another source of error is
directly related to the Arnoldi process for the construction of the Krylov ba-
sis. First of all, the generic vector wk+1 = Avk is not perfectly orthogonal to
the null-space N (AT ). Moreover, the Gram-Schmidt orthogonalization which
is performed on each vector can be the source of a further loss of orthogo-
nality between vk+1 and the null-space of the adjoint matrix. A subsequent
reorthogonalization with respect to that space should then be issued after the
Gram-Schmidt orthogonalization of wk+1 with respect to v1 , . . . , vk .
The Arnoldi algorithm should then be modified as
r0 = b − Ax0
β = kr0 k
V1 = r0 /β
do i = 1, k
wi+1 = AVi
do j = 1, i − 1
hj,i =< wi+1 , Vj >
wi+1 = wi+1 − hj,i Vj
end
do α = 1, m
dα,i =< wi+1 , n∗,(α) >
wi+1 = wi+1 − dα,i n∗,(α)
end
hi+1,i = kwi+1 k
Vi+1 = wi+1 /hi+1,i
end

80
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

In this case, eqn. (2.41) is modified to

AVk = Vk+1 H k + N D (5.57)

where D is an m × k matrix in which the jth column contains the projection of


the vector wj+1 along the null-space N (AT ), and N is an n × m matrix whose
column α is given by t∗(α) . In exact arithmetics D is a null matrix.
Consequently the relation between the residual rk and the Krylov solution
becomes
rk = b − Axk = b − A (x0 + Vk y) = r0 − AVk y =
 (5.58)
= Vk+1 βe1 − Vk+1 H k + N D y

leading to the m × n least squares problem



min Vk+1 βe1 − Vk+1 H k + N D y (5.59)
y∈R

which can not be further simplified. The resulting algorithm is therefore prac-
tically useless, due to its computational requirements.
The only way to solve this issue is dropping the last term in (5.57), with the
implicit acceptance of a numerical error whose value is of the order of the norm
of the matrix D.
The better way to completely remove any convergence issue due to the sin-
gularity of A is to modify the operator itself in order to obtain a non singular
system which solution is the Drazin solution of the original one. Recalling that
the Drazin solution is the only solution of

Ax = b b ∈ R(A) (5.60)

which satisfies the constraint


D E
x ⊥ N (AT ) ⇒ x, t∗,(α) = 0 (5.61)

it is easy to show that the augmented operator


X
à = A + β t∗(α) ⊗ t∗(α) (5.62)
α

where the augmentation term is a projector over the null-space of the adjoint
operator, provides the Drazin inverse solution of the singular operator A.

81
The Boundary Element Method for the exterior Stokes flow problem

This is a consequence of the fact that


!
X
∗(α) ∗(α)
R t ⊗t = N (A∗ ) = R (A) (5.63)
α

thus the solution of the system


!
X
∗(α) ∗(α)
Ãx = A+β t ⊗t x = b b ∈ R (A) (5.64)
α

is equivalent to the unique solution shared by the two decoupled singular systems

Ax = b
!
X
∗(α) ∗(α)
(5.65)
t ⊗t x = 0.
α

As a matter of fact, the desingularization of the single-layer operator can be


performed using different augmentation operators, as flows from the following
reasoning.
Let V be the discretized velocity operator (5.39) or (5.45), and g the corre-
sponding right hand side belonging to the space G of rigid body motions.
Let

G −1 ≡ {t|V t ∈ G} (5.66)

be the space containing all the solutions of the single-layer equation with r.h.s.
belonging to G. Since V is singular,

dim G −1 = dim (G) + dim (N (V )) .



(5.67)

Let us define the subspace


 
G̃ −1 ⊂ G −1 dim G̃ −1 = dim (G) (5.68)

with the property

∀t ∈ G̃ −1 , ∀tα ∈ N (V ) t 6= tα (5.69)

which simply resolves in

G̃ −1 ∩ N (V ) ≡ 0. (5.70)

82
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Thus subspace G̃ −1 has the same dimensions of G, and its intersection with
the null-space contains only the zero element.
The subspace G̃ −1 is not unique. For example

G̃ −⊥ = G −1 ∩ N (V ) (5.71)
is the space containing all the null-space free solutions of the single-layer oper-
ator.
Similarly
G̃ −D = G −1 ∩ R (V ) (5.72)
is the space containing all the Drazin inverse solutions of the single-layer oper-
ator.

If V is symmetric, then the two latter spaces coincide, since R (V ) ≡ N (V ) .
−1
The solution of V t = −g, with t ∈ G̃ , exists and is unique. Now let B
be a second operator with the properties
N (V + B) ≡ 0 (5.73)
and
∃G̃ −1 |G̃ −1 ⊆ N (B). (5.74)
Then, let t̂ be the unique solution of the single-layer operator belonging to
G̃ −1 . Clearly
(A + B) t̂ = At̂ = −g (5.75)
since t̂ ∈ N (B) for property (5.74), therefore t̂ is a solution of the modified
system
(A + B) t = −g. (5.76)
Moreover, for property (5.73), t̂ is the unique solution of (5.76).
Therefore, if the augmentation of the single-layer operator is performed by
means of a projector onto its own null-space, instead of onto the null-space of
its adjoint operator,
" #
h i X
Ṽ {t} = V + β t∗(α) ⊗ t∗(α) {t} = − {g} ∈ G (5.77)
α

the desingularized system still provides a valid solution of the single-layer oper-
ator, different from the corresponding Drazin inverse solution.

83
The Boundary Element Method for the exterior Stokes flow problem

5.8 The ill-conditioning of the single-layer oper-


ator
Despite the conclusions of the previous section, the ill-conditioning of the single-
layer boundary operator for the external Stokes problem leads to severe con-
vergence problems even if a desingularization technique is adopted. This ill-
conditioning is largely dependent upon the geometry of the boundary. For this
reason, the solution of the resistance problem over immersed particles of com-
pact form usually leads to negligible convergence issues, which can be easily
corrected by a simple augmentation. The examples provided in this section
aim at analysing the limits of a pure single-layer formulation for the analysis of
MEMS-like geometries.
As a first benchmark, the problem of the tetrahedral body shown in Figure
5.6 translating with unit speed is solved. The boundary is meshed with 516
triangular elements, and four discretizations are used. The CBEM solution
is obtained by means of the collocational formulation (5.13), while the three
SGBEM solutions are obtained by (5.29), with a different quadrature scheme
for the x-integration using respectively 1, 7 and 52 Gauss-Hammer points.
The graph in Figure 5.7 shows the convergence history of all the analysed
formulations, with and without the adoption of a desingularization approach.
The singular system shows a tendency to stall at a certain threshold, which is
lower for the SGBEM technique. However, when the system is desingularized,
the stagnation zone disappears.
The stagnation zones are due to the fact that in finite precision arithmetic the
matrix, though being severely ill-conditioned, is never rank-deficient. Therefore
the Krylov vectors resulting from matrix-vector multiplications are not perfectly
orthogonal to the null-space of the transpose matrix, and the successive Gram-
Schmidt orthonormalization can lead to Krylov vectors approximatively lying in
the null-space N (V T ). The correlation between the convergence history and the
magnitude of the component along N (V T ) of the kth Krylov vector wk is shown
in Figure 5.8 for the CBEM approach. It is noticeable that the stagnation occurs
when the component of the vector wk along the null-space reaches a maximum.
In the second example, the benchmark problem shown in Figure 5.9 is solved.
The immersed body is moved in the direction of its fingers axis.
The presence of thin parts is typical of MEMS geometries, and the resulting
matrix is severely ill-conditioned, even if the correct augmentation technique is
employed. As a matter of fact, the possibility of ill-conditioning of the matrices
arising from the discretization of single-layer operators is a well known issue.

84
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 5.6: TGranslating simplex; problem geometry

However, the numerical issues observed here are typical of the velocity operator
of the Stokes flow problem, since the single-layer operators derived from both
the Laplace and the compressible elasticity fundamental solution lead to optimal
convergence over the same geometry.
The convergence history plotted in Figure 5.10 for both the CBEM approach
and the SGBEM approach with 52 Gauss-Hammer points shows that the stan-
dard desingularization technique alone is not able to remove all the stagnation
zones. As a matter of fact, the presence of slender members, which numerically
behave like almost independent bodies, leads to a severe ill-conditioning of the
resulting desingularized matrix.
This kind of behavior suggests the presence of a numerical null-space spanned
by the left eigenvectors associated to the lowest singular values. The effect of
the ill-conditioning on the convergence rate of the GMRES method is almost
independent of the formulation adopted.
Numerical experiments show that pressure driven flows are more sensitive
to the conditioning of the system matrix than shear driven flows. A simple
observation which corroborates this evidence is that the main issues related

85
The Boundary Element Method for the exterior Stokes flow problem

Convergence History
1
CBEM
SGBEM_1
0.01 SGBEM_7
SGBEM_52
1e-04 CBEM - NS
SGBEM_1 - NS
1e-06 SGBEM_7 - NS
SGBEM_52 - NS
Error

1e-08

1e-10

1e-12

1e-14

1e-16
0 20 40 60 80 100 120 140
Iterations

Figure 5.7: Translating simplex; convergence history

CBEM
1

0.01

0.0001

1e-06

1e-08

1e-10

1e-12

1e-14
Convergence history
<w_k,n*>/||w_k||
1e-16
0 20 40 60 80 100 120
k

Figure 5.8: Translating simplex; stagnation in convergence due to the loss of


orthogonality between wk and the null-space

86
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 5.9: Fingered structure

to ill-conditioning depend on the incompressibility, therefore boundary condi-


tions which do not induce large pressure variations are expected to lead to
faster convergence of the resulting algebraic system of equations, despite the
ill-conditioning of the matrix.
In order to check if this is the case, the matrix augmentation and right hand
side projection can be performed with respect to a bigger number of eigenvectors
than the ones belonging to the theoretical null-space. The effects on the con-
vergence history for the CBEM matrix are plotted in the first graph of Figure
5.11 as functions of the number of eigenvalues used in the desingularization.
Let us define with c−1 = σmin /σmax the inverse of the condition number,
where σmin and σmax are the extremal singular values. A singular matrix has
one or more null singular values, therefore c−1 = 0. For a perfectly conditioned
matrix σmin = σmax holds, therefore c−1 = 1.
The effect of the numerical null-space removal on the inverse of the condition
number of the analysed matrix is plotted in the second graph of Figure 5.11,
while the third graph shows the total force acting on the body. The results refer
to both the CBEM and the SGBEM52 formulation.
We can observe that the value of the total force is almost independent of the
dimension of the supposed numerical null-space in the desingularization phase,
which means that the solution of the system is basically determined by the
eigenvectors associated to the highest singular values.
The ill-conditioning due to the geometry is a critical issue for the analysis of

87
The Boundary Element Method for the exterior Stokes flow problem

Convergence History
1
CBEM
0.01 SGBEM_52
CBEM - AS
0.0001 SGBEM_52 - AS

1e-006

1e-008
Error

1e-010

1e-012

1e-014

1e-016

1e-018
0 50 100 150 200 250
Iterations

Figure 5.10: Convergence history of the fingered structure

MEMS-like structures. Unfortunately, the vectors belonging to the numerical


null-space of large-scale matrices cannot be calculated in a computationally
efficient way.
The ill-conditioning problems just described raise the need of an advanced
technique such as the Mixed Velocity-Traction Equation approach, proposed in
the next chapter, which preserves the solution of the single-layer approach while
achieving a superior conditioning.

88
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Convergence History - CBEM


1
Singular
1 EV
0.01 2 EV
3 EV
4 EV
1e-04 5 EV
10 EV
15 EV
1e-06 20 EV
30 EV
1e-08
Error

1e-10

1e-12

1e-14

1e-16

1e-18
0 50 100 150 200 250
Iterations

Condition number

CBEM
SGBEM_52
1/c

0 5 10 15 20 25 30
Eigenvectors

Total force
138
CBEM
SGBEM_52

137
Force

136

135

134
0 5 10 15 20 25 30
Eigenvalues

Figure 5.11: Convergence history with numerical null-space removal

89
The Boundary Element Method for the exterior Stokes flow problem

90
Chapter 6

The Mixed
Velocity-Traction
formulation for the external
Stokes flow

In this chapter, a new boundary integral formulation for the solution of the ex-
ternal Stokes flows with Dirichlet boundary conditions is proposed. The Mixed
Velocity-Traction (MVT) formulation allows to obtain an optimal rate of con-
vergence, therefore correcting any ill-conditioning issue related to the thin body
effect. The integral formulation is presented together with a proper discretiza-
tion technique, which allows to retain at discretized level the fundamental prop-
erties of the original integral operator. Several examples show the performance
and reliability of the proposed formulation for the evaluation of drag forces on
complex MEMS devices.

6.1 Introduction
As seen in the previous chapter, the AS approach, while formally retrieving
a non-singular formulation, usually does not address the performance issues
related to the thin body effect. The Mixed Velocity-Traction (MVT) equation
approach is a formulation first introduced by Frangi et al. [11] which enables
The Mixed Velocity-Traction formulation for the external Stokes flow

to obtain a nonsingular, well-conditioned equation which still delivers a valid


solution for the original single-layer formulation.
A brief survey of the necessary background information required to properly
introduce the MVT formulation is given in this section.
The single-layer external velocity equation with assigned rigid-body condi-
tions is given by

[V t] (x) = −g g∈G (6.1)

where V is the singular velocity operator, with null-space N (V ) = span(tα ) and


range R(V ) = N (V )⊥ .
A general solution of (6.1) is thus in the form

t = t⊥ + cα tα (6.2)

where cα are arbitrary coefficients and


 ⊥
V t (x) = −g (x)
(6.3)
t⊥ , t α = 0.
The corresponding external traction equation is
  
1
− K ∗ t (x) = 0 (6.4)
2

and is satisfied by the null-space free component t⊥ of the solution of the velocity
equation.
Since (6.4) is a homogeneous equation, any of its non-trivial solutions lies in
its null-space, which can thus be defined as
 
1 ∗
N −K = G −⊥ (6.5)
2
where

G −⊥ ≡ t | V t = −g, t⊥ , tα = 0

(6.6)

is the space spanned by the null-space free inverse solutions of the velocity
equation, when the r.h.s. is given by members of G.
Moreover, the null-space of the external double-layer operator is
 
1
N −K = G (6.7)
2

92
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

thus the range of the external traction operator is given by


 
1 ∗
R −K = G⊥. (6.8)
2

6.2 The Mixed Velocity-Traction formulation


The Mixed Velocity-Traction formulation is generated by imposing a linear com-
bination of the velocity equation and of its corresponding traction equation, in
the form
  
γ 1
[V t] (x) + − K ∗ t (x) = −g (x) (6.9)
µ 2

where γ is a length which can be used to tune the √ conditioning of the resulting
equation. From the practical point of view γ = γ̃ A, where γ̃ is a nondimen-
sional parameter, and A is a characteristic size of the adopted mesh.
At this point, it is necessary to show that the solution t of (6.9) for a given
r.h.s. g ∈ G is unique, and it is also a solution of the corresponding velocity
equation (6.1).
The uniqueness of the solution of (6.9) is verified if the corresponding ho-
mogeneous equation
  
γ 1 ∗
[V t] (x) + − K t (x) = 0 (6.10)
µ 2

admits only the trivial solution.


Let t̂ be a generic solution of (6.10). Let us define the velocity field
 
û (x) := V t̂ (x) (6.11)

and the traction field


  
γ 1
fˆ (x) := − K ∗ t̂ (x) . (6.12)
µ 2

In every interior domain Ωα , the velocity field û satisfies by definition the


Stokes equations. Moreover, due to (6.10), it satisfies the boundary conditions
over Γα
γ ˆ
û (x) + f (x) = 0. (6.13)
µ

93
The Mixed Velocity-Traction formulation for the external Stokes flow

An application of the Green’s formula for the Stokes flow leads to the fol-
lowing identity (the equivalent of Betti theorem for incompressible elasticity)
Z  2 Z
µ ∂ ûi ∂ ûj
(x) + (x) dV (x) = û (x) · fˆ (x) dS (x) (6.14)
2 Ωα ∂xj ∂xi Γα

The left hand side is nonnegative, and the right hand side can be rewritten
in the form
Z Z
γ 2
û (x) · fˆ (x) dS (x) = − kû (x)k dS (x) (6.15)
Γα µ Γα

which is negative if γ > 0. Therefore

û = 0 (6.16)

must hold, and a generic solution of (6.10) must consequently belong to the
null-space N (V ). However, if t̂ ∈ N (V ), then
  
1 ∗
− K t̂ (x) = 0 (6.17)
2

which implies t̂ = 0, since t̂ 6∈ N (1/2 − K ∗ ).


The MVT equation admits a unique solution. In order to show that this
solution is also a solution of the single-layer operator when rigid body velocity
boundary conditions are applied, let t⊥ be the null-space free solution of

[V t] (x) = −g (x) g ∈ G. (6.18)

This solution belongs to the subspace G −⊥ , which is the null-space of the


traction equation. Therefore
  
 ⊥ γ 1 ∗
V t (x) + − K t⊥ (x) =
µ 2 (6.19)
= V t⊥ (x) = −g (x)
 

which proves that the unique solution of the MVT equation with r.h.s. belonging
to G is a solution of the single-layer equation.
It is important to stress that the behaviour of the MVT solution at dis-
cretized level heavily depends on the ability to preserve the correct null-space
of the traction operator in the actual implementation. The three sources of

94
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

errors are (i) the discretization technique employed, (ii) the approximation of
the involved integrations (Gauss quadrature, Fast Multipole Method) (iii) the
finite precision of floating point arithmetics in computers. While the two latter
sources of error are somehow unavoidable, a correct choice in the discretization
scheme is required to avoid the total loss of accuracy of the MVT solution.

In the discretized MVT formulation, let us call A the single-layer velocity


operator and B the single-layer traction operator, which are singular matrices.
Let c−1 be the inverse of the condition number, as defined in the previous
chapter.

While the theoretical MVT solution does not depend on the value of the
weighting coefficient γ, the conditioning of the MVT matrix heavily depends
upon it. Indeed, for γ = 0, the single-layer formulation is recovered, and c−1
0 = 0.
−1 A+γB B A+γB
If γ → ∞, then again cγ → 0, since σmax → γσmax → ∞ and σmin →
σ ∗ ≤ σmax
A
. Therefore, it exists an optimal value γ̂ such as c−1 γ̂ = max c−1
γ .
The goal of a successful MVT implementation is to ensure that the solution of
the mixed equation for γ = γ̂ is sufficiently close to the solution of the velocity
equation, despite the loss of accuracy introduced by the discretization process.

6.3 The discretization of the MVT equation

The right choice for the discretization technique is a crucial issue in a successful
implementation of the MVT formulation. In order to unify the various dis-
cretization algorithms which will be described, let us introduce the variational
version of equation (6.9).

Let t̃ be an arbitrary test function belonging to the same space E of the


unknown function t. Then, the weak formulation of (6.9) is

   
γ 1
t̃, [V t] + t̃, − K∗ t = − t̃, g ∀t̃ ∈ E (6.20)
µ 2

95
The Mixed Velocity-Traction formulation for the external Stokes flow

where
Z Z
t̃, [V t] = t̃i (x) Gij (x, y) tj (y) dS (y) dS (x)
Γ
    Γ Z
1 ∗
t̃, −K t = t̃i (x) ti (x) dS (x) +
2 Γ
Z Z (6.21)

+ t̃i (x) Kij (x, y) tj (y) dS (y) dS (x)
Γ Γ
Z
t̃, g = t̃i (x) gi (x) dS (x) .
Γ

A triangulation of the surface Γ is performed by means of a set of plane


triangular elements τ ∈ T , and the unknown function is approximated by a
parametric field

n
(t)
X
t (x; t1 , . . . , tn ) = Ni (x) ti (6.22)
i=1

belonging to the n-dimensional vector space En ⊂ E. In particular, we refer to


a piecewise constant approximation for the unknown traction field, therefore

(t) 1 x ∈ τi
Ni (x) = . (6.23)
0 x∈6 τi

The test functions t̃ must belong to the same vector space Eh , therefore n
linearly independent equations can be derived from (6.20).

6.3.1 The Galerkin discretization


The Galerkin discretization consists in imposing the n equations defined above
without any further simplification. The final system of equations takes the form
 
γ
AG + B G {t} = − g G

(6.24)
µ

96
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where the (i, j)th element of the 3 × 3 submatrices corresponding to hth source
element and kth field element is
Z Z
 G
A hk ij = Nh (x) Nk (x) Gij (x, y) dS (y) dS (x)
ZΓ ZΓ

 G
B hk ij = Nh (x) Nk (x) Kij (x, y) dS (y) dS (x) + (6.25)
Γ Γ
Z
1
− δij Nh (x) dS (x)
2 Γ

and the right hand side is


Z
 G
g hi = Nh (x) g (x) dS (x) ∈ GnG (6.26)
Γ

which, for a piecewise constant approximation, specialize respectively in


Z Z
 G
A hk ij = Gij (x, y) dS (y) dS (x)
τ τ
ZhZk Z (6.27)
 G ∗ 1
B hk ij = Kij (x, y) dS (y) dS (x) − δij dS (x)
τh τk 2 τh

and
Z
G
g (x) dS (x) ∈ GnG .

g τh i
= (6.28)
τh

The null-space of the traction matrix arising in the Galerkin approach is


the same of that of the original formulation, with the only difference of the
restriction to the subspace En , i.e.
−1
N B G = GnG

. (6.29)

therefore this formulation still provides a correct MVT solution which is inde-
pendent of the value of γ.

6.3.2 The Collocation discretization


The Collocation approach can be recovered by evaluating the x-integral with a
midpoint quadrature rule
Z
f (x) Nk (x) dS (x) = f (xτ ) wτ (6.30)
Γ

97
The Mixed Velocity-Traction formulation for the external Stokes flow

where xk is the centroid of the support of Nk , and


Z
wτ = Nk (x) dS (x) . (6.31)
Γ

For the piecewise constant approximation xk is the centroid of the kth tri-
angular element.
The resulting system of equations is
 
γ C
[W ] A + B {t} = − [W ] g C
C

(6.32)
µ

where
Z
 C
A hk ij = Nk (x) Gij (xh , y) dS (y)

 C ∗ 1 (6.33)
B hk ij = Nk (x) Kij (xh , y) dS (y) − δhk δij
Γ 2
 C
g h i = gi (xh ) ∈ GnC

and the matrix W is given by

[W ] = diag (w1 , . . . , wn ) . (6.34)

The most crucial issue with the Collocation approach is that the matrix
corresponding to the discretized traction operator is not the transpose of the
collocational discretized double-layer operator. From the theoretical point of
view, the existence of a null-space for the Collocational traction equation can
not be proved. Numerical experiments confirm that the resulting matrix is not
singular, and the solution of the corresponding MVT equation shows an heavy
dependence of the value of γ.

6.3.3 The Qualocation discretization


The Qualocation approach consists in approximating the y-integrations as well
as the x-integrations in the self-term and in the right hand side with a midpoint
quadrature rule (Tausch et al. [37]).
In this case, the resulting system of equations becomes
 
γ Q
AQ + B Q [W ] {t} = − [W ] {g} (6.35)
µ

98
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

where
Z
 Q
A hk ij = Nh (x) Gij (x, yk ) dS (x)

 Q ∗ 1 (6.36)
B hk ij = Nh (x) Kij (x, yk ) dS (x) − δhk δij
Γ 2
 Q
g h i = gi (xh ) ∈ GnQ
and the matrix W is still
[W ] = diag (w1 , . . . , wn ) . (6.37)
The matrix obtained with the Qualocation discretization of the traction
operator B Q is the transpose of the matrix arising from the Collocation dis-
cretization of the double-layer operator. Indeed
Z
 ∗Q  ∗
K hk ij
= Nh (x) Kij (x, yk ) dS (x) =
Γ
Z
= Nh (y) Kji (xk , y) dS (y) = (6.38)
Γ
T
= K C kh ji = K C hk ij
  

which leads to
 Q  C !T
1 1
K∗ − = K− . (6.39)
2 2
Due to the fact that any rigid body motion lies in the null-space of the
Collocational double-layer operator, its adjoint operator possesses a null-space
of the same dimension. Therefore an appropriate subspace G̃ −1 can be defined
and property (5.74) is fullfilled. The consequence is that the solution obtained
with the Qualocation approach is theoretically independent of the value of γ.
In practise, it shows a much lighter dependence than the corresponding Collo-
cational solution, allowing to obtain a well-conditioned MVT matrix with very
little impact on accuracy.

6.4 The Fast Multipole Method for the MVT


equation
The FMM approach for thr MVT equation requires the evaluation of far-field
contributions of both the single-layer velocity operator and the single-layer trac-

99
The Mixed Velocity-Traction formulation for the external Stokes flow

tion operator. The actual formulation depends on the adopted discretization.


The formulae are introduced with reference to the Galerkin discretization, with
double integrations. The corresponding Collocational and Qualocational ver-
sions are then obtained by approximating either integration with a 1-point
quadrature rule.

6.4.1 Single-layer velocity operator


The evaluation of surface tractions by means of the single-layer velocity operator
involves the evaluation of the weakly singular integral appearing in (5.29), where
the kernel is the Stokeslet (5.7).
The required expressions can be evaluated starting from the Fast Multi-
pole formulation for compressible linear elasticity (see e.g. Yoshida et al. [42],
Nishimura [27]), by setting the limit for the Poisson ratio ν −→ 1/2.
In particular, the Stokeslet (5.7) can be rewritten in the form
 −→ 
Oy
1  1 j
Gij (x, y) = P
 ij (x) + Qi (x) (6.40)
8πµ r r

where
−→ ∂
Pij = δij − Ox
j ∂xi
(6.41)

Qi = .
∂xi

Using the expansion


∞ X n −→ −→
1 X −→ −→
= Sn,m Ox Rn,m Oy Oy < Ox (6.42)
r n=0 m=−n

the ith component of the double integral over the source element Γα and the
field element Γβ
Z Z
(Γα ,Γβ )
Ii = Gij (x, y) φj (y) dS (y) dS (x) (6.43)
Γα Γβ

100
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

can be rewritten in the form


Z X ∞ X n −→
(Γ ,Γ ) 1 V
Ii α β = Pij (x) Sn,m Ox Mj,n,m (O) dS (x) +
8πµ Γα n=0 m=−n
Z X ∞ X n −→ (6.44)
1 S
+ Qi (x) Sn,m Ox Mn,m (O) dS (x)
8πµ Γα n=0 m=−n

where M S and M V are respectively scalar and vector multipole moments defined
by the φ2M formulae
Z −→
V
Mj,n,m (O) = Rn,m Oy φj (y) dS (y)
Γβ
Z −→ −→ (6.45)
S
Mn,m (O) = Rn,m Oy Oy φj (y) dS (y) .
Γβ j

The corresponding M 2M formulae are


0
n
X n
X −−→
0
V
Mj,n,m (O ) = Rn0 ,m0 O0 O Mj,n−n
V
0 ,m−m0 (O)

n0 =0 m0 =−n0
0
n n −−→
S
Mn,m 0
(O ) =
X X
Rn0 ,m0 O0 O Mn−n
S (6.46)
0 ,m−m0 (O) +

n0 =0 m0 =−n0
−−→ 
− O0 O Mj,n−n
V
0 ,m−m0 (O) .
j

The integral (6.44) can be expressed also as a function of local expansion


coefficients by means of the L2I formula
0
Z ∞ n
(Γα ,Γβ ) 1 X X
R −−→ V
Ii = Fij,n 0 ,m0 (x0 x) Lj,n0 ,m0 (x0 ) dS (x) +
8πµ Γα n0 =0 m0 =−n0
0
(6.47)
Z ∞ n
1 X X
−−→ S
+ GR
i,n0 ,m0 (x0 x) Ln0 ,m0 (x0 ) dS (x)
8πµ Γα n0 =0 m0 =−n0

where

R
Fij,n,m (−x−
→ −−→ −−→
0 x) = δij Rn,m (x0 x) − (x0 x)j Rn,m (−
x−

0 x)
∂xj
(6.48)
−−→ ∂
GRi,n,m (x0 x) = Rn,m (−
x−

0 x) .
∂xi

101
The Mixed Velocity-Traction formulation for the external Stokes flow

The scalar and vector local expansion coefficients LS and LV are defined by
the M 2L formulae
0

X n
X −−→
n
LVj,n,m (x0 ) = V
(−1) Sn+n0 ,m+m0 Ox0 Mj,n 0 ,m0 (O)

n0 =0 m0 =−n0
0
∞ n −−→
(−1) Sn+n0 ,m+m0 Ox0 MnS0 ,m0 (O) + (6.49)
X X n
LSn,m (x0 ) =
n0 =0 m0 =−n0
−−→ 
V
− Ox0 Mj,n 0 ,m0 (O) .
j

Finally, the L2L formulae are


0
∞ n
Rn0 −n,m0 −m (−
x− → V
X X
LVj,n,m (x1 ) = 0 x1 ) Lj,n0 ,m0 (x0 )
n0 =n m0 =−n0
0
∞ n
(6.50)
Rn0 −n,m0 −m (−
x− →
X X
LSn,m (x1 ) = S
0 x1 ) Ln0 ,m0 (x0 ) +
n0 =n m0 =−n0

− (−
x− →

V
0 x1 )j Lj,n0 ,m0 (x0 ) .

The only modification required by the Collocational formulation is in the


definition of the L2I equation, which becomes
0
∞ n
(Γ ,Γ ) wα X X
R −−−→ V
Ii α β = Fij,n 0 ,m0 (x0 xα ) Lj,n0 ,m0 (x0 ) +
8πµ 0
n =0 m0 =−n0
0
(6.51)
∞ n
wα X
(−
x−−→ S
X
+ GR
i,n0 ,m0 0 xα ) Ln0 ,m0 (x0 ) dS (x)
8πµ 0
n =0 m0 =−n0

where wα is the area of the source region Γα .


Similarly, the Qualocation formulation requires the modification of the Φ2M
formulae
−−→
V
Mj,n,m (O) = wβ Rn,m Oyβ φj (yβ )
−−→ −−→ (6.52)
S
Mn,m (O) = wβ Rn,m Oyβ Oyβ φj (yβ ) .
j

where wβ is the area of the field region Γβ .

102
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

The Fast Multipole implementation for the velocity operator thus involves
four scalar arrays, instead of the only one needed by by the scalar integral
equation associated to the Laplace problem. Let alone this difference, the ap-
proximation of the matrix-vector multiplication is carried on with an identical
algorithm.

6.4.2 Single-layer traction operator

The evaluation of the far-field part of the traction integral involves a proper def-
inition of multipole moments and local expansions based on the decomposition
of the Stresslet kernel, which is formally indentical to the single-layer traction
kernel for incompressible elasticity. The derivation of the FMM formulation
for the hypersingular equation for compressible elasticity, which involves the
traction kernels, is derived in Yoshida et al. [42]. The incompressible elasticity
formulation cannot be derived by the compressible formulation by simply set-
ting ν → 1/2, due to the presence of the pressure term in the Stresslet kernel,
which thus requires an ad hoc derivation of the corresponding FMM algorithm.

The Stresslet kernel Kij appearing in the traction operator can be rewritten
in the form

   
∗ δik ∂ 2 ∂ ∂
Kij (x, y) = +µ Gji + µ Gjk nk (x) =
8π ∂xj r ∂xk ∂xi
    (6.53)
δik ∂ 2 ∂ ∂
= − +µ Gji + µ Gjk nk (x)
8π ∂yj r ∂xk ∂xi

which, using (6.40) and (6.42), can be rewritten as

∞ n
∗,n,m −
1 X X  → −→

Kij (x, y) = Kij Ox, Oy (6.54)
8π n=0 m=−n

103
The Mixed Velocity-Traction formulation for the external Stokes flow

where
∗,n,m −
 → −→ ∂ −→ −→
Kij Ox, Oy = δij nk (x) S n,m Ox Rn,m Oy +
∂xk
−→ ∂ −→ ∂ −→
+ Ox nk (x) S n,m Ox Rn,m Oy +
i ∂xk ∂yj
∂ −→ −→ ∂ −→
− nk (x) S n,m Ox Oy Rn,m Oy +
∂xk i ∂yj
−→ −→ (6.55)

+ nj (x) S n,m Ox Rn,m Oy +
∂xi
−→ ∂ −→ ∂ −→
+ nk (x) Ox S n,m Ox Rn,m Oy +
k ∂xi ∂yj
∂  −→ −→
   ∂ −→
− nk (x) S n,m Ox Oy Rn,m Oy .
∂xi k ∂yj

The ith component of the double integral over the source element Γα and
the field element Γβ
Z Z
(Γ ,Γ ) ∗
Ii α β = Kij (x, y) φj (y) dS (y) dS (x) (6.56)
Γα Γβ

can thus be rewritten in the form


Z X ∞ X n −→
(Γ ,Γ )
Ii α β = ASij,n,m Ox Mj,n,m
V
(O) dS (x) +
Γα n=0 m=−n
Z ∞ X
X n −→
S V
+ Bi,n,m Ox nj (x) Mj,n,m (O) dS (x) + (6.57)
Γα n=0 m=−n
Z X∞ X n −→
S S
+ Ci,n,m Ox Mn,m (O) dS (x)
Γα n=0 m=−n

where
−→ ∂ −→
ASij,n,m Ox = δij nk (x) S n,m Ox
∂xk
−→ ∂ −→
S
Bi,n,m Ox = S n,m Ox (6.58)
∂xi
−→  −→ −→  −→
S ∂ ∂
Ci,n,m Ox = Ox nk (x) + Ox nk (x) S n,m Ox
i ∂xk k ∂xi

104
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

and the multipole moments are given by the Φ2M formulae


Z −→ −→ ∂ −→
V
Mj,n,m = Rn,m Oy φj (y) − Oy Rn,m Oy φk (y) dS (y)
Γβ j ∂yk
(6.59)
−→
Z
S ∂  
Mn,m = Rn,m Oy φk (y) . dS (y)
Γβ ∂yk

Finally, the integral (6.57) can be expressed also as a function of local ex-
pansion coefficients by means of the L2I formula
0
Z ∞ n
(Γ ,Γ ) 1 X X
−−→ V
Ii α β = AR
ij,n0 ,m0 (x0 x) Lj,n0 ,m0 (x0 ) dS (x) +
8π Γα n0 =0 m0 =−n0
0
Z ∞ n
1 X X
R −−→ V
+ Bi,n 0 ,m0 (x0 x) nj (x) Lj,n0 ,m0 (x0 ) dS (x) +
8π Γα n0 =0 m0 =−n0
0
Z ∞ n
1 X X
R −−→ S
+ Ci,n 0 ,m0 (x0 x) Ln0 ,m0 (x0 ) dS (x)
8π Γα n0 =0 m0 =−n0

(6.60)

where

−−→ ∂
AR
ij,n0 ,m0 (x0 x) = δij nk (x) Rn0 ,m0 (−
x−→
0 x)
∂xk
−−→ ∂
R
Bi,n 0 ,m0 (x0 x) = Rn0 ,m0 (−x−

0 x) (6.61)
∂xi
 
−−→ ∂ ∂
R
Ci,n 0 ,m0 (x0 x) = (−
x−

0 x)i nk (x) + (−x−→
0 x)k nk (x) Rn0 ,m0 (−
x−

0 x)
∂xk ∂xi

The local expansion coefficients are defined as functions of the multipole


moments by the M 2L formulae (6.49). Analogously, the M 2M and L2L ex-
pressions are given by (6.46) and (6.50), respectively.
The Qualocation and Collocation formulations require the evaluation of the
integrals appearing in (6.59) and (6.60), respectively, with a midpoint Gauss
quadrature rule.
The scalar multipole moment coefficients for the traction operators are the
same than those for the velocity operator, therefore they can be shared by the
FMM algorithm for the MVT equation. On the other end, the vector coefficients

105
The Mixed Velocity-Traction formulation for the external Stokes flow

are different, and undergo different transformations, so they must be kept sepa-
rate. Moreover, since the scalar local expansion coefficients depend even on the
vector moments, they also need to be kept separate.

6.5 Numerical examples


The examples provided in this chapter are tailored to the solution of MEMS
devices with the MVT approach. With the exception of the first examples,
which uses a direct solver in order to explicitly obtain the solution and the
condition number of the system of equations, the GMRES solver coupled with a
left Leaf Preconditioner is employed. The Leaf Preconditioner is a block Jacobi
preconditioner which uses the submatrices associated to the leaves of the oct-
tree structures as diagonal blocks. The explicit inversion of the submatrices is
fast enough to be fully compensated by the reduction in GMRES iterations.
With the exception of the first example, a weighting coefficient γ̃ = 1 has been
adopted for the MVT equation.
The total forces and torques acting on the rigid bodies are evaluated at the
end of the analysis. According to the numerical tests carried out, the relative er-
ror for the computed forces at a given iteration is of the same order of magnitude
of the GMRES relative residuum, therefore valid results from the engineering
point of view can be obtained with tolerances in the order of η = 10−2 .
The following examples have been solved on a 2.6GHz Pentium 4 personal
computer with 1GB of RAM.

6.5.1 Translating sphere


Let us consider a sphere of radius R translating with assigned velocity V in a
fluid of viscosity µ. The analytical solution of the Stokes problem provides the
total drag force

F = 6πµRV (6.62)

and is considered a standard benchmark for the validation of Stokes flow solvers.
The MVT solution, obtained with different values of the weighting coefficient γ̃,
is compared with the solution of the corresponding single-layer equation. The
problem is solved with three different meshes, respectively of 144 DOFs, 624
DOFs and 2400 DOFs. The results obtained with both the Collocation and the
Qualocation approaches are listed in the following table.

106
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS
log γ̃ -2 -1 0 1 2 3 4
Collocation 18.74 18.73 18.64 17.84 12.47 3.18 0.36
Qualocation 18.74 18.74 18.74 18.74 18.74 18.74 18.73

The Qualocation approach proves its superior algebraic qualities. Even with
a piecewise constant discretization, the null-space of the aprroximated traction
operator is preserved, and the results are independent on the value of γ̃. On the
contrary, the Collocation discretization leads to a non-singular traction operator,
and the corresponding MVT solution tends to zero as γ̃ increases.
The plot of the inverse of the condition number in Figure 6.1 shows that an
optimal conditioning can be reached with a value γ̃ ' 1, well within the range
in which the qualocation approach provides a perfect solution.

Figure 6.1: Translating sphere, inverse of the condition number

A similar test is performed on a more realistic MEMS-like geometry, made


up of a fixed stator and a moving fingered rotor, solved with six different meshes
ranging from 954 DOFs to 7578 DOFs. Figure 6.2 shows the finest mesh.
Even in this case, the solution obtained for different values of γ̃ is reported
in the table below, while the inverse of the condition number is plotted in Figure

107
The Mixed Velocity-Traction formulation for the external Stokes flow

Figure 6.2: Fingered structure

Figure 6.3: Fingered structure, inverse of the condition number

108
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

6.3. Once again the optimal condition number is obtained when γ̃ ' 1. In this
case the Collocation approach provides an highly deteriorated solution for that
value of γ̃, while the Qualocation sapproach still delivers an acceptable solution
from the engineering point of view.

log γ̃ -3 -2 -1 0 1 2 3
Collocation 923.84 922.93 915.51 870.26 676.96 374.46 23.84
Qualocation 924.16 924.42 925.63 921.23 906.99 891.26 836.76

These examples prove the superior quality of the Qualocation formulation


coupled with the MVT approach for the solution of the external incompressible
Stokes flow.

6.5.2 Comb finger resonator


The comb-finger resonator studied in Section 3.8.3 is considered. In this case
the structure consists in the three bodies already defined for the electrostatic
problem, together with a fixed substrate. The rotor is free to translate in a
direction parallel to the fingers.
Three different meshes have been employed, with 7196, 25344 and 68578
elements, for a total of 21588, 76032 and 205734 degrees of freedom, respectively.
Two of the the three meshes are depicted in Figure 6.4.
In Figure 6.5, the convergence history of the MVT formulation for the three
meshes is plotted, together with the nondimensional viscous drag force F/(µLU )
obtained with the intermediate mesh, where U = 1 is the unit velocity applied
to the rotor and L = 100µm is a characteristic dimension of the device. The
force rapidly converges, as anticipated at the beginning of the section.
Each matrix-vector product of the GMRES procedue for the three meshes
takes 13s, 31s and 110s respectively.

6.5.3 Torsional accelerometer


A somehow similar but more realistic example is the torsional accelerometer
of Figure 6.6. Several fixed comb-finger capacitors are attached to a circular
substrate while an inner ring connects all the comb-finger capacitors of the
rotor which is free to rotate around the vertical symmetry axis and is otherwise
constrained.
The full-scale analysis of the structure has been performed with three meshes
having 39780, 103740 and 379208 elements, corresponding to 119340, 311220 and

109
The Mixed Velocity-Traction formulation for the external Stokes flow

Figure 6.4: Meshes of the comb finger resonator

110
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 6.5: Comb finger resonator, convergence history and total force

1137624 unknowns, respectively. A detail of the most refined mesh is depicted


in Figure 6.6. Also in this case the structure is analysed in the configuration
shown with a unit angular velocity assigned along the vertical symmetry axis.
Convergence graphs and an example of torque history are collected in Figure
6.7, where C/(µL3 ω) is the nondimensional torque along the symmetry axis
(L = 100 µm). Each matrix-vector product of the GMRES procedure for the
three meshes takes 95s, 280s and 910s respectively.

6.5.4 Parallel plate resonator


As pointed out in the previous chapter the analysis of parallel plate MEMS
is the most challenging for a large scale iterative analysis. Convergence issues
inherent in classical approaches essentially have motivated the development of
the MVT formulation.
First one unit of a typical parallel plate MEMS is analysed with an increasing
level of refinement. The geometry is depicted in Figure 6.8 together with an
example of the meshes adopted and of the level of refinement reached (10484,
34910, 372004 elements, respectively).

111
The Mixed Velocity-Traction formulation for the external Stokes flow

Figure 6.6: Geometry and finest mesh of the torsional accelerometer

112
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 6.7: Torsional accelerometer, convergence history and total torque

In this case the structure on the left and the substrate are fixed and the
rotor on the right moves in the direction orthogonal to the parallel plates with
unit velocity. The holes in the rotor structure are required by the production
process. Results are collected in Figure 6.9. F/(µLU ) is again the nondimen-
sional force with L = 100 µm. Each matrix-vector product of the GMRES takes
approximately 18s, 75s and 548s for the three meshes, respectively.

The presence of several slender fingers has been identified as a major source
of ill-conditioning for the standard formulation. In order to verify the MVT
performance in this tough case, the structures in Figure 6.10 have been anal-
ysed. Essentially they consist of a collection of the basic unit but without holes.
Results concerning the convergence history are collected in Figure 6.11 in four
different cases with 1, 5 10 and 15 units and variable level of refinement. The
MVT technique easily addresses also this kind of analyis.

The change in slope of the convergence plot after 200 iterations is due to the
restart parameter of the GMRES solver which has been indeed set to 200.

113
The Mixed Velocity-Traction formulation for the external Stokes flow

Figure 6.8: Parallel plate unit: geometry, mesh1 and detail of mesh3

6.6 Conclusions
The Fast Multipole accelerated Boundary Element Method proves to be a highly
efficient technique for the evaluation of the drag forces exerted on MEMS de-
vices. Moreover, though being limited to the range of low Reynolds and low
Knudsen numbers, it is highly competitive in comparison with the Finite Ele-
ment and Finite Volume methods. The performance of boundary integral for-
mulations based on the single-layer velocity operator is however undermined
by the intrinsic ill-conditioning of the resulting algebraic systems of equations
which leads to stagnation in convergence. The Mixed Velocity-Traction equation
approach allows to overcome this issue, ensuring a proper conditioning without
affecting the quality of the solution.

114
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 6.9: Parallel plate unit: convergence history for GMRES solver and
global force on the rotor in the direction of movement

115
The Mixed Velocity-Traction formulation for the external Stokes flow

Figure 6.10: Simplified parallel plate resonator: single unit MEMS and 15 unit
MEMS

116
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

Figure 6.11: Simplified parallel plate resonator: convergence history for the
GMRES solver

117
The Mixed Velocity-Traction formulation for the external Stokes flow

118
Appendix A

Recursive evaluation of
spherical functions

The evaluation of the kernel expansions involved in the Fast Multipole algorithm
for both scalar and vector problems require the determination of the numerical
values of the functions Rn,m and Sn,m , together with their derivatives. The ex-
pressions given in (3.20) can be computed using simple recursive relations which
work in cartesian coordinates (Nishimura et al. [28]). Using these relations the
required quantities can be evaluated as functions of the position x = {x1 , x2 , x3 }
with norm r = ||x||.
Evaluation of RN,M (x):
- R0,0 (x) = 1
- for N = 1, 2, . . . , P
x1 + ix2
RN +1,N +1 (x) = RN,N (x) (A.1)
2 (N + 1)

- For M = 0, 1, . . . , P , N = M, M + 1, . . . , P − 1
(2N + 1) x3 RN,M (x) − r2 RN −1,M (x)
RN +1,M (x) = (A.2)
(N + M + 1) (N + 1 − M )

- For N = 1, 2, . . . , P M = 1, . . . , N ,
M
RN,−M (x) = (−1) RN,M (x) . (A.3)
Recursive evaluation of spherical functions

Evaluation of SN,M (x):


1
- S0,0 (x) = r

- for N = 1, 2, . . . , P

(2N + 1) (x1 + ix2 )


SN +1,N +1 (x) = SN,N (x) (A.4)
r2

- For M = 0, 1, . . . , P , N = M, M + 1, . . . , P − 1

(2N + 1) x3 SN,M (x) − (N + M ) (N − M ) SN −1,M (x)


SN +1,M (x) =
r2
(A.5)

- For N = 1, 2, . . . , P M = 1, . . . , N ,
M
SN,−M (x) = (−1) SN,M (x) . (A.6)

The derivatives can be then evaluated with the formulae


∂ 1
RN,M (x) = (RN −1,M −1 (x) − RN −1,M +1 (x))
∂x1 2
∂ i
RN,M (x) = (RN −1,M −1 (x) + RN −1,M +1 (x))
∂x2 2

RN,M (x) = RN −1,M (x)
∂x3
(A.7)
∂ 1
SN,M (x) = (SN +1,M −1 (x) − SN +1,M +1 (x))
∂x1 2
∂ i
SN,M (x) = (SN +1,M −1 (x) + SN +1,M +1 (x))
∂x2 2

SN,M (x) = −SN +1,M (x) .
∂x3

120
Appendix B

The internal mesh for the


DBEM formulation

B.1 Automatic generation of an internal mesh


A simple algorithm for the automatic generation of an internal structured mesh
enclosed by a closed surface defined by means of plane triangular elements is
described.
The boundary surface is enclosed in a regular parallelepiped, which is subdi-
vided in a regular grid of cubic cells of a given fixed size, as shown in Figure B.1
with reference to a corresponding 2D situation. Two procedures can be applied
for checking whether a given cell is actually in the domain or not.
The first one consists in a loop over the boundary elements for each candidate
cell. The distance di between the center of the candidate cell and the center of
the ith element is evaluated, searching for the element which minimizes it. The
sign of the dot product between the distance vector and the outward normal to
the boundary at the selected element allows to determine the status of the cell. If
the sign is positive, then the cell belongs to the internal domain. This procedure
is quite fast, however it may fail expecially in presence of boundary meshes
with rapid element size transitions, therefore the automatically generated mesh
should be inspected.
A second fail-proof, albeit slower, technique is described. For each cell, a
half-line with random direction d is generated starting from the candidate point.
If the line does not cross the boundary (line l1 in Figure B.1), then the point lies
The internal mesh for the DBEM formulation

outside the domain and the corresponding element does not exist. If the line
crosses the boundary one or more times (line l2 in Figure B.1), then the nearest
intersection to the candidate point is checked. Being n the outward normal to
the boundary, if d · n > 0 the point is in the domain, otherwise it is outside.

l2
l1

P

n Q3
2

Q1 1
Q2

Figure B.1: Internal mesh generation

The intersection of the line with the boundary must be checked at element
level. With reference to Figure B.1, let P be a point in space, and Q1 , Q2 and
Q3 the vertices of a triangular element; q = Q2 − Q1 and r = Q3 − Q1 are the
vectors running along two edges of the element, and v = Q1 − P is the vector
which points to the first vertex from the candidate point. The intersection
between the half-line and the plane containing the element is written in α, η1
and η2 coordinates, where η1 and η2 are the master element natural coordinates
and α is a coordinate running along db = d/ kdk.
The position of the intersection point is
v3 (q1 r2 − q2 r1 ) − v2 (q1 r3 − q3 r1 ) + v1 (q2 r3 − q3 r2 )
α =
D
d3 (r1 v2 − r2 v1 ) − d2 (r1 v3 − r3 v1 ) + d1 (r2 v3 − r3 v2 )
η1 =
D (B.1)
−d3 (q1 v2 − r2 v1 ) + d2 (q1 v3 − q3 v1 ) − d1 (q2 v3 − q3 v2 )
η2 =
D
D = d3 (q1 r2 − q2 r1 ) − d2 (q1 r3 − q3 r1 ) + d1 (q1 r2 − q2 r1 )
If D = 0 then the half-line and the plane containing the element are parallel.

122
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

If D 6= 0, then if 0 ≤ η1 ≤ 1 and 0 ≤ η2 ≤ 1 − η1 the line intersects the element.


Among the elements which intersect the line, the one with lower positive value
of α is selected. The sign of d · n allows to chech whether the candidate point is
inside or outside of the domain.

B.2 Analytical integrations over parallelepiped-


shaped volume elements
The volume integrals appearing in the boundary-domain formulations in chapter
4 are of the kind
Z
1
dV (y) (B.2)
Ω kx − yk

and must be evaluated on cubic cells, for an arbitrary position in the three-
dimensional space of the source point x.
If the singularity lies at the origin of the reference system, the analytical
solution over a regular parallelepiped with edges lenghts a, b and c, as shown in
Figure B.2, reduces to
Z a Z b Z c
1
dy3 dy2 dy1 =
p
0 + y22 + y32
0 0 y12
   
1 ab 1  ac  1 bc
= − c2 tan−1 − b2 tan−1 − a2 tan−1 + (B.3)
2 cd 2 bd 2 ad
1  1  1
− ac log a2 + c2 − bc log b2 + c2 − ab log a2 + b2 +

2 2 2
+ +bc log (a + d) + ac log (b + d) + ab log (c + d)

where
p
d= a2 + b2 + c2 (B.4)

which is valid for positive and negative values of a, b and c.


When the source point lies at an arbitrary position in space, the analytical
solution of the integral
Z a2 Z b2 Z c2
1
p dy3 dy2 dy1 (B.5)
a1 b1 c1 y12 + y22 + y32

123
The internal mesh for the DBEM formulation

c
d

b
a

Figure B.2: Volume integration with source point on parallelepiped vertex

I I1 I2

= - +

I3 I4 I5

- - +2 +

I6 I7 I8

+2 +2 -3

Figure B.3: Volume integration with source point in a generic position in space

124
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

can be obtained as a linear combination of singular integrals in the form (B.3),


since

I = I1 − I2 − I3 − I4 + 2I5 + 2I6 + 2I7 − 3I8 (B.6)

where I1 , . . . , I8 are the singular integrals over the domains shown in Figure B.3.
The proposed analytical solution fails if the distance between the singularity
and the integration domain is excessive, due to the numerical errors typical
of the subtraction of large quantities in (B.6). In analogy with the analytical
solutions in Milroy et. al. [26] for the surface integrals, it is convenient to set a
reference distance, over which numerical integrations can be employed.

125
The internal mesh for the DBEM formulation

126
Bibliography

[1] Abousleiman Y and Cheng AH-D. Boundary element solution for steady
and unsteady Stokes flow. Comput. Methods Appl. Mech. Engrg., 117:1–13,
1994.

[2] Aliabadi MH. The Boundary Element Method, Applications in Solids and
Structures, volume 2. Wiley, 2002.

[3] Balas J, Sladek J, and Sladek V. Stress Analysis by Boundary Element


Methods, volume 23 of Studies in Applied Mechanics. Elsevier, 1989.

[4] Benzi M and Tuma M. A sparse approximate inverse preconditioner for


nonsymmetric linear system. SIAM J. Sci. Comput., 19(3):968–994, may
1998.

[5] Bonnet M. Boundary Integral Equation Methods for Solids and Fluids.
John Wiley & Sons, 1995.

[6] Chen KE. An analysis of Sparse Approximate Inverse preconditioners for


boundary integral equations. SIAM. J. Matrix Anal. Appl., 22(4):1058–
1078, 2001.

[7] Cheng H, Greengard L, and Rokhlin V. A fast multipole algorithm in three


dimensions. Journal of Computational Physics, 155, 1999.

[8] Christiansen SH and Nédélec JC. Des préconditionneurs pour la résolution


numérique des équations intégrales de frontière de l’èlectromagnétisme. C.
R. Acad. Sci. Paris, 331(1):733–738, 2000.

[9] Drazin MP. Pseudoinverses in associated rings and semigroups. Amer.


Math. Monthly, 65:506–514, 1968.
BIBLIOGRAPHY

[10] Frangi A and Novati G. On the numerical stability of time-domain elasto-


dynamic analyses by BEM. Comput. Methods Appl. Mech. Engrg., 173:403–
417, 1999.
[11] Frangi A and Tausch J. A qualocation enhanced approach for the Dirichlet
problem of exterior Stokes flow. Engng. Analysis with Boundary Elem.,
2004. Submitted for publication.
[12] Frayssè L, Giraud L, Gratton S, and Langou J. A set of GMRES routines
for real and complex arithmetics on high performance computers. Technical
Report TR/PA/03/3, CERFACS, 2003. Public domain software available
on www.cerfacs.fr/algor/Softs.
[13] Giraud L, Langou J, and Sylvand G. On the parallel solution of large indus-
trial wave propagation problems. Technical report tr/pa/04/52, CERFACS,
2004.
[14] Greengard L and Rokhlin V. A fast algorithm for particle simulations. J.
of Computat. Phys., 73:325–348, 1987.
[15] Greengard L and Rokhlin V. A New Version of the Fast Multipole Method
for the Laplace Equation in Three Dimensions. 6:229–269, 1997.
[16] Grote MJ and Huckle T. Parallel preconditioning with sparse approximate
inverses. SIAM J. Sci. Comput., 18(3):838–853, may 1997.
[17] Hackbusch W and Nowak ZP. On the fast matrix multiplication in the
boundary element method by panel clustering. Numer. Math., 54:463–491,
1989.
[18] Ingber MS, Mammoli AA, and Brown MJ. A comparison of domain integral
evaluation techniques for boundary element methods. International Journal
for Numerical Methods in Engineering, 52:417–432, 2001.
[19] Ipsen ICF and Meyer CD. The idea behind Krylov methods. Amer. Math.
Monthly, 105(10):889–899, 1998.
[20] Karniadakis GE and Beskok A. Micro flows, fundamentals and simulation.
Springer, 2002.
[21] Mackerle J. Sensors and actuators: finite element and boundary element
analyses and simulation. A bibliography (1997-1998). Finite Elements in
Analysis and Design, 33:209–220, 1999.

128
Fast Multipole accelerated Boundary Element techniques for large-scale
problems, with applications to MEMS

[22] Mammoli AA and Ingber MS. Stokes flow around cylinders in a bounded
two-dimensional domain using a multipole-accelerated boundary element
methods. Int. J. Numer. Meth. Engng., 44:897–917, 1999.

[23] Mantivc V. A new formula for the C-matrix in the Somigliana identity.
Journal of Elasticity, 33:191–201, 1993.

[24] Margonari M. Boundary Element Techniques for Three Dimensional Prob-


lems in Elastostatics. PhD thesis, University of Trento, 2004.

[25] Merkel M, Bulgakov V, Bialecki R, and Kuhn G. Iterative solution of large-


scale 3D-BEM industrial problems. Engineering Analysis with Boundary
Elements, 22:183–197, 1998.

[26] Milroy J, Hinduia S, and Davey K. The elastostatic three-dimensional


boundary element method: analytical integration for linear isoparametric
triangular elements. Appl. Math. Modelling, 21:763–782, dec 1997.

[27] Nishimura N. Fast multipole accelerated boundary integral equation meth-


ods. Appl. Mech. Rev., 55(4):299–324, 2002.

[28] Nishimura N, Yoshida K, and Kobayashi S. A fast multipole boundary


integral equation method for crack problems in 3D. Engineering Analysis
with Boundary Elements, 23:97–105, 1999.

[29] Peirce A and Siebrits E. Stability analysis and design of time-stepping


schemes for general elastodynamic boundary element models. International
Journal for Numerical Methods in Engineering, 40:319–342, 1997.

[30] Power H and Miranda G. Second kind integral equation formulation of


Stokes flows past a particle of arbitrary shape. SIAM J. Appl. Math.,
47(4):689–698, 1987.

[31] Power H and Wrobel L. Boundary integral methods in fluid mechanics.


Computational Mechanics Publications, Southampton, 1995.

[32] Pozrikidis C. Boundary integral and singularity methods for linearized vis-
cous flow. Cambridge University Press, Cambridge, 1992.

[33] Saad Y. A flexible inner-outer preconditioned GMRES algorithm. SIAM


J. Sci. Comput., 14:461–469, 1993.

129
BIBLIOGRAPHY

[34] Saad Y and Shultz MH. Gmres: A generalized minimal residual algorithm
for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput.,
7:856–869, 1986.
[35] Steinbach O and Wendland WL. The construction of some efficient precon-
ditioners in the Boundary Element Method. Advances in Computational
Mathematics, 9:191–216, 1998.
[36] Tausch J. Sparse BEM for potential theory and Stokes flow using variable
order wavelets. Computational mechanics, 32:312–318, 2003.
[37] Tausch J and White J. Second Kind Integral Formulations of the Capaci-
tance Problem. Adv. Comput. Math., 9:217–232, 1998.
[38] Valente FP and Pina HLG. Iterative solvers for BEM algebraic systems
of equations. Engineering Analysis with Boundary Elements, 22:117–124,
1998.
[39] van der Vorst HA and Vuik C. GMRESR: a family of nested GMRES
methods. Num. Lin. Alg. Appl., 1:369–386, 1994.
[40] Wang X. FastStokes: a fast 3-D fluid simulation program for Micro-Electro-
Mechanical Systems. PhD thesis, Massachusetts Institute of Technology,
2002.
[41] Wrobel LC. The Boundary Element Method, volume 1, Applications in
Thermo-Fluids and Acoustics. John Wiley & Sons, 2002.
[42] Yoshida K, Nishimura N, and Kobayashi S. Application of Fast Multipole
Galerkin boundary integral equation method to elastostatic crack prob-
lems in 3D. International Journal for Numerical Methods in Engineering,
50:525–547, 2001.

130

You might also like