Parallel Multigrid Finite Volume Computation of Three-Dimensional Thermal Convection - 98-1188
Parallel Multigrid Finite Volume Computation of Three-Dimensional Thermal Convection - 98-1188
P.Wang*and R.D.Ferraro
Abstract
time-dependent, thermal convective flows is presented. The algebraic equations resulting from
been implemented on the Intel Paragon, the Cray T3D, and the IBM SP2 byusingdomain
decomposition techniques and the MPI communication software. The code can use ID, 2D, or
3D partitions as required by different geometries, and is easily ported to other parallel systems.
Numerical solutions for air (Prandtl number Pr = 0.733) with various Rayleigh numbers up to
lo' are discussed.
*corresponding author
1
thermal convection, parallel computation, finite volume, multigrid
Nomenclature
d width of cavity
h height of cavity
1 length of cavity
Nu Nusselt number
p non-dimensional pressure
p‘ pressure correction
R Rayleigh number
T non-dimensional temperature
x , y, z non-dimensional coordinates
Greek symbols
LY under-relaxation factor
2
/3 coefficient of thermal expansion
K thermal diffusivity
Y kinematic viscosity
D Prandtl number
1. INTRODUCTION
Natural convection driven by imposed horizontal density gradients finds many applications in
engineering: reactor cooling systems, crystal growth procedures, and solar-energy collectors. The
most numerically studied form of this problem is the case of a rectangular cavity with differentially
heated sidewalls. The two dimensional version of this problem has received considerable attention
[l, 2, 3, 4, 5, 61, but for the three dimensional case, very few results have been obtained, mainly
due to the limits in computing power. The significant computational resources of modern, mas-
sively parallel supercomputers promise to make such studies feasible. In order to determine such
flow structures and heat transfer, numerical simulations using the Navier-Stokes equations and the
In the early simulation methods vorticity and stream function were usually the calculated vari-
ables, but the use of primitive variables has attracted many computational researchers for three
fluid flows, such as MAC [7], the Projection Method [8],and others [9]. The SIMPLE algorithm of
Patankar and Spalding [lo] notonly provided a remarkablysuccessful implicit method, but has dom-
3
inated for a decade the field of numerical simulations of incompressible flows [ll].This algorithm
based on the finite volume formulation lends itself to easy physical interpretation, and it ensures
that if conservation is satisfied for each control volume, it is also satisfied for the entire calculation
Modelling large-scale three-dimensional time dependent fluid flows poses many challenges. Current
numerical methods perform well on fast, single-processor, vector computers; however, the expense of
performing such computation is extreme in terms of needed memory and processing time. With the
rapid development of parallel computation, modelling large scale three dimensional, time dependent
flows has become relistic by using a large numberof processors. Currently, many large scalescientific
simulations have been carried out on various parallel systems such as the Intel Paragon, the Cray
In our present study, the SIMPLE scheme is chosen as the algorithm for the thermal convective
flow problems on parallel systems. It is not only because the scheme is used widely in the convective
heat transfer community, but also because it is easy to implement on a parallel system. Since the
derived equations areonly dependent on itslocal control volume information, domain decomposition
techniques can be easily applied . Here we present a numerical study of thermal convection in a
large Rayleigh number range using parallel systems. Section 2 describes the mathematical formu-
lation of the three dimensional, time dependent, thermal cavity flows. The numerical approach for
those flows is given in Section 3. The detailed parallel implementation and the code performance
are described in Section 4. The numerical solutions for air (Prandtl number Pr=O.733) with various
4
Rayleigh numbers up to lo7 are discussed in Sections 5. The summary of the present work is given
in Section 6.
2. MATHEMATICAL FORMULATION
The flow domain is a rectangular cavity of 0 < x < 1,0 < y < d , and 0 <z < h. The appropriate
form as
d u au dw
-+-+-"0,
a x dy d z
~ ( ~ + u E du
+ " - &du
+ W a r =--
0 ") dX
ap +v u,
The dimensionless variables in these equations are the fluid velocities u, u , w , the temperature T ,
and the pressure p , where 0 = V / K is the Prandtl number and R = g,BATh3/r;vis the Rayleigh
number. Here v is the kinematic viscosity, n is the thermal diffusivity, /? is the coefficient of thermal
expansion, and g is the acceleration due t o gravity. The rigid end walls on x=O, 1 are maintained at
constant temperatures TOand TO+ AT respectively, while the other boundaries are assumed to be
insulating. So the boundary conditions on the rigid walls of the cavity are
T = O on x=O ,
5
T=l on x=l ,
dT
"
-0 on z=O,h ,
dz
dT
-=0 on y=O,d .
dY
In general the motion is controlled by the parameters c,R and the flow domain.
3. NUMERICAL APPROACH
An efficient and practical numerical approach for three dimensional, time-dependent, thermal con-
vective flow problems is studied. This implementation is based on the widely used finite volume
method (SIMPLE [12]) with an efficient and fast elliptic multigrid scheme for predicting incom-
pressible fluid flows. A normal staggered grid configuration is used and the conservation equations
Here, velocities are stored at thesix surfaces of the control volume markedby (ue,uW,un, us, wt, wb),
and the temperature and pressure are stored at the centerof the control volume ( p i ,Ti). Since the
solution of the pressure equation derived from the SIMPLE scheme can represent as much as 80%
of the total cost for solving the fluid flow problem [ll],it is therefore a high priority to solve for
p in an efficient manner. Here, a multigrid scheme is applied to the discretized equations, which
acts as a convergence accelerator and reduces the cpu time significantly for the whole computation.
Local, flow-oriented, upwind interpolation functions have been used in the scheme to prevent the
A brief summary of the SIMPLE method is outlined here, and details that are omitted here may
6
be found in the original reference [12]. For a guessed pressure field p * , the imperfect velocity field
(u*,u*, w * ) based on this pressure will result from the solution of the following discretization equa-
tions :
where the summationis over the appropriateneighbor points. A,, A n , A, are the areasof the faces of
the i control volume at e , n, t respectively. a,, an, at, aunb, aunt,, awnb, and b,, b,, bw are thecoefficients
of the finite-volume equations. The correct velocity fields are computed by the velocity-correction
formula:
where
de = -,A e
ae
An
dn = -,
an
p = p* +cup'.
a is an under-relaation factor for the pressure. In the present case a value of 0.5 is used. p' will
7
result from the solution of the following discretization equation:
which is derived from the continuity equation (1) and the velocity-correction equations (14), (15),
ap = a, + a, + a, + a , + at + ab,
8
The temperature will be solved by the following discretization equation:
where up,U T , upnb, atnb, and bp,bT are the coefficients of the pressure and temperature equations
Here five discretized equations for u*,w*, w*,p', T need to be solved at each time level. Typically,
this solution step dominates the costs associated withfinite volume implementation, with the pres-
sure equation having thehighest cost. Therefore, the choice of solution methodology is perhaps one
Direct solution methods, such as Gaussian elimination or LU decomposition, have been commonly
used in solving linear algebraic equations, and there have been several efforts toward developing
parallel implementation of such solvers [13, 14, 151. However, there are some inherent disadvantages
of employing these techniques for the parallel solution of large problems. The three dimensional,
time dependent thermal convective flow problems typically result in structured matrices character-
ized by extremely large bandwidths. The direct solution of such matrices requires a great amount
of memory for storage. Since five algebraic equations will be solved, storing the associated matrices
Here iterative solution techniques are chosen for the above algebraic equations. Since only local
coefficients of finite-volume equations are required for an iterative scheme, the total memory is far
9
less than those of direct methods. More importantly, using domain decomposition techniques, it-
erative methods are readily implemented in a parallel system with a message passing library. But
iterative methods may face converge problems. If the problem size is very large, the number of
iterations needed to satisfactorily solve an equation could be significant, or sometimes, the iterative
method might not converge a t all. So choosing an efficient and fast converging iterative scheme is
Here a multigrid scheme is applied in solving the above equations. The main idea is to use the
solution on a coarse grid to revise the required solution on a fine grid since an error of wavelength X
is most easily eliminated on a mesh of size Ah, where X M Ah. Thus a hierarchy of grids of different
mesh sizes is used to solve the fine grid problem. It has been proven theoretically and practically
that the multigrid method has a better rate of convergence than do other iterative methods. Here
we do not give a full theoretical analysis of the algorithm, which is described in detail elsewhere
[16, 171. In the present computation, a V-Cycle scheme with a flexible number of grid levels is
implemented with Successive Over-Relaxation as the smoother. Injection and linear interpolation
are used as restriction operators and interpolation operators respectively. A detailed description of
a parallel multigrid method is given in the next section, and implementations of sequential multigrid
4. PARALLEL IMPLEMENTATION
In order t o achieve loadbalance and to exploit parallelism as much as possible, a general and
10
portable parallel structure (Figure 2 ) based on domain decomposition techniques is designed for
the three dimensional flow domain. It has l D , 2D, and 3D partition features which can be chosen
according to problem geometry. Those partition features provide the best choice to achieve load
balance so the communication will be minimized. For example, if the geometry is a square cavity,
the 3D partitioner can be used, while if the geometry is a shallow cavity with a large aspect ratio,
the 1D partitioner in x direction can be applied. Here, MPI is used for communication which is
of the portability of this software, the code with MPI can be executed on any parallel system which
is written in C
supports the MPI library.Because of the complexity of the implementation, the code
so flexible data structures can beused. The whole parallel computation is carried out by executing
the following sequence on each subdomain with a communication procedureadded at each iteration
2 . Guess the flow field at the initial time step including the velocity u, v , w ,
3. Advance the flow to the next time step: update the coefficient b .
4. Solve the temperature equation (34) using the multigrid method and
velocity equations (11), (12), and (13) by the multigrid method and
11
exchange data information on partition boundaries.
7. Correct the velocity and the pressure fields by using (14), (15), (16),
and (20).
The parallel multigrid solver for the pressureequation is summarized as the following: For the
pressure equation u,p: = u,,bp~, + b,, the discrete form is rewritten as APh = bh at mesh level h.
Here p', b, are represented by P, b , and A is the local coefficient matrix of the finite volume pressure
equation. And the parallel V-Cycle scheme Ph t M P h ( P h ,bh) for N grid levels is outlined as:
1. DO k = 1,N- 1
Relax n1 times on AhPh = bh with a given initial guess P,h , and after
Enddo
3. D o k = N - l , l
Correct p h t p h + l,h,pZh.
Relax 7x2 times on AhPh = bh with initial guess PO",and after each
Enddo
Here I i h and are restriction operators and interpolation operators respectively, and in the present
12
study, injection and linear interpolation areused. A similar multigrid schemeis used for the velocity
The algorithm telescopes down to the coarsest grid, which can be a single interior grid, and then
works its way back to the finest grid. Currently, the parallel V-cycle scheme with aflexible number
of grid level is implemented, which can be adjusted according to the grid size used. The Successive
Over-Relaxation was chosen as the smoother, and two times iterationswere performed at each grid
level.
For low Rayleigh numbers, the initialconditions throughout the flow domain can be set to T = x11
and u = v = w = p = 0. For a higher Rayleigh numbers, the initial conditions can be generated
from a stead flow at a lower Rayleigh number. The application is implemented on the Intel Paragon,
the Cray T3D, and the IBM SP2, but it can be easily ported to other distributed memory systems
The speedup measurements of the parallel code are performed on the Intel Paragon, and the IBM
SP2. A moderate grid size 64 x 64 x 64 with a test model of R = lo5, r = 0.733 in a square box
for fixed time steps is used as our test problem. Figure 3 shows the speed up from 1 processor to
128 processors on the Paragon (+) and the SP2 (*). The line is nearly linear on the Paragon, but
on the SP2 the curve bends slightly when the number of processors reaches 64. This is because the
computation speed of the SP2 is about five time faster than that of the Paragon for the present
problem, but the communication speed on the SP2 is not. So for the same grid size, the ratio of
13
computation to communication changes more rapidly on the SP2.
Various numerical testshave been carried out on the 3D code. Theresults show that the nu-
merical scheme is robust and efficient, and the general parallel structure allows us to use different
partitions to suitvarious physical domains. Here numerical results for the velocity and temperature
fields with R = 14660, lo5, lo6, lo7, (T = 0.733 in 0 5 5 , y, z 5 1 are presented. The grid sizes
64 x 64 x 64 and 128 x 128 x 128 with 2D and 3D partitions are used for the computation. The com-
putation is stopped when the following conditions are satisfied, corresponding to the achievement of
a steady-state solution:
where k is the time level index and €1 and €2 are usually taken to be lop6.
The velocities on the whole flow domain are displayed in Figure 4, which give complete pictures
of the three-dimensional flow with various Rayleigh numbers. In Figure 5 (a), thevelocities and the
temperatures for R = 14660 are illustrated on y = 0.5. The flow patterns on the 2 - z plane are
similar to the two dimensional problems [4, 51 with low Rayleigh numbers. It is easy to see that
the flow rises from the hot side, travels horizontally, and sinks on the cold side. For R = lo5 in
14
Figure 5 (b), the velocity field has a slightly tilt, and it no longer remains a single main circula-
tion. Two rolls are formed in the x - z plane velocity field, and the thermal boundary layers are
observed on the two sidewalls. When R = lo6 in Figure 5 (c), the centers of the rolls move further
toward the two sidewalls, and the thermal boundary layers are getting thiner. In the middle of the
flow domain, besides the two rolls near the sidewalls, a weak roll in the center is visible. Once the
Rayleigh number reaches R = lo7 in Figure 5 (d), the centers of the two rolls move to the lower
left corner and the upper right corner respectively. A stronger roll dominates the middle region.
The core temperature field remains nearly linear, but the structures near thesidewalls become more
complicated. Compared with R = lo6, much stronger temperature gradients cover nearly the whole
sidewall, and vary more rapidly near the lower left and the upper right corners. On thex - z plane,
the flow fields have similar flow patterns as those of the two dimensional cases, but the 3D solutions
give complete pictures of the flows which are more realistic and interesting.
The heat transfer along the cold wall and the center ( x = 0.5) plane is characterized by the lo-
The local minimum, the local maximum , and the overall Nusselt number Nu on the cold wall are
shown in Table 1, and indicate how the outward heat transfer is significantly enhanced as R in-
15
creases. For Rayleigh number lo7, Figure 6 shows the local Nusselt numbers on the cold wall and
on the middle plane ( x = 0.5), which gives quite different local heat transfer properties. On the cold
wall, the local Nusselt numbers increase sharply when z + 1.0 because of the thin thermal boundary
layers set up near the walls, and the local Nusselt numbers vary little in y direction except near
the south wall (y = 0.0) and the north wall (y = 1.0). On the midplane, the local Nusselt numbers
change significantly in z direction, and have some negative values near the lower half zone ( z < 0.5).
The rangeof the local minimum and maximum Nusselt number is much larger than that on the cold
wall. As the total Nusselt number Nu can be commonly used as an indicator of the approach to a
steady state, the overall Nusselt number should remain the same if a steady state is achieved . So
the Nusselt number Nu, on the cold wall and the Nusselt number Nu, on the midplane can be used
t o provide a check on the accuracy of the numerical solution. Table 2 lists the Nusselt numbers in
each case with different Rayleigh number to test the accuracy of the numerical results. This result
6. CONCLUSIONS
We have successfully implemented the finite volume method with an efficient and fast multigrid
memory systems. The parallel code is numerically robust, computationally efficient, and portable to
parallel architectures which support MPI for communications. The present parallel code has a very
flexible partition structure which can be used for any rectangular geometryby applying a l D , 2D, or
3D partitioning to achieve load balance. This feature allows us to studyvarious thermal cavity flows
16
with different geometries. In addition to the Prandtlnumber and theRayleigh number, the geometry
allows us to simulate large scale problems using a larger number of processors. We have carried out
some high Rayleigh numberflow simulations which demonstrate thecapabilities of this parallel code.
The new three dimensional numerical results are obtained for various Rayleigh numbers ranging
from 14,660 t o lo7. They give a complete description of the three dimensional flow, with the flow
field gradually changing to a multiple roll structure from a single flow circulation as the Rayleigh
numberincreases. Very thin flow boundary layers are formed on the two sidewalls. The overall
Nusselt numbers on the cold wall with various Rayleigh numbers are also calculated which show the
strong dependence of Rayleigh numbers. In spite of the difficulties associated with large Rayleigh
number simulation, our results illustrated here clearly demonstrate the great potential for applying
this approach to solving much higher Rayleigh number flow in realistic, three-dimensional geome-
tries using parallel systems with large gird sizes. Much higher Rayleigh numbers computations and
1 Acknowledgments
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology,
and was sponsored by the National Research Council and the National Aeronautics and Space Ad-
ministration while one of the authors (P.Wang) held a NRC-JPL Research Associateship.
17
Reference herein to any specific commercial product, process,or service by trade name, trade-
mark, manufacturer,or otherwise, does not constitute or imply its endosement by the United States
Government, the National Research Council, or the Jet Propulsion Laboratory, California Institute
of Technology.
This research was performed in part using the CSCC parallel computer system operated by Caltech
on behalf of the Concurrent Supercomputing Consortium. Access to this facility was provided by
NASA. The Cray Supercomputer used in this investigation was provided by funding from the NASA
Offices of Mission to Planet Earth, Aeronautics, and Space Science. Access to the Intel Paragon at
the Jet Propulsion Laboratory, California Institute of Technology, was provided by the NASA High
References
[I] D.E. Cormack, L.G.Lea1, and J.H.Seinfe1d. Natural convection in a shallow cavity with differ-
entially heated end walls. Part 2. Numerical solutions. J.Fluid Mech., 65:231-246, 1974.
[a] G. de Vahl Davis and I. P. Jones. Natural convection in a square cavity: a comparison exercise.
[3] P.G. Daniels, P.A.Blythe, and P.G. Simpkins. Onset of multicellular convection in a shallow
[4] N. G Wright. Multigrid solutions of elliptic fluid flow problems. Ph.D Dissertation, University
18
[5] P.Wang. ThermalConvection in SlenderLaterally-HeatedCavities. Ph.DDissertation,City
[6] P. Wang and P.G.Daniels. Numerical solutions for the flow near the end of a shallow laterally
[7] F.H. Harlow and J. E. Welch. Numerical calculations of time-dependent viscous incompressible
flow of fluid with free surface. Phys. Fluids, 8:No.12 2182-2189, 1965.
[8] A.J. Chorin. Numerical solution of navier-stokes equations. Math. Comp., 22:745-762, 1968.
[9] P.J. Roache. Computational Fluid Dynamics. Hermosa, Alberquerque, N.Mex., 1976.
[lo] S.V. Patankar and D. B. Spalding. A calculationprocedure for heat, mass and momentum
transfer in three-dimensional parabolic flows. Int. J. Heat Mass Transfer, 15:1787, 1972.
[ll]J.P. Van Doormaal and G.D. Raithby. Enhancements of the SIMPLE method for predicting
[12] S.V. Patankar. Numerical Heat Transfer and Fluid Flow. Hemisphere, New York, 1980.
[15] Ding H.Q. and R. D. Ferraro. A general purpose sparse matrix parallel solvers package. 9th
[16]A. Brandt. Multi-level adaptive solutions to boundary-value problems. Math. Comp., 31:333
390, 1977.
19
[17] S.F. McCormick. Multilevel Adaptive Methods for Partial Differential Equations. Frontiers on
[18] W.L. Briggs. A Multigrid Tutorial. Society for Industrial and Applied Mathematics, Philadel-
phia, 1987.
20
R a Nu Numin Numa,
Table 1: The overall Nusselt number N u , the local minimum Nusselt number and thelocal maximum
21
~
R
1: 11
Nu,
4.615
10.575
Nu,
4.667
10.414
Discrepancy
-0.052
0.161
lo7 21.175
21.149 0.026
Table 2: The overall Nusselt number on the cold wall and the middle plane ( x = 0.5).
22
List of Figures
2. lD, 2D, and 3D partitions ona flow domain for parallel systems which can be applied according
to different geometries.
3. Speed up of the parallel 3D code on the IBM SP2 (*) and Intel Paragon (+).
4. Velocity on the whole domain for Rayleigh numbers 14460 (a), lo5 (b), lo6 (c), lo7 (d).
5. Velocity (left), temperature (right) on y = 0.5 for Rayleigh numbers 14460 (a), lo5 (b), lo6
(c), lo7 ( 4 .
6. The local Nusselt numbers on the cold wall (a), and on the middle y - z plane (b).
23
Fig. 1
P. Wang
Fig. 2
P. Wang
100
I2O I
0 20 40 6 0 80 1 0 0 1 2 0 1 4 0
Number of processors
Fig. 3
P. Wang
Z Z
Fig. 4
P. Wang
Fig. 5
P. Wang
N
Fig. 6
P. Wang