0% found this document useful (0 votes)

114 views7 pages

Data-Parallel Line Relaxation Method For The Navier Stokes Equations

NS solver

Uploaded by

Aniruddha Bose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views7 pages

Data-Parallel Line Relaxation Method For The Navier Stokes Equations

NS solver

Uploaded by

Aniruddha Bose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

AIAA JOURNAL

Vol. 36, No. 9, September 1998

Data-Parallel Line Relaxation Method

for the Navier– Stokes Equations
Michael J. Wright,¤ Graham V. Candler,† and Deepak Bose‡
University of Minnesota, Minneapolis, Minnesota 55455

The Gauss– Seidel line relaxation method is modi ed for the simulation of viscous ows on massively parallel
computers. The resulting data-parallel line relaxation method is shown to have good convergence properties for a
series of test cases. The new method requires signi cantly more memory than the previously developed data-parallel
relaxation methods, but it reaches a steady-state solution in much less time for all cases tested to date. In addition,
the data-parallel line relaxation method shows good convergence properties even on the high-cell-aspect-ratio
grids required to simulate high-Reynolds-number ows. The new method is implemented using message passing
on the Cray T3E, and the parallel performance of the method on this machine is discussed. The data-parallel line
relaxation method combines the fast convergence of the Gauss– Seidel line relaxation method with a high parallel
ef ciency and thus shows promise for large-scale simulation of viscous ows.
Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

Introduction viscous problems. However, the method shows a signi cant degra-

T HE numerical simulation of large complex ow elds is a com- dation of the convergence rate for high-Reynolds-number ows be-
putationally intensive task. In addition, the large disparity because of the high-cell-aspect-ratio grids needed to resolve the thin
tween the different length scales encountered in high-Reynolds- boundary layer. Modi cations to the DP-LUR method that help to
number simulations can result in a stiff equation set that usually alleviate this problem were presented in Ref. 3, but it remains an
requires an implicit method to converge to a steady-state solution issue for high-Reynolds-number ow simulations.
in a reasonable time. The cost associated with solving this equation To fully address this convergence degradation on high-cell-
set makes the use of a massively parallel supercomputer attractive aspect-ratio grids, it is necessary to solve the implicit equation
because these machines have a very large peak performance. How- in a more closely coupled manner. This is the approach taken in
ever, most implicit methods are inef cient when implemented on a the Gauss– Seidel line relaxation (GSLR) method of MacCormack,5
parallel computer. A true implicit method requires the inversion of which breaks a two-dimensionalprobleminto a series of block tridi-
a large sparse matrix, which involves a great deal of interprocessor agonal matrix solutions in the body-normal direction. This method
communication. This results in poor computational performance. would be inef cient on a massively parallel machine because the
The traditional solution to this problem has been to choose an ef- Gauss– Seidel sweeps would require frequent and irregular inter-
fective serial algorithm and implement it in parallel using some sort processor communication. However, it is possible to modify this
of domain decomposition. This approach has been used with some method using an approach similar to that previously applied to the
success,1 but many of the most effective serial algorithms contain LU-SGS method. By replacing the Gauss– Seidel sweeps with a
inherent data dependencies and cannot be implemented effectively series of line relaxation steps, the algorithm can be parallelized
in parallel without modi cation. effectively. In addition, the potential for a solution bias resulting
Another approach for structured meshes is to design a new im- from the use of the Gauss– Seidel sweeps, which can cause prob-
plicit algorithm that would take advantageof the structured commu- lems in three-dimensionalsimulationsusing the GSLR, 6 is removed.
nication pattern. Such an algorithm would be inherently well suited This paper discusses the modi cations required to create this data-
to a data-parallel environment without explicit domain decomposi- parallel version of the GSLR algorithm. The resulting data-parallel
tion. The algorithm would be ef cient and easy to implement in ei- line relaxation (DPLR) method is then compared with the DP-LUR
ther data-parallelor message-passingmode and would be portableto method and the original GSLR method on a variety of viscous prob-
a wide variety of parallel architectures.For example, Candler et al.2 lems. Finally, implementation and performance issues on two dif-
and Wright et al.3 have shown that it is possible to make some mod- ferent parallel computers, the Cray T3E and the Thinking Machines
i cations to the Yoon and Jameson lower-upper symmetric Gauss– CM-5, are discussed.
Seidel (LU-SGS) method4 that make it almost perfectlydata parallel.
The resulting data-parallel lower-upper relaxation (DP-LUR) me- Numerical Method
thod replaces the diagonal Gauss– Seidel sweeps of the LU-SGS The fully implicit form of the two-dimensional Navier– Stokes
method with a series of pointwise relaxation steps. The DP-LUR equations in curvilinear coordinates is
method was shown to be attractive for the solution of a variety of
Un C1 ¡ Un @ FQ n C 1 @ GQ n C 1
C C D0
1t @» @´
Received May 5, 1997; presented as Paper 97-2046 at the AIAA 13th
Computational Fluid Dynamics Conference, Snowmass Village, CO, June where U is the vector of conserved quantities and FQ and GQ are the
29– July 2, 1997; revision received May 4, 1998; accepted for publication ux vectors in the » (body-tangential) and ´ (body-normal) direc-
May 14, 1998. Copyright ° c 1998 by the American Institute of Aeronautics
tions. The ux vectors can be split into convectiveand viscous parts:
and Astronautics, Inc. All rights reserved.
¤ Postdoctoral Research Associate, Department of Aerospace Engineering
FQ D F C Fv ; GQ D G C G v
and Mechanics and Army High Performance Computing Research Center.
Member AIAA.
† Associate Professor, Department of Aerospace Engineering and Me- If we focus on the inviscid problem for now, we can linearize the
chanics and Army High Performance Computing Research Center. Senior ux vector using
Member AIAA. n
‡ Graduate Research Assistant, Department of Aerospace Engineering and @F
Fn C 1 ’ Fn C .U n C 1 ¡ U n / D F n C An ±U n
Mechanics and Army High Performance Computing Research Center; cur- @U
rently Research Scientist, Thermosciences Institute, Mail Stop 230-3,NASA
Ames Research Center, Moffett Field, CA 94035. Member AIAA. G n C 1 ’ G n C B n ±U n
1603
1604 WRIGHT, CANDLER, AND BOSE

We then split the uxes according to the sign of the eigenvalues of Then a series of k max relaxationsteps are made using, for k D 1; kmax ,
the Jacobians ¡1
±Ui;.k/j D I C ¸nA I C ¸nB I i; j
1t Ri;n j
F D A C U C A ¡ U D FC C F¡
C .1t = Vi; j / A nC i ¡ 1 ; j Si ¡ 1 ; j ±Ui.k¡¡1;1/j ¡ An¡ i C 1 ; j Si C 1 ; j ±Ui.kC¡1;1/j
2 2 2 2
to obtain the standard upwind nite volume representation
C BCn i; j ¡ 1 Si; j ¡ 1 ±Ui;.kj ¡¡ 1/1 ¡ B¡n i; j C 1 Si; j C 1 ±Ui;.kj ¡C 1/1
±Ui;n j C .1t = Vi; j / A Ci C 12 ; j Si C 12 ; j ±Ui; j ¡ A Ci ¡ 12 ; j Si ¡ 12 ; j ±Ui ¡ 1; j 2 2 2 2

then
¡ A¡i ¡ 1 ; j Si ¡ 1 ; j ±Ui; j ¡ A¡i C 1 ; j Si C 1 ; j ±Ui C 1; j ±Ui;n j D ±Ui;.kjmax / .2/
2 2 2 2
where ¸ A D ½ A 1t S = V . For the solutionof viscous ows, Eq. (2) can
C BCi; j C 1 Si; j C 1 ±Ui; j ¡ BCi; j ¡ 1 Si; j ¡ 1 ±Ui; j ¡ 1 be modi ed to include the contribution of the appropriate implicit
2 2 2 2
viscous Jacobians by using a spectral radius approximation.3 With
n
¡ B¡i; j ¡ 1 Si; j ¡ 1 ±Ui; j ¡ B¡i; j C 1 Si; j C 1 ±Ui; j C 1 D 1t Ri;n j this approach, all data that are required for each relaxation step
2 2 2 2
have already been computed during the previousstep. Therefore, the
(1) entire relaxation step may be performed simultaneously in parallel
where Ri;n j is the change in the solution due to the uxes at time without data dependencies.In addition, because the same pointwise
calculation is performed on each computationalcell, load balancing
level n, S is the surface area of the cell face indicated by its indices,
and Vi; j is the cell volume. will be ensured as long as the data are evenly distributed across
the available processors. Thus, the DP-LUR algorithm is almost
For the solution of the Navier– Stokes equations the appropriate
implicit viscous Jacobians must be included in Eq. (1). Following perfectly data parallel, and aside from the required nearest-neighbor
Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

communication, a massively parallel computer should approach its

the method of Tysinger and Caughey,7 we can linearize the viscous
ux vectors Fv and G v , assuming that the transport coef cients are peak operational performance.
A modi ed version of this method that improves the convergence
locally constant, to obtain
rate on high-cell-aspect-ratio grids can be derived from Eq. (2) if the
@ @ Yoon and Jameson4 approximation is relaxed and the approximate
Fvn C 1 ’ Fvn C .L±U /n ; G nv C 1 ’ G nv C .N ±U /n Jacobians are replaced with their exact counterparts. The resulting
@» @´ full matrix DP-LUR method improves performance by eliminat-
ing the overstabilization that is characteristic of all diagonalized
where the viscousJacobians L and N are evaluatedin such a way that
methods.8 The full matrix method is slightly more memory and
they are functionsof the vector of conservedquantitiesU and not the
computationallyintensive because it requires the storage and inver-
derivatives of U . With these de nitions Eq. (1) will be unchanged
sion of a single Jacobian matrix at each grid point, but it retains the
if we simply replace the Euler Jacobians A and B with AQ and B, Q
excellent parallel ef ciency of the original method.3
where
DPLR Method
AQ C D A C ¡ L ; AQ ¡ D A¡ C L
Although both variations of the DP-LUR method are ef cient
BQ C D BC ¡ N ; BQ ¡ D B¡ C N for the simulation of many ows, they are both affected to some
extent by the high-cell-aspect-ratio grids necessary to resolve the
boundary layers of high-Reynolds-number ows. The dependence
Equation (1) is the basic starting equation for many implicit
of the convergence rate on the cell aspect ratio was reduced by the
schemes, including both the DP-LUR and the DPLR methods. From
introductionof the full matrix DP-LUR method, as discussedearlier,
this point, however, the derivation of these two methods is different.
but some performance degradation is still apparent. The remaining
We rst brie y review the derivation of the DP-LUR method here
dependence is primarily due to the fact that these methods place
so that the similarities between the DP-LUR and the new DPLR
all of the off-diagonal terms on the right-hand side, and thus their
method may be noted. The full derivation of the DP-LUR method
effect is only weakly coupled to the diagonal. A more accurate
can be found in Refs. 2 and 3.
approach would be to move all of the off-diagonal terms back to
the left-hand side, as in Eq. (1), and to solve the fully coupled
DP-LUR Method problem using a large block-bandedmatrix inversion.However, this
The rst step in the derivation of the DP-LUR method is to move approach would be extremely expensiveand inef cient on a parallel
all of the off-diagonal terms in Eq. (1) to the right-hand side. The machine. A better approach is possible for viscous external ows if
method of Yoon and Jameson4 is then used to approximate the im- we recognize that the viscous ow gradients will be much stronger
plicit Jacobians with in the body-normal .´/ direction. Thus, the physical problem is
much more strongly coupled in the body-normal direction, and it is
A C D 12 .A C ½ A I /; A ¡ D 12 .A ¡ ½ A I / possible to move just these body-normal terms back to the left-hand
side, resulting in the following:
where ½ A is the spectral radius of the Jacobian A, given by the mag- BO i; j ±Ui; j C 1 C AO i; j ±Ui; j ¡ CO i; j ±Ui; j ¡ 1
nitude of the largest eigenvalue juj C a, where a is the speed of
sound. With this approximation the differences between the Jaco- D ¡ DO i; j ±Ui C 1; j C EO i; j ±Ui ¡ 1; j C 1t Ri;n j (3)
bians on the diagonal become diagonal matrices, and the solution
of the resulting equation is greatly simpli ed. where the matrices denoted by the carets are de ned from the Jaco-
The LU-SGS algorithm employs a series of corner-to-corner bians as
sweeps through the ow eld using the latest available data for the AO i; j D I C .1t = Vi; j / AQ 1 S 1 ¡ AQ 1 S 1
Ci C 2 ; j iC 2;j ¡i ¡ 2 ; j i ¡ 2 ;j
off-diagonal terms to solve the resulting equation. Although this
method has been shown to be ef cient on a serial or vector machine, C BQ Ci; j C 1 Si; j C 1 ¡ BQ ¡i; j ¡ 1 Si; j ¡ 1
2 2 2 2
signi cant modi cations are required to reduce or to eliminate the
data dependencies and to make the method parallelize effectively. BO i; j D .1t = Vi; j / BQ ¡i; j C 1 Si; j C 1
The DP-LUR approach solves this problem by replacing the Gauss– 2 2

Seidel sweeps with a series of pointwise relaxation steps using the CO i; j D .1t = Vi; j / BQ Ci; j ¡ 1 Si; j ¡ 1
following scheme. First, the right-hand-side Ri; j is divided by the 2 2

diagonal operator to obtain ±U .0/ DO i; j D .1t = Vi; j / AQ ¡i C 1 ; j Si C 1 ; j

2 2
¡1
.0/
±Ui; j D I C ¸nA I C ¸nB I i; j
1t Ri;n j EO i; j D .1t = Vi; j / AQ Ci ¡ 1 ; j Si ¡ 1 ; j
2 2
WRIGHT, CANDLER, AND BOSE 1605

In this way it is possible to solve Eq. (3) for all j points at each
i location as a series of fully coupled block tridiagonal systems
aligned in the body-normal direction. This was the approach used
by MacCormack in the GSLR method.5 In this method the problem
is solved via a series of alternating forward and backward sweeps
through the ow eld in the i direction, using the latest available
data for the terms on the right-hand side. This method works well
for two-dimensional ows on a serial or vector machine, but it is
not straightforwardto extendthe method to three-dimensional ows.
The obvious approach to the three-dimensional case is to continue
to set up the problem as a series of block tridiagonal systems nor-
mal to the body and to sweep in both the axial and circumferential
directions. However, this approach can lead to a nonphysical bias
in the converged solution due to the biasing that is inherent in the
Gauss– Seidel sweeping process.6 In addition,the data dependencies
in the Gauss– Seidel sweeps would make the algorithm inef cient
on a parallel machine. However, it is possible to eliminate both of
these dif culties if we modify the GSLR method by replacing the
Fig. 1 Sample 128 £ 128 cylinder-wedge grid. Every fourth grid point
Gauss– Seidel sweeps with a series of line relaxation steps, using a
is shown.
procedure similar to that outlined in the preceding section for the
DP-LUR method. The resulting DPLR method is then described by
the following scheme. First, the implicit terms on the right-handside The boundary-layerresolution is measured with the wall variable
Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

of Eq. (3) are neglected, and the resulting block tridiagonal system
is factored and solved for ±U .0/ : yC D ½yu ¤ =¹
BO i; j ±Ui;.0/j C 1 C AO i; j ±Ui;.0/j ¡ CO i; j ±Ui;.0/j ¡ 1 D 1t Ri;n j where u ¤ ispthe friction velocity, given in terms of the wall stress
Then a series of k max relaxationsteps are made using,for k D 1; kmax , ¿w by u ¤ D .¿w =½/. For a well-resolved boundary layer the mesh
spacing is typically chosen so that yC · 1 for the rst cell above the
BO i; j ±Ui;.k/j C 1 C AO i; j ±Ui;.k/j ¡ CO i; j ±Ui;.k/j ¡ 1 body surface. The grid for each case is then exponentiallystretched
from the wall to the outer boundary.
D ¡ DO i; j ±Ui.kC¡1;1/j C EO i; j ±Ui.k¡¡1;1/j C 1t Ri;n j Although the results presented here are based on modi ed rst-
order Steger– Warming ux vector splitting for the numerical ux
then evaluation,9 note that the derivation of the implicit algorithm is
±Ui;n j D ±Ui;.kjmax / .4/ general and can be used with many ux evaluation methods. In fact,
Liou and Van Leer10 have shown that the use of Steger– Warming
Boundary conditions are implemented by folding the contribu- splitting can be effective and robust for the implicit part of the
tion of the boundary cells into the appropriate matrix in a manner problem, even when the uxes are evaluated by a different method.
identical to that used for GSLR. 5 The DPLR method requires a sin- The DPLR method is in all instances more sensitive to the size
gle lower-upper (LU) factorization and kmax C 1 backsubstitutions of the implicit time step than the DP-LUR method, as expected,
per iteration. By using this approach, all of the data required for because the DPLR method involves a more exact representation of
each relaxation step have already been computed during the pre- the problem, and therefore the size of the time step will have more
vious step. Therefore, as long as the data are distributed on the physical meaning. In all cases presented here, the implicit time step
processors in such a way that the body-normal direction is entirely was chosen to correspond to a Courant– Friedrichs– Lewy (CFL)
local, the entire relaxation step can be performed simultaneously number of 1 in the rst iteration and was rapidly increased to its
in parallel with no data dependencies. In addition, because the i - maximum stable value. The size of the maximum stable time step
direction, off-diagonal terms are all equally lagged by one relax- for each case was governed primarily by the freestream conditions,
ation step, the biasing problemno longer exists, and implementation with little or no limitation due to the mesh spacing. Therefore the
of a three-dimensional version of the algorithm becomes straight- maximum CFL number varied considerably from case to case.
forward. The DPLR approach will use signi cantly more memory Figure 2a shows the effect of the number of relaxationsteps (kmax )
than the DP-LUR methods because ve Jacobianmatrices (seven for on the convergence rate of the method on the two-dimensional
three-dimensional ows) must now be computed and stored at each cylinder-wedge geometry at a Reynolds number of 3 £ 104 and
grid point as compared with one for the full matrix DP-LUR method yC D 1. In Fig. 2a the explicit solution was obtained using a stan-
and none for the diagonal method. For perfect gas ows the DPLR dard rst-order Euler method with the maximum stable time step
method uses about twice as much memory as the DP-LUR method (CFL D 0:1). The line marked kmax D 0 corresponds to performing
for the two-dimensionalimplementation and four times as much for just the initial block tridiagonal solution along each i line, with no
the three-dimensionalversion. However, the DPLR method uses no implicit coupling in the i direction. The k max D 0 approach is simi-
more memory than the original GSLR method and should converge lar to that proposed by Wang and Widhopf11 and later implemented
much faster than either DP-LUR approach due to the more exact in parallel by Taylor and Wang.12 Although the kmax D 0 approach
formulation of the implicit operator. offers a signi cant improvement over the explicit method, we see
from Fig. 2a that the convergencerate of the method improves when
Results the effect of the i -direction coupling terms is included (kmax > 0).
The perfect gas implementation of the DPLR method has been The DPLR method shows a dependence on k max similar to that of
tested on two- and three-dimensional geometries with an emphasis the DP-LUR methods, with the convergence rate steadily improv-
on evaluating the convergence properties and parallel performance ing as k max increases, up to kmax D 6. Figure 2b shows the effect of
of the new method. The primary test case for this paper is the Mach the number of relaxation steps on the cost of the method, evaluated
15 perfect gas ow over a cylinder-wedgeblunt body, with Reynolds as total CPU time on an eight-processor Cray T3E-900. Because
numbers Re based on the freestream conditions and body length the cost of evaluating the Jacobian matrices and setting up the LU
varying from 3 £ 104 to 3 £ 108 . A sample 128 £ 128 grid for this factored block tridiagonal system is much greater than the cost of
problem is shown in Fig. 1. The three-dimensional computations performingadditionalbacksubstitutions,we see that increasingkmax
were performed on multiple planes of the same two-dimensional also improves the cost effectiveness of the method, up to k max D 4.
grids, which makes it easy to directly compare the convergence As shown in Fig. 2b, values of kmax larger than 4 give little improve-
properties of the two- and three-dimensional implementations of ment in convergence and are not cost effective. Therefore, all of the
the method. results presented in this paper were run at kmax D 4.
1606 WRIGHT, CANDLER, AND BOSE

a)
Fig. 4 Convergence histories for the DPLR method on a high-Mach-
number and low-supersonic-Mach-number ow: two-dimensional
cylinder-wedge blunt body at Re = 3 £ 104 ; 128 £ 128 grid with y+ = 1
for each case.
Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

that, although both methods achieve a 10-order-of-magnitude re-

duction in the density error norm in fewer than 300 iterations, the
DPLR method using k max D 4 actually performs a little better than
the GSLR method with four sweeps through the ow eld. This is
because the DPLR method allows larger time steps to be used for
this case. If both methods are run with the same time step, their per-
formance is nearly identical. This is surprising because the GSLR
method always uses latest available data and thus should allow in-
formation to travel across the entire computational domain during
each implicit iteration, whereas with the DPLR method informa-
tion can travel only k max cells per iteration in the axial (i ) direction.
However, both methods are identical in their treatment of the body-
b)
normal terms.
Fig. 2 a) Convergence histories and b) CPU times on an eight-processor Figure 4 compares the convergence rate of the DPLR method on
Cray T3E-900 for the DPLR method showing in uence of kmax : two-
two cylinder-wedge ows at Mach 15 and 2. Both cases are run at
dimensional cylinder-wedge blunt body at M1 = 15 and Re = 3 £ 104 ;
128 £ 128 grid with y+ = 1 for the rst cell above the body. a Reynolds number of 3 £ 104 . We see that both cases reach a 10-
order-of-magnitudereductionin the densityerror norm in fewer than
300iterations.However, there are differencesin the convergencehis-
tories. The low-Mach-number case requires fewer iterations for the
bow shock to reach its nal location because the low-Mach-number
ow is less nonlinear. In addition, once the shock has reached its
nal location,the convergencerate of the Mach 2 ow is slower than
that for the Mach 15 ow, due to the longer characteristic ow time.
Similar results are obtained at other Mach numbers. This shows that
the DPLR method can be an effective tool for the solution of both
supersonic and hypersonic ows.
The performanceof the DPLR method is also examined for three-
dimensional ows, using multiple identical planes of the baseline
128 £ 128 cylinder-wedge blunt body grid. In all of the cases, there
is essentially no difference in the convergence histories between
the two-dimensional and three-dimensional implementations. By
performingthe three-dimensionalcalculationson multiple planes of
a two-dimensional grid, we can also easily check for any evidence
of the solution bias that is exhibited by the three-dimensionalGSLR
method. This effect is always more noticeablein such cases because
Fig. 3 Convergence histories for the DPLR method as compared with
any solutionbias will tend to create a nonphysicalcross ow velocity
the originalGSLR method: two-dimensionalcylinder-wedge blunt body in the direction in which the multiple planes are projected. This will
at M1 = 15 and Re = 3 £ 104 ; 128 £ 128 grid with y+ = 1. be evidenteven beforeany other differencescan be detectedbetween
the solutions on different planes. In all of the cases tested to date,
the maximum cross ow velocity in the three-dimensional ow eld
The convergencerate of the new DPLR method is compared with is more than 10 orders of magnitude smaller than the freestream
the original GSLR method in Fig. 3. Both methods are tested at the velocity.We feel that this value is suf cientlysmall to be attributedto
same conditions as in Fig. 2. The methods behave similarly, with machine roundofferrors, and thus the DPLR method has eliminated
the convergence rate increasing signi cantly after about 150 itera- the solution bias problem.
tions. The slower convergencerate at the beginningof the solution is The new DPLR method is compared with the diagonal and full
due to the motion of the bow shock through the computational grid. matrix DP-LUR methods in Fig. 5 for the cylinder-wedge blunt
Because this is a highly nonlinear process, the shock will move body at a Reynolds number of 3 £ 104 and yC D 1. All three meth-
at most one computational cell per iteration. However, once the ods are run at k max D 4. We see in Fig. 5a that the DPLR method is
bow shock has reached its nal location, the block tridiagonal so- very ef cient, achieving a 10-order-of-magnitude reduction in the
lutions rapidly drive the error norm toward machine zero. We see density error norm in fewer than 300 iterations, as compared with
WRIGHT, CANDLER, AND BOSE 1607

a)
Fig. 6 Convergence histories for the DPLR method showing in uence
of Reynolds number: two-dimensional cylinder-wedge blunt body at
M1 = 15; 128 £ 128 grid with y+ = 1 for each case.
Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

b)
Fig. 5 a) Convergence histories and b) CPU times on an eight-processor
Cray T3E-900 for the DPLR method as compared with the two DP-LUR
methods: two-dimensional cylinder-wedge blunt body at M1 = 15 and
Re = 3 £ 104 ; 128 £ 128 grid with y+ = 1.
Fig. 7 CPU times on an eight-processor Cray T3E-900 required to
achieve 10 orders of density error norm convergence for the DPLR
2300 iterations for the full matrix method and 6200 iterations for and DP-LUR methods as a function of the Reynolds number: two-
the diagonal DP-LUR method. However, convergence in fewer it- dimensional cylinder-wedge blunt body at M1 = 15 and Re = 3 £ 104 ;
erations does not necessarily imply that the method will be more 128 £ 128 grid with y+ = 1 for each case.
cost effectiveon a massively parallel machine. It is also necessaryto
know how much each iteration of the method will cost. Therefore,
to examine the effectivenessof the new algorithm, Fig. 5b compares the 128 £ 128 grid for each case was chosen so that yC D 1 for the
the cost of the methods, plotted as CPU time on an eight-processor rst cell above the body. As the Reynolds number is increased by
Cray T3E-900. From Fig. 5b we see that the DPLR method also has four orders of magnitude, the amount of computer time required
a high parallel ef ciency and is by far the most cost effective of the to achieve a 10-order-of-magnitude reduction in the density error
three methods. For this problem the line relaxation method reaches norm remains constant for the DPLR method, whereas the time in-
a 10-order-of-magnitudereduction in the density error norm in just creases by a factor of 6 for the full matrix method and 7 for the
35 s, compared with 235 s for the full matrix method and 359 s diagonal method. This shows that the more exact implicit operator
for the diagonal method. This shows that the DPLR method can used in the DPLR method eliminates the convergence degradation
potentially be a powerful tool in the simulation of viscous ows. on high-cell-aspect-ratio grids.
Figure 6 shows the convergencehistoriesof the DPLR method for Figure 8 compares the viscous and inviscid implementations of
three viscous ows with Reynolds numbers ranging from 3 £ 104 the DPLR method for two of the cases in Fig. 6. The grid for each
to 3 £ 108 . The 128 £ 128 grid for each case was chosen so that case was chosen to satisfy the yC D 1 condition for the viscous so-
yC D 1 for the rst cell above the body, ensuring that the bound- lution. The inviscid solutions were then obtained on the same com-
ary layers for all cases are equally well resolved. Because the putational grids. We see that the convergence rates for the viscous
boundary-layerthickness decreases with increasing Reynolds num- and inviscid versions of the method are almost identical. This is in
ber, the maximum cell aspect ratio (CAR) of the grid must in- direct contrast to the DP-LUR method, which always requires more
crease as well to meet the yC D 1 requirement. For the test cases iterations to converge the viscous solution. This again shows the
in Fig. 6, the maximum CAR ranges from 35 for Re D 3 £ 104 to bene t of moving the body-normal terms back to the left-hand side
about 125,000 for Re D 3 £ 108 . We can see that, although each of of the equation and coupling them directly to the diagonal.
the cases behaves differently during the early phases of the ow evo-
lution,all convergewith the same terminal slope after the bow shock Parallel Performance
reaches its nal location. In addition,there is essentially no increase The DPLR algorithm is inherently data parallel by design and re-
in the number of iterations required to reach steady state as the quiresno asynchronouscommunicationor computation.This makes
Reynolds number (and therefore the CAR) is increased. the code readily portable to a variety of parallel architectures with
Figure 7 compares the convergence properties of the DPLR only minor modi cations because it is relatively easy to modify a
method with the DP-LUR methods on several viscous ows with data-parallelcode to run effectively on a message-passingmachine,
Reynolds numbers ranging from 3 £ 104 to 3 £ 108 . Once again, whereas the reverse is not necessarily possible. The DPLR method
1608 WRIGHT, CANDLER, AND BOSE

Fig. 8 Convergence histories for the viscous and inviscid implementa- Fig. 9 Parallel ef ciency for the two-dimensional viscous DPLR
tion of the DPLR method: two-dimensional cylinder-wedge blunt body method on the T3E-900: 512 £ 512 computational grid used.
at M1 = 15; 128 £ 128 grid with y+ = 1 for each case.

The sustained performance for the two-dimensional and three-

Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

was implemented and tested on two different massively parallel dimensional implementations of the DPLR method on the T3E-900
architectures. First, a message-passing version using the Message- is about 75 M ops per node, which is only 8.3% of the peak theoreti-
Passing Interface (MPI) standard for interprocessorcommunication cal performanceof the machine. This number seems quite low, but it
was implemented on the 272-processor Cray T3E-900 located at is comparable to other published results. The NAS 2 parallel bench-
Network Computing Services, Inc. (formerly the Minnesota Super- mark results offer the best comparison because these codes have
computerCenter). Each processorof this machinehas 128 Mbytes of been written to simulate actual computational uid dynamics appli-
memory and a theoretical peak performance of 900 M ops. In addi- cations, and the individual machine vendors have not been allowed
tion, a data-parallel version was implemented on the 896-processor to perform assembler-level optimizations to the source code. Un-
Thinking Machines CM-5 located at the University of Minnesota fortunately, NAS 2 benchmark results have not yet been published
Army High Performance Computing Research Center. Each pro- for the T3E. However, results from the T3D for the block tridiag-
cessor of this machine has 32 Mbytes of memory and four vector onal (BT) benchmark show a performance of about 10 M ops per
units, yielding a theoretical peak performance of 128 M ops per processor.13 In addition, results for the NAS 1 benchmarks, which
processor. have been published for both the T3D and T3E-600 (600 M ops
The data-parallel implementation of the new DPLR method peak performance), show that the sustained speed on the T3E-600
should retain the perfect scalability and high parallel ef ciency that is typically about 3.3 times that on the T3D.14 This would result in a
characterized the DP-LUR method2 because the algorithm design performance of about 33 M ops per processor on the T3E-600 for
is very similar. However, it is dif cult to show this on the CM-5 the BT benchmarkor about 50 M ops per processoron the T3E-900,
because it is a vector parallel machine, and thus large vector lengths assuming perfect scalability. Therefore, the 75 M ops per proces-
are required to ensure good performance. In addition, it has been sor obtained for the DPLR method seems reasonable.However, it is
shown that only the number of points in the dimensions that are possible that further optimizations can be made to the source code
spread across all of the nodes should be used when evaluating the that would increase the performance.
vector length.2 To solve the block tridiagonal system required by
the DPLR method without interprocessorcommunication,it is nec- Conclusions
essary that all j points corresponding to a particular i location be The GSLR method has been modi ed to make it amenable to
entirely on processor, whereas with the DP-LUR method both the the solution of the Navier– Stokes equations on massively parallel
i and j directions can be spread across the available processors. supercomputers. The resulting DPLR method replaces the Gauss–
Therefore, when the 128 £ 128 test case presented in this paper is Seidel sweeps of the original with a series of line relaxationsteps. In
run on a 64-processorCM-5 with four vector units per processor,the this manner all of the data dependencies in the original method are
vector length will be 64 for the DP-LUR method but will actually removed, and each relaxation step becomes almost perfectly data
be less than 1 for the DPLR method. This means that the DPLR parallel. Because of its design, the new method can be easily imple-
method is not using the vector hardware on the processors. There- mented in either the data-parallelor message-passingprogramming
fore we would expect the two-dimensionalDPLR method to be very styles. The method also retains the good convergence properties of
slow on the CM-5 due to the small vector length, even though it may the original GSLR method, and in fact with four relaxation steps it
have a high parallel ef ciency. This is indeed the case for the im- converges in fewer iterations for the test cases considered. In addi-
plementation tested. For the 128 £ 128 test case presented here, the tion, the relaxation steps eliminate the solution bias problem exhib-
DPLR method runs at only 1.39 M ops per processor, as compared ited by the three-dimensional GSLR method, and thus the DPLR
with 13.2 for the diagonal DP-LUR method and 20.7 for the full method can easily be extended to the solution of three-dimensional
matrix method. This would not be a problem on a machine with- ows.
out vector hardware. Thus, although the DPLR method should be The DPLR method uses considerably more memory than either
ef cient on many data-parallelarchitectures,it is dif cult to directly of the previously developed DP-LUR methods but demonstrates a
calculate the parallel ef ciency of the method on the CM-5. dramatic improvement in cost effectiveness, reaching steady state
The performance of the message-passing implementation of the in about 15% of the time on a Cray T3E. In addition, both DP-
method on the T3E is easier to evaluate because there is no vec- LUR methods showed a degradation of the convergence rate when
tor hardware on this machine. In this implementation, the data are high-Reynolds-number ows were simulated. However, the DPLR
distributed across the processors by breaking the problem in the method is more strongly coupled in the body-normal direction and
i direction. Communication latency is masked by using nonblock- thus has good convergence properties at all Reynolds numbers.
ing sends and receives in MPI. The parallel speedup curve for the The new method has been implemented using message passing on
two-dimensional DPLR algorithm on the T3E-900 is presented in the Cray T3E-900 and shows nearly perfect speedup, with a parallel
Fig. 9. We see that the method has almost perfect speedup, up to the ef ciency of 98% even when 32 processorsare used. The singlenode
maximum number of processorstested. In fact, on 32 processors the performance of the method is about 75 M ops per processor, which
speedup is 31.3, which corresponds to a parallel ef ciency of 0.98. is only 8.3% of the peak theoretical performance of the machine.
WRIGHT, CANDLER, AND BOSE 1609

However, this value is comparable to data obtained for the NAS 4 Yoon, S., and Jameson, A., “A Lower-Upper Symmetric Gauss-Seidel

2 parallel benchmarks and does not detract from the high parallel Method for the Euler and Navier – Stokes Equations,” AIAA Journal, Vol. 26,
ef ciency of the method. No. 9, 1988, pp. 1025, 1026.
5 MacCormack, R. W., “Current Status of the Numerical Solutions of the
In short, the high parallel ef ciency and good convergence char-
acteristics of the DPLR method make it attractive for the solution Navier– Stokes Equations,” AIAA Paper 85-0032, Jan. 1985.
6 MacCormack, R. W., “Solution of the Navier – Stokes Equations in Three
of very large compressible ow problems. Dimensions,” AIAA Paper 90-1520, June 1990.
7 Tysinger, T., and Caughey, D., “Implicit Multigrid Algorithm for the
Acknowledgments Navier– Stokes Equations,” AIAA Paper 91-0242, Jan. 1991.
The authors were supportedby the NASA Langley Research Cen- 8 Yoon, S., and Kwak, D., “Multigrid Convergence of an Implicit Symmet-

ter under Contract NAG-1-1498 and the Army Research Of ce ric Relaxation Scheme,” AIAA Journal, Vol. 32, No. 5, 1994, pp. 950– 955.
9 MacCormack, R. W., and Candler, G. V., “The Solution of the Navier –
under Grant DAAH04-93-G-0089. This work was also supported
in part by the U.S. Army High Performance Computing Research Stokes Equations Using Gauss – Seidel Line Relaxation,” Computers and
Center under the auspices of the Department of the Army, Army Fluids, Vol. 17, No. 1, 1989, pp. 135– 150.
10 Liou, M. S., and Van Leer, B., “Choice of Implicit and Explicit Operators
ResearchLaboratoryCooperativeAgreementDAAH04-95-2-0003/
for the Upwind Differencing Method,” AIAA Paper 88-0624, Jan. 1988.
Contract DAAH04-95-C-0008, the content of which does not nec- 11 Wang, J. C., and Widhopf, G. F., “An Ef cient Finite Volume TVD
essarily re ect the position or the policy of the government, and no Scheme for Steady State Solutions of the 3-D Compressible Euler/Navier –
of cial endorsement should be inferred. Computer time on the Cray Stokes Equations,” AIAA Paper 90-1523, June 1990.
T3E was provided by the Minnesota Supercomputer Institute. 12 Taylor, S., and Wang, J. C., “Launch Vehicle Simulations Using a Con-

current Implicit Navier– Stokes Solver,” AIAA Paper 95-0223, Jan. 1995.
References 13 Saphir, W., Woo, A., and Yarrow, M., “The NAS Parallel Benchmarks
1 Simon, H. D. (ed.), Parallel Computational Fluid Dynamics Implemen- 2.1 Results,” Numerical Aerospace Simulation Facility, NAS TR NAS-96-
tations and Results, MIT Press, Cambridge, MA, 1992. 010, NASA Ames Research Center, Moffett Field, CA, Aug. 1996.
Downloaded by SUNY AT BUFFALO on January 23, 2015 | http://arc.aiaa.org | DOI: 10.2514/2.586

2 Candler, G. V., Wright, M. J., and McDonald, J. D., “Data-Parallel 14 Saini, S., and Bailey, D. H., “NAS Parallel Benchmark (Version 1.0)

Lower-Upper Relaxation Method for Reacting Flows,” AIAA Journal, Vol. Results 11-96,” Numerical Aerospace Simulation Facility, NAS TR NAS-
32, No. 12, 1994, pp. 2380– 2386. 96-018, NASA Ames Research Center, Moffett Field, CA, Nov. 1996.
3 Wright, M. J., Candler, G. V., and Prampolini, M., “Data-Parallel Lower-

Upper Relaxation Method for the Navier– Stokes Equations,” AIAA Journal, D. S. McRae
Vol. 34, No. 7, 1996, pp. 1371– 1377. Associate Editor

(AR) 1968 - Numerical Solution of The Navier-Stokes Equations - Chorin PDF
No ratings yet
(AR) 1968 - Numerical Solution of The Navier-Stokes Equations - Chorin PDF
18 pages
Slide-3 Z Transform and Its Application
No ratings yet
Slide-3 Z Transform and Its Application
76 pages
3D Incompressible Navier-Stokes Solver Lower-Gauss-Seidel Algorithm
No ratings yet
3D Incompressible Navier-Stokes Solver Lower-Gauss-Seidel Algorithm
1 page
Lower-Upper Symmetric-Gauss-Seidel Method For The Euler and Navier-Stokes Equations
No ratings yet
Lower-Upper Symmetric-Gauss-Seidel Method For The Euler and Navier-Stokes Equations
2 pages
An Implicit Non-Staggered Cartesian Grid Method For Incompressible Viscous Flows in Complex Geometries
No ratings yet
An Implicit Non-Staggered Cartesian Grid Method For Incompressible Viscous Flows in Complex Geometries
24 pages
1999, Article, A Fast, Matrix-Free Implicit Method For Compressible Flows o Unstructured Grids
No ratings yet
1999, Article, A Fast, Matrix-Free Implicit Method For Compressible Flows o Unstructured Grids
27 pages
Yang, Balaras - 2006 - An Embedded-Boundary Formulation For Large-Eddy Simulation of Turbulent Flows Interacting With Moving Boundaries
No ratings yet
Yang, Balaras - 2006 - An Embedded-Boundary Formulation For Large-Eddy Simulation of Turbulent Flows Interacting With Moving Boundaries
29 pages
Lung 2021 IOP Conf. Ser. Mater. Sci. Eng. 1088 012002
No ratings yet
Lung 2021 IOP Conf. Ser. Mater. Sci. Eng. 1088 012002
7 pages
Y Tter Strom 1999
No ratings yet
Y Tter Strom 1999
10 pages
JCP Ibm 2010
No ratings yet
JCP Ibm 2010
29 pages
Mittal1 PDF
No ratings yet
Mittal1 PDF
28 pages
Circular Cylinder in Uniform Cross-Flow: Lecture 1: A First Introduction To Immersed Boundary Methods
No ratings yet
Circular Cylinder in Uniform Cross-Flow: Lecture 1: A First Introduction To Immersed Boundary Methods
19 pages
Wang (Base Do Artigo para o CILAMCE 2022)
No ratings yet
Wang (Base Do Artigo para o CILAMCE 2022)
20 pages
Van Ella
No ratings yet
Van Ella
12 pages
An Implicit-Explicit Hybrid Method: For Lagrangian Hydrodynamics
No ratings yet
An Implicit-Explicit Hybrid Method: For Lagrangian Hydrodynamics
28 pages
Morzynskl - 1999 - Solution of the eigenvalue problems resulting from
No ratings yet
Morzynskl - 1999 - Solution of the eigenvalue problems resulting from
16 pages
Recent Progress Toward A Three-Dimensional Unstructured Navier-Stokes Flow Solver
No ratings yet
Recent Progress Toward A Three-Dimensional Unstructured Navier-Stokes Flow Solver
21 pages
Numerical Simulation of The Navier Stoke Equation
No ratings yet
Numerical Simulation of The Navier Stoke Equation
18 pages
7nihms 177320
No ratings yet
7nihms 177320
46 pages
Local Adaptive Mesh Refinement For Shock Hydrodynamics: Berger
No ratings yet
Local Adaptive Mesh Refinement For Shock Hydrodynamics: Berger
21 pages
JCP 2006
No ratings yet
JCP 2006
19 pages
Mmersed Oundary Ethods: Rajat Mittal
No ratings yet
Mmersed Oundary Ethods: Rajat Mittal
23 pages
Accelerated Relaxation' or Direct Solution? Future Prospects For Fem
No ratings yet
Accelerated Relaxation' or Direct Solution? Future Prospects For Fem
11 pages
Morgany, K & Perairez J - Unstructured Grid Finite-Element Methods For Fluid Mechanics (1997)
No ratings yet
Morgany, K & Perairez J - Unstructured Grid Finite-Element Methods For Fluid Mechanics (1997)
70 pages
Ghia V., High-Resolutions For Incompressible Flow Using The Navier-Stokes Equations and A Multi-Grid Method
No ratings yet
Ghia V., High-Resolutions For Incompressible Flow Using The Navier-Stokes Equations and A Multi-Grid Method
25 pages
A Numerical Solution of The Navier-Stokes Equations Using The Finite Element
No ratings yet
A Numerical Solution of The Navier-Stokes Equations Using The Finite Element
28 pages
Accelerating Eulerian Fluid Simulation With Convolutional Networks
No ratings yet
Accelerating Eulerian Fluid Simulation With Convolutional Networks
10 pages
Immersed Boundary 2017
No ratings yet
Immersed Boundary 2017
24 pages
Chapter Two Literature Survey
No ratings yet
Chapter Two Literature Survey
12 pages
Max Invqn v4
No ratings yet
Max Invqn v4
27 pages
Yang Jacobi
No ratings yet
Yang Jacobi
24 pages
Advances in Dynamic Relaxation Techniques For Nonlinear Finite Element Analysis
No ratings yet
Advances in Dynamic Relaxation Techniques For Nonlinear Finite Element Analysis
7 pages
A Fully Coupled Finite Volume Solver For The Solution of Incompressible Flows On Locally Refined Non-Matching Block-Structured Grids
No ratings yet
A Fully Coupled Finite Volume Solver For The Solution of Incompressible Flows On Locally Refined Non-Matching Block-Structured Grids
12 pages
A Family of Single-Step Houbolt Time Integration Algorithms For Structural Dynamics
No ratings yet
A Family of Single-Step Houbolt Time Integration Algorithms For Structural Dynamics
11 pages
Performance Analysis of Different Iterative Solvers Parallelized On Gpu Architecture
No ratings yet
Performance Analysis of Different Iterative Solvers Parallelized On Gpu Architecture
8 pages
YeckelA&DerbyJJ 1999 PressureDatumForIncompFlows
No ratings yet
YeckelA&DerbyJJ 1999 PressureDatumForIncompFlows
16 pages
2008 Simple (Ideal) I
No ratings yet
2008 Simple (Ideal) I
19 pages
Courant Number PDF
No ratings yet
Courant Number PDF
14 pages
Massively Parallel Semi-Lagrangian Advection: Simulation
No ratings yet
Massively Parallel Semi-Lagrangian Advection: Simulation
16 pages
L06 - Solution To A System of Linear Algebraic Equations
No ratings yet
L06 - Solution To A System of Linear Algebraic Equations
60 pages
A Parallel Implicit in Compressible Flow
No ratings yet
A Parallel Implicit in Compressible Flow
33 pages
DTIS21264
No ratings yet
DTIS21264
17 pages
Hind Marsh 1983
No ratings yet
Hind Marsh 1983
11 pages
A Cartesian Grid Method For Solving The Two-Dimensional Streamfunction-Vorticity Equations in Irregular Regions
No ratings yet
A Cartesian Grid Method For Solving The Two-Dimensional Streamfunction-Vorticity Equations in Irregular Regions
45 pages
mtmw14 4
No ratings yet
mtmw14 4
30 pages
A Time-Dependent Computational Method For Blunt Body Flows
No ratings yet
A Time-Dependent Computational Method For Blunt Body Flows
6 pages
Methods For Fluids
No ratings yet
Methods For Fluids
37 pages
2009 AAMM Limiter
No ratings yet
2009 AAMM Limiter
30 pages
1
No ratings yet
1
15 pages
CFD Hoffmann v1
No ratings yet
CFD Hoffmann v1
500 pages
Ansys Fluent Solver Settings
No ratings yet
Ansys Fluent Solver Settings
46 pages
Adaptive Simulations of Two-Phase Ow by Discontinuous Galerkin Methods
No ratings yet
Adaptive Simulations of Two-Phase Ow by Discontinuous Galerkin Methods
16 pages
Adaptively Localized Continuation-Newton Method—Nonlinear Solvers That Converge All the Time
No ratings yet
Adaptively Localized Continuation-Newton Method—Nonlinear Solvers That Converge All the Time
19 pages
1-s2.0-S0045782504000696-main
No ratings yet
1-s2.0-S0045782504000696-main
18 pages
Non Linear MLPG
No ratings yet
Non Linear MLPG
15 pages
BDF_JCP2010
No ratings yet
BDF_JCP2010
27 pages
Cam99 15
No ratings yet
Cam99 15
42 pages
2023, Vila-Cha Et Al., Thermomechanical Problems
No ratings yet
2023, Vila-Cha Et Al., Thermomechanical Problems
21 pages
UNIT NO 1: Finite Difference Schemes
No ratings yet
UNIT NO 1: Finite Difference Schemes
9 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Math 4310 Handout - Quotient Vector Spaces: Dan Collins
No ratings yet
Math 4310 Handout - Quotient Vector Spaces: Dan Collins
5 pages
Training Easyput
No ratings yet
Training Easyput
18 pages
Calculus 3 - Chapter 14 - Vector Fields Course
No ratings yet
Calculus 3 - Chapter 14 - Vector Fields Course
54 pages
E Assignments Bank - Class 7 - 2024 - 25
No ratings yet
E Assignments Bank - Class 7 - 2024 - 25
11 pages
Paper 1 Ans
No ratings yet
Paper 1 Ans
15 pages
Highest Common Factor (HCF)
100% (1)
Highest Common Factor (HCF)
11 pages
13 NTH Root
No ratings yet
13 NTH Root
6 pages
Colorful Pastel Childish Informative Organic Shapes Kids Health Tips Infographic
No ratings yet
Colorful Pastel Childish Informative Organic Shapes Kids Health Tips Infographic
2 pages
M IIIElectricalEngineeringInstrumentationandControlEngineering PDF
No ratings yet
M IIIElectricalEngineeringInstrumentationandControlEngineering PDF
456 pages
18.1 The Hydraulic Exponent For Uniform - Flow Computation.
No ratings yet
18.1 The Hydraulic Exponent For Uniform - Flow Computation.
4 pages
B-Spline Interpolation: Charles Frye Introduction To Splines
No ratings yet
B-Spline Interpolation: Charles Frye Introduction To Splines
10 pages
1.real Numbers
No ratings yet
1.real Numbers
7 pages
DLP Laws of Exponents
No ratings yet
DLP Laws of Exponents
14 pages
2 Distributive Lattices
No ratings yet
2 Distributive Lattices
20 pages
Numerical Analysis: Using MATLAB and Spreadsheets
No ratings yet
Numerical Analysis: Using MATLAB and Spreadsheets
44 pages
Ejercicios Grupo de Permutaciones
No ratings yet
Ejercicios Grupo de Permutaciones
25 pages
JEE-Mathematics: Xii-Differential Equation
No ratings yet
JEE-Mathematics: Xii-Differential Equation
4 pages
Class 12 Matrices Extra Questions CBSE
No ratings yet
Class 12 Matrices Extra Questions CBSE
21 pages
Week 6
No ratings yet
Week 6
2 pages
Topical Past Papers Review P4 (0580)
100% (1)
Topical Past Papers Review P4 (0580)
9 pages
Matrix Exponential, Linear Algebra, Alexandria University
No ratings yet
Matrix Exponential, Linear Algebra, Alexandria University
76 pages
Galois - Theory 23 3
No ratings yet
Galois - Theory 23 3
2 pages
Mathematics Paper04
No ratings yet
Mathematics Paper04
3 pages
Tut 5 Fourier Series SageMath
No ratings yet
Tut 5 Fourier Series SageMath
15 pages
A Generalized Mazur'S Theorem and Its Applications
No ratings yet
A Generalized Mazur'S Theorem and Its Applications
18 pages
Matrices DPP 2
No ratings yet
Matrices DPP 2
2 pages
Implicit
No ratings yet
Implicit
4 pages
Lecture-1 CFD
No ratings yet
Lecture-1 CFD
11 pages
Download Full (eBook PDF) Mathematics for Engineers 5th Edition PDF All Chapters
100% (2)
Download Full (eBook PDF) Mathematics for Engineers 5th Edition PDF All Chapters
50 pages

Data-Parallel Line Relaxation Method For The Navier Stokes Equations

Uploaded by

Data-Parallel Line Relaxation Method For The Navier Stokes Equations

Uploaded by

AIAA JOURNAL

Vol. 36, No. 9, September 1998

Data-Parallel Line Relaxation Method

communication, a massively parallel computer should approach its

diagonal operator to obtain ±U .0/ DO i; j D .1t = Vi; j / AQ ¡i C 1 ; j Si C 1 ; j

that, although both methods achieve a 10-order-of-magnitude re-

The sustained performance for the two-dimensional and three-

You might also like