Efficient parallel solution of PDEs

Efficient parallel solution of PDEs M. Garbeyy, E. Heikkola , R. A. E. Mäkinen , T. Rossi , J. Toivanen , Yu. V. Vassilevskiz y

Centre pour le Developpement du Calcul Scientifique Parallele Université Claude Bernard – Lyon 1

Department of Mathematical Information Technology, University of Jyväskylä z

Institute of Numerical Mathematics, Russian Academy of Sciences

1 Introduction The modeling of physical systems often leads to partial differential equations (PDEs). Usually, the equations or the domain where the equations are posed are so complicated that the analytic solution cannot be found. Thus, the equations must be solved using numerical methods. In order to do this, the PDEs are first discretized using the finite element method (FEM) or the finite difference method (FDM), for example. These methods often lead to large systems of (non)linear equations the solution of which can be very time and memory consuming even in modern computers. Therefore, efficient parallel numerical solution methods for PDEs are needed in order to make the cost of simulations reasonable. In this report, we briefly describe the main ideas of fictitious domain, domain decomposition and fast direct methods. Then, we present their applications to acoustic scattering and fluid flow problems, which have quite a different nature. This is a continuation of our earlier research efforts reported in [4]. The basic idea of fictitious domain methods is to extend the operator and the domain into a larger, simple-shaped domain, in which it is easier to construct efficient preconditioners for iterative methods. For example, the simpler domain can be a parallelepiped when the preconditioner is based on fast direct solvers. In the algebraic fictitious domain methods, the problem is extended on the algebraic (matrix) level in such a way that the solution of the original problem is obtained directly as a restriction of the solution of the extended problem without any additional constraints. In domain decomposition methods, the computational domain is split into several smaller subdomains, which can be either overlapping or nonoverlapping, depending on the type of method. One of the reasons to use such decompositions is that they lead to a natural way to parallelize the solution algorithms. These methods are very actively studied; see, for example, the web pages http://www.ddm.org for the domain decomposition conference series. Fast direct methods usually mean FFT and cyclic reduction based efficient solvers, which require at most O (N log2 N) operations to solve a system of linear equations of size N; see [6], for example. Suitable linear systems for such solvers arise from constant-coefficient secondorder PDEs posed on rectangular domains when tensor product -type meshes are used in the discretization. The PSCR algorithm is probably the most flexible among the fast direct solvers and it can be parallelized in rather natural way. Its parallel implementation for problems in two-dimensional domains is described in [7].

2 Time-harmonic acoustic scattering The Helmholtz equation is a basic mathematical model for the propagation of time-harmonic (fixed frequency with respect to time) and small-amplitude pressure waves in compressible flu1

a known time-harmonic wave which is incident on a bounded three-dimensional object. Here, the scattered wave satisfies the three-dimensional Helmholtz equation with certain boundary conditions depending on the acoustical properties of the object. For the numerical solution, this problem is often approximated by truncating the original unbounded exterior of the object with a simple-shaped boundary (spherical or rectangular) and by imposing an absorbing boundary condition. Numerical solution methods for scattering problems have aroused active research interest, because efficient methods would facilitate the simulation of important physical phenomena in many fields such as underwater acoustics, medicine, and room acoustics. We are especially interested to develop efficient solvers for real-life high-frequency scattering problems in which the wavelength is small compared to the dimensions of the computational domain. To obtain reasonable discretization accuracy a typical rule-of-thumb is to have at least ten nodes per wavelength throughout the computational domain. Therefore, scattering problems discretized by FEM often lead to the solution of very large-scale linear systems for which the computational efficiency or the memory usage becomes the bottleneck. An efficient numerical solution requires special techniques to reduce the memory consumption and the computational cost of standard approaches such as the finite element or the boundary element methods. For the solution of the acoustic scattering problems, we have applied the algebraic fictitious domain method. It is known to lead to an efficient solution algorithm for elliptic mesh equations arising from the finite element method, and it has been applied successfully also to scattering problems [3, 5]. We use the finite element discretization on locally perturbed orthogonal meshes, called locally fitted meshes. By using such discretization together with the fictitious domain method, we are able to perform the iterative solution in a subspace with the dimension Ns being orders of magnitude smaller than the dimension N of the total system. This benefit is significant in view of memory consumption especially in the three-dimensional case [3, 5]. In [5], we have introduced an efficient parallelization of the fictitious domain solver for the three-dimensional Helmholtz equation. Our implementation is based on MPI communication and, thus, it can be used in a distributed-memory parallel computer. The main stages of the parallel algorithm are solution of linear systems with the fictitious domain preconditioner, for which we apply the so-called partial solution algorithm, and matrix-vector multiplications, which can be parallelized by a suitable decomposition of the data among the processors. We performed numerical experiments in Cray T3E and SGI Origin 2000 parallel computers at CSC. According to the results, the scalability of the algorithm is good, and we are able to solve very large-scale scattering problems the total dimension of which is over 109 ; while the number of nodes on the surface of the scatterer is over 106 : These are, to our knowledge, the largest discrete finite element scattering problems that have been reported to be solved in the literature. Numerical results for two hemispheres shown in Figure 2 are given in Table 1 in which the columns are: the inverse of wavelength 1=, the size of the total system N, the size of the subspace for iterations Ns , the number of iterations iter and the time in minutes T using 32 processors on SGI Origin 2000. A cross section of solution when 1= 8 is plotted in Figure 3.

=

1= 2 4 8 16

N 7:0e6 2:4e7 1:2e8 6:4e8

Ns 8:1e4 3:2e5 1:3e6 5:1e6

iter 71 125 155 233

T 4 31 133 1379

Table 1: Numerical results with the two hemispheres.

2

We consider the solution of unsteady incompressible Navier-Stokes equations which describe time-dependent viscous fluid flows. We assume that the Reynolds number ranges from moderate to high and, thus, the convection is the dominant phenomenon in such flows. For the discretization in a three-dimensional domain, we have used FDM on staggered grids, that is, the well-known MAC scheme with a local adaptation to the boundaries [2]. For the convection, stable combinations of central and upwind finite differences are used. In time-dependent problems, in which the equations describe many different kinds of phenomena, it can be advantageous to decouple some of them using operator splitting techniques. Here, the Navier-Stokes equations are decoupled to a transport constituent and to the projection to divergence-free vector fields. With this splitting, one time step for the Navier-Stokes equations leads to the solution of three linear convection-reaction-diffusion problem for the velocity components and a Poisson problem for pressure [2]. Thus, we have obtained simple fundamental subproblems which can be solved separately using special techniques. The arising three decoupled convection-reaction-diffusion problems for the velocity components are solved using a multiplicative Schwarz method [1, 2], which is an overlapping domain decomposition method. The decomposition of domain is done in such a way that the fast decay of solution in upwind and crosswind directions [1] can be exploited to make the iterative solution to converge very quickly. For our flow problem, the decomposition to six subdomains is shown in Figure 1. In Schwarz methods, it is only necessary to exchange information between the processors holding the neighboring subdomains. In the solution of subdomain problems, we have used LU-decompositions. For more detailed description, see [2]. Outflow plane

Inflow plane

Figure 1: The decomposition of a 0:41 0:41 2:5 parallelepiped with a cylindrical hole to six subdomains. The Poisson problems are solved using an algebraic fictitious domain method. The domain is extended to be the whole parallelepiped and the original stiffness matrix is extended by zero blocks. The preconditioner is based on a parallel implementation of the PSCR algorithm for three-dimensional problems. Since the boundary conditions for the pressure are of natural type, this strategy leads to fast convergence of iterations. The coefficient matrix is not symmetric, because the grid is locally adapted to the boundaries and a special FDM is used in the discretization. We have used the preconditioned generalized conjugate residual method as the iterative method together with a special initial guess for the solution; see [2]. The Poisson solver requires global communication between processors due to the diffusive nature of the problem. As a numerical example, we have performed 800 time steps for the flow problem on a 80 60 64 grid in Cray T3E at CSC. The convection-reaction-diffusion and Poisson solvers both required 5 iterations on average to solve the arising linear problem at each time step. The solvers needed slightly less than 8 and 4 seconds using using 24 and 16 processors, respectively. The different numbers of processors are due to the restrictions in our implementation. According to several test runs, both solvers have good scalability in Cray T3E. 3

In our project, we have needed lots of processors and memory for short test runs. We were kindly provided with these resources by CSC. The purpose has been to demonstrate the capability to solve very large-scale scientific problems and the scalability of our new algorithms. Thus, we have different requirements than the most of the other users of CSC’s computing resources, but we hope that our needs will be taken into account also in the future. For our acoustic scattering problems, the bottleneck on Cray T3E was the amount of memory per processors while in SGI Origin 2000 it was the wall clock time limit in the queues. In our experiments, the communication capacity of Cray T3E turned out to be high.

References [1] M. Garbey, Yu. A. Kuznetsov and Yu. V. Vassilevski, A Parallel Schwarz Method for a Convection-diffusion Problem, SIAM Journal on Scientific Computing, 22, no. 3, 891–916, 2000. [2] M. Garbey and Yu. V. Vassilevski, A Parallel Solver for Unsteady Incompressible 3D NavierStokes equations, Centre pour le Developpement du Calcul Scientifique Parallele, Université Claude Bernard – Lyon 1, Report 1/2000, Parallel Computing (to appear). [3] E. Heikkola, Yu. A. Kuznetsov and K. Lipnikov, Fictitious Domain Methods for the Numerical Solution of Three-dimensional Acoustic Scattering Problems, Journal of Computational Acoustics, 7, no. 3, 161–183, 1999. [4] E. Heikkola, R. A. E. Mäkinen, T. Rossi and J. Toivanen, Parallel Algorithms for Very Large Structured and Unstructured Systems of Linear Algebraic Equations with Applications in Acoustics, CSC Report on Scientific Computing 1997–1998, J. Haataja (ed.), Center for Scientific Computing, Finland, 154–159, 1999. [5] E. Heikkola, T. Rossi and J. Toivanen, A Parallel Fictitious Domain Method for the Threedimensional Helmholtz Equation, Department of Mathematical Information Technology, University of Jyväskylä, Report B9/2000, SIAM Journal on Scientific Computing (submitted). [6] T. Rossi and J. Toivanen, Nonstandard Cyclic Reduction Method, its Variants and Stability, SIAM Journal on Matrix Analysis and Applications, 20, no. 3, 628–645, 1999. [7] T. Rossi and J. Toivanen, Parallel fast direct solver for block tridiagonal systems with separable matrices of arbitrary dimension, SIAM Journal on Scientific Computing, 20, no. 5, 1778–1796, 1999.

4

Figure 2: The scattering object is composed of two hemispheres of radius one. The wall thickness is 0.2, and the distance between the centers is two.

=

Figure 3: The real part of the total wave along the x2 x3 -plane when 1= 8: The incident plane wave moves from southwest to northeast along the plane. The largest discrete Helmholtz problems we have solved on SGI Origin 2000 have up to 1:3 109 degrees of freedom, and the diameter of the scattering object is 200 wavelengths.

5