An Infrastructure for Parallel Adaptive Mesh-Re

An Infrastructure for Parallel Adaptive Mesh-Renement Techniques DRAFT

An Infrastructure for Parallel Adaptive Mesh-Renement Techniques Manish Parashar and James C. Browne Department of Computer Sciences University of Texas at Austin fparashar, [email protected]

Contents 1 Introduction

1

2 Problem Description

2

2.1 Binary Black-Hole Grand Challenge 2.2 Adaptive Finite-Dierence Methods 2.2.1 Adaptive Grid Hierarchy : : : 2.2.2 AFD Integration Algorithm :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

3 Programming Abstractions for Adaptive Finite-Dierence Methods 3.1 GRID HIERARCHY Class : : : : : : : : : : : 3.1.1 Views of the GRID HIERARCHY Class 3.2 GRID FUNCTION Class : : : : : : : : : : : : 3.3 DOMAIN Class : : : : : : : : : : : : : : : : : : 3.4 An AFD Application Program : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

4 Parallelization of the AFD Methods: Issues & Requirements 4.1 Data-Structure Requirements : : : : : : : : : : : : : 4.2 Communication Requirements : : : : : : : : : : : : : 4.2.1 Inter-grid Communication : : : : : : : : : : : 4.2.2 Intra-grid Communication : : : : : : : : : : : 4.2.3 Random Communication : : : : : : : : : : : 4.3 Decomposition Issues : : : : : : : : : : : : : : : : : : 4.4 Decomposition of the Dynamic AFD Grid Hierarchy

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

5 Data-Management Support for PAFD Methods 5.1 5.2 5.3 5.4

Space Filling Curves : : : : : : : : : : : : : : : SDDG Representation : : : : : : : : : : : : : : DAGH Representation : : : : : : : : : : : : : : Data-Structure Storage : : : : : : : : : : : : : 5.4.1 Adaptive Grid Structure Storage : : : : 5.4.2 Data Storage : : : : : : : : : : : : : : : 5.5 PAFD Implementation based on SDDG/DAGH

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

6 Conclusions

2 3 3 4

5

6 7 8 9 10

11 11 12 12 12 13 13 14

15

15 17 18 19 20 21 22

23

Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


A Decomposition Schemes for the Dynamic AFD Grid Hierarchy A.1 Independent Grid Distribution A.2 Combined Grid Distribution : : A.3 Independent Level Distribution

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

B Berger-Oliger AFD Algorithm


25 25 26 27

27


1

An Infrastructure for Parallel Adaptive Mesh-Renement Techniques Manish Parashar and James C. Browne Department of Computer Sciences University of Texas at Austin fparashar, [email protected]

1 Introduction This paper presents an infrastructure for implementation of parallel adaptive (multigrid) nite dierence codes that use adaptive mesh-renement techniques for the solution of partial dierential equations. The abstraction provided by this infrastructure is a dynamic hierarchical grid where operations on the grid are independent of distribution of the grid across the processors of a parallel execution environment and the computational operations are independent of the number of levels in the hierarchy of the grid. The motivation for development of this infrastructure was the selection by the Binary Black Hole NSF Grand Challenge project of the Berger-Oliger adaptive multigrid method 1] as an ecient and eective solution of Einstein's equations for interaction of binary Black Holes. Dynamically adaptive methods for solution of dierential equations which employ locally optimal approximations have been shown to yield highly advantageous ratios for cost/accuracy when compared to methods based upon static uniform approximations. Parallel versions of these methods oer potential for accurate solution of physically realistic models of important physical systems. Sequential implementations of adaptive algorithms in conventional programming system abstractions have proven to be both complex and dicult to validate. Parallel implementations are then an order of magnitude more complex. It is commonly the case that 80% of the code volume of a parallel adaptive code written in conventional programming systems is concerned with procedurally realizing dynamic distributed data structures on top of static data structures such as Fortran arrays. Furthermore this code has little connection with the physics or engineering being solved (but it is an important discipline in computer science). Clearly there is advantage in providing an infrastructure of programming abstractions upon which parallel adaptive methods can be simply and directly implemented provided that operations provided execute eciently. Development of the infrastructure begins with derivation of the programming abstractions in which computations on dynamic hierarchical grid structures are directly implementable. The appropriate programming abstractions are a hierarchy of scalable distributed dynamic grids and a set of operations upon this grid hierarchy. The operations include creation, partitioning, computations on the grid such as stencil operations, communication among grid partitions at a single level and communication among grids at dierent levels of the hierarchy. The next step is to design an implementation for parallel execution environments which preserves ecient execution while providing transparency to distribution of the grids across the processors of the parallel execution environment. Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


2

One crucial factor in obtaining an ecient implementation of the operations on dynamic hierarchical grids is to preserve logical locality under expansion and contraction. This will enable ecient computational access to the grids. One section of the paper briey describes an implementation of dynamic array storage based on extendible hashing 9] which guarantees preservation of locality under expansion and contraction. Another crucial factor is to preserve logical locality in the partitions when the grids are mapped and partitioned across processors. If this is done then the total communication and synchronization overhead will be minimized. A major section of this paper denes and describes the algorithms, based on space lling curves 5], which are used to map the inherently multi-dimensional hierarchy of grids onto a linear structure upon which locality preserving mappings and partitionings can be eciently applied. The methods applied in this paper are a derivation for parallel adaptive multi-grid nite dierence methods of algorithms developed for parallel adaptive many-body codes 2] and parallel hp-adaptive nite element code 3]. This paper is organized as follows: Section 2 introduces the Binary Black Hole Grand Challenge project and describes the components and requirements of a class of adaptive nite-dierence methods applicable to the problem. Section 3 then identies a set of high-level programming abstractions that can be used to express these methods. Section 4 discusses the parallelization of these adaptive methods and outlines related issues and requirements. Section 5 proposes fundamental data-structures that can be used to build the identied programming abstractions, and discusses the the design, representation, and implementation of these dynamic distributed data-structures. Section 6 presents some concluding remarks.

2 Problem Description 2.1 Binary Black-Hole Grand Challenge The Binary Black-Hole Grand Challenge undertakes the description of the coalescence of astrophysical black holes and the emitted gravitational radiation by computationally solving Einstein's equations of gravity. The black-hole equations are mixed in nature, consisting of: non-linear hyperbolic evolution equations for the true dynamic degrees of freedom non-linear elliptic constraint equations for initial data and gauge conditions determining the coordinates which can be of any type. Numerical treatments of Einstein's eld equations of general relativity are typically formulated using a 3 (space) + 1 (time) decomposition of spacetime. In this approach one sets up a computational grid on a bounded region in space and evolves initial data specied on a spacelike hypersurface. Current experience in tackling these equations suggests the use of nite-dierence approximations with adaptive mesh renement, such as the BergerOliger adaptive nite-dierence scheme. Such adaptive nite-dierence methods are briey introduced below.



3

2.2 Adaptive Finite-Dierence Methods Finite-dierence methods discretize the continuum domain by overlaying a grid over it. The discretized domain is then dened by grid points of the overlayed grid, and algebraic equations for the unknowns are obtained via dierence analogues of the various dierential equations. The structure and granularity of the discretizing grid used by these techniques is dened by the characteristics of the solution and must provide sucient resolution to adequately represent the smallest scale features of this solution. In many problems however, solution features which are of interest and require high-resolution are localized. As computational work in these schemes is directly proportional to the number of grid points (i.e. the resolution of the grid), the use of a uniform grid with the nest desired resolution is highly inecient and can lead to a prohibitive discretization, both in terms of computation and storage requirements. Dynamic, Adaptive Finite-Dierence (AFD) techniques that use adaptive mesh renements (AMR), provide a means for maintaining computational tractability without sacricing accuracy. The idea is to concentrate additional computational eort (higher resolution) to the areas of interest in the solution. AFD methods start with a base coarse grid with minimum acceptable resolution that covers the entire computational domain. As the solutions progress, regions in the domain requiring additional resolution are identied (tagged) and ner grids are overlayed on the tagged regions of the coarse grid. Renement proceeds recursively so that regions on the ner grid requiring more resolution are similarly tagged and even ner grid is overlayed on these regions. The resulting structure is an adaptive grid hierarchy. AFD formulations are dened by two key components: viz. (1) the nature of adaptations performed and the structure of the resulting adaptive grid hierarchy and (2) the integration algorithm operating on the grid hierarchy which denes the order of integration of dierent levels in the hierarchy, the order and frequency of adaptation, and the communication between the dierent levels. The Berger-Oliger AMR scheme 1] for time-dependent hyperbolic PDEs has been selected as the AFD formulation to be used to solve the BBH grand challenge. The adaptive grid hierarchy and integration algorithm components of AFD schemes are introduced below using this formulation.

2.2.1 Adaptive Grid Hierarchy The adaptive grid hierarchy is a set of dynamically overlayed grids generated by recursively rening a base grid in response to some feature in the transient solution. It can be represented as a family of grids, fGln g, where the superscript l 0 l L represents the level of renement (0 being the coarsest (base) grid and L the nest grid) and the subscript n indexes component grids at the same level. A complementary view of the grid hierarchy is a directed acyclic graph (DAG) where each node of the graph represents a component grid. Levels of this DAG correspond to the levels of renement in the adaptive grid hierarchy, and nodes at the same level of the DAG correspond to component grids at a particular level of renement. The adaptive grid hierarchy is a dynamic structure, both in the number of levels of renement and in the number of component grids at any particular level of renement. Component grids in this hierarchy are treated as Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


4

1

G2

0

G1 1

Gn

1

1

1

G1

Gn

G2

3

Gk

2

G1

2

G2

2

Gi

2

Gj

2

Gj

3

Gk

1

G1 0

G1

Figure 1: Adaptive Grid Hierarchy - 2D (Berger-Oliger AMR Scheme) separate entities in the sense that dierent dierence equations can be solved relatively independently at each level and on each component grid. The Berger-Oliger AFD formulation requires the resolution (or grid spacing) of component grids at any level l of the grid hierarchy to be an integral multiple of the grid spacing of component grids at the next level (l + 1), i.e. hl = khl+1 where k is some integer. Further, component grids at any level l of the grid hierarchy must be locally uniform with space and time resolutions, hl & tl , such that hl = tl typically < 1. Finally, a nesting of component grids is maintained along the grid hierarchy such that each component grid at level l + 1 is contained within a component grid at level l i.e. all the grid points of any component grid at level l + 1 must lie in the convex hull of a component grid at level l. Consequently, the DAG corresponding to a Berger-Oliger grid hierarchy is a tree. The two views of the Berger-Oliger adaptive grid hierarchy are illustrated in Figure 1. Other formulations dene more general grid hierarchies wherein component grids can be rotated with respect to the base grid, and can have dierent mesh spacings in dierent coordinate directions.

2.2.2 AFD Integration Algorithm The AFD integration algorithm denes the order in which dierent levels of the grid hierarchy are integrated, the interactions between overlaying component grids at dierent levels, and the criterion and Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


5

method for grid renement. Correspondingly, it is composed of three key components: (1) Time Integration, (2) Error Estimation & Regriding, and (3) Inter-Grid Operations. These components are described below.

Time Integration: Time integration is performed on each component grid using a specied nite-

dierence operator. Each component grid may have its own operator and can be integrated independently (except for determination of the boundary values). The order of integration along the grid hierarchy is dened recursively such that, before advancing component grids at a particular level of renement in time, all component grids at higher levels of renement must be integrated to the current time of component grids at that level. That is, before stepping component grids at level l, i.e. fGlig, from time T to T + tl , all component grid at levels > l must be integrated to time T .

Error Estimation & Regriding: The error estimation and regriding component of the integration

algorithm performs of the following three steps (1) agging regions needing renement based on error estimation, (2) clustering agged points, and (3) rened grid generation. The result may be the creation of a new level of renement or component grids at existing levels, and/or the deletion of existing component grids. Grid generation in step 3 is performed so as to maintain proper nesting along the grid hierarchy.

Inter-Grid Operations: Inter-grid operations are used to communicate solutions values along the adap-

tive grid hierarchy. In case of the Berger-Oliger AFD scheme, the following inter-grid operations are dened:

Initialization of rened component grids: Rened component grid initialization may be per-

formed using the interior values of an intersecting component grid at same level if one exists or by prolongated values from the underlying coarser component grid.

Coarse grid update: An underlying coarse component grids are updated using the values on a

nested ner component grid each time they are integrated to the same time. This update or restriction inter-grid operation may be performed by direct injection or using a dened averaging/interpolation scheme.

Averaging: Averaging is necessary when two component grids at the same level of renement overlap, and is used to update the coarse component grids underlying the overlap region.

A recursive formulation Berger-Oliger AFD integration algorithm is included in Appendix B.

3 Programming Abstractions for Adaptive Finite-Dierence Methods Although AFD methods yield a very advantageous cost/accuracy ratio, their implementation, even on sequential systems, is non-trivial and requires considerable eort. A signicant portion of this eort is expended in the creation and maintenance of the adaptive grid hierarchy. Migration to a parallel/distributed Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


6

environment can cause this complexity, as well as the required programming skills and eort to increase by an order of magnitude. This complexity associated with parallel AFD implementations can be alleviated by dening appropriate high-level programming abstractions that complement the problem. The objective is to provide the application developer with a set of primitives that are intuitive to the problem, while hiding implementation details and system specic issues. The abstractions themselves can be independently and eciently implemented on target systems. High-level programming abstractions thus enable applications developers to concentrate on the problem rather than the implementation complexities. In the case of AFD methods, the basic underlying abstraction is a dynamic, adaptive grid hierarchy. The application starts by rst dening the structure of this grid hierarchy. Data values and problem variables that have to be computed are then associated with elements of the hierarchy. The solution can now be formulated as a combination the grid hierarchy, grid functions and a sequence of operations on the two. Correspondingly, in this section, we dene three programming abstractions (or object classes) for AFD applications. The GRID HIERARCHY class abstracts the dynamic, adaptive grid hierarchy dened by the AFD method and provides a high-level interface for creating, maintaining and operating on this hierarchy. Operations on the GRID HIERARCHY class include adding, deleting and clustering renements, as well as accessing components grids at a particular level. The second programming abstraction is the GRID FUNCTION class. The GRID FUNCTION class associates application data elements and variables with the GRID HIERARCHY and enables components of the PDE to be dened on computational domain. The nal programming abstraction is the DOMAIN class. DOMAIN abstracts the problem being solved as a combination of the GRID HIERARCHY, GRID FUNCTIONs dened on the GRID HIERARCHY, and an integration scheme operating on the GRID FUNCTIONs. Attributes and operations dened for the three abstractions are presented below:

3.1 GRID HIERARCHY Class GRID HIERARCHY Attributes: The GRID HIERARCHY class is specied by four key attribute

sets:

Geometry: Geometry attributes specify the space-time geometry of the computation domain in terms of its extent, connectivity and resolution.

Boundary: The boundary attributes specify the dierent boundaries on the grid. These boundaries may be specied as a mask on the geometry or as a characteristic function.

Coordinate: The coordinate attributes assign coordinate values to elements of the component grids. Renement: Renement attributes abstract the hierarchical structure of GRID HIERARCHY and dene renement parameters.



7

GRID HIERARCHY Operators: Operators/methods dened on a GRID HIERARCHY include: Alias : The alias operator allows a region of a component grid to aliased and then addressed

and operated on using the alias. The region may be specied using grid coordinates.

For the following methods, the region parameter is optional and can be specied either using coordinates or using an alias. The default region is the entire grid.

Load/Store Region]: The load operator initializes the variables (grid functions) associated with a

region of the grid from a specied data le. Similarly, the store operator dumps the variables associated with the region to a le.

Rene/Coarsen Region]: The rene operator uses renement parameters to create a rened com-

ponent grid over the specied region. The associated attributes are accordingly updated. The coarsen operator deletes an existing rened component grid over the specied region.

Query Operators: Query operators are dened to enable the user to query for dierent grid param-

eters.

Indexing Operators: Indexing operators provide a Fortran 90 like indexing interface to enable and

instance of GRID HIERARCHY to be indexed in the space-time dimensions and along the grid hierarchy.

View Region]: The view operator represents a possible interface to visualization software so as to

enable graphical viewing of specied regions of the grid hierarchy.

3.1.1 Views of the GRID HIERARCHY Class The GRID HIERARCHY class provides the user with two views of the AFD grid structure dened as follows:

Hierarchical View: This view treats the grid structure as a dynamic DAG of simple space-time grids wherein each grid can be individually indexed and operated upon. This view enables denition of nitedierence operators on particular grids and inter-grid operators between grids.

Composite View: In the composite view, the entire grid hierarchy is treated as a single composite grid

with renement information stored locally. This view enables operations to be dened recursively on the entire grid hierarchy.



8

3.2 GRID FUNCTION Class The GRID FUNCTION class denes components of the PDE (elds) on the discretized domain dened by GRID HIERARCHY. The key components of this class are introduced below.

GRID FUNCTION Attributes: The key attributes required to specify a GRID FUNCTION are as

follows:

Specications: The specication attributes for a GRID FUNCTION include: Name: The name attribute assigns a symbolic identier to the GRID FUNCTION which can then

be used to specify operations on it. The name also serves as a Fortran 90 like array identier once the GRID FUNCTION is assigned to a GRID HIERARCHY.

Storage Type: This attribute denes the storage to be associated with each element of the GRID FUNCTION. It may be a predened type such as an integer or real, or an user dened type (derived type).

Dimension: This attribute denes the dimension of each eld element in case of multi-dimensional (i.e. tensor) elds.

Boundary Condition: The boundary condition attributes dene the type of boundary condition to

be used for the GRID FUNCTION and the associated parameters required to specify the chosen condition.

Initialization: The initialization attributes species the initialization scheme for the boundary/interior

of the GRID FUNCTION. Initialization may be performed from a data-le, using an initialization function, or using the output from a previous computation.

Integration Function (Interior/Boundary): This attribute denes the nite-dierence integra-

tion operator for the GRID FUNCTION on the interior and boundary regions of GRID HIERARCHY. The integration function attributes are used to derive interior and boundary space-time stencils for the GRID FUNCTION. These stencils dene inter-processor communication requirements during parallelization as is discussed in the following section (Section 4).

Inter-Grid Operators: The following inter-grid operators are specied for each GRID FUNCTION Prolongation Function: The prolongation function denes the interpolation scheme used to initialize a newly created rened grid during regriding. It is used to derive a space-time prolongation stencil. Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


9

Restriction Function: The restriction function denes the averaging scheme used to update the parent grid approximation during integration. It is used to derive a space-time restriction stencil.

Note that GRID FUNCTION attributes (other than specication) can be selected from a pre-dened attribute library.

GRID FUNCTION Operations: Methods and operations dened on an instance of GRID FUNCTION

include:

Initialize: This method uses the initialization attributes of the GRID FUNCTION to denes its initial

value.

Setup Boundary: This method uses the boundary condition attributes to setup the boundary for

the GRID FUNCTION.

Integrate Interior/Boundary : The integrate methods updates a GRID FUNCTION using

the associated interior/boundary integration function attributes. The integrate operator is applied at a particular level of the grid hierarchy.

Prolong/Restrict : The prolong/restrict operators use the associated functions to

perform prolongation and restrictions across grid levels.

View/Store: The view operator enables visualization of the GRID FUNCTION at a

particular time and grid level. The store operator writes the value of the GRID FUNCTION to a le for post-processing.

3.3 DOMAIN Class The DOMAIN abstracts the problem as combination of a GRID HIERARCHY, GRID FUNCTIONs dened on the GRID HIERARCHY, and an integration scheme dened on the grid functions and the grid hierarchy.

Components of a DOMAIN: A DOMAIN is dened by the following components: GRID HIERARCHY GRID FUNCTION(s) Update operators dened on GRID FUNCTION(s) and having the form: Update(GRID F UNCT ION (s)) ) GRID F UNCT ION (s) Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


10

Integration sequence on the set of GRID FUNCTION(s) Integration ordering along the grid hierarchy such as the Berger-Oliger integration scheme dened in Section 2

Local error estimation and regriding criterion Residuals to be computed Miscellaneous parameters such as grid buer size and regrid frequency

Operations on a DOMAIN Methods dened on the DOMAIN class operate recursively on the set of

GRID FUNCTIONs and the entire grid hierarchy using appropriate attributes for individual GRID FUNCTIONs. These methods/operators include:

Initialize the domain: Initialize all grid functions at all levels of the grid hierarchy using their respective Initialize operators.

Setup Boundaries on the domain: Set boundaries for each grid function in the domain using their respective Setup Boundary operators.

Evolve the domain: Integrate each of the grid functions dened on the grid hierarchy using their interior/boundary integrate operators and the specied integration sequence and ordering.

Estimate Errors over the domain: Use the specied error expression to evaluate the error value on each grid.

Regrid: Use the specied regrid criterion to ag region requiring regriding and perform the regriding. Compute Residuals: Computes specied residuals. View/Store : Enables visualization/dumping of the specied grid function at a particular grid level.

3.4 An AFD Application Program An AFD application program developed using programming abstractions dened in this section has the following structure:



11

1. Dene GRID HIERARCHY 2. Dene GRID FUNCTION(s) 3. Dene DOMAIN 4. Dene Auxiliary Parameters 5. Initialize DOMAIN 6. SetBoundaryCondition DOMAIN 7. Repeat nstep (a) Evolve DOMAIN (b) ComputeResiduals DOMAIN (c) Conditional View/Store GRID FUNCTION(s) 8. Output

4 Parallelization of the AFD Methods: Issues & Requirements The primary source of parallelism in AFD methods is data-parallelism which can be exploited by decomposing the discretized computational domain (or computational grid) across the processing elements and concurrently operating on the local portions of this domain. AFD methods also exhibit some taskparallelism in the form of updates to independent groups of grid-functions and integration across levels of the grid hierarchy. Consequently, the parallelization of AFD methods will consist of appropriately partitioning the computational grid across available computing nodes. Issues and requirements that have to addressed during such a parallelization are discussed below:

4.1 Data-Structure Requirements The fundamental data-structure dened by adaptive nite-dierence methods presented in Section 2 is a dynamic hierarchy of successively and selectively rened grids. This hierarchy is dynamic in the number levels present, in the number of component grids at a level, and in extent of these component grids and their location with respect to the base grid. Consequently, the data-structure requirements for parallel adaptive nite-dierence (PAFD) methods is a distributed dynamic DAG of grids. Other requirements dictated by the Berger-Oliger AFD scheme that need to be considered during parallelization are: (1) nesting requirements along the grid hierarchy and (2) clustering of sibling grids at each level during regriding.



12

4.2 Communication Requirements Communication requirements for PAFD methods discussed in Section 2 can be classied as follows:

4.2.1 Inter-grid Communication Inter-grid communications are communications between component grids at dierent levels of the grid hierarchy, are dened by the AFD integration algorithm. These communications are of two types:

Prolongations: Prolongations are dened from a coarse grid to a nested ne grid and are used to initialize the ne grid using a specied interpolation scheme. Prolongations are performed whenever a new rened grid is created during regriding.

Restrictions: Restrictions are dened from a ne grid to it's parent coarse grid and are used to update

the parent's approximation every time the child grid is integrated to the same time level as its parent. Restrictions are based on a specied interpolation or averaging function.

Inter-grid communications typically require gather/scatter type operations dened by a specied interpolation or averaging stencil. These communications are irregular, dened only at run-time, and can be very expensive. Further, they can lead to serious communication bottlenecks for certain decomposition schemes. For example, consider a grid component at the nest level of renement. The region in the base grid corresponding to this ne grid component will have only a fraction of its grid points. Thus, a decomposition scheme based strictly of work load (which is proportional to the number of grid points) will distribute the ne grid component across a much larger number of processors than the corresponding base grid region. Inter-grid communication in this case will require larger number of processors trying to simultaneous communicate with a much smaller number of processors resulting in a communication bottleneck.

4.2.2 Intra-grid Communication Parallelization of the AFD grid hierarchy requires the distribution of component grids across multiple processors. This decomposition results in intra-grid communication requirements to update the boundary elements of local portions of the distributed grid during integration. Intra-grid communications typically consist of near-neighbor exchanges based on the space-time stencil dened by the dierence operator used. These communications are regular and well dened and can be scheduled so as to be overlapped with computations on the interior region of the local portion of distributed grids.



13

4.2.3 Random Communication Random communications occur during clustering of component grids, while maintaining grid nesting, or during grid redistribution for load balancing. The number and nature of these communications depends on the decomposition strategy used.

4.3 Decomposition Issues The decomposition strategy denes how the dynamic grid hierarchy underlying AFD methods is decomposed and distributed across processing elements. Decompositions are dened so as to optimize metrics such as execution time, and costs in terms of resources and overheads. Key issues that need to be considered during decomposition are discussed below:

Parallelism: The primary source of parallelism in AFD schemes is via concurrent operations on ele-

ments of component grids at the same level of the grid hierarchy. Consequently, grid hierarchy must be decomposed so as to fully exploit this parallelism.

Communication Overheads: The selected decomposition scheme must minimize the overheads asso-

ciated each of three required communications. Decomposition issues related to the these communications are as follows:

As mentioned earlier, intra-grid communications are regular and can be overlapped with computations

on the interior regions of the local partition. Consequently, the decomposition should create partitions with a suciently large interior to boundary ratio so as to enable this overlapping.

Inter-grid gather/scatter communications require irregular broadcasts/multicasts and merges. These communications can be expensive and can lead to serious bottlenecks, specially in schemes which decompose grids individually. Consequently, inter-grid communications must be minimized.

Random communication is required during the regriding step which involves the creations and clustering of new, rened component grids, the deletion of an existing component grid, and maintaining of grid nesting. The domain decomposition scheme must support these operations with minimum overheads.

Balanced Load Distribution: Due to grid renements, dierent levels of the AFD grid hierarchy have

dierent computational loads. In case of the Berger-Oliger AFD scheme for time-dependent PDEs, spacetime renement results in rened grids which not only have a larger number of grid elements but are also updated more frequently (i.e. take smaller time steps). The coarser grid are generally more extensive and hence their computational load cannot be ignored. Consequently, decomposition schemes must result in Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


G

2 1

G

2

G

2

1

1

2 3

1

G2

G1

G

14

G3

0 1

P0

P1

P2

P3

Figure 2: Composite distribution of the grid hierarchy a balanced load distribution across processors so as to minimize processor idle time and therefore overall execution time. Further, the dynamic nature of the AFD grid hierarchy may make it necessary to redistribute the hierarchy run-time to maintain a balanced load. The initial decomposition strategy, the internal representation of the data-structure and the load balancing strategy must enable such a redistribution to be performed incrementally and eciently.

4.4 Decomposition of the Dynamic AFD Grid Hierarchy As outlined in this Section, an appropriate distribution of the dynamic AFD grid hierarchy is critical to its ecient parallelization. In this section we present a composite decomposition scheme that addresses the issues and requirements outlined above. Other possible decompositions are discussed in Appendix A. The composite distribution scheme is illustrated in Figure 2 for a 1 dimensional Berger-Oliger grid hierarchy. The objective of this decomposition is to alleviate the cost of potentially expensive inter-grid communications. This is achieved by decomposing the hierarchy is such a way that these communications become local to each processors. Parallelism across component grids at each level is fully exploited by this scheme. The composite decomposition scheme requires redistribution when component grids are created or destroyed during regriding. This redistribution, however, can be performed incrementally and will typically require of shifting data either left or right to neighboring processors. Although the composite distribution can eciently support PAFD methods, generating and maintaining this distribution using conventional data-structure representations results in large amounts of communications and data movement which in turn osets its advantages. In the following section we present a representation for AFD grid hierarchies that allows composite decompositions to be eciently generated Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


Morton Order

15

Peano-Hilbert Order

Figure 3: Space Filling Curves - Examples and maintained.

5 Data-Management Support for PAFD Methods The basic data-structures underlying parallel/distributed adaptive nite-dierence methods are:

A Scalable Distributed Dynamic Grid (SDDG) which is a distributed and dynamic array, and is used to implement a single component grid in the adaptive grid hierarchy.

A Distributed Adaptive Grid Hierarchy (DAGH) which is dened as a dynamic collection of SDDGs and implements the entire adaptive grid hierarchy.

If is clear from the discussion in the previous section that an ecient and eective parallel/distributed AFD implementation must use a parallelization, decomposition, and distribution scheme that complements the problem itself. Correspondingly, the implementation of the identied data-structures must complement the dynamic and hierarchical nature of adaptive grid hierarchy. In this section we present a representation, and storage/access mechanisms for the SDDG and DAGH data-structures that meets these requirements. The fundamental requirement in realizing such a class of dynamic, distributed data-structures is the generation of an extendable, global index-space. A class of dimension changing mapping called space-lling curves can be used to provide such an index space and are introduced below.

5.1 Space Filling Curves Space lling curves are a class of locality preserving mappings from d-dimensional space to 1-dimensional space , i.e. N d ! N 1, such that each point in N d is mapped to a unique point or index in N 1. The mapping Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


16

Level 2

Level 1

Level 1

Level 3

Level 2

Figure 4: Space Filling Curves - Self-Similarity Property can thus be though of as laying out a string within the d-dimensional space so that it completely lls the space. The 1-dimensional mapping generated by the space-lling curve serves as an ordered indexing into multi-dimensional space. Mapping functions used to generate the space-lling index corresponding to a point in multi-dimensional space typically consist of interleaving operations and logical manipulations of the coordinates of the point, and are computational inexpensive. Two such mappings, the Morton order and the Peano-Hilbert order, are shown in Figure 3. A more extensive coverage of space-lling curves and their properties can be found in 5, 6]. Two properties of space lling curves viz. digital causality and self-similarity, make them particularly suited as a means of generating the required adaptive and hierarchical index-space.

Digital Causality: Digital causality implies that points that are close together in the d-dimensional

space will be mapped to points that are close together in the 1-dimensional space, i.e. locality is preserved by the mapping.



17

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

{0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15} (Morton) {0 1 5 4 8 12 13 9 10 14 15 11 7 6 2 3} (Peano-Hilbert)

Figure 5: SDDG Representation

Self-Similarity: Self-similarity property implies that, as a d-dimensional region is rened into smaller

sub-regions, the rened sub-regions can be recursively lled by curves that have the same structure as the curve used to ll the original (unrened) region but possibly dierent orientations. Figure 4 illustrates this property for a 2-dimensional regions with renements by factors of 2 and 3. In addition to providing a linear ordered index space that is hierarchical and extendable, each index generated by the space-lling mapping has information about the original multi-dimensional space embedded in it. As result, given a key, it is possible to obtains its position in the original multi-dimensional space. Space-lling mappings can be used to develop an appropriate representation for the two data-structures as described below.

5.2 SDDG Representation A multi-dimensional SDDG is represented as a one dimensional ordered list of SDDG blocks obtained by rst blocking the the SDDG to obtain required granularity, and then ordering the SDDG blocks based on the selected space-lling curve. The granularity of SDDG blocks is system dependent and must attempt to optimize the computation-communication ratio for each block. Each block in the list is assigned a cost corresponding to its computational load. Figure 5 illustrates this representation for a 2-dimensional SDDG. Decomposition of the SDDG across processing elements now consists of appropriately partitioning the SDDG block list so as to balance the total cost at each processor. Since the space-lling curve mapping preserves spatial locality, the resulting distribution is comparable to traditional block distributions in terms of communication overheads. Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


1

0

18 3

2

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

4

7

8

11

14

13

12

15

{0 1 4 {0 1 4 5} 2 3 {2 3 6 7} 7 8 {8 9 12 13} 12 13 {10 11 14 15} 11 14 15} (Morton) {0 1 {0 1 5 4} 4 8 12 13 {8 12 13 9 10 14 11 15} 14 15 11 7 {7 6 2 3} 2 3} (Peano-Hilbert)

Figure 6: Composite representation - Example 1 0

1

2

3

1 0

1

4

8 9 10 11

8

12 13 14 15

8 12

12

3

2

1 2 3 4 5 6 7

0

4

3

2

13

13

14

14

7 7 11

11 15

15

{{0 1 2 3} 1 4 {0 1 4 {0 1 4 5}} 2 3 {2 3 {2 3 6 7} 7} 7 8 {8 {8 9 12 13} 12 13} 12 13 {{10 11 14 15} 11 14 15} 11 14 15} (Morton) {{0 1 3 2} 1 {0 1 {0 1 5 4} 4} 4 8 12 13 {8 12 13 {8 12 13 9 10 14 15 11} 14 15 11} 14 15 11 7 {7 {7 6 2 3} 2 3} 2 3} (Peano-Hilbert)

Figure 7: Composite representation - Example 2

5.3 DAGH Representation DAGHs can be represented in two ways corresponding to the two views of the adaptive grid hierarchy. The rst representation corresponds to the hierarchical or DAG view and consists of sets of SDDGs lists for each level. This view enables component grids at each level of the grid hierarchy to be addressed and operated on individually. The second DAGH representation corresponds to the composite view of the adaptive grid hierarchy. This representation starts with a simple SDDG list for the base grid and appropriately incorporates newly Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


1

0

19 3

2

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

4

7

8

12

11

14

13

15

{0 1 4 {0 1 4 5} 2 3 {2 3 6 7} 7 8 {8 9 12 13} 12 13 {10 11 14 15} 11 14 15} (Morton) P0

P1

P2

P3

{0 1 {0 1 5 4} 4 8 12 13 {8 12 13 9 10 14 11 15} 14 15 11 7 {7 6 2 3} 2 3} (Peano-Hilbert) P0

P1

P2

P3

Figure 8: Composite distribution - Example 1 created SDDGs within this list, as the base grid gets rened. The resulting structure is a composite list of the entire adaptive grid hierarchy. Incorporation of rened components grids into the base SDDG list is is achieved by exploiting the self-similarity property of space-lling orderings. For each rened region, the SDDG sub-list corresponding to the rened region is replaced by the child's SDDG list. The costs associated with blocks of the new list are updated to reect combined computational loads of the parent and child. The resulting structure is an ordered list of DAGH blocks, where each DAGH block represents a block of the entire grid hierarchy and may contain more than one grid level. Figures 6 & 7 illustrate the composite representation for a two dimensional case. Dierent decompositions of the adaptive grid hierarchy (discussed in Section 4) can be generated by using one of the two DAGH representations and partitioning the ordered lists based on associated costs. In particular, the desired composite decomposition can now be easily generated by partitioning the composite DAGH list to balance the cost associated to each processor. The resulting decompositions are shown in Figures 8-10.

5.4 Data-Structure Storage Data-structure storage can be divided into two components (1) storage of adaptive grid structure, and (2) storage of associated data. The overall storage scheme is shown in Figure 11. The two components are described below: Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


0

1

2

3

20

1 0

1

4

0

3

2 2

1 4 5 6

4

7

3

7

7

8 9 10 11

8

11

12 13 14 15

8

13

12

11

14

15

14

13

12

3

2

15

{{0 1 2 3} 1 4 {0 1 4 {0 1 4 5}} 2 3 {2 3 {2 3 6 7} 7} 7 8 {8 {8 9 12 13} 12 13} 12 13 {{10 11 14 15} 11 14 15} 11 14 15} (Morton) P0

P1

P2

P3

Figure 9: Composite distribution - Example 2A 0

1

2

3

1 0

1

4

1 2 3 4 5 6 7 8 9 10 11

8

12 13 14 15

8 12

12

3

2

0

4

3

2

13

13

14

14

7 7 11

11 15

15

{{0 1 3 2} 1 {0 1 {0 1 5 4} 4} 4 8 12 13 {8 12 13 {8 12 13 9 10 14 15 11} 14 15 11} 14 15 11 7 {7 {7 6 2 3} 2 3} 2 3} (Peano-Hilbert) P0

P1

P2

P3

Figure 10: Composite distribution - Example 2B

5.4.1 Adaptive Grid Structure Storage The structure of the adaptive grid hierarchy is stored as ordered lists using the representation presented above. Both views of the grid hierarchy (hierarchical and composite) are maintained the hierarchical view to enable each level of the hierarchy to be addressed and operated on individually the composite view to enable an appropriate composite distribution and redistribution of the grid hierarchy. Implementation of the two views however, consists of a single abstract data object with appropriate interfaces dened to enable it to be operated on as a single composite DAGH list and as sets of SDDG lists per level. The basic addressable unit in the our storage scheme is a single block in the SDDG or DAGH list. Each DAGH block represents a block of the entire adaptive grid hierarchy and can contain multiple levels of component SDDGs a SDDG block being a special case of a DAGH block with a single level. Each DAGH block is assigned a index into the space-lling index space which identies its location in the entire grid Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


Key Levels Work

Key Levels Work

21 Key Levels Work

Key Levels Work

List of Composite DAGH_Blks

Adaptive Grid Structure

Key Work

Key Work

Key Work

Key Work

Key Work

Key Work

Key Work

Key Work

"SDDA" Key Work

Key Work

Key Work

Key Work

Lists DAGH_Blks per level

Figure 11: Storage Scheme structure. It also maintains information about its extent (granularity), the number of renement levels it contains, and a cost measure corresponding to its computational load. By exploiting geometry information encoded in its space-lling index, each DAG block is compute the keys associated with neighboring blocks along each dimension. Operations dened on a DAGH blocks include renement/coarsening and decomposition/composition. The renement operators adds new rened component SDDG to the DAGH block and updates its level count accordingly, while the coarsen operator deletes the nest level contained in the DAGH block. Decomposition of a DAGH block consists of dividing it up into smaller DAGH blocks. The compose operation composes DAGH blocks into single DAGH block with a greater extent. The operations on DAGH blocks are illustrated in Figure 12.

5.4.2 Data Storage The data storage component on the proposed storage scheme serves as a direct access repository for computational data that need to be communicated between neighboring DAGH blocks. This storage is implemented as a \Scalable Distributed Dynamic Array" (SDDA) that uses extendable hashing techniques 9, 10] to provide a dynamically extendable, globally indexed storage. The SDDA is a hierarchical structure and is capable dynamically expanding and contracting as required. Entries into the SDDA correspond to DAGH blocks and the array is indexed using the associated DAGH block keys. The SDDA data storage provides a means for ecient intra-grid communication between DAGH blocks. To communicate data to neighboring DAGH blocks, the data is copied to appropriate locations in the Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


22

Levels = 1

refine/coarsen

decomp/comp

Levels = 2

refine/coarsen

Levels = 3

Figure 12: DAGH Block Operations SDDA. This information is then asynchronously shipped to the appropriate processor. Similarly, data needed from remote DAGH blocks is received on-the-y and inserted into the appropriate location in the SDDA. Storage associated with the SDDA is maintained in ready-to-ship buckets. This alleviates overheads associated with packing and unpacking. An incoming bucket is directly inserted into its location in the SDDA. Similarly, when the data associated with a DAGH block entry is ready to ship, the associated bucket is shipped as it is.

5.5 PAFD Implementation based on SDDG/DAGH This dierent components of a PAFD applications can now be developed using the data-structures and storage scheme dened above. In this sections we discuss the implementation of some of these components.

Base Grid Creation: The base grid is created using specied bounding box information. Each processor recursively decomposes this bounding box into DAGH blocks with granularity greater than an architecture specic minimum. A sucient number of block are generated so that they can be uniformly distributed among the available processors. The blocks are ordered and linked to generate a DAGH. This list is then uniformly partitioned among the set of processors. The associated data-storage is created on each processor Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


23

as a single level SDDA.

Renement, Regriding & Redistribution: The error estimation and regriding stage of the PAFD

method is performed in three steps. During the rst step, each processor uses the specied error estimation technique to ag regions needing renement in the DAGH blocks assigned to it. Locally agged regions are then clustered by each processor and the local portion of the composite list is accordingly updated. Decompose/compose and rene operators dened on a DAGH block are used to generate the rened DAGH list. Similarly, renements that are no longer required are deleted by coarsening the associated DAGH blocks. In the second step, a global concatenate operation combines local portion of the updated DAGH block list so that each processor now contains the structure of the entire grid hierarchy. As second round of clustering now performed on locally clustered rened regions and required buer regions are are added. The nal rened DAGH block list is created and the SDDA structure is updated accordingly. Redistribution consists of partitioning the updated list so as to balance the load. The nal step consists of initializing the data elements (GRID FUNCTIONs) associated with the new grid hierarchy. This is performed using the SDDA data storage.

Time integration and Intra-Grid Communication: The time integration component of the integra-

tion algorithm is performed in two steps. In the rst step, each processors updates the portions of each of its DAGH blocks that need to be communicated to neighboring blocks and copies this updated data to the appropriate locations in the SDDA. This data is asynchronously shipped out in the second step while the processor is updating the remaining portion of its DAGH blocks.

Inter-Grid Prolongations & Restrictions: Inter-grid prolongations and restrictions are local to each DAGH blocks and hence can be performed without any communications.

6 Conclusions In this report we presented an infrastructure for supporting parallel/distributed adaptive mesh-renement techniques. The infra-structure is part of the computational toolkit being designed to support the Binary Black-Hole grand challenge eort. The infrastructure denes a set of high-level programming abstractions that can be used to express adaptive nite-dierence algorithm, and provides data-management support to implement these abstractions in a parallel/distributed environment. The proposed distributed, dynamic data-structures are based on the denition of an extendable, global index-space generated by a class of space-lling mapping. A scalable distributed dynamic array is built using this index-space and is used as a means for ecient data communication.



24

References 1] Marsha J. Berger and Joseph Oliger, \Adaptive Mesh Renement for Hyperbolic Partial Dierential Equations", Jounal of Computational Physics, pp. 484{512, 1984. 2] M. S. Warren and J. K. Salmon, \A Parallel Hashed Oct-Tree N-Body Algorithm", Proceedings of Supercomputing'93, Nov. 1993. 3] Carter Edwards, \Extendable Hashing Data Management for Globally Addressable Transperently Distributed Data Objects", Technical report, Department of Computer Sciecne, University of Texas at Austin, Aug. 1994. 4] High Performance Fortran Forum, High Performance Fortran Language Specications, Version 1.0, Jan. 1993, Also available as Technical Report CRPC-TR92225 from Center for Research on Parallel Computing, Rice University, Houston, TX 77251-1892. 5] Hanan Samet, The Design and Analysis of Spatial Data Structures, Addison - Wesley Publishing Company, 1989. 6] Theodore Bially, A Class of Dimension Changing Mappings and its Application to Bandwidth Compression, PhD thesis, Polytechnic Institute of Brooklyn, 1967. 7] Edward A. Patrick, Douglas R. Anderson, and F. K. Bechtel, \Mapping Multidimensional Space to One Dimension for Computer Output Display", IEEE Transactions on Computers, C-17(10):949{953, Oct. 1968. 8] Carter Edwards, \Data Structures for Parallel Distributed Adaptive hp Finite Element Method (FEM) Applications", Technical report, Department of Computer Sciences, University of Texas at Austin, Oct. 1994. 9] R. Fagin, \Extendible Hashing - A Fast Access Mechanism for Dynamic Files", ACM TODS, 4:315{ 344, 1979. 10] H. F. Korth and A. Silberschatz, Database System Concepts, McGraw Hill, New York, 2 edition, 1991.



G G

G

2

G

1

25

2

G

2

1

G

1

1

G

2

2 3

1 3

0 1

Figure 13: 1-D grid hierarchy G

2

G

1

2

G

2

2 3

P0 P1 P2 P3

P0 P1 P2 P3 P0P1P2P3 1

1

P0

G

1

G2

G1

P1

P2

G3 P0

P3

P1

P2

P0

P3

P1

P2

P3

0 1

P0

P1

P2

P3

Figure 14: Independent grid distribution of the grid hierarchy

A Decomposition Schemes for the Dynamic AFD Grid Hierarchy This section discusses dierent decompositions of the AFD grid hierarchy. The one-dimensional grid hierarchy shown in Figure 13 is used to illustrate the dierent schemes.

A.1 Independent Grid Distribution The independent grid distribution scheme shown in Figure 14 distributes the components grids at dierent levels independently across the processors. This distribution leads to balanced loads and no re-distribution is required when grids are created or deleted. However, the decomposition scheme can be very inecient with regard to inter-grid communication. In the adaptive grid hierarchy, a ne grid corresponds to (and updates) a small region of the underlying coarse grid. If both, the ne and coarse grid are distributed over the entire set of processors, all the processors (corresponding to a ne grid) will communicate with the small set of processors corresponding to the associated coarse grid region, thereby causing a serious bottleneck. For example, in Figure 14, a restriction from grid G22 to grid G11 requires all the processors to communicate with processor P3 Department of Computer Sciences & Center for Relativity University of Texas at Austin Austin, TX 78712-1081 Tel: (512) 471-1103 Fax: (512) 471-0890


G

P2

2

G

1

2

G

2

P3

P3

2 3

P3 1

1

1

G2

G1

G3

P1

G

26

P1

P2

P2

0 1

P0

G

0 1

P0

1 G1 P1

P1

1 G 2

1 G3

G

2 1

P2

G

2 2

G

2 3

P3

Figure 15: Combined grid distribution of the grid hierarchy Another problem with this distribution is that parallelism across multiple components at a single grid level is not exploited. For example, in Figure 14, grids G11 G12 & G13 are distributed across the same set of processors and have to integrated sequentially. Finally, clustering operations can be expensive for such a distribution.

A.2 Combined Grid Distribution The combined grid distribution, shown in Figure 15, distributes the total work load in the grid hierarchy by rst forming a simple linear structure by abutting the grid components and then decomposing this structure into partitions of equal load. The combined decompositionscheme also suer from the bottleneck described for the independent grid distribution, but to a lesser same extent. For example, in Figure 15, G21 and G22 update G11 requiring P 2 and P 3 to communicate with P 1 for every restriction. Further, regriding operations involving the creation or deletion of a grid is extremely expensive as it requires an almost complete re-distribution of the grid hierarchy. However, a more serious problem with this distribution is that it does not exploit the main source of parallelism (across grids). For example, when G01 is being updated, processors P 2 and P 3 are idle and P 1 has only a small amount of work. Similarly when updating grids at level 1 (G11, G12 and G13) processors P 0 and P 3 are idle, and when updating grids at level 2 (G21, G22 and G23) processors P 0 and P 1 are idle.



G

P0

2

G

1

P1

2

G

2

P2

P1 P2 1

1

P0

2 3

P3

1

G2

G1

G

27

G3

P1

P1

P2

P2

P3

0 1

P0

P1

P2

P3

Figure 16: Independent level distribution of the grid hierarchy

A.3 Independent Level Distribution In the independent level distribution scheme (see Figure 16), each level of the AFD grid hierarchy is individually distributed by partitioning the combined load of all component grids at the level is distributed among the processors. This sheme overcomes some of the drawbacks of the independent grid distribution.Parallelism with in a level of the hierarchy is exploited. Although, the inter-grid communication bottleneck is reduced for this distribution, the required gather/scatter communications can be expensive. Creation or deletion of component grids at any level requires a re-distribution of the entire level.

B Berger-Oliger AFD Algorithm The Berger-Oliger AFD integration algorithm is dened recursively in 1] as follows:

Algorithm 1 Integrate(level) 1 2 3 4 5 6 7 8 9 10

repeat (Refine Ratio)level if (Regrid Time) then do Regriding end if

Step tlevel on all grids at level if (level+1 exists) then Itegrate(level+1) Update(level,level+1) end if end repeat