gmres for sequentially multiple nearby systems

GMRES FOR SEQUENTIALLY MULTIPLE NEARBY SYSTEMS K. GURU PRASAD , D. E. KEYESy AND J. H. KANEz

Abstract. An application of the Generalized Minimal Residual (GMRES) algorithm to the solution of sequentially multiple nearby systems of equations through the reuse of Krylov subspaces is presented. The main focus is on the case when only the right-hand side vector changes. However, the case in which both the matrix and the right-hand side change is also addressed. Applications of these formulations include nonlinear problems, Design Sensitivity Analysis, multiple load cases, and transient analyses. Examples are drawn from systems arising from the discretization of Boundary Integral Equations. Though there is no reason to expect that Krylov subspaces may eectively be reused for unrelated right-hand sides (except in the trivial full-dimensional limit), recent experience suggests that reuse has broad practical applicability. Key words. GMRES, Boundary Integral Equations, Krylov methods, dense nonsymmetric systems, design sensitivity analysis, reanalysis, multiple right-hand sides AMS(MOS) subject classi cations. 65F10, 65N22, 65N38

1. Introduction. Direct methods for solving dense linear systems of n equations easily amortize the O(n3 ) operation cost of the rst system when multiple righthand sides are to be solved with an unchanging coecient matrix. Multiple righthand sides with the same matrix arise in engineering applications such as multiple loading conditions, certain time-evolving problems, and certain nonlinear problems. In addition, there are cases in which both the coecient matrix and the right-hand side change to form a nearby system. Examples include general nonlinear problems and Design Sensitivity Analysis (DSA). In these cases also, the original factorization of a direct solver can often eectively be reused through iterative reanalysis techniques in which terms involving matrix perturbations are thrown on the right-hand side. Dense systems arising in Boundary Integral Equations (BIEs) can typically be solved in O(n2 ) operations by Krylov iterative methods (or in faster still in some problems through fast multipole methods, see, e.g. [12]), but the constant is large compared to that for repeat solves with direct methods. Thus, until recently, iterative methods have not enjoyed much favor for sequentially multiple right-hand sides. When multiple right-hand sides are available a priori , block iterative methods such as block-Lanczos or block-conjugate gradient [13, 17] or the block-Stiefel method [17, 18] can be used to reduce the average cost per right-hand side. However, when each right-hand side depends upon the solution of the previous, the block methods are not directly applicable. A two-stage approach to solve for a second system when the Lanczos vectors of the rst have been saved was proposed in [14]. The rst stage is a simple Galerkin projection of the solution onto the previous Krylov subspace. The second stage is a continuation of the Krylov process in which the new vectors associated with the second right-hand side are orthogonalized not only with Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. The work of this author was supported by NSF grant DDM 90-20733. y Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA. The work of this author was supported by NSF grant EET 89-57475 and by the National Aeronautics and Space Administration under NASA contract No. NAS1-19480, while in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681. z Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, NY 13699, USA. The work of this author was supported by NSF grant DDM 90-19852.

1

respect to the original Krylov subspace, but also against a basis vector derived from the second right-hand side. These stages are called \Lanczos-Galerkin projection," and \modi ed Lanczos" in [16], where they are analyzed, together with a \restarted Lanczos-Galerkin" method. The analysis, though not the technique, is restricted to symmetric positive de nite systems. In related and similarly restricted work [22], an ecient method is presented for solving f(A)x = b using the Krylov subspace information obtained from the solution of Ax = b, for suitable functions f. One application would be implicit transient problems in which only the time step contribution to the diagonal changes, but the coecients of A not otherwise. Two recent papers have presented the use of Krylov iterative methods to sequentially multiple right-hand sides arising in practical large-scale applications. Farhat et al. [4] apply the preconditioned conjugate gradient method, along with projection, for multiple load static analyses and implicit linear dynamic analyses using a substructured nite element formulation. A transient incompressible Navier-Stokes application with an implicit pressure equation amenable to conjugate gradients is solved using projection into a space of previous solutions to produce an initial iterate by Fischer [5]. The full linear systems in these applications are large, sparse and symmetric positive de nite. Conjugate gradients is therefore natural for a rst right-hand side. However, for repeat solutions with the same matrix, accumulating a basis as a by-product of early solutions to reduce the number of iterations in later solutions is storage intensive. In [4], this obstacle is avoided by applying the iterative solution process only to a small Schur complement system for degrees of freedom on interfaces between subdomains, with interior degrees of freedom condensed out. Thus, through substructuring or domain decomposition, the advantages of Krylov iterative methods can be extended to the context of multiple right-hand sides. In [5], the extra storage required for the pressure basis vectors is tolerable in comparison to that occupied by other work areas. The aim of this paper is to investigate the possibility of using iterative solvers like GMRES for solving dense nonsymmetric systems of equations where the righthand side and possibly the coecient matrix change. The density of the original system is a crucial feature, since it implies that storage of a complete Krylov basis, if necessary, at worst doubles the memory requirements of the original system, instead of increasing it by an order of magnitude. Nonsymmetry is also important, since it implies that there is no short recurrence formula for new basis vectors, as there is in conjugate gradients. Thus, all basis vectors are saved for the current system, whether or not there will be a subsequent system over which to amortize them. (We write in the context of nonrestarted GMRES. Other Krylov methods designed to avoid the storage costs of nonrestarted GMRES are not as automatically adapted to sequentially multiple right-hand sides.) Although the practicality of our approach may be tied to the dense nonsymmetric characteristics of BIEs, the formulation itself is general. Section 2 brie y reviews GMRES, which is adapted to multiple right-hand sides in Section 3. Section 4 brie y reviews the collocation boundary element formulation of BIEs, and its extensions to nonlinear analysis and DSA. Section 5 contains numerical illustrations of the eective use of GMRES in BIEs, including examples in which the coecient matrix is unchanging and an example in which nearby matrices, along with nearby right-hand sides, are generated. Though Section 3.1 below is a straightforward extension of [14] to the nonsymmetric case, we believe that the extension, as well as its successful application in BIEs, makes its rst appearance herein. 2

2. Generalized Minimal Residual Method. As its name implies, the GMRES method nds an approximate solution of a linear system of equations that is optimal among all possible elements in a Krylov subspace in the sense of residual minimization. Given the n-dimensional nonsingular system Ax = b;

(1) and the initial residual

r0 = b , Ax0 ; the m-dimensional Krylov subspace, denoted Km , is the space spanned by the vectors

r0 ; Ar0; A2r0; :::Am,1r0 :

(2)

The Krylov basis is adapted to the operator A and the right-hand side b by construction, and good solutions can often be found in small subspaces (m n). The Krylov basis is typically unsuitable for direct use, but other orthonormal or AT A-conjugate bases spanning the same subspace can be constructed at moderate cost. In addition, system (1) can be transformed by pre- or post-multiplication by an n-dimensional preconditioning matrix. GMRES is detailed in its original unpreconditioned form in [20] and its eective use in solving discretized BIEs has been noted in [7, 23]. (The survey [3] describes a variety of dense applications to which Krylov methods have been applied.) After m iterations, GMRES minimizes the residual rm by solving a least squares problem for the coecients representing x anely as x0 plus a linear combination of orthonormal basis vectors fv1 ; v2; . . .; vm g of Km , whose collection into an n m matrix we denote Vm . This Arnoldi basis is formed via a Gram-Schmidt process: v1 r0=jjr0jj; For k = 1; 2; . . . f w Avk ; For i = 1; 2; . . .; k f hi;k viT w; w w , hi;k vi ;

g

g

hk+1;k

kwk; vk+1

w=hk+1;k;

After k steps of the Arnoldi process, the scalars hij form a (k + 1) k matrix H k , which is square upper Hessenberg extended by a row that is nonzero only in the lower right entry, and we have AVk = Vk+1H k . The overall GMRES process is symbolized in matrix terms as follows:

x = x0 + z; z = Vk y; y = arg min jjb , A(x0 + Vk y)jj = arg min jjr0 , AVk yjj = arg min jjr0 , Vk+1 H k yjj = arg min jjVk+1(g , H k y)jj = arg min jjg , H k yjj: 3

Here, g is jjr0jj times the rst column of the identity matrix of size k + 1, and Vk+1 comes outside the Euclidean norm since is it orthonormal. The overdetermined problem remaining for y is solved by a QR factorization of H k , followed by a backsolve with the upper triangular part of R. We make the convention (for compatibility with [20]) that Qk H k = Rk, where Qk is a (k + 1) (k + 1) matrix and Rk has the same dimensions as H k . Then (3)

y = bRkc,1bQk gc;

where the bc notation refers to deletion of the last row of its argument. The eciency of GMRES, vis-a-vis the other Petrov-Galerkin Krylov methods to which it is mathematically equivalent [19], is due, in part, to the step-by-step in-place accumulation of the QR factorization of H k , which allows dynamic monitoring of the norm of the residual without construction of intermediate solutions. The magnitude of the last element of the iteratively updated g is jjrkjj. In practice, the stopping value, m, of k is determined by inspecting this terminal element. The QR transformation of a Hessenberg matrix can be accomplished with k Givens rotations, which are stored in factored form, Qk = Gk Gk,1 G2 G1, where Gj is a plane rotation in the (j; j + 1) (j; j + 1) block that zeros out the (j + 1; j) element of H j . The original reference [20] should be consulted for details and [20, 6] for convergence theory; a pseudo-code description follows the \NR = 1" branches of Figure 1 below. 3. GMRES for Sequentially Multiple Right-Hand Sides. In this section, we present a formulation of GMRES for multiple right-hand sides in which the righthand sides need not be known a priori. It consists of two basic steps: use of the Krylov subspace out of which the most recent solution has been drawn to improve an initial iterate for the solution of the next right-hand side, and further improvement by expansion of that subspace through a continuation of the Arnoldi process. Thus, it is required to solve Ax~ = b~; where b~ is \near" b in some sense.

3.1. Projection for an initial approximation. A natural means of seeking to improve an initial iterate x00 for x~ is to do a Galerkin projection onto the Krylov subspace generated for the rst system. The new initial iterate is given by (4) x~0 = x00 + Vm y~: where (5) and (6)

T 0 y~ = bRmc,1 bQm Vm0r0 c;

r00 = b~ , Ax00 :

Since Vm is based on the previous right-hand side system, it can be anticipated that the solution x~0 so obtained will not be accurate enough unless r00 = b~ , Ax00 ; is nearly contained in the previous subspace Km , i.e., unless (I , Vm VmT )r00 0. We comment in Section 6 about the likelihood of this occurring in practice. 4

3.2. Improving the solution. If the projected solution does not satisfy a residualbased convergence criterion, the Arnoldi process is continued further with k = m + 1;m + 2; . . . in the usual way with the new seed vm+1 = ~r0=k~r0k; except that the new Krylov basis vectors vm+1 , vm+2 , . . ., are successively orthogonalized with respect to all of the previous vectors in Vm . The vector playing the role of g in (3) now comes from the new right-hand side. That is to say, if we assume the number of iterations to be m for the rst right-hand side and p for the second, the overall GMRES process can be symbolized in matrix terms as follows: x~ = x~0 + ~z; ~z = Vm+p y~; y~ = arg min jjb~ , A(x~0 + Vm+p y~)jj = arg min jj~r0 , AVm+p y~jj = arg min jj~r0 , Vm+p+1 H m+p y~jj = arg min jjVm+p+1 (g , H m+p y~)jj = arg min jjg , H m+p y~jj: As with the rst right-hand side, the solution of the last system is carried out by an We take the components of ~r0 on each of iteration-by-iteration QR factorization of H. the vectors (v1 ; . . .; vm ) of the previous Krylov space to get the rst m components of g, then apply the previous m Givens rotations to them to overwrite them with Qm g, and continue with further iterations of GMRES. it can be shown to remain upper HesAnalyzing the structure of the matrix H, senberg except for extra terms accumulating in the mth column. Since H k is de ned as (7) H k = VkT+1 AVk ; an expanded view of H k is 2 h . . . h1;m,1 h1m v1T Aw1 v1T Aw2 . . . 3 11 66 7 ... h h2m v2T Aw1 v2T Aw2 . . . 77 2;m,1 66 h21 .. .. .. .. .. 77 ... 66 . . . . . 7 (8) H = 66 0 hm;m,1 hmm vmT Aw1 vmT Aw2 . . . 77 ; 66 w1T Av1 . . . w1T Avm,1 w1T Avm w1T Aw1 w1T Aw2 . . . 77 64 w2T Av1 . . . w2T Avm,1 w2T Avm w2T Aw1 w2T Aw2 . . . 75 .. .. .. .. .. ... . ... . . . . where, for clarity, the set of vectors associated with the second right-hand side, namely vm+1 ; vm+2; . . ., are denoted w1; w2 ; . . .. The terms below the diagonal in the mth column are generally nonzero. However, the rest of the terms in the lower left block are zero in exact arithmetic. To see this, recall that j ,1 X (9) v~j = Avj ,1 , hij vi; i=1

5

N=0 For Nside = 1; 2; 3; . . . ; NR

Input right-hand side b and initial iterate x0 r0 b , Ax0 If kr0 k tol(; kx0 k; kbk) Then x x0 ; Break to Next Nside Endif N N +1 If N =1, Then kr0 k; v1 r0 = ; g e1 ; ks 1 Endif If N > 1 Then For i = 1T; 2; . . . ; m gi r0 vi ; r0 r0 , gi vi Endfor Apply rotations Gi ; i = 1; m to g ym = bRm c,1 bQm gc x0 x0 + Vm ym ; r0 b , Ax0 kr0 k; vm+1 r0 = For i = 1T; 2; . . . ; m gi r0 vi Endfor gm+1 ; gm+2 0 Apply rotations Gi ; i = 1; m to g ks m + 1 Endif For k = ks; ks + 1; . . . until convergence w Avk ; wN w For i = 1; 2T; . . . ; k hi;k vi w; w w , hi;k vi Endfor hk+1;k kwk; vk+1 w=hk+1;k km k , 1; If N > 1; km m Apply rotations Gi ; i = 1; km to kth column of Hk If N > 1 Then hk;p vkT wl ; l = 1; N , 1 Apply rotations ((Fl;i ; l =th1; N , 1) and Gi ); i = m + 1; k , 1 to kth column of Hk Apply rotation FN;k to pl column of Hk (to make hk;p = 0), and to g; l = 1; N , 1 (**) Endif Apply rotation Gk to Hk (to make hk+1;k = 0) and to g (*) If jgk+1 j tol(; kx0 k; kbk) Then m k; pN k; Break to Exit Endif Endfor Exit: ym arg min jjg , H m yjj xm x0 + Vm ym Next Nside Endfor l

l

Fig. 1. GMRES algorithm for NR right-hand sides (Givens rotation matrices Gi and FN;i are de ned where rst used in lines (*) and (**) )

6

where v~j denotes vj before normalization, and j = 1; 2; . . .; m. Premultiplying both sides of this equation by wlT , l = 1; 2; . . ., and noting that wl is orthonormal to vectors v1 through vm by construction, we get wlT Avj,1 = 0. This argument breaks down for j > m, since vm+1 ( w1 ) comes from the normalization of the projection-reduced residual of the second right-hand side, not from the Krylov process. Strictly speaking, the span of the basis expanded beyond vm is no longer in general a Krylov subspace; rather, it is the sum of the original Krylov subspace and the projection of a second Krylov subspace onto the the orthogonal complement of vm . The terms in the \spike" below the diagonal in column m do not involve any computations beyond dot products since, in the expression wiT Avm , Avm is computed already and available as part of the solution process for the previous right-hand side when the mth column of the upper Hessenberg matrix was computed. After (m + p) iterations of GMRES the matrix H will not be perfect upper Hessenberg but will have additional nonzero terms in its mth column below its mth row. The spike is eliminated by an extra Givens rotation applied at each iteration from (m + 1) onwards. The rotation to reduce the newly introduced spike term comes in addition to the rotation required at each iteration of standard GMRES to eliminate the newly introduced subdiagonal. Convergence in residual is checked as a by-product of the iterations, and after convergence all of the (m + p) Krylov vectors are used to form the solution for the second right-hand side. This approach can readily be extended to a third and to subsequent right-hand sides. For the N th right-hand side, there will be spike terms below all the columns pl ; l = 1; N , 1 where pl is the cumulative size of the Krylov space for right-hand side l. This structure of H is illustrated below for 4 right-hand side systems. Here, the labels 1; 2; 3; 4 represent, respectively, potentially nonzero terms of the H formed during the solution process for the rst, the second, the third and the fourth right-hand sides.

2 1 ... 1 1 2 ... 2 2 3 ... 3 3 4 ... 4 4 3 66 1 . . . 1 1 2 . . . 2 2 3 . . . 3 3 4 . . . 4 4 77 66 . . . ... ... . . . ... ... ... . . . ... ... ... . . . ... ... ... 77 66 1 1 2 . . . 2 2 3 . . . 3 3 4 . . . 4 4 777 66 2 2 . . . 2 2 3 . . . 3 3 4 . . . 4 4 77 66 2 2 . . . 2 2 3 . . . 3 3 4 . . . 4 4 77 66 . . . . ... ... . . . ... ... ... . . . ... ... ... 77 .. 66 77 66 2 2 2 3 . . . 3 3 4 . . . 4 4 7 (10) H = 6 3 3 3 . . . 3 3 4 . . . 4 4 77 : 66 3 3 3 . . . 3 3 4 . . . 4 4 77 66 .. .. . . . ... ... . . . ... ... ... 77 66 . . 7 66 3 3 3 3 4 . . . 4 4 77 66 4 4 4 4 . . . 4 4 77 66 4 4 4 4 . . . 4 4 77 64 .. .. .. . . . ... ... 75 . . . 4 4 4 4 4 Figure 1 contains a pseudo-code description of the procedure. In this gure, Gi and FN;i are separate sets of rotations, the former for the subdiagonal terms, the latter for the spike terms. 7

3.3. Complexity Considerations. If m GMRES iterations are required to solve the rst right-hand side, the total oating point operation cost for m dense matvecs and k inner products at the kth iteration, k = 1; 2; . . .; m, is approximately mn2 + m2 n=2. If the proposed approach requires p additional iterations for the second right-hand side, the cost for the second system is approximately (p + 1)n2 + mnp + p2 n=2 + mn, since there are p + 1 matvecs and the (m + s)th iteration requires (m + s) inner products. (The remaining mn + n2 is the cost of the initial projection process). In a \break-even" analysis, it is easily seen that, for n m, the present approach for the second right-hand side will be worth using as long as p is approximately less than or equal to m=2. For several nearby right-hand sides, such as those occurring in nonlinear analysis, it is reasonable to hope that after a small number of outer iterations, GMRES will require only one or a few iterations in each of the outer loops. With this assumption, GMRES extended as above to multiple right-hand sides asymptotically costs approximately 2NR n2, where NR is the number of right-hand sides, whereas the direct LU and subsequent backsolves cost a total of approximately n3 =3 + NR n2. For NR = n right-hand side systems, both would cost O(n3 ) operations. Thus, the computational complexity of solving subsequent right hand-sides using the present GMRES formulation has the same asymptotic order as a direct LU factorization and backsolves when a large number of nearby right-hand systems are to be solved. The approach shown here requires storage of all of the Krylov vectors starting from the rst right-hand side. It is also required to save the matrix-vector product A times the last Krylov vector for each of the right-hand sides. This can be a signi cant memory overhead { up to a full orthogonal basis. Conversely, as more right-hand sides are handled with the same coecient matrix, the growing Krylov subspace satis es more approximate solutions through projection alone, resulting in smaller operation cost. The estimates in this section assume that large Krylov spaces can be accumulated without the need to reorthogonalize. 4. Boundary Element Analysis. The formulation presented above is general. The examples in Section 5 illustrating its practical use have been drawn from boundary element discretizations. A brief summary of the origins of Ax = b for the examples of Section 5 is given here. The BIE approach analytically transforms a d-dimensional PDE into a (d , 1)-dimensional boundary integral equation. The BIE is then solved numerically by expansion into a nite-dimensional subspace and collocation. Thermal conduction applications in a nite body without internal energy generation, for example, satisfy Laplace's equation r2 T = 0 for temperature T(x). Letting q(x) denote the normal heat ux at the surface of the body, T (x; ) the in nite domain fundamental solution for temperature corresponding to a unit source at , and q (x; ) the associated normal heat ux evaluated at the body surface, we can write, through Green's second identity, Z Z q T = T q; S

S

where S is the surface of the 3-D object under consideration. The fundamental solutions are q = 1=(4R3)(xi , i )Ni and T = 1=(4R); 8

where R = jjxi ,i jj. The boundary of the object is discretized into boundary elements joined continuously at nodes and edges and the integration is performed numerically over all boundary patches. Within each element, the unknown response is assumed to vary polynomially in terms of nodal values. In the singular collocation method, the above integral equation is written by placing the source point of the fundamental solution at each of the boundary nodes, thus getting a system of n equations in n unknowns, (11) F T = Gq: Elasticity problems are approached conceptually in the same manner except that the unknowns are multicomponent traction and displacement vector elds. The governing dierential equations in this case (in the absence of body forces) are the stress equations of equilibrium ij;i = 0; where 2 3 xx xy xz = 4 xy yy yz 5 xz yz zz is the 3-D stress tensor with six independent components. Using the stress-strain law, the strain-displacement relations, and the Cauchy stress-traction transformation relation, the stress equilibrium equations can be expressed in terms of displacements and tractions experienced by the object. We then consider the fundamental (Kelvin) solutions of displacements (ui ) and tractions (ti ) to the problem of an in nite elastic medium subjected to a point load, and use the reciprocal theorem, leading to Z Z ti ui = ui ti: S

S

Upon discretizing the stressed object, we place the source point of the fundamental solution at each of the boundary nodes and evaluate the fundamental solution in each of the coordinate directions. Finally, we can again write the discretized BIE in the customary notation (12) F u = Gt; where F and G are the boundary element coecient matrices, and u and t are the displacement and traction vectors. In (11), either the temperature or the ux is known at each boundary point. In (12), for each coordinate direction, either the displacement or the traction is known at each point. Speci ed boundary conditions are enforced via column exchanges and negations in the systems (11) or (12) to produce the ultimate linear system Ax = b; where x contains the unknowns of both kinds (displacements and tractions in the case of elasticity). The coecient matrix from a singular collocation approach is generally dense and nonsymmetric. In the language of integral equations, it is generally neither purely \ rst kind" nor purely \second kind". The hypersingular Galerkin approach, growing in popularity, leads to dense symmetric systems amenable to analysis by conjugate gradients. 9

4.1. Nonlinear Analysis. As an important class of applications for multiple nearby systems, we consider nonlinear problems, which are inevitably solved by iteration over a set of linear problems. For example, in nonlinear thermal BEA, one has to solve a system of equations (11) and, after the column exchanges are made so that all the unknowns are on one side of the equations to be solved, some elements of the right-hand side are functions of the unknowns. This happens, for instance, in a problem where thermal conductivity is a function of temperature or in which certain locations have convection or radiation boundary conditions. In such cases a suitable guess is made for these unknowns, the right-hand side vector formed and the result iterated until nonlinear convergence. The thesis [24] shows two approaches, one with the right-hand side only changing and the other with both the right-hand side and the coecient matrix changing in each iteration. In the former case, the LU factorization of A is done once and reused in each of the iterations and in the latter case, LU of A is computed in each iteration. In the former case, there are many inexpensive nonlinear iterations, and in the latter case, there are a few expensive iterations. As shown below, Krylov iterative methods can eectively compete with direct methods. In the former case, the formulation of this paper for multiple right-hand sides may be utilized directly, (as in Section 5.2.2 below) whereas in the latter, a preconditioner formed in the rst iteration may be utilized in subsequent iterations. 4.2. Design Sensitivity Analysis. The goal of DSA is to compute the rates of change of an object's response with respect to changes in the parameters (design variables) that control its shape. The direct approach to DSA is an implicit dierentiation of the governing Boundary Integral Equation with respect to the design variable. If u and t represent the generic unknown response quantities and speci ed boundary conditions, respectively, then dierentiation of (12) with respect to the design variable L gives (13) F u;L = G;L t , F;L u where u;L represents sensitivity of response with respect to a geometric parameter controlling the shape of the object. If we are changing only the shape of the object and not the boundary conditions, t;L is identically zero and we are interested in u;L. As seen from these equations and discussed in [10, 11], the implicit dierentiation approach does not require the factorization of a perturbed system matrix but only the computation of a new right-hand side vector and a back solve using the available factorization of the original coecient matrix. However, this approach requires computation of derivatives of boundary element coecient matrices. For sophisticated element technologies [1, 2], this task requires the computation of rates of change of element geometry at integration points which in turn requires the computation of sensitivities of a substantial set of information associated with the surface geometry of the model. The dierentiated BIE is also computationally more complex. A second, semianalytical approach [21] can be used to determine the matrix sensitivities of (13) where a nite dierence is performed between the matrix coecients of the original and perturbed models and then forward and back substitution is performed. However, the semianalytical approach is observed to be sensitive to the nite dierence step size. Implicit dierentiation, whether performed using analytically formulated matrix derivatives or using a nite dierence approach produces a righthand side vector that is not necessarily \nearby" to the original right-hand side for the baseline response. Thus, neither the semianalytical or the implicit approaches are of direct interest in this paper. 10

A third technique, presented in [9, 15] is more suitable. The technique involves performing two analyses, one for a `baseline' model and the other for a slightly perturbed `nearby' model, the latter being computationally economical, and then performing a nite dierence between the two responses. The term reanalysis refers to performing an analysis of a perturbed model. There are ecient reanalysis techniques for both a direct rst analysis [8] and an iterative rst analysis [15]. When the DSA problem is posed as an analysis and a reanalysis combined with a nite dierence calculation, the two linear systems are nearby and an iterative solver such as GMRES can be utilized to advantage. These nearby systems have the characteristic that both A and b change in the system Ax = b. 4.3. DSA using GMRES. By continuity, if the shape of the domain in a Boundary Element Analysis (BEA) is perturbed slightly, the response of the baseline model should be a good initial guess for the solution vector of a subsequent analysis. Similarly, the preconditioner and its factors computed during the baseline analysis may be reusable. This is especially eective when the cost of formation and factorization of the preconditioner is signi cant as in the case of dense block-diagonal preconditioners as employed for multi-zone BEA models. The exploitation of the two economization features, namely the reuse of the baseline converged solution vector and the baseline preconditioner, has been shown to be useful in [15]. On the other hand, this paper exploits by-products of the Krylov process more fundamentally to accelerate the convergence of the reanalysis process. Assuming that the set of equations given by (1) has been solved using GMRES, it is now required to solve (14) A~x~ = b~; where A~ and b~ are \near" A and b, respectively. The information obtained in the solution of (1) can be exploited by solving the modi ed problem as a projection on the Krylov subspace spanned by V corresponding to the original problem. Speci cally, if A~ = A + A, then we can write (15) H y~ = p , V T AV y~; where (16) p = V T ~r0; and (17) ~r0 = b~ , Ax~0 ; x~0 being an initial guess to the modi ed problem. The solution of the modi ed problem is given by (18) ~x = x~0 + V y~: By using the QR factorization of H formed during the solution of the baseline model, (15) can be solved by a sequence of Givens rotations and a back solve as (19) Ry~(k+1) = Q(p , V T AV y~(k)): If A = 0, i.e. only b changes, this equation can be solved in one outer iteration and this case was treated in detail in the previous sections. It is important to realize 11

that this approach gives an acceptably accurate solution to the perturbed model only when the solution of the perturbed model almost lies in the Krylov subspace of the unperturbed model. Otherwise it should be thought of as an approach for determining a good initial iterate for the perturbed model and GMRES must be restarted. After the baseline analysis and the reanalysis are performed in this way, design sensitivities can be computed using a nite dierence method. It is straightforward to modify the above approach when a preconditioner is used assuming that the preconditioner of the baseline model is used for the perturbed model also. It can be seen that the projection process shown above takes about n2 + 2nk operations where n is the number of equations and k is the number of iterations taken by GMRES for the convergence of the baseline set of equations. 5. Numerical Examples. We present some examples representing the thermal and elastic phenomena discussed in Section 4. In each example described below a relative residual convergence tolerance of 10,7 has been used as the GMRES termination condition. The execution times are measured on a SparcstationSLC under optimized f77 in double precision. The discretizations are all performed by a singular collocation boundary element technique and a right diagonal preconditioner is employed in all cases except as otherwise stated.

5.1. Multiple Right-Hand Sides. 5.1.1. Rectangular Bar - Elastoplasticity Problem. A nonlinear example

in elastoplasticity was solved using the approach in Section 3. A rectangular bar of aspect ratio 8 : 3 : 3 was modeled using 114 quadrilateral quadratic 8-noded boundary elements giving 344 nodes, at each of which is de ned three degrees of freedom. The bar was held on rollers along one face and loaded uniformly along the other face in one of the directions; see Figure 2 (from [24]). The loading was such as to cause complete plastic strain hardening in the entire model. See [24], where this problem is solved by a load increment and iteration method, for details. Each of the load increments requires solving iteratively with only the right-hand side changing in each iteration. The results for a single one load increment are presented here. This dense problem of size 1032 was run for the rst load increment through 150 nonlinear iterations which brought the nonlinear residual norm down by over two orders of magnitude 58.5 to 0.365. The third of these 150 right-hand sides was suciently dierent from the rst two that it was expedient to solve this right-hand side afresh, that is to say, to throw away the accumulated Krylov space and restart. The iteration history of GMRES was 18; 1; 31; 1; | 1; ..{z.; 1; 1} 147 times

and the execution times are shown in Table 1. Altogether 49 fresh iterations, 148 restarted iterations, and 148 accompanying projections were required. This means that 345 matrix accesses, each of O(n2) complexity, were performed. The maximum relative componentwise error of the nal solution (which consists of both displacements and derivative tractions) relative to the direct solution was less than 0:005. Greater accuracy is wasted, since at this stage, a new nonlinear load step is computed, and several such load steps may be required in practice. The convergence and complexity results of this section are displayed in Figure 3. The gure compares a complexity of n3 =3 + kn2 for the direct method and an asymptotic complexity of 2kn2 for GMRES, where k is the total number of right-hand 12

Fig. 2. Rectangular Bar - Elastoplasticity Problem

35

1600

30

1400

1200 Execution Time, seconds

Number of Iterations

25

20

15

10

1000

800

600

400 5 200 0 0

50

100

150

Right-hand side number

0 0

20

40

60 80 100 Right-hand side Number

Fig. 3. (a) Iteration History of GMRES and (b) Comparison of Execution times with Direct and GMRES methods, for the Elastoplasticity Problem

Table 1

Iterations and Execution time for the Elastoplasticity problem

Method Outer Iter. Inner Iter. Fact. Time Soln. Time Total Time Direct 150 1062.8 466.4 1529.2 GMRES 150 345 0.0 967.1 967.1 13

120

140

160

Fig. 4. Pressurized Cylinder - DSA

side systems to be solved, most of which require only a projection and an additional iteration. The direct method complexity begins at a large oset and is afterwards strictly linear with respect to the number of systems, whereas GMRES begins with much smaller oset and displays asymptotically linear behavior with slightly more than twice the slope. For this problem, the two approaches seem to intersect at approximately n=4. It is important to note, however, that after n systems, in the presence of exact arithmetic, the Krylov space spans the solution space, and a simple projection will be sucient. After this point, the two curves will run parallel to each other. As noted above, the Krylov space was ushed for the third right-hand side in the sequence of 150. If a fresh start is not made for the third right-hand side, and the algorithm of Figure 1 used for NR = 150 instead of NR = 2 followed by NR = 148 the performance is inferior, requiring a total of 445 iterations, each of O(n2 ) complexity,with a total time of 1341.8 s. There is also a loss in accuracy of the nal solution, in spite of the apparently good convergence in the residual norm estimator gk . This loss of accuracy correlates with loss of orthogonality of the Krylov vectors, a phenomenon that awaits further investigation.

5.2. DSA Examples. 5.2.1. Pressurized Cylinder. This example demonstrates the application of

GMRES to solve a nearby problem via a projection and restart process and to compute the sensitivities of unknown displacements and tractions with respect to a geometrical 14

Table 2

GMRES Iteration Counts for Pressurized Cylinder - DSA

Model Preconditioner Baseline Reanalysis 1-Zone Diagonal 26 10 2-Zone Diagonal 48 13 2-Zone Zonal-Block Diagonal 16 8 design variable through a nite dierence scheme. In this example, A is nonzero. The physical problem is a hollow circular cylinder subjected to internal pressure. Due to symmetry, only one quarter of this cylinder of 1:3 inner to outer radius ratio was modeled, using 6-noded quadratic triangular boundary elements. Faces x = 0 and y = 0 of the cylinder quadrant are under rollers, the inner circular face is subjected to a uniform pressure, and the outer circular face is free of traction. Figure 4 (from [11]) shows the single-zone boundary element model consisting of 462 degrees of freedom, and the geometric sensitivities required to build the nearby model. A twozone discretization was also constructed from the same physical model by creating an arti cial concentric interface at the junction of rst and second rings of elements. For the single-zone model, the design variable was inner radius whereas for the two-zone model, it was outer radius. The results for this example are shown in Table 2. In each case, the perturbation is 0.00025 of the inner radius. 5.2.2. Sphere with Convection Boundary Conditions. As a third example, we consider a conducting sphere, one octant being modeled due to symmetry, subjected to a convective boundary condition across its outer radius. The convection coecient is given by h(T ) = 5 + 0:025T + 0:00005T 2 where T is supplied in dimensional form between the 276 6-noded quadratic triangular boundary elements with 554 nodes. The ambient exterior temperature was 200 and the inner radius was maintained at 800 . The model is shown in Figure 5 (from [24]). With both A and b changing, number of nonlinear iterations was 8. In this case, using a direct solver, A was factored in every iteration, and using GMRES, each outer nonlinear iteration after the second one made use of an initial guess equal to the converged solution of the previous iteration. The direct method took a total of 383.2 s for factorization and 4.6 s for all backsolves. GMRES without the use of a projected initial guess required respectively 19, 20, 20, 20, 20, 20, 20, 20 iterations in the eight outer nonlinear iterations and a total time of 63.5 s, a savings of a factor of 6. GMRES with use of a projected initial guess and restarts required 19, 21, 22, 18, 14, 10, 6, 3 iterations in 8 respective nonlinear iterations and a total time of 46.2 s, an additional savings factor of about 1.4. 6. Practical Limitations. The approach of Section 3 can be de ned for any sequential set of right-hand sides even if it is not economical. In practice, it has been observed that convergence may be slow (slower even than making a cold start with no reuse of the previous Krylov space) if the right-hand sides are very dierent from each other. Our experience, and that of [4, 5] in the symmetric positive de nite case, also indicates that reuse of Krylov subspaces is not particularly eective when a subsequent right-hand side is \unrelated" to the previous in the sense that a large component of its initial residual lies outside of the original Krylov space. For instance, a Krylov space generated to solve an elasticity problem with a compressive load is of 15

Fig. 5. Sphere { Convection boundary conditions

little value in reducing the solution cost of the same problem subject to a shear load. It is often the case, in contexts in which two or more very disparate load responses are required, that they will be available a priori, in the form of a catalog of modes. In this case, a generalization of block Lanczos would be more appropriate than the present generalization of modi ed Lanczos. A single block iteration may still be superior to independent serial iterations in this instance, due (at least) to amortization of computational and data access overheads, and (at best) to convergence synergism [16]. However, it is argued in [25] that when a design process is proceeding eciently, modes sequentially generated by the outer design loop will, in fact, be substantially independent. We would like to be able to decide a priori when it is required to throw away the accumulated Krylov vectors and do a cold start. If we attempt to reuse an irrelevant Krylov space, and stop based on the by-product residual, there may be large pointwise errors in the solution. This may be due to accumulated round-o errors and observed loss of orthogonality in the large Krylov spaces. It may be possible to develop a criterion for discarding the previous Krylov space based on the norms of the residuals of the new system before and after the initial projection process. 7. Conclusions. A formulation of the Generalized Minimal Residual method for the solution of sequentially multiple nonsymmetric linear systems has been presented, and demonstrated in examples involving dense matrices to be more eective than the same multiple systems solved independently with GMRES. The formulation extends earlier work on the Lanczos method by reusing Krylov subspaces. In the limit of large numbers of right-hand sides it is asymptotically no less eective than backsolves with direct factorizations. An approach is also presented for economic evaluation of design sensitivities using nite dierences, with GMRES as the reanalysis engine. It is illustrated, for perhaps the rst time, that Krylov solvers can be competitive with direct methods in the design sensitivity analysis domain of boundary element methods. Before the present formulation can be routinely recommended, further theoretical 16

or heuristic research on practical measures of the \distance" between multiple systems of equations is required. However, in light of recent experience, Krylov methods should be reexamined for many design or transient applications for which they have been previously been assumed noncompetitive. Acknowledgment. The authors are grateful to Dr. Hua Wang for the use of her boundary element code for the elastoplasticity example. REFERENCES [1] M. S. Casale and J. E. Bobrow, The Analysis of Solids Without Mesh Generation using Trimmed Patch Boundary Elements, Engineering with Computers, 5 (1989), pp. 249{257. [2] , Ecient Analysis of Complex Solids using Adaptive Trimmed Patch Boundary Elements, in Proceedings of the ISBEM90, Springer Verlag, NY, 1991. [3] A. Edelman, Large Dense Numerical Linear Algebra in 1993: The Parallel Computing In uence, Int. J. Supercomput. Applics., 7 (1993), pp. 113{128. [4] C. Farhat, L. Crivelli, and F. X. Roux, Extending Substructure Based Iterative Solvers to Multiple Load and Repeated Analyses, Technical Report, Center for Space Structures and Controls, (1993). [5] P. F. Fischer, Projection Techniques for Iterative Solution of Ax = b with Successive Righthand sides, ICASE TR 93-90, also SIAM J. Sci. Computing, (1994 (submitted)). [6] R. W. Freund, G. H. Golub, and N. M. Nachtigal, Iterative Solution of Linear Systems, Acta Numerica, (1991), pp. 57{100. [7] J. H. Kane, D. E. Keyes, and K. G. Prasad, Iterative Equation Solution Techniques in Boundary Element Analysis, Int. J. of Numer. Meths. Eng., 31 (1991), pp. 1511{1536. [8] J. H. Kane, B. L. K. Kumar, and R. H. Gallagher, Boundary Element Iterative Reanalysis for Continuum Structures, ASCE J. Eng. Mech., 116 (1990), pp. 2293{2309. [9] J. H. Kane and K. G. Prasad, Boundary Formulations for Sensitivity Analysis Without Matrix Derivatives, AIAA Journal, 31 (1993), pp. 1731{1734. [10] J. H. Kane and S. Saigal, Design Sensitivity of Solids using BEM, ASCE J. Eng. Mech., 114 (1988), pp. 1703{1722. [11] J. H. Kane, G. Zhao, H. Wang, and K. G. Prasad, Boundary Formulations for Three Dimensional Continuum Structural Shape Sensitivity Analysis, ASME J. Appl. Mech., 59 (1992), pp. 827{834. [12] F. T. Korsmeyer, D. K. P. Yue, K. Nabors, and J. White, Multipole-Accelerated Preconditioned Iterative Methods for Three-Dimensional Potential Problems, in Proceedings of the Fifteenth Boundary Element International Conference, Worcester Polytechnic Institute, 1993, Elsevier, London. [13] D. O'Leary, The Block Conjugate Gradient Algorithm and Related Methods, Lin. Alg. Applics., 29 (1980), pp. 243{322. [14] B. N. Parlett, A New look at the Lanczos Algorithm for Solving Symmetric Systems of Linear Equations, Lin. Alg. Applics., 29 (1980), pp. 323{346. [15] K. G. Prasad and J. H. Kane, Shape Reanalysis and Sensitivities Utilizing Preconditioned Iterative Boundary Solvers, Structural Optimization, 4 (1992), pp. 224{235. [16] Y. Saad, On the Lanczos Method for Solving Symmetric Linear Systems with Several Right Hand Sides, Math. Comp., 178 (1987), pp. 651{662. [17] Y. Saad and A. Sameh, A Parallel Block Stiefel Method for Solving Positive De nite Systems, in Proceedings of Elliptic Problem Solver Conference, Los Alamos Scienti c Laboratory, 1981, Academic Press, NY, pp. 405{412. , Iterative Methods for the Solution of Elliptic Dierential Equations on Multiprocesors, [18] in Proceedings of CONPAR 81 Conference, Wolfgang Handler, ed., 1981, Springer Verlag, NY, pp. 395{411. [19] Y. Saad and M. H. Schultz, Conjugate Gradient-like Algorithms for Solving Nonsymmetric Linear Systems, Math. Comp., 44 (1985), pp. 417{424. [20] , GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Computing, 7 (1986), pp. 856{869. [21] S. Saigal, J. H. Kane, and R. Aithal, Semi-analytical Structural Sensitivity Formulation using Boundary Elements, AIAA Journal, 27 (1989), pp. 1615{1621. 17

[22] H. A. Van der Vorst, An Iterative Solution Method for Solving f (A)x = b using Krylov subspace information obtained for the Symmetric Positive De nite Matrix A, J. Comp. Appl. Math., 18 (1987), pp. 249{263. [23] S. A. Vavasis, Preconditioning for Boundary Integral Equations, SIAM J. Matrix Applics., 13 (1992), pp. 905{925. [24] H. Wang, Boundary Formulations for Nonlinear Design Sensitivity Analysis, Ph.D. Thesis, Mechanical and Aeronautical Department, Clarkson University, NY, (1992). [25] D. P. Young, W. P. Huffman, M. B. Bieterman, R. G. Melvin, F. T. Johnson, C. L. Hilmes, and A. R. Dusto, Issues in Design Optimization Methodology, Boeing Computer Services Tech. Rpt., BCSTECH-94-007 (1994).

18