Fast Iterative Solution of Integral Equations With Method of Moments ...

2314

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 56, NO. 8, AUGUST 2008

Fast Iterative Solution of Integral Equations With Method of Moments and Matrix Decomposition Algorithm – Singular Value Decomposition Juan M. Rius, Member, IEEE, Josep Parrón, Alexander Heldring, José M. Tamayo, and Eduard Ubeda

Abstract—The multilevel matrix decomposition algorithm (MLMDA) was originally developed by Michielsen and Boag for 2-D TMz scattering problems and later implemented in 3-D by Rius et al. The 3-D MLMDA was particularly efficient and accurate for piece-wise planar objects such as printed antennas. However, for arbitrary 3-D problems it was not as efficient as the multilevel fast multipole algorithm (MLFMA) and the matrix compression error was too large for practical applications. This paper will introduce some improvements in 3-D MLMDA, like new placement of equivalent functions and SVD postcompression. The first is crucial to have a matrix compression error that converges to zero as the compressed matrix size increases. As a result, the new MDA-SVD algorithm is comparable with the MLFMA and the adaptive cross approximation (ACA) in terms of computation time and memory requirements. Remarkably, in high-accuracy computations the MDA-SVD approach obtains a matrix compression error one order of magnitude smaller than ACA or MLFMA in less computation time. Like the ACA, the MDA-SVD algorithm can be implemented on top of an existing MoM code with most commonly used Green’s functions, but the MDA-SVD is much more efficient in the analysis of planar or piece-wise planar objects, like printed antennas. Index Terms—Fast integral equation methods, method of moments (MoM), multilayer Green’s function, printed antennas.

I. INTRODUCTION

I

N RECENT years, a wide range of fast methods [1] have been developed for accelerating the iterative solution of the electromagnetic integral equations [2] discretized by method of moments (MoM) [3]. Most of them are based on multilevel subdomain decomposition and require a computational cost of or . One of these methods is the multiorder level fast multipole algorithm (MLFMA) [4]. The MLFMA has been widely used in the last years to solve very large electroManuscript received June 29, 2007; revised November 19, 2007. Published August 6, 2008 (projected). This work was supported in part by the Spanish “Comisión Interministerial de Ciencia y Tecnología” (CICYT) through the “Ramón y Cajal” Programme, by projects TEC2006-13248-C04-02/TCM, TEC2006-13248-C04-01/TCM, and TEC2007-66698-C04-01/TCM, and by the“Ministerio de Educación y Ciencia” through the FPU fellowship program. J. M. Rius, A. Heldring, J. M. Tamayo, and E. Ubeda are with the Antenna Lab, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona 08034, Spain (e-mail: [email protected]). J. Parrón is with the Antenna and Microwave Systems Group, Department of Telecommunication and System Engineering, Universitat Autònoma de Barcelona, Barcelona 08007, Spain (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAP.2008.926762

magnetic problems [5], [6] due to its excellent computational efficiency. method is the multilevel matrix Another fast decomposition algorithm (MLMDA) [7]. When the MLMDA was published in 1997, we found it simple, easy to implement and with the key advantage that it can be programmed on top of any MoM discretization with subdomain basis functions much smaller than the wavelength and with most of the commonly used Green’s functions. For that reason, we applied the MLMDA to 3–D problems and found that it was particularly efficient and accurate for piece-wise planar objects [8]. The application of MLMDA to the analysis of printed antennas [9] lead to excellent results: for example, a 16 16 microstrip patch array of size with is analyzed—without using symmetries—in less than 5 min with an error of 0.1% in the induced current and 0.06% in the input impedance compared to conventional MoM. However, when [8] was published: (i) MLMDA was much less efficient for general 3–D problems than MLFMA, needing more computation time and storage memory, and (ii) the MLMDA matrix compression error was unacceptable for practical applications and, even worse, it did not converge to zero as the compressed matrix size increased. The adaptive cross approximation method (ACA) was developed in 2000 by Bebendorf [10] and since then widely used to solve large magnetostatic problems [11]. Recently, the ACA has been applied also to antenna radiation and RCS computation [12], [13]. In contrast with MLFMA, the ACA algorithm is purely algebraic and, therefore, does not depend on the problem Green’s function. The aim of this paper is to present a new and improved formulation of MLMDA for general 3-D problems in order to fix the efficiency and accuracy issues in [8]. The resulting algorithm, MDA-SVD, is comparable with the MLFMA and the ACA in terms of computation time and memory requirements. II. FUNDAMENTALS OF FAST MULTILEVEL SOLVERS Although we limit our attention here to the electric field integral equation (EFIE) in the frequency domain, the procedure can be also applied to the MFIE. The EFIE discretized by MoM may be expressed in matrix form as [2], [3] (1) where are the coefficients of the induced current expanded in is the Rao, Wilton, and Glisson (RWG) basis functions [14],

0018-926X/$25.00 © 2008 IEEE

RIUS et al.: FAST ITERATIVE SOLUTION OF INTEGRAL EQUATIONS

2315

impedance matrix and is the discretization of the incident field. Iterative methods—like GMRES [15]—are used to solve (1) for the induced current coefficients when the number of unknowns is very large because, for a single excitation vector, they require a much smaller computational effort than the direct methods. In each iteration, the main computational burden to is in the obtain the ( th) estimation of the induced current matrix-vector product . Using direct matrix-vector multiplication, the operation count and the memory require, if is large. ments for each iteration are proportional to A. Domain Decomposition The huge operation count in the direct matrix-vector multiply can be reduced by dividing the object into many nonoverlapping subdomains. If a pair of subdomains is considered, denoted as source and observation subdomains, the field due to the RWG basis functions at the source subdomain tested by the RWG weighting functions at the observation subdomain can be computed as (2) where and stand for the indices of the RWG basis functions [14] in, respectively, the source and observation subdomains and is a submatrix of the impedance matrix. and denote the number of original RWG functions in the source and observation boxes, respectively. If all possible pairs of subdoin mains are considered, the matrix-vector multiply (1) may be obtained as the addition of submatrix operations of the form (2). B. Degrees of Freedom of the Electromagnetic Field When the source and observation subdomains are not due to the source tested at touching each other, the field the observation subdomain has a reduced number of degrees of freedom [1], [16] much smaller than the number of elements . Therefore, can be computed in much less in vector operations. The smaller the subdomains are or the than further apart they are, the larger is the saving in the computation . of According to [7], the number of degrees of freedom of the is the rank of the submatrix in (2), and should field submatrix is agree with the theory in [16]. The rank of the equal to the number of nonzero singular values. Due to numer, we consider ical integration errors in the computation of negligible all the singular values below a given threshold. correFig. 1 shows the (sorted) singular values of sponding to the interaction between two cubes of side length at a distance of between centers (solid line), with an average . The largest singular value has been normalmesh size of ized to one. It can be noticed that the submatrix rank is much smaller than the order. In this case, only 108 singular values are , while the order of the submatrix is 1152. larger than It is important to point out that the elements of the impedance matrix must be computed with enough accuracy. Fig. 1 also shows the singular values of the same submatrix when the

Fig. 1. Normalized singular values of an impedance submatrix which represents the interaction between two cubes of side length 1 at a distance of 2 between centers. The RWG mesh size is =7. The set of curves shows the results corresponding to different quadrature rules in the computation of the [Z ] source and field integrals.

source and field integrals in have been computed with different quadrature rules. With a low accuracy 1-point numerical integration rule (dash-dot line), 161 singular values are larger threshold. This value is reduced to 120 with a than the 4-point rule and finally to 108 with 9 or 19 points quadrature. , the If for the same objects we use a finer mesh of size submatrix order is now equal to 4608, the number of singular is 93 for the 4-point quadrature rule values larger than and 123 for the lower accuracy 1-point rule. We will show later in Section III-B that the quadrature rule order plays an important role in the accuracy of the MDA-SVD compressed matrix approximation. yields The singular value decomposition (SVD) of

(3) where and are orthonormal matrices and is a diagonal matrix whose elements are the nonnegligible singular . values of When is a rank-deficient matrix, , and corresponding in (3) we can discard the columns of to negligible singular values. The result is a compressed matrix representation, where the storage requirement to store is . C. Multilevel Subdivision of the Object It is, thus, convenient to subdivide the object into a set of boxes, as large as possible, such that most of them are not touching one another. In Fig. 2 the box enclosing the object is subdivided into smaller boxes at multiple levels, in the form of an octal tree. The largest boxes not touching each other are at level 2, while the smallest boxes are at level . Hence, we will introduce a recursive procedure which will begin at level 2 and will stop at the finest level . For a couple of nonempty source and observation boxes which belong to the ), two cases are possible: same subdivision level (

2316


Fig. 2. Multilevel subdivision of the box enclosing the object.

• The boxes are touching one another or are the same (diagonal case): They are then subdivided into level boxes, except if we have already reached the finest level, . If this is the case, the interaction between touching boxes is obtained by direct submatrix-vector multiplication (2), which requires the computation of the corresponding . impedance matrix terms • The boxes are nontouching each other: then the product (2) is computed very efficiently by a matrix compression algorithm such as MLFMA, MLMDA, ACA, or MDA-SVD. III. MDA A. The Algorithm According to [7] and [8], interaction between two boxes at the same level not touching each other can be computed efficiently in the following way: Equivalent RWG sources are defined for each box as illustrated in Fig. 3. In general they must be located in the boundary of the box, but if all the original RWG basis functions in the box are contained in a plane, the equivalent sources may be located in the boundary of the rectangle resulting from the intersection between the box and the plane, as shown in Fig. 4. It must be noticed that in Fig. 3 the equivalent sources are grouped in clusters of three RWG: two of them are parallel to the box boundary, while the third is perpendicular. Perpendicular currents must be included so that equivalent sources have the necessary degrees of freedom to radiate all possible field polarizations. This agrees with equivalence theorem, in which magnetic currents tangent to the boundary radiate the perpendicular field component. In the old MLMDA 3–D formulation [8], the perpendicular equivalent sources were missing. As a result, the matrix compression error did not converge to zero as the number of equivalent sources (and, therefore, the size of the compressed matrices) increased. In general, 3-D cases the required number of equivalent sources was too large—resulting in very long computation times—and the compression error unacceptable for practical applications. These issues have been fixed with the new formulation presented here. With regard to the distribution of equivalent sources along the box surface, the

Fig. 3. Equivalent RWG sources defined for a box. They are evenly distributed with respect to the vertices, edges and faces of the box; and clustered in groups of three RWG that have the current flowing in three independent directions.

Fig. 4. If all the original RWG basis in the box are contained in a plane, the equivalent RWG sources are necessary only at the vertices and edges of the rectangle resulting from the intersection between the box and the plane.

particular choice depicted in Fig. 3 has no obvious advantages over other possible choices, provided that equivalent sources are evenly distributed. Let us apply (2) to the original RWG basis and testing functions at the source and observation boxes, respectively, denoted in our notation. Similarly, let and refer to the by and indices of the equivalent RWG functions at the source and oband original servation boxes, respectively. There are


2317

RWG functions in the source and observation boxes, respectively, while the number of equivalent RWG basis is in both of them. , the floating In the computation of (2), point operation count in the direct submatrix-vector multiply is . Alternatively, one can find the current coefficients at the equivalent RWG in the source box that produce the same in the equivalent RWG at the field as the original sources observation box (indices )

(4) At the original RWG in the observation box (indices ), the produce the same field as the original equivalent sources sources

Fig. 5. Relative error in the MDA-SVD computation of the electric field for two cubes of side length 1 at a distance of 2 between centers. The RWG mesh size is =7. The set of curves shows the results corresponding to different quadrature rules in the computation of the [Z ] source and field integrals. is the SVD threshold, to be introduced in Section III-E.

(5) The operation count is now proportional to

(6) . If the number of equivalent RWG is small, where and , this alternative approach is much faster than the direct submatrix-vector product (2).

TABLE I CPU TIME, STORAGE MEMORY AND RELATIVE ERROR FOR ANALYZING A 1:65 RADIUS SPHERE DISCRETIZED INTO N = 49;000 RWG BASIS FUNCTIONS. THE “TIME” COLUMN SHOWS THE TIME REQUIRED BY MDA-SVD, MLFMA AND ACA TO CREATE THE MULTILEVEL TREE, PREPARE THE COMPRESSED MATRIX REPRESENTATION AND SOLVE THE SYSTEM OF EQUATIONS. THE “TOTAL TIME” COLUMN SHOWS THE TOTAL SOLUTION TIME INCLUDING THE TIME TO SETUP THE PRECONDITIONER MATRIX AND COMPUTE THE INCOMPLETE LU (ILU) OF THE PRECONDITIONER [19] (5 MIN. 31 S)

B. Number of Equivalent Sources The relative error in the far-field interactions computed with MDA (5) depends obviously on the number of equivalent sources . Fig. 5 shows the relative error in the MDA-SVD computation of the electric field for two cubes of side length at a distance of between centers. The mesh is rather coarse in order to test the performance of MDA far from the low-frequency or quasi-static cases. It can be observed that, as the number of equivalent sources increases, the error converges to a residual. This residual error is smaller for higher integration and, for the highest order accuracy in the computation of quadrature rule, seems to converge to zero with an increasing number of equivalent sources. The most significant contribution in the computation of in (1) comes from small source and observation boxes that are touching one another or are the same box (near-field boxes). This contribution is computed by direct submatrix-vector products (2) (see Section II-C). Consequently, the contribution of MDA algorithm to the overall computation result is small. We can take advantage of this fact to use a smaller number of equivalent sources in MDA, which increases the error in this algorithm but does not degrade significantly the error in the overall computation. The minimum number of equivalent sources to obtain an accurate computation with (4) and (5) has been theoretically studied in [16]. In our numerical experience, a relative error in

the induced current of the order of, respectively, 1% or 0.1% results from (7) if the equivalent RWG are distributed on the boundary of a box, Fig. 3, or

(8) if the equivalent RWG are distributed on the boundary of a rectangle, Fig. 4. In (7) and (8) is the wavenumber, and are the size of the source and observation boxes, respectively, and is the distance between the centers of the boxes. Clearly, the algorithm is more efficient and accurate in the latter case (Fig. 4). In Fig. 5, the value corresponding to (7) and 1-point integration rule is marked with a solid star, while the value corresponding to the high-accuracy computation in Table IV—using a 4-points rule—is marked with a solid circle. It must be remarked that the number of equivalent functions obtained through (7) and (8) is significantly larger than the theoretical number of degrees of freedom in [16]. Although the given by (7) and (8) already lead to an efficient values of MDA implementation, the size of the compressed matrices can

2318


TABLE II CPU TIME, STORAGE MEMORY AND RELATIVE ERROR FOR ANALYZING A 15:9 SQUARE PLATE DISCRETIZED INTO N = 96;840 RWG BASIS FUNCTIONS. THE ILU OF THE PRECONDITIONER NEEDED 2 MIN. 26 S

TABLE III CPU TIME, STORAGE MEMORY AND RELATIVE ERROR FOR SOLVING THE X-BAND HORN PROBLEM WITH N = 68;904 BASIS FUNCTIONS. FOUR-FOLD SYMMETRIES HAVE BEEN USED IN THE TOP ROWS, AND NO SYMMETRIES IN THE BOTTOM ONES. THE ILU OF THE PRECONDITIONER NEEDED 1 MIN. 41 S. USING SYMMETRIES AND 6 MIN. 8 S. WITHOUT SYMMETRIES Fig. 6. Total time, including equivalent sources setup, to compute the interaction between two cubes of side length L, located at a distance of 2L between centers. The RWG mesh size is =10.

TABLE IV CPU TIME, STORAGE MEMORY AND RELATIVE ERROR FOR SOLVING THE SAME X-BAND HORN THAN III, NOW WITH HIGH-ACCURACY SIMULATION PARAMETERS

be further reduced by adding a SVD postcompression step (see Section III-E).

of far-field boxes, the MDA algorithm complexity scales as while MLMDA scales as , only slightly faster than the number of equivalent sources . Therefore, the cost of setting up equivalent sources in 3-D is negligible for MDA algorithm but is not for MLMDA. This issue is addressed in the next section. Fig. 6 shows the total time to compute the interaction between two cubes separated a distance between centers equal to twice the cube side length. In this case, the crossover point at which MLMDA becomes more efficient than MDA-SVD or MDA is, and . The number of basis functions in respectively, the field and source boxes at the crosover points is, approximately, 15 000 for MDA-SVD and 18 000 for MDA. D. Optimum Implementation

C. MLMDA In the analysis of electrically large objects, boxes in the coarser levels that are not touching each other have or and , which from (7) results in proportional to , that is, proportional to . The operation count for computing the interaction between two large boxes is, therefore, proportional to . For by a new mulvery large boxes, it can be reduced to tilevel decomposition of the source and observation boxes, that constant. ensures minimum by maintaining the product This algorithm, that is called multilevel MDA (MLMDA) and resembles an FFT, has been described in detail in [7]. In 2-D problems, the time required to setup the equivalent sources is negligible compared to the MLMDA algorithm computation time. However, in 3-D the number of equivalent is larger than in 2-D and the equivalent sources sources are represented by 3-D basis functions (e.g., RWG) instead of point sources. This makes the computational effort to setup equivalent sources in 3-D much larger than in 2-D. Given basis functions a pair of far field boxes each containing and separated a distance proportional to its size, in MDA the number of equivalent sources is proportional to , while and has a proportionality constant in MLMDA it scales as one order of magnitude larger than in MDA. For the same pair

When the size of far-field boxes is not very large, the computational overhead in setting up the MLMDA equivalent 3-D RWG sources is significant and makes MLMDA slower than straightforward MDA (see Fig. 6). Also, for small far-field boxes or boxes containing a few basis functions, the MDA algorithm may be slower than direct matrix-vector multiplication. In order to optimize the computation time, for each pair of far-field boxes at the same level we estimate the operation count for the three algorithms and use the fastest one. In practice, for the problem sizes up to a few hundreds of thousands of unknowns that we have solved here, direct submatrix-vector product is the fastest for small boxes and MDA is the fastest for medium and large-size boxes, while MLMDA is always slower. According to Fig. 6, MLMDA would be faster than MDA-SVD for boxes having more than, approximately, 15 000 basis functions. Since the larger boxes are located at level-1 and there are 64 level-1 boxes, the total number of basis functions in the object would be of the order of 1 million. We have also found heuristically that in our implementation, , the which uses a discretization mesh of edge size about is of the order of and optimum size of the finest level depends on the geometry of the problem. This optimum size only affects the computational speed and memory requirements, but not the error in the MDA computation.


2319

E. SVD Postcompression

After the QR decomposition, (12) becomes

As shown in Section III-B, the necessary number of equivalent sources to achieve a small error in the MDA is larger than submatrix. For that reason, instead of the rank of the the MDA [(4) and (5)]

(15)

(9)

To obtain a decomposition of with a diagonal matrix in the middle and orthogonal matrices at the extremes, we need a new SVD of the inner matrices

it would be desirable to use a singular value decomposition (3) (SVD) of

(16)

(10) and are orthonormal matrices and is a diwhere agonal matrix whose elements are the nonnegligible singular . values of Since the number of significant singular values (order ) is smaller than the number of equivalent sources of (order of ), , the storage requirement for the is smaller than for the MDA ones SVD matrices

that gives the final result (17) and are orthonormal and is Now diagonal. Since the SVD is unique, except for row or column permutations, we have obtained in a very efficient way the SVD (10) with of the large (18) (19) (20)

(11) is On the other hand, the cost of computing the SVD of much larger that the cost of the MDA and makes the straightforward SVD approach (10) unfeasible. Therefore, the new approach proposed here is to postcomobtained after press with SVD the already small matrix MDA. The result will need storage requirements as small as (10), but with only a with the straightforward SVD of small computational overhead above the simple MDA (9). SVD postcompression has previously been applied to the fast multipole method (FMM) for nonoscillatory kernels [17]. gives After MDA (9), the SVD of

IV. RESULTS In this section, the MDA-SVD performance and results are compared to those of MLFMA and ACA. Since in approximate matrix compression methods there is a tradeoff between the approximation error and computational efficiency, in all cases we show the relative error in the induced current versus the MoM iterative solution

(21) (12) where, as in (10), columns and rows corresponding to negligible singular values are discarded. However, in (12) the matrices that left- and right-multiply , respectively, and , are not orthonormal, as in (10). As a consequence, the order of (number of significant singular values) is larger than the order of and we have not reached the maximum compression level yet. and The solution is to apply QR decomposition to matrices

(13) (14) where

are orthonormal and

upper triangular matrices.

denotes the root-mean-square norm. where The reference solution is computed using direct matrix-vector multiplication for both near- and far-field boxes. As a result, (21) gives the error introduced by the compressed matrices representation that approximate the impedance matrix in (1). In order to compute the matrix-vector product without farfield submatrix compression, symmetries have been used to reduce the size of the linear system matrix and, when necessary, near- or far-field submatrices that did not fit in the available computer memory were stored in hard disk. All the computations have been done in a PC workstation with an Opteron CPU at 2.2 GHz and 20 GB of RAM. The operative system is Debian Linux. The MDA-SVD, MLFMA, and ACA algorithms have been programmed in MATLAB language, while the routine to compute mutual impedances between basis and testing functions is coded in C language. In all cases, the object geometry has been discretized using RWG basis functions [14]. Although this is not the optimum

2320


choice for some of the test cases presented here, the aim of this section is studying the computational requirements and accuracy of the MDA-SVD for a large number of unknowns, even when the best basis function set for each test case is not used. In most of the results shown, MDA-SVD and ACA parameters have been adjusted to obtain an error of the same order as MLFMA with precision parameters adequate for fast computation of far-field quantities [18]: number of multipoles with and fourth-degree interpolation of plane waves. Using this parameters, the relative error in the ). far field is of the order of 1% ( A. MDA-SVD Compared With MLFMA and ACA In this subsection, MDA-SVD is compared with our implementations of ACA and MLFMA. The latter is quite efficient, since it has been able to analyze the TARA reflector antenna [18] in 2 h 42 min. 1) Sphere: The first test case has been chosen as a worstcase scenario for MDA versus the other methods. In order to avoid subdomain decomposition boxes in which MDA equivalent functions are located in the boundary of a plane, as in Fig. 4, the object geometry has no planar surfaces. Table I shows the CPU time, storage memory and relative error in the induced current computation for the three methods. and the number of unknowns is The sphere radius is . The MoM reference solution with direct matrix-vector multiplication has been computed using the eightfold symmetry of the sphere, while the fast methods do not use symmetries in order to solve the largest linear system. The parameters that control the tradeoff between accuracy and computational efficiency have been adjusted to obtain a far-field relative error of the order of 1%: in MLFMA, the FMM [18], while in ACA the error precision parameter is threshold to stop adding columns and rows to the compressed . In MDA without SVD postcompression the matrices is number of equivalent currents given by (7) results in an induced current error of the order of 1%. To reach a 5% error after SVD postcompression, we have set a SVD drop threshold of relative to the largest singular value. The MDA-SVD method is the fastest, while MLFMA is the method that needs less memory storage. Although one expects ACA to be significantly faster than MDA-SVD because it needs less operations to reach the same matrix compression level (no generation of equivalent functions, no computation of mutual impedances between original and equivalent basis and testing functions, no SVD and QR operations, etc.), in practice ACA is slightly slower than MDA-SVD. The main reason is that in order to compute the interaction between a pair of source and field boxes, ACA must call the impedance submatrix computation subroutine every time a new row or column is generated in the compressed matrices [10]–[12]. On the contrary, MDA-SVD only calls the impedance submatrix computation subroutine three times, (9). This makes the use of cache memory in the computation of impedance matrix elements much more effective for MDA-SVD than for ACA. 2) Square Plate: The second test case is in principle the most favorable to MDA: a planar square plate allows MDA to use the

Fig. 7. CPU time versus number of unknowns for the computation of a matrix-vector product using MDA-SVD and MLFMA, the latter with precision parameter P = 2.

smaller number of equivalent functions (Fig. 4) for all pairs of far-field boxes. However, also MLFMA and ACA benefit from the smaller number of DoF associated to this structure, and the three methods obtained the solution in about half the computation time than in the sphere case, for about twice the number of (Table II). The side length is . unknowns, Using the same precision parameters as in the previous case, in MLFMA, threshold equal to in ACA namely in MDA-SVD, the and SVD drop threshold equal to error in the induced current decreased from about 6%–8% to 1%–2% for the three methods. It can be observed in Table II that the method that takes most advantage of the planar structure is indeed the MDA-SVD. Now the MDA-SVD needs significantly less computation time than ACA (MDA-SVD needed only 10% less than ACA for the sphere) and less storage memory than MLFMA (MDA-SVD needed 60% more than MLFMA for the sphere). B. Complexity Scaling With the Number of Unknowns It is difficult to predict the complexity scaling with the number of unknowns for the MDA-SVD algorithm. For that reason, we have conducted a series of simulations doubling several times the size of the object in order to compare the complexity scaling of MDA-SVD with that of MLFMA, which . is known to be of order Fig. 7 shows the CPU time versus the number of unknowns for the computation of a single matrix-vector product using MDA-SVD and MLFMA, the latter with precision parameter . It is remarkable that, while the MLFMA needs the same CPU time for a given regardless of the object shape –sphere or plate–, the MDA-SVD is much faster for the plate than for the sphere. Also, it is interesting to notice that, although the MDA-SVD is faster than the MLFMA for the values of shown in Fig. 7, the complexity scaling with the number of unknowns is slightly larger for the MDA-SVD than for the MLFMA. A least-squares linear regression in logarithmic scale for both curves results (sphere in a crossover point at


Fig. 8. Induced current in the surface of the X-band horn and radiation pattern. The simulation is compared with measurements and with the aperture theory approximation.

case). This applies only to the approximate computation of matrix-vector products. The complete solution includes also the multilevel tree and matrix setup time, which is much faster for MLFMA than for MDA-SVD in the case of the sphere and only slightly faster in the case of the plate. C. Antenna Analysis Benchmarks The antenna analysis benchmarks that we usually run in order to test the performance of our home developed algorithms are: • an X-band horn, as an example of perfectly conducting surfaces in 3–D; • and the 16 16 microstrip array defined in [20], as an example of planar multilayer structures. Additionally, we show the performance of MDA-SVD for a challenging problem: a 12,204 patches reflectarray [21] with almost half a million unknowns. 1) X-Band Horn: An X-band pyramidal horn has been disRWG basis functions with an average cretized into . The horn is fed by a rectangular monomode edge size of waveguide of length with an elementary dipole excitation at and the the center. The pyramidal section has a length of . aperture size is Table III shows the CPU time, the memory required to store the compressed matrices and the relative error in the induced current computation. MDA is compared with our implementation of ACA and MLFMA [18]. The FMM precision parameter , while in ACA the error in MLFMA [18] used here is threshold to stop adding columns and rows to the compressed . The simulated induced current and radiamatrices is tion pattern of the horn are shown in Fig. 8. Although the mesh is rather coarse, the simulation results agree well with the measurements. Since the amount of memory available in the computer (20 GB) is much larger than the amount required by MDA, no SVD postcompression has been applied. This results in faster matrix decomposition time, but the compressed matrices are larger and, therefore, the matrix-vector multiplication is slower. Since in this problem the number of iterations is not very large (22 with symmetries and 34 without symmetries), the overall computation time is slightly faster for the MDA alone than for the MDA with SVD postcompression. In this problem only the waveguide walls are planar surfaces parallel to the boundary of the domain subdivision boxes and,

2321

Fig. 9. Induced current in the patches and feeding lines of the16 crostrip array.

2 16 mi-

therefore, the MDA cannot take advantage of far-field boxes with sources located in a plane, as in Fig. 4. The computation time of the three algorithms is very similar. However, the MLFMA needs much less memory at the cost of a larger error in the computed induced current. 2) X-Band Horn With High Accuracy: In order to show the performance of MDA-SVD when a high-accuracy computation is required, we have increased the precision parameters of the three matrix compression algorithms until the relative error in the induced current was smaller than 0.1% or stagnated: The number of MDA equivalent sources given by (7) has been muland tiplied by 8/3, the ACA threshold has been lowered to .A the MLFMA precision parameter [18] increased to source and 4-point integration rule in the computation of field integrals has been used in order to reduce the effective rank of the impedance submatrix (see Fig. 1). As shown in Table IV, ACA and MLFMA failed to reach the required accuracy. The MDA-SVD algorithm was not only much more accurate, but also faster than ACA and MLFMA. 3) Microstrip Patch Array: The 16 16 microstrip array defined in [20] has been analyzed with MDA-SVD and ACA. . The microstrip The electrical size at 9.42 GHz is patches and feeding lines have been discretized with RWG basis . The array layout with functions [14] in the induced current computation is shown in Fig. 9. Table V shows the CPU time and memory requirements together with the computation error. The Green’s function for infinite multilayer media has been used. The precision parameters are in ACA and SVD drop threshold equal threshold equal to in MDA-SVD. As expected for a planar structure, to the MDA-SVD is faster and needs less storage memory than the ACA. The MDA approximation without SVD postcompression gives a relative error of 0.1% in the induced current and 0.06% in the input impedance computations. Although the SVD postcompression step increases the error to more than 5%, it is necessary if one wishes to use the smallest possible amount of memory: the storage requirement is reduced from 1192 to 676 MB at the cost of a few seconds increase in the computation time. 4) Reflectarray: To finish the antenna simulation tests, a planar multilayer structure with several layers of metalization has been analyzed. The antenna is a three-layer printed reflectarray [21] that has been proposed as a benchmark case in the

2322


TABLE V CPU TIME, STORAGE MEMORY AND RELATIVE ERROR FOR SOLVING THE 16 16 MICROSTRIP ARRAY PROBLEM WITH N = 58;884 Unknowns. NO SYMMETRIES HAVE BEEN USED TO SPEED UP THE COMPUTATION. THE ILU OF THE PRECONDITIONER NEEDED 1 MIN. 3 S

2

Fig. 11. Reflectarray directivity pattern (dBi). The MDA-SVD simulation is compared with the measurements gently supplied by the authors of [21].

TABLE VI CPU TIME AND STORAGE MEMORY REQUIRED FOR SOLVING THE 40 39 3-LAYER MICROSTRIP REFLECTARRAY WITH N = 485;592 Unknowns. THE PROBLEM HAS NO SYMMETRIES. THE ILU OF THE PRECONDITIONER NEEDED 1 H 14 MIN

2

Fig. 10. Three-metalization layers reflectarray [21] and a detail of the first layer triangle mesh. Parts of this figure have been extracted from Figs. 1 and 8 of [21] (IEEE copyright).

error but to modelling inaccuracies like a too coarse mesh: each patch was meshed into 4 4 rectangles, and each rectangle subdivided into two triangles. The total amount of memory to store MDA-SVD matrices was 12.6 GB. A finer 5 5 mesh would multiply the number of unknowns by a factor roughly 1.5, which would result in storage requirements exceeding the 20 GB available computer memory. V. CONCLUSION

“Antennas Center of Excellence” SoftLab activity [22]. The reflectarray consists of three layers of rectangular patch arrays separated by honeycomb and backed by a ground plane. The total number of patches is 12 204 and have been discretized into 386 352 triangles with 485 592 RWG unknowns. The reflectarray shape is elliptical with axes 1036 980 mm, or at 12.1 GHz. Fig. 10 shows a picture of the antenna, the three metalization layers and a detail of the triangle mesh. The antenna has been analyzed by MDA-SVD with the infinite multilayer media Green’s function. In order to reduce the number of equivalent functions in the MDA, we have observed that, independently of the number of metalization layers in the box, if the separation between the top and bottom layers is smaller than we do not need to place equivalent functions on each layer, but only on the top and bottom ones [23]. In the case of the reflectarray analyzed here, this makes two layers of equivalent functions instead of three. Table VI shows the computational requirements. Since the problem is too large to obtain the MoM solution—using direct matrix-vector multiplications—in a reasonable time, the simulation results have been compared with measurements (Fig. 11). The mismatch is probably not due to the matrix compression

In this paper, the 3-D implementation [8] of MLMDA algorithm [7] has been improved with a new location of the equivalent sources in the boundary of the octal tree boxes and with SVD postcompression of the MDA compressed matrices. The first is crucial to have a matrix compression error that converges to zero as the compressed matrix size increases. It has also been found that when computing the interaction between far-field boxes that are not very large, the computational effort to setup 3-D equivalent sources can make MLMDA less efficient than MDA-SVD. In order to optimize the computation time for each pair of far-field boxes at the same level, we estimate the operation count for direct matrix-vector product, MDA-SVD and MLMDA and use the fastest algorithm, which is usually MDA-SVD for problem sizes smaller that one million unknowns. Compared to the well-established MLFMA [4] and the recently applied to wave propagation problems ACA [12], the MDA-SVD algorithm needs a computation time for analyzing 3-D problems comparable to ACA and MLFMA, but the latter requires the least memory storage. However, the MDA-SVD is the most efficient for analyzing structures having large planar surfaces, like multilayer printed antennas.


A high accuracy test was performed increasing the precision parameters of the three matrix compression algorithms until the relative error in the induced current was smaller than 0.1% or stagnated. ACA and MLFMA failed to reach the required accuracy while the MDA-SVD algorithm was not only much more accurate, but also faster than ACA and MLFMA (Table IV). Apart from the high accuracy issue, the main advantage of MDA-SVD and ACA approaches compared with MLFMA is that they can be integrated on top of an existing MoM code independently of the Green’s function used if the impedance matrix contains rank-deficient pseudoblocks, which is the most common case. This feature makes MDA-SVD and ACA the solvers of choice for integration into antenna modelling codes.

REFERENCES [1] W. Chew, J. Jin, C. Lu, E. Michielssen, and J. Song, “Fast solution methods in electromagnetics,” IEEE Trans. Antennas Propag., vol. 45, no. 3, pp. 533–543, Mar. 1997. [2] N. Morita, N. Kumagai, and J. Mautz, Integral Equation Methods for Electromagnetics. Boston, MA: Artech House, 1990. [3] R. Harrington, Field Computation by Moment Methods. New York: MacMillan, 1968. [4] J. Song, C. Lu, and W. Chew, “Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects,” IEEE Trans. Antennas Propag., vol. 45, no. 10, pp. 1488–1493, Oct. 1997. [5] S. Velamparambil and W. Chew, “Analysis and performance of a distributed memory multilevel fast multipole algorithm,” IEEE Trans. Antennas Propag., vol. 53, no. 8, pp. 2719–2727, Aug. 2005. [6] Ö Ergul and L. Gurel, “Fast and accurate solutions of extremely large integral-equation problems discretised with tens of millions of unknowns,” Electron. Lett., vol. 43, no. 9, Apr. 2007. [7] E. Michielssen and A. Boag, “A multilevel matrix decomposition algorithm for analyzing scattering from large structures,” IEEE Trans. Antennas Propag., vol. 44, no. 8, pp. 1086–1093, Aug. 1996. [8] J. Rius, J. Parrón, E. Úbeda, and J. Mosig, “Multilevel matrix decomposition algorithm for analysis of electrically large electromagnetic problems in 3-D,” Microw. Opt. Technol. Lett., vol. 22, no. 3, pp. 177–182, Aug. 1999. [9] J. Parrón, J. Rius, and J. Mosig, “Application of the multilevel decomposition algorithm to the frequency analysis of large microstrip antenna arrays,” IEEE Trans. Magn., vol. 38, no. 2, pp. 721–724, Mar. 2002. [10] M. Bebendorf, “Approximation of boundary element matrices,” Numer. Math., vol. 86, no. 4, pp. 565–589, 2000. [11] S. Kurz, O. Rain, and S. Rjasanow, “The adaptive cross-approximation technique for the 3-D boundary-element method,” IEEE Trans. Magn., vol. 38, no. 2, pp. 421–424, Mar. 2002. [12] K. Zhao, M. Vouvakis, and J.-F. Lee, “The adaptive cross approximation algorithm for accelerated method of moment computations of EMC problems,” IEEE Trans. Electromagn. Compat., vol. 47, pp. 763–773, Nov. 2005. [13] J. Shaeffer, “LU factorization and solve of low rank electrically large mom problems for monostatic scattering using the adaptive cross approximation for problem sizes to 1,025,101 unknowns on a PC workstation,” presented at the IEEE AP-S Int. Symp. , Jun. 10–15, 2007. [14] S. Rao, D. Wilton, and A. Glisson, “Electromagnetic scattering by surfaces of arbitrary shape,” IEEE Trans. Antennas Propag., vol. 30, no. 3, pp. 409–418, May 1982. [15] Y. Saad and M. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SISC, vol. 7, no. 3, pp. 856–869, 1986. [16] O. Bucci and G. Francescetti, “On the degress of freedom of scattered fields,” IEEE Trans. Antennas Propag., vol. 37, no. 7, pp. 918–926, Jul. 1989. [17] Z. Gimbutas and V. Rokhlin, “A generalized fast multipole method for nonoscillatory kernels,” SIAM J. Scientif. Comp., vol. 24, no. 3, pp. 796–817, 2002. [18] A. Heldring, J. Rius, L. Ligthart, and A. Cardama, “Accurate numerical modeling of the tara reflector system,” IEEE Trans. Antennas Propag., vol. 52, no. 7, pp. 1758–1766, Jul. 2004.

2323

[19] Y. Saad, Iterative Methods for Sparse Linear Systems. Boston, MA: PWS, 1996. [20] C. Wang, F. Ling, and J. Jin, “A fast full-wave analysis of scattering and radiation from large finite arrays of microstrip antennas,” IEEE Trans. Antennas Propag., vol. 46, no. 10, pp. 409–418, 1998. [21] J. A. Encinar, L. Datashvili, J. A. Zornoza, M. Arrebola, M. Sierra-Castaner, J. L. Besada, H. Baier, and H. Legay, “Dual-polarization dualcoverage reflectarray for space applications,” IEEE Trans. Antennas Propag., vol. 54, no. 10, pp. 2827–2837, Oct. 2006. [22] Antennas Network of Excellence SoftLab [Online]. Available: http:// www.antennasvce.org/Community/SoftLAB [23] J. Parrón, G. Junkin, and J. Rius, “Improving the performance of the multilevel matrix decomposition algorithm for 2.5-d structures. application to metamaterials,” in Proc. Antennas Propag. Soc. Int. Symp., Jul. 9–14, 2006, pp. 2941–2944.

Juan M. Rius received the “Ingeniero de Telecomunicación” degree in 1987 and the “Doctor Ingeniero” degree in 1991, both from the Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. In 1985, he joined the Electromagnetic and Photonic Engineering Group, Department of Signal Theory and Telecommunications (TSC), UPC, where he currently holds a position of “Catedrático” (equivalent to Full Professor). From 1985 to 1988, he developed a new inverse scattering algorithm for microwave tomography in cylindrical geometry systems. Since 1989, he has been engaged in the research for new and efficient methods for numerical computation of electromagnetic scattering and radiation. He is the developer of the graphical electromagnetic computation (GRECO) approach for high-frequency RCS computation, the integral equation formulation of the measured equation of invariance (IE-MEI), and the multilevel matrix decomposition algorithm (MLMDA) in 3–D. Current interests are the numerical simulation of electrically large antennas and scatterers. He has held positions of Visiting Professor at EPFL, Lausanne, from May 1, 1996 to October 31, 1996; Visiting Fellow at City University of Hong Kong from January 3, 1997 to February 4, 1997; “CLUSTER chair” at EPFL from December 1, 1997 to January 31, 1998; and Visiting Professor at EPFL from April 1, 2001 to June 30, 2001. He has more than 46 papers published or accepted in refereed international journals (24 in IEEE TRANSACTIONS) and more than 100 in international conference proceedings.

Josep Parrón was born in Sabadell, Spain, in 1970. He received the Telecommunication Engineer degree and the Doctor Engineer degree from the Universitat Politécnica de Catalunya (UPC), Spain, in 1994 and 2001, respectively. From 2000 to 2002, he was with the Electromagnetic and Photonic Engineering group of the Signal Theory and Communication Department, UPC, as an Assistant Professor. Since 2002, he has been an Associate Professor in the Telecommunication and Systems Engineering Department, Universitat Autónoma de Barcelona (UAB), Spain. His research interests include efficient methods for numerical computation of electromagnetic scattering and antenna radiation. He is the author or coauthor of more than 50 technical journal articles and conference papers.

Alex Heldring was born in Amsterdam, The Netherlands, on December 12, 1966. He received the M.S. degree in applied physics and the Ph.D. degree in electrical engineering from the Delft University of Technology, Delft, The Netherlands, in 1993 and 2002, respectively. He is presently an Assistant Professor with the Telecommunications Department, Universitat Politecnica de Catalunya, Barcelona, Spain. His research interests include integral equation methods for electromagnetic problems and wire antenna analysis.

2324


José M. Tamayo was born in Barcelona, Spain, on October 23, 1982. He received the degree in mathematics and the degree in telecommunications engineering from the Universitat Politécnica de Catalunya (UPC), Barcelona, both in 2006. In 2004, he joined the Telecommunications Department, Universitat Politecnica de Catalunya (UPC), Barcelona, where he is pursuing the Ph.D. degree in signal and communication theory. His current research interests include accelerated numerical methods for solving electromagnetic problems and parallelization.

Eduard Ubeda was born in Barcelona, Spain, in 1971. He received the Telecommunication Engineer degree and the Doctor Ingeniero degree from the Politechnic University of Catalonia (UPC), Barcelona, in 1995 and 2001, respectively. In 1996, he was with the Joint Research Center, from the European Commission, Ispra, Italy. From 1997 to 2000, he was a Research Assistant with the Electromagnetic and Photonic Engineering group, UPC. From 2001 to 2002, he was a Visiting Scholar with the Electromagnetic Communication Laboratory, Electrical Engineering Department, Pennsylvania State University. Since 2003, he has been with the Politechynic University of Catalonia (UPC). He is the author of 13 papers in international journals and 30 in international conference proceedings. His main research interests are numerical computation of scattering and radiation using integral equations.