A Stable Unstructured Finite Volume Method for

A Stable Unstructured Finite Volume Method for Parallel Large-Scale Viscoelastic Fluid Flow Calculations Mehmet SAHIN † Faculty of Aeronautics and Astronautics, Astronautical Engineering Department, Istanbul Technical University, Maslak, Istanbul, 34469, TURKEY Abstract A new stable unstructured finite volume method is presented for parallel large-scale simulation of viscoelastic fluid flows. The numerical method is based on the side-centered finite volume method where the velocity vector components are defined at the mid-point of each cell face, while the pressure term and the extra stress tensor are defined at element centroids. The present arrangement of the primitive variables leads to a stable numerical scheme and it does not require any ad-hoc modifications in order to enhance the pressure-velocity-stress coupling. The log-conformation representation proposed in [R. Fattal, R. Kupferman, Constitutive laws for the matrix-logarithm of the conformation tensor, J. Non-Newtonian Fluid Mech. 123 (2004) 281–285] has been implemented in order improve the limiting Weissenberg numbers in the proposed finite volume method. The time stepping algorithm used decouples the calculation of the polymeric stress by solution of a hyperbolic constitutive equation from the evolution of the velocity and pressure fields by solution of a generalized Stokes problem. The resulting algebraic linear systems are solved using the FGMRES(m) Krylov iterative method with the restricted additive Schwarz preconditioner for the extra stress tensor and the geometric non-nested multilevel preconditioner for the Stokes system. The implementation of the preconditioned iterative solvers is based on the PETSc library for improving the eficiency of the parallel code. The present numerical algorithm is validated for the Kovasznay flow, the flow of an Oldroyd-B fluid past a confined circular cylinder in a channel and the three-dimensional flow of an Oldroyd-B fluid around a rigid sphere falling in a cylindrical tube. Parallel large-scale calculations are presented up to 523,094 quadrilateral elements in two-dimension and 1,190,376 hexahedral elements in three-dimension. Keywords: Non-Newtonian fluids; Oldroyd-B Model; Unstructured Finite Volume Method; LargeScale Computation; Non-Nested Multigrid Method; Flow Past a Cylinder in a Channel; Flow around a Sphere Falling in a Tube.

1

INTRODUCTION

Large-scale numerical simulations of viscoelastic fluid flows have gained particular interest due to their wide range of application areas. Although significant progress has been made in the solution of incompressible viscoelastic fluid flows, development of more advanced numerical algorithms in term of accuracy, stability, convergence and required computer power for both steady-state simulations and fully implicit time integration of the incompressible viscoelastic fluid flow equations is an active research topic. In the past two decades, considerable effort has been given to the development of robust and stable numerical algorithms. In order to enhance the numerical stability, Perera and Walters [43] introduced the idea of elastic viscous split stress (EVSS) formulation in a finite difference method. Guenette and Fortin [23] proposed the so-called discrete elastic viscous split stress (DEVSS) method in a mixed FEM implementation. Sun et al. [52] proposed an adaptive viscoelastic stress splitting formulation (AVSS) and its applications: the streamline integration (AVSS/SI) and the streamline upwind Petrov-Galerkin (AVSS/SUPG) methods. Oliveira et al. [38] developed a collocated finite volume method on nonorthogonal grids; the velocity-stress-pressure decoupling was removed by using an interpolation similar to that of Rhie and Chow [45]. In the present paper, a new stable unstructured finite volume method is presented for parallel large-scale simulation of viscoelastic fluid flows. The numerical method is based on the † E-mail:

[email protected]

1

side-centered finite volume method where the velocity vector components are defined at the mid-point of each cell face, while the pressure term and the extra stress tensor are defined at element centroids. The present arrangement of the primitive variables leads to a stable numerical scheme and it does not require any ad-hoc modifications in order to enhance the pressure-velocity-stress coupling. This approach was initially used by Hwang [27] and Rida et al. [46] for the solution of the incompressible Navier-Stokes equations on unstructured triangular meshes. Hwang [24] pointed out several important computational merits for the aforementioned grid arrangement. Rida et al. [46] called this scheme side-centered finite volume method and the authors reported superior convergence properties compared to the semi-staggered approach. Therefore, the present side-centered finite volume method is implemented for large-scale viscoelastic fluid flow simulations rather than the semi-staggered finite volume algorithm given in [50]. Rannacher and Turek [44] used the same approach within the finite element framework by employing the stable non-conforming Q˜1 /Q0 finite element pair which is a quadrilateral counterpart of the well-known nonconforming triangular Stokes element of Crouzeix-Raviart [15]. The most appealing feature of this finite element pair is the availability of efficient multigrid solvers which are sufficiently robust even on nonuniform and highly anisotropic meshes. Although the fully staggered approach with multigrid method also leads to very robust numerical algorithm, obtaining the velocity components on unstructured staggered grids is not straightforward as well as the computation of inter grid transfer operators in multigrid. The use of all the velocity vector components significantly simplifies the numerical discretization of the governing equations on unstructured grids as well as the implementation of physical boundary conditions. The present arrangement of the primitive variables can be applied to any nonoverlapping convex polygon which is very important for the treatment of more complex configurations. In the present work, a special attention will be given to satisfy the continuity equation exactly within each element and the summation of the continuity equations can be exactly reduced to the domain boundary, which is important for the global mass conservation. Although significant development has been made for robust and stable numerical algorithms for the solution of viscoelastic fluid flows, most numerical methods lose convergence at small or moderate Weissenberg numbers, limiting their applications due to the so-called High Weissenberg Number Problem (HWNP). Recently, a log representation of the conformation tensor was proposed by Fattal and Kupferman [22]. In this approach, the governing constitutive equation is written in terms of the logarithm of the conformation tensor. This representation ensures the positive definiteness of the conformation tensor and captures sharp elastic stress layers which are exponential in nature. Hulsen et al. [26] showed that the log-conformation formulation improves the stability of numerical methods by applying the DEVSS/DG method to simulate the flow of Oldroyd-B fluid and Giesekus fluid past a cylinder in a channel. Coronado et al. [17] presented a simple alternate form of the log-conformation formulation in which the conformation tensor is replaced by the matrix exponential. Kane et al. [30] compared four different implementation of the log-conformation formulation on the flow around a circular cylinder. The authors pointed out that the original log-conformation [22] is an excellent choice despite its complex implementation. Therefore, the original log-conformation formulation of [22] has been implemented in order to improve the limiting Weissenberg numbers in the proposed finite volume method. The parallel computation of viscoelastic flows is essential for bringing the modern computation power up to the task of carrying out the calculation required by complex constitutive equations. Dou and PhanThien [19] implemented the parallel unstructured finite volume algorithm for the flow of PTT fluid around a cylinder on a distributed computing environment through Parallel Virtual Machine (PVM). Coala et al. [11] presented a highly parallel time integration method for calculating viscoelastic flows with the DEVSSG/DG finite element discretization. The method is based on an operator splitting time integration method that decouples the calculation of the polymeric stress by solution of a hyperbolic constitutive equation from the evolution of the velocity and pressure fields by solution of a generalized Stokes problem. The Stokes-like system is solved by using the BiCGStab Krylov iterative method preconditioned with the block complement and additive level method (BCALM). Dimakopoulos [18] presented the parallelization of a fully implicit and stable finite element algorithm with relative low memory requirements for the accurate simulation of time-dependent free-surface flows of multimode viscoelastic liquids. Kim et al. [33] obtained high-resolution solutions of viscoelastic flow problems based on the DEVSS-G/SUPG formulation. An adaptive incomplete LU (AILU) preconditioner with variable reordering was used for the coupled solution of the linearized system of equations. Castillo et al. [12] developed a fully coupled parallel finite element method for solving three-dimensional viscoelastic free surface flows. A Krylov subspace method with 2

an approximate inverse preconditioner (SPAI) was used for the solution of the linearized system of nonlinear equations. As in the work of Caola et al. [11], we use a time-splitting technique which decouples the solution of the extra stress from the evolution of the velocity and pressure fields by solution of a generalized Stokes problem. Although this decoupling limits the allowable time step, both steps can be solved efficiently by using preconditioned Krylov subspace methods. In here, the Stokes-like system is solved using the FGMRES(m) Krylov iterative method [48] preconditioned with non-nested multigrid method. The present preconditioner is essential for a parallel scalable viscoelastic flow solver. Because it is well known that one-level methods (e.g., Jacobi, Gauss-Seidell, incomplete LU, etc.) lead to nonscalable solvers since they cause increase in the number of iterations as the number of sub-domains is increased. Multigrid methods [24, 56] are known to be the most efficient numerical techniques for solving largescale problems that arise in numerical simulations of physical phenomena because of their computational costs and memory requirements that scale linearly with the degrees of freedom. The basic idea of the multigrid method is to carry out iterations on a fine grid and then progressively transfer these flow field variables and residuals to a series of coarser grids. On the coarser grids, the low frequency errors become high frequency ones and they can be easily annihilated by simple explicit methods. There are various possible strategies for implementing a multigrid algorithm on unstructured meshes [37]. One of the most successful multigrid technique has been the use of non-nested coarse and fine levels. In this approach, coarse grid levels are created independently from the finer meshes and flow variables, residuals and corrections are transferred back and forth between the various grid levels in a multigrid cycle. To reduce the memory requirement of the multigrid method we use more aggressive coarsening method similar to that of Lin et al. [34, 35]. In this study, we investigate the applicability of the multigrid technique for the coupled iterative solution of the momentum and continuity equations. In the coupled approach, the momentum and continuity equations are solved simultaneously. This will lead to more robust solution techniques compared to SIMPLE, SIMPLER, etc. type decoupled solution techniques as well as the preconditioned iterative solvers based on block factorization techniques [20, 31, 21, 3, 49]. An extensive review on the fully-coupled iterative solvers for the incompressible Navier-Stokes equations may be found in [51]. However, it is not possible to apply the standard multigrid methods with classical smoothing techniques (e.g., Jacobi, Gauss-Seidel, etc.) for the coupled iterative solution of the momentum and continuity equations because of the zero-block in the saddle point problem [7, 47, 51]. In order to avoid the zero-block in the saddle point problem, we use an upper triangular right preconditioner which results in a scaled discrete Laplacian instead of a zero block in the original system. Then Jacobi, Gauss-Seidel, etc. type simple smoothers can be used directly as a smoother. The implementation of the preconditioned Krylov subspace algorithm, matrix-matrix multiplication and the multilevel preconditioner were carried out using the PETSc (Portable, Extensible Toolkit for Scientific Computation) software package [5] developed at the Sandia National Laboratories. The computational meshes are partitioned into a set of sub-domains using the METIS library [29]. The proposed numerical method is validated initially for the Kovasznay flow [32] and the spacial convergence of the method is established for a Newtonian fluid. Then the method is applied to classical benchmark problem of the flow of a viscoelastic fluid past a confined circular cylinder in a channel. The viscoelastic fluid past a confined circular cylinder was investigated by many researchers [2, 1, 11, 19, 60, 26, 33, 39, 17, 30]. It is known that the problem is very difficult due to the very thin extra stress boundary layer on the cylinder surface and in the wake behind the cylinder. In the present work, we employ very high mesh resolution in these regions, allowing very accurate solutions at minimum cost, in order to study mesh convergence in the wake of the cylinder. The numerical results at W e = 0.7 indicate that mesh convergence is achieved in the wake of the cylinder. However, the numerical results at higher Weissenberg numbers indicate that no steady-state solution is possible for an Oldroyd-B fluid beyond W e = 0.8. Finally, the numerical method is applied to the three-dimensional flow of a viscoelastic fluid around a rigid sphere falling in a cylindrical tube [36, 10, 40, 59, 13, 57, 58]. The two-dimensional axisymmetric results available in the literature confirm the present three-dimensional algorithm. This article is organized as follows: The governing equations and the proposed finite volume algorithm are given in Section 2. The present numerical algorithm is tested on parallel machines and verified for the Kovasznay flow, the flow of an Oldroyd-B fluid past a confined circular cylinder in a channel and the three-dimensional flow of an Oldroyd-B fluid around a rigid sphere falling in a cylindrical tube in Section 3. Finally, the conclusion and discussions are presented in Section 4. 3

2

MATHEMATICAL and NUMERICAL FORMULATION

The governing equations for the incompressible Oldroyd-B fluid flow in the Cartesian coordinate system can be written in dimensionless form as follows: the continuity equation −∇ · u = 0

(1)

∂u Re + (u · ∇)u + ∇p = β∇2 u + ∇ · T ∂t

(2)

the momentum equations

and the constitutive equation for the Oldroyd-B model ∂T > We + (u · ∇)T − (∇u) · T − T · ∇u = (1 − β)(∇u + ∇u> ) − T ∂t

(3)

In these equations, u represents the velocity vector, p is the pressure and T is the extra stress tensor. The non-dimensional parameters are the Reynolds number Re, the Weissenberg number W e and the viscosity ratio β. Integrating the differential equations (1) and (3) over an unstructured quadrilateral/hexahedral element Ωe with boundary ∂Ωe gives I − n · u dS = 0 (4) ∂Ωe

We

Z

∂Ωe

∂T − (∇u)> · T − T · ∇u dV + ∂t

I

∂Ωe

I (n · u)TdS = (1 − β)

∂Ωe

(un + nu)dS −

Z

TdV ∂Ωe

(5) and the momentum equation (2) over an arbitrary dual control volume Ωd with boundary ∂Ωd yields Z I I I I ∂u Re dV + Re (n · u)udS + npdS = n · ∇udS + n · TdS (6) Ωd ∂t ∂Ωd ∂Ωd ∂Ωd ∂Ωd The n represents the outward normal unit vector, V is the control volume and S is the control volume surface area. Figure 1 illustrates typical two neighbouring quadrilateral elements with an arbitrary dual control volume for the momentum equation (6) constructed by connecting the element centroids to the common vertices shared by the both quadrilateral elements. The velocity vector components are defined at the mid-point of each cell face while the pressure and the extra stress tensor are defined at the element centroids. The three-dimensional hexahedral elements with a dual control volume are also shown in Figure 2. In the following subsection, we restricted ourself to the numerical discretization of two-dimensional viscoelastic fluid flows and its extension to three-dimension is straightforward.

2.1

Numerical Discretization

The momentum equations along the x- and y-directions are discretized over the dual finite volume shown in Figure 1 and the discretization area involves only the right and left elements that share the common edge where the components of the velocity vector are discretized. The discrete contribution from the right cell shown in Figure 1 is given below for each term of the momentum equation along the x-direction. The time derivative n n+1 9u1 + un+1 + un+1 + un+1 9u1 + un2 + un3 + un6 2 3 6 Re A123 − Re A123 12∆t 12∆t

(7)

The convective term n+1 n n n Re [un12 ∆y12 − v12 ∆x12 ] un+1 12 + Re [u23 ∆y23 − v23 ∆x23 ] u23

(8)

The pressure term

n+1 pn+1 + pn+1 p2 + pn+1 1 2 3 ∆y12 + ∆y23 2 2 4

(9)

The viscous term n+1 n+1 n+1 n+1 ∂u ∂u ∂u ∂u −β ∆y12 − β ∆y23 + β ∆x12 + β ∆x23 ∂x 12 ∂x 23 ∂y 12 ∂y 23

(10)

The extra stress term n+1 n+1 n+1 −(Txx)n+1 12 ∆y12 − (Txx )23 ∆y23 + (Txy )12 ∆x12 + (Txy )23 ∆x23

(11)

In here, A123 is the area between the points x1 , x2 and x3 , ∆t is the time step, ∆x12 = x2 − x1 , ∆x23 = x3 − x2 , ∆y12 = y2 − y1 , ∆y23 = y3 − y2 , the values u12 , u23 , v12 and v23 are the velocity vector components defined at the mid-point of each dual volume face and p1 , p2 and p3 are the pressure values at the points x1 , x2 and x3 , respectively. The velocity vector components u12 , u23 , v12 and v23 are computed using the least square interpolations [4, 6]. As an example, u12 = β [u1 + ∇u1 r1 ] + (1 − β) [u2 + ∇u2 r2 ]

(12)

where β is a weight factor determining the type of convection scheme used, ∇u1 and ∇u2 are the gradients of velocity components where the u1 and u2 velocity components are defined and r1 and r2 are the distance vectors from the mid-point of the dual control volume face to the locations where the gradients of velocity components are computed. In this present work, we will employ only β = 0.5 which corresponds to the central least square interpolation. For evaluating the gradient terms, ∇u1 and ∇u2 , a least square procedure is used in which the velocity data is assumed to behave linearly. Referring to Figure 1 as an example, the following system can be constructed for the term ∇u1     ∆x21 ∆y21  ∂u  u2 − u 1  ∆x31   ∆y31     ∂x =  u3 − u1  (13) ∂u  ∆x41   ∆y41 u4 − u 1  ∂y ∆x ∆y u −u 51

51

5

1

This overdetermined system of linear equations may be solved in a least square sense using the normal equation approach, in which both sides are multiplied by the transpose. The modified system is solved using the singular value decomposition provided by the Intel Math Kernel Library in order to avoid the numerical difficulties associated with solving linear systems with near rank deficiency. The pressure values at x1 and x3 as well as the velocity values at these nodes are computed in a similar manner. Then velocity gradients at the dual control volume edge midpoints can be computed from the Green’s theorem. I ∂u 1 = udy (14) ∂x A ∂Ωc I ∂u 1 = − udx (15) ∂y A ∂Ωc where Ωc covolume represents one-quarter of a quadrilateral element where the dual-control volume edge is aligned with one of the covolume diagonal lines and the line integral on the right-hand side of the equations (14) and (15) is evaluated using the mid-point rule on each of the covolume faces. The procedure used here for the evaluation of viscous fluxes is very similar to the method described by Hwang [27]. Although, the actual values of gradients can be computed trough the use of the least square procedure shown above, it is less accurate compared to the gradient values calculated with the Green’s theorem [4]. The contribution from the left cell is also calculated in a similar manner. The discretization of the momentum equation along the y-direction follows very closely the ideas presented here and therefore is not repeated here. The continuity equation (4) is integrated within each quadrilateral elements and evaluated using the mid-point rule on each of the element faces. −

4 X n+1 uj ∆yj − vjn+1 ∆xj = 0

(16)

j=1

where ∆xj and ∆yj are the element edge lengths along the x- and y-directions, respectively and uj and vj are the velocity vector components defined at the mid-point of each quadrilateral element face. The 5

constitutive equation for the Oldroyd-B fluid is discretized as in [50] within each element assuming that the extra stresses Ti and velocity gradients ∇ui are constant: We



n+1  Ti

− ∆t

Tni

Ae − (∇uni )> · Tn+1 Ae − Tn+1 · ∇uni Ae + i i

4 X j=1



 unj ∆yj − vjn ∆xj Tn+1 j

= (1 − β) ∇un+1 + (∇un+1 )> Ae − Tn+1 Ae i i i

(17)

where Ae is the area of the quadrilateral element and Tj is the value of the extra stress tensor at the face centres of the quadrilateral elements. In order to extrapolate the extra stresses to the boundaries of the finite volume elements the second-order upwind least square interpolation described above is used in order to maintain stability for hyperbolic constitutive equations. The time-dependent finite volume discretisation of the above equations leads to a linear system of equations of the form:      Aτ τ Aτ u 0 τ b1  Auτ Auu Aup   u  =  b2  (18) 0 Apu 0 p 0 The above linear system of algebraic equations should be solved for each time step.

2.2

The Log Conformation Formulation

The conformation tensor is a quantity that describes the internal microstructure of polymer molecules in a continuum level. The relation between the conformation tensor and the extra stress tensor is given by σ =I+

We T 1−β

(19)

The constitutive equation for the Oldroyd-B fluid in terms of the conformation tensor can be written as ∂σ We + (u · ∇)σ − (∇u)> · σ − σ · ∇u = −(σ − I) (20) ∂t The conformation tensor is symmetric and positive definite. Unless special care is taken, the conformation tensor may lose this property at high Weissenberg numbers and the numerical solution will soon diverge. Recently, a log-conformation formulation was proposed by Fattal and Kupferman [22] where the constitutive equation is rewritten in terms of the logarithm of the conformation tensor through eigenvalue computations, Ψ = logσ = RlogΛR>. This representation ensures the positive definiteness of the conformation tensor and captures sharp elastic stress layers which are exponential in nature. It is shown in [22] that it is possible to decompose the gradient of divergence free velocity field into anti-symmetric Ω (pure rotation) and N tensors, and symmetric B tensor which commutes with the conformation tensor. ∇u = Ω + B + Nσ −1

(21)

By inserting the decomposition (21) into the constitutive equation (20) and replacing the conformation tensor with the new variable Ψ, the new set of equations to be solved can be rewritten as follows: ∂Ψ We + (u · ∇)Ψ − (ΩΨ − ΨΩ) − 2B = −(I − e−Ψ ) (22) ∂t ∂u (1 − β) + (u · ∇)u + ∇p = β∇2 u + ∇ · (eΨ − I) (23) Re ∂t We −∇ · u = 0

(24)

The discretization of the above equation follows very closely the ideas presented here and therefore is not repeated here.

6

2.3

Iterative Solver

In practice, the solution of equation (18) does not converge very quickly and it is rather difficult to construct robust preconditioners for the whole coupled system. Therefore, we decouple the system by using a time-splitting technique which decouples the calculation of extra stresses from the evaluation of the velocity and pressure fields by solving a generalised Stokes problem. Aτ τ τ = b1 − A τ u u (25) Auu Aup u b2 − Auτ τ = (26) Apu 0 p 0 However, it is not possible to apply the standard multigrid methods with classical smoothing techniques (e.g., Jacobi, Gauss-Seidel) for the coupled iterative solution of the momentum and continuity equations because of the zero-block in the saddle point problem [7, 47, 51]. This could be overcome by applying those smoothers to the squared system which is symmetric and positive definite. The smoothing properties for the squared system was analyzed in [54] and the convergence rate has been proven to be O(1/m) for the multigrid method, where m is the number of smoothing steps. But the method has not become popular due to its low efficiency. Although, the Vanka smoother [53] is very effective in smoothing out errors it is very difficult to implement in an efficient manner. In the present paper, we use an upper triangular right preconditioner which results in a scaled discrete Laplacian instead of a zero block in the original system. Then the modified system becomes Auu Aup I −Aup q Auu Aup − Auu Aup q b2 − Auτ τ = = (27) Apu 0 0 I p Apu −Apu Aup p 0 and the zero block is replaced with −Apu Aup , which is a scaled discrete Laplacian. Unfortunately, this leads to a significant increase in the number of non-zero elements due to the matrix-matrix multiplication. However, it is possible to replace the −Aup block matrix in the upper triangular right preconditioner with a computationally less expensive matrix, −Aûp . The calculations indicate that the largest contribution for the pressure gradients in the momentum equations comes from the right and left elements that share the common edge/face where the components of the velocity vector are discretized. Therefore, we will use the contribution from these two elements for the −Aûp matrix which leads maximum three non-zero entries per row. Although, this approximation does not change the convergence rate of an iterative solver significantly, it leads to a significant reduction in the computing time and memory requirement. The multilevel preconditioner is based on a multiplicative non-nested multigrid method with one V-cycle. In this multigrid method, coarse grid levels are created independently from the finer meshes and flow variables, residuals and corrections are transferred back and forth between the various grid levels in a multigrid cycle. The basic two-level multigrid method is described in Table I. However, for the application of non-nested restriction and prolongation operators, one needs to knows the coarser element containing the centroids of both the elements and the edges/faces on the fine level. For this purpose, the quadtree/octree data structure (see, for example, [8]) can be constructed on the coarse mesh. The quadtree consists of a sequence of recursively divided squares superimposed on a region. Then searching the coarse mesh for a given point involves determining in which octant the point is contained and searching the associated coarse mesh elements whether any of them contain the point. This algorithm is very effective when we are searching for a single point or several arbitrary points. However, this procedure does not use the information from previously computed neighboring points when multiple related points are involved. Thus, we introduce the idea of the level set renumbering algorithm on the fine mesh. We start with the first element and level-1 is defined as the set of elements connected to the vertices of the first element. The next level is found by considering all new neighbours of level-1. This procedure is repeated until all the elements are assigned to a level and the elements are renumbered in ascending order based on their levels. In here, the use of the local level set renumbering algorithm ensures that there is at least one neighboring fine element with a previously computed coarse element number. Therefore, we already know the associated coarse elements without the use of a quadtree search algorithm and the present search algorithm only involves the testing of several associated coarse mesh elements whether any of them contain the point. Therefore, it is possible to find the target coarse element numbers within several iterations. The coarse level elements are partitioned by computing the maximum number of fine level 7

elements with the same processor number within a coarse element and the load balancing on the coarse level is generally well ensured. To reduce the memory requirement of the multigrid method we use more aggressive coarsening method similar to the work of Lin et al. [34, 35]. In order to reduce the complexity of data structure, the velocity vector components are defined at vertices on the coarse grid levels. The restricted additive Schwarz preconditioner with the FGMRES(m) Krylov iterative method [48] is used as a smoother for the multilevel preconditioner and either the successive over-relaxation (SOR) preconditioner or the incomplete LU preconditioner with no fill-in is employed within each partitioned sub-blocks. The implementation of the preconditioned Krylov subspace algorithm, matrix-matrix multiplication and the multilevel preconditioner were carried out using the PETSc [5] software package developed at the Sandia National Laboratories. METIS library [29] is used to decompose the flow domain into a set of sub-domains.

3

NUMERICAL RESULTS

In this section the proposed numerical algorithm is tested on parallel machines and verified for the Kovasznay flow, the flow of an Oldroyd-B fluid past a confined circular cylinder in a channel and the three-dimensional flow of an Oldroyd-B fluid around a rigid sphere falling in a cylindrical tube. The present two-dimensional calculations are performed on the SGI Altix 3000 (1300MHz, Itanium 2) machine available at the Faculty of Aeronautics and Astronautics of ITU with 32 nodes and the computing facilities at TUBITAK ULAKBIM, High Performance and Grid Computing Center. Meanwhile, the three-dimensional calculations are carried out at the Anodolu (Intel Xeon 2.33 GHz) machine at the National Center for High Performance Computing of Turkey using 128 nodes. The present numerical results are obtained by using the time-splitting technique given in the Section 2.3.

3.1

Kovasznay Flow

Kovasznay flow [32] is an analytical solution of the two-dimensional steady-state Navier-Stokes equations and it is used to establish the spatial order of convergence for a Newtonian fluid. The spatial domain in which Kovasznays solution is defined is taken here as the unit square [−0.5, 0.5] × [−0.5, 0.5]. The analytical solution has the following form: u(x, y) = 1 − eλx cos(2πy) (28) sin(2πy) v(x, y) = λeλx (29) 2π 1 − e2λx p(x, y) = (30) 2 r Re Re2 λ = − + 4π 2 (31) 2 4 For the present validation case, the Reynolds number is taken to be 40. In order to establish the spatial convergence of the method, an h-refinement study is performed on both uniform Cartesian meshes as well as unstructured quadrilateral meshes. For uniform Cartesian meshes, five different meshes are employed: mesh U1 with 21 × 21 node points, mesh U2 with 41 × 41 node points, mesh U3 with 81 × 81 node points, mesh U4 with 161 × 161 node points and mesh U5 with 321 × 321 node points. For unstructured quadrilateral meshes the following meshes are considered: mesh M1 with 314 node points and 273 elements, mesh M2 with 1,133 node points and 1,052 elements, mesh M3 with 4,393 node points and 4,232 elements, mesh M4 with 17,635 node points and 17,314 elements and mesh M5 with 66,591 node points and 65,950 elements. The successive meshes are generated using the mapping and paving algorithms provided within the CUBIT mesh generation environment [9]. The details of the meshes corresponding to the meshes U1 to U5 and meshes M1 to M5 are provided in Table II. The error measure is taken to be: Error =

ku − uexact k2 √ Nu

(32)

where Nu is the number of edges. The mesh space ∆h on the unstructured quadrilateral element is defined as 1 ∆h = √ (33) Ne 8

where Ne is the number of elements. The convergence of error measure with mesh spacing is shown in Figure 3 and the error measure decays at an algebraic rate as the mesh is refined. In a log-log scale the expected rate of convergence would appear as a straight line. The central difference approximation to the convective term gives a straight line with an algebraic convergence rate of O(∆h 2 ) on both structured Cartesian and unstructured grids. The present error measure is also compared with the results obtained from the MAC scheme [25]. Although the present numerical scheme does not show any significant improvement on the error measure, it is capable of treating more complex configurations compared to the MAC scheme.

3.2

Performance of Stokes Solver

Efficient numerical solution of Stokes flow is a serious bottleneck in performing parallel large-scale viscoelastic numerical simulations. To illustrate the performance of the present two-level preconditioned iterative solver given in the Section 2.3, an algorithmic scaling study is presented for the two- and three-dimensional lid-driven cavity problem on the SGI Altix 3000 (1300MHz, Itanium 2) machine. The calculations are performed on uniform Cartesian meshes with 501 × 501 and 1001 × 1001 resolutions in two-dimension and 51 × 51 × 51 and 101 × 101 × 101 resolutions in three-dimension. The performance analysis has been carried out by considering the one- and two-level preconditioned iterative solvers and is mainly focused on the Stokes flow. A special attention is given to fit the partitioned data into the local physical memory of computational nodes. In these calculations, the relative residual is set to 10 −8 . Table III presents the results for the algorithmic scaling of the present one- and two-level preconditioned iterative solvers for the two-dimensional lid-driven Stokes flow in a square enclosure using the present numerical algorithm. In addition, the algorithmic scaling of the one-level preconditioned iterative solver is presented for the classical MAC scheme [25]. The symbol − represents the calculation that does not converge within a reasonable time or the calculation for which the local physical memory of computational nodes is not enough. The standard one-level iterative solver is based on the restricted additive Schwarz method with the flexible GMRES(200) algorithm [48]. A block-incomplete factorization coupled with the reverse Cuthill-McKee ordering [16] is used within each partitioned sub-domains. As it may be seen from Table III, the calculations indicates significant improvement in the computation time with an increase in the incomplete LU factorization level. However, it is clear that the iterative solver does not scale well for the present mesh sizes. The standard two-level iterative solver is based on the two-level non-nested geometric multigrid preconditioner with the flexible GMRES(200) algorithm [48]. The restricted additive Schwarz preconditoner is used as a smoother and and either the successive overrelaxation (SOR) preconditioner or the block-incomplete factorization with no fill-in is employed within each sub-blocks. On the coarse level, the block-incomplete factorization with no fill-in is employed. In these two-level calculations, the coarsening ratio is constant between the grid levels and it is set to 1 : 8 2 . As it may be seen, the standard two-level solver with the SOR and ILU(0) preconditioners converges within several iterations and the number of iterations is independent of the problem size. Although we have used only two-level preconditioned iterative solvers, the numerical results indicate relatively good scaling properties. There are two main issues for the algorithmic scaling of an iterative solver. The first issue is to keep the number of iterations constant as the number of subdomains is increased. It is well known that one-level methods cause increase in the number of iterations as the number of sub-domains is increased. Two-level methods can remedy the situation by keeping the coarsening ratio constant between the fine and coarse levels [34, 35]. Lin et al. [34] reported that a two-level preconditioner is optimally convergent for the given fine-to-coarse grid ratio of 82 in two-dimension and 83 in three-dimension. The second issue is that the computation time required for each iteration should scale linearly with the number of unknowns. This is more difficult to achieve for two-level methods due to a relatively large coarse mesh LU factorization. Therefore, one must pay an attention to the coarse grid solve time. In here, the coarse mesh solution time is significantly improved using the restricted additive Schwarz preconditioner with a block-incomplete factorization with no fill-in. However, for larger problems, it may be required to introduce additional levels in order to keep the coarse level relatively cheap. In addition to the present calculations, the standard one-level preconditioned iterative solver is applied to the classical MAC scheme [25] in order to compare its computation time with the present numerical algorithm. The calculations indicate that the MAC scheme is relatively cheaper compared to the present numerical algorithm. It is well known that the unstructured finite volumes methods can not compete with the structured numerical 9

methods including the MAC scheme. The main reason is that the unstructured finite volume solvers need to construct a dual volume and has to interpolate the fluxes to the dual volume faces. Meanwhile the coefficients of the MAC scheme can be constructed directly with no mesh information. In addition, the number of entries per row is significantly larger for the unstructured finite volume methods. Finally, the convective-diffusion submatrix is approximately two times larger since we employ all the velocity components. Table IV presents the similar results for the algorithmic scaling for the three-dimensional lid-driven cubic Stokes flow. Although, the increase in the level of a block-incomplete factorization on each partitioned sub-domains significantly reduces the number of required iterations, this is not possible for large-scale three-dimensional calculations due to the prohibitively large physical memory requirement. The standard two-level preconditioned iterative solver with a constant coarsening ratio of 1 : 5 3 is employed for the present three-dimensional calculations. The convergence properties of the three-dimensional results are very similar to that of the two-dimensional calculations. The total solution time of the standard two-level preconditioned iterative solver with the SOR smoother is approximately 154 seconds for the Stokes flow on an uniform 101 × 101 × 101 mesh. Approximately 72 seconds of this computation time is spent for the construction of linear system, 10 seconds is spent for the matrix-matrix multiplication in equation (27) and 8 seconds is spent for the construction of intergrid transfer operators. Therefore, the solution time is comparable with the time required for the constructing of the linear system. Although the preconditioned iterative solvers based on block factorization techniques [20, 31, 21, 3, 49] has been implemented as in [49] in addition to the one- and two-level preconditioned iterative methods, they do not perform as well as the present two-level preconditioned iterative solution technique. As an example, the three-dimensional lid-driven cavity flow at Re = 20 is solved on a Cartesian uniform 65 × 65 × 65 mesh using 6 Oseen iterations with zero initial value and rtol = 10−5 . The implemented block based least squares commutator (LSC) preconditioner [20] requires 972 seconds on the SGI Altix 3000 machine with 32 nodes. In here, the smaller subproblems are solved inexactly similar to that of [21] using the HYPRE BoomerAMG solver [28]. However, the same test case in [21] required 391 seconds using the block based pressure convection diffusion (PCD) preconditioner [31] on the ASCI Red machine (333MHz, Intel Pentium II Xeon) with 100 nodes. The slight difference in computation time is mainly due to size of the convective-diffusion subproblem in our approach which is approximately three times larger since we use all the components of the velocity vector. In addition, the LSC preconditioner requires to solve the scaled discrete Laplacian subproblem two times for each outer iteration while the PCD preconditioner requires only once. On the other hand, the standard two-level preconditioned iterative method for the first approach requires 195 seconds on the SGI Altix 3000 with 32 nodes and indicates substantial improvement in the computation time.

3.3

Oldroyd-B Fluid Past a Confined Circular Cylinder

The parallel algorithm described in Section 2 is used to compute the two-dimensional viscoelastic flow past a confined circular cylinder in a channel [2, 1, 11, 19, 60, 26, 33, 39, 17, 30]. For this flow problem, we consider a circular cylinder of radius R positioned symmetrically between two parallel plates separated by a distance 2H. The blockage ratio R/H is taken equal to 0.5 and the computational domain extends a distance 12R upstream and downstream of the cylinder. The dimensionless parameters are the Reynolds number Re = hU iR/η, the Weissenberg number W e = λhU i/R and the viscosity ratio β = η s /η. The physical parameters are the density ρ, the average velocity at the inlet hU i, the relaxation time λ, the zero-shear-rate viscosity of the fluid η and the solvent viscosity ηs . The viscosity ratio β is chosen to be 0.59, which is the value used in the benchmarks for the Oldroyd-B fluid. In this work, fully developed velocity boundary conditions are imposed at the inlet and natural (traction-free) boundary conditions are imposed at the outlet boundary. No-slip velocity boundary conditions are imposed on all solid walls. The extra stresses are computed everywhere within the computational domain and their boundary conditions are introduced through their fluxes, using the analytical values at the inlet boundary. In the present work, three different meshes are employed: coarse mesh M1 with 35,815 node points and 35,313 elements, medium mesh M2 with 141,148 node points and 140,147 elements and fine mesh M3 with 565,122 node points and 563,121 elements. The successive meshes are generated by doubling the number of grid points on the boundaries from the previous one and using the square root of stretching factors used from the previous one. As may be seen in Figure 4, the mesh is highly stretched on the cylinder 10

surface, on the walls and in the wake behind the cylinder in order to resolve very strong stress gradients. The details of the mesh characteristics are provided in Table V. In order to validate our code, the flow of an Oldroyd-B fluid past a confined circular cylinder in a channel is solved at a Weissenberg number of 0.7. The mesh convergence of Txx with mesh refinement is given in Figure 5 on the cylinder surface and along the center line in the wake. As seen in Figure 5, the mesh convergence is obvious at W e = 0.7 both on the cylinder surface and along the center line in the wake. In the literature, there are some results for the mesh refinement study for extra stress along the center line in the wake. However, these numerical results are not mesh convergent at W e = 0.7 even though they clearly indicate a mesh convergence trend. The computed stress component Txx on the cylinder surface and along the center line in the wake are compared in Figure 6 with the results of Yurun et al. [60], Hulsen et al. [26] and Afonso et al. [2]. These extreme values of Txx are provided with the other results available in the literature in Table VI. The results of Yurun et al. [60] were obtained using a Galerkin/least-squares hp finite element method. Hulsen et al. [26] used the DEVSS/DG formulation in a FEM context with the log-conformation formulation. Afonso et al. [2] employed a structured collocated FVM based on the log-conformation formulation with a time-marching pressure-correction algorithm on highly refined meshes at the rear stagnation region. The comparison in Figure 6 shows excellent agreement on the cylinder surface with the results of Yurun et al. [60] and Hulsen et al. [26]. The low value of Afonso et al. [2] is due to the relatively large mesh size employed on the cylinder surface and the authors presented numerical results similar to ours with mesh refinement. However, there is a slight difference along the center line in the wake and our results are shifted slightly downstream. This is mainly due to the extremely small mesh size required in the wake behind the cylinder. Nevertheless, our results in the wake indicate a remarkable agreement with the one-dimensional DG calculation of Hulsen et al. in Figure 6 of [26] (M4-1D) along the center line in the wake. The one-dimensional DG calculation of the authors were obtained on a very fine mesh by starting from the back stagnation point and using the u−velcoty component from the FEM calculation. The calculations at higher Weissenberg numbers led to more surprising results. The calculations at a Weissenberg number of 0.8 converged to steady-state solutions on all meshes and the mesh convergence of Txx with mesh refinement is given in Figure 7 on the cylinder surface and along the center line in the wake. At this Weissenberg number, there is no mesh convergent tendency in the stress profiles. For the calculations at W e = 0.9, the numerical solution becomes even worse since the extra stress along the center line in the wake initially grows exponentially with time and no steady-state solution can be found anymore on meshes M2 and M3. The variation of the extra stress RMS value with iteration number is given in Figure 8 on meshes M1 to M3. As it may seen the RMS value on meshes M2 and M3 initially grows exponentially on meshes M2 and M3. Then the solution becomes time-dependent depending on the mesh resolution and time step. Although, the value of the extra stress may reach quite high values such as 1×104 during the initial growth, the solution field is still quite smooth and the velocity field is divergencefree. In addition, we observe an extremely low pressure field at the stagnation point behind the cylinder behaving like a singularity point. It should be noted that the critical Weissenberg number at which the solution becomes time-dependent is relatively lower compared to the previous results in the literature; it is 1.4 for the results of Hulsen et al. [26] and 1.0 for the result of Afonso [2]. At this point, we are not sure whether the extra stress along the center line in the wake should exhibit exponential unbounded growth with time to infinity or leads to a time-dependent solution for the present two-dimensional calculations. However, the calculations on the fine mesh M3 show more smooth initial exponential growth indicating the first case. In the literature, there are also large discrepancies for the stresses in the wake region; some researchers have suggested that there may be some numerical artifacts beyond W e = 0.7. For example, Yurun et al. [60] suggested that the flow of the Oldroyd-B fluid may have about the same limiting Deborah number as the UCM fluid and solutions with higher De may be numerical artifacts. Owens and Phillips [41] pointed out that there is a singular point of extra stress somewhere between We = 0.7 and 0.8, the mesh-convergence cannot be obtained over the singular point. The present large-scale calculations also indicate that that no steady-state solution is possible for an Oldroyd-B fluid beyond W e = 0.8. Although the convergence of the drag coefficient with mesh refinement is not considered to be a very good indicator of accuracy, the value for the steady state drag coefficients are tabulated in Table VII and compared with several other results available in the literature. The total drag on the cylinder

11

surface may be expressed as follows: I I I I I ∂u ∂u dy − dx + Txx dy − Txy dx Fx = − pdy + β ∂x ∂y

(34)

The computed drag coefficient values are in relatively good agreement with the other results available in the literature.

3.4

Oldroyd-B Fluid around a Sphere Falling in a Cylindrical Tube

The two-dimensional axisymmetric viscoelastic flow of an Oldroyd-B fluid around a sphere falling in a cylindrical tube is one of the classical benchmark problems in non-Newtonian fluid mechanics and has been studied extensively by many researchers [36, 10, 40, 59, 13, 57, 58]. The parallel algorithm described in Section 2 is validated by solving this particular benchmark problem in three-dimension using allhexahedral elements. For this problem, we consider a rigid sphere of radius Rs falling with a terminal velocity Us along the axis of a cylindrical tube of radius Rt . The ratio of sphere to tube radius is taken to be 0.5 and the computational domain spans from x = −12Rs to x = 12Rs with the sphere located at x = 0. The dimensionless parameters are the Reynolds number Re = Us Rs /η, the Weissenberg number W e = λUs /Rs and the viscosity ratio β = ηs /η. The physical parameters are the density ρ, the terminal velocity Us , the relaxation time λ, the zero-shear-rate viscosity of the fluid η and the solvent viscosity η s . The viscosity ratio β is chosen to be 0.50, which is the value used in the classical benchmark problem for the Oldroyd-B fluid. The associated boundary conditions are the prescribed unit uniform velocity along the x-axis at the inflow and on the circular tube wall, no-slip boundary conditions on the sphere surface and natural (traction-free) boundary conditions at the outflow. The extra stresses are computed everywhere as in two-dimension and their boundary conditions are imposed through their fluxes, using the analytical values at the inlet boundary. The unstructured computational mesh shown partially in Figure 9 is used in our calculations. The mesh consists of 1,214,542 nodes and 1,190,376 hexahedral elements leading to total 19,117,980 degrees of freedom (DOF). As may be seen in Figure 9, the mesh is highly stretched on the sphere surface, on the walls and in the wake region behind the sphere in order to resolve very strong stress gradients. The normal mesh space on the sphere surface is set to 2.6 × 10−3 while the minimum tangential mesh space is approximately equal to 8 × 10−3 . There are 11,580 quadrilateral elements on the sphere surface. The mesh is created using the mapping, paving and sweeping algorithms available within the CUBIT mesh generation environment [9]. The METIS library [29] is used to partition the mesh into 128 subdomains. The computed u−velocity component isosurfaces with streamtrace plot are given in Figure 10 at W e = 0.6 with β = 0.5. The isosurfaces are plotted directly from the element face center values by constructing a new mesh using element face centroids. As it may be seen the maximum value of the u−velocity component occurs between the sphere and the tube walls. In addition, the computed stress component Txx isosurfaces with contour plot on y = 0 plane (red lines) and on solid walls (black lines) are shown in Figure 11. For this case, the computed Txx values are relatively low compared to the two-dimensional cylinder calculations. The computed stress component Txx is given in Figure 12 on the sphere surface and along the centerline of the cylindrical tube, and it is compared with the numerical results of Owens and Phillips [40] on the sphere surface. These extreme values of Txx are also provided with the other results available in the literature in Table VIII. The results of Owens and Phillips [40] were obtained using a spectral element method; Lunsmann et al. [36] used a EVSS/FEM formulation. The present maximum value of Txx extra stress component presents only a 2.70% difference from the value computed by Owens and Phillips [40] and a 1.18% difference from the value of Lunsmann et al. [36]. However, it should be noted that the present extreme value of the Txx corresponds to the value at the hexahedral element centroids next to the sphere surface. The present three-dimensional calculation required approximately 10789 seconds on the Anodolu machine with 128 nodes. Although we would like to do additional calculations as in two-dimension, the decoupled time integration method given is Section 2 significantly limits the allowable time step on the present fine mesh making the calculations very expensive. In the future, we will develop a fully-coupled iterative solver in order to further improve the computational efficiency of the present numerical algorithm.

12

4

CONCLUSIONS

A new stable unstructured finite volume is presented for parallel large-scale solution of the viscoelastic fluid flows with exact mass conservation within each elements. The present arrangement of the primitive variables leads to a stable numerical scheme and it does not require any ad-hoc modifications in order to enhance the pressure-velocity-stress coupling. The time stepping algorithm used decouples the calculation of the polymeric stress by solution of a hyperbolic constitutive equation from the evolution of the velocity and pressure fields by solution of a generalized Stokes problem. The resulting algebraic linear systems are solved using the FGMRES(m) Krylov iterative method with the restricted additive Schwarz preconditioner for the extra stress tensor and the geometric non-nested multilevel preconditioner for the Stokes system. The present multilevel preconditioner for the Stokes system is essential for parallel scalable viscoelastic flow computations. This is because, as it is well known, one-level methods lead to non-scalable solvers since they cause increase in the number of iterations as the number of sub-domains is increased. However, it is not possible to apply the standard multigrid methods with classical smoothing techniques (e.g., Jacobi, Gauss-Seidel) for the coupled iterative solution of the momentum and continuity equations because of the zero-block in the saddle point problem. In order to avoid the zero-block in the saddle point problem, we use an upper triangular right preconditioner which results in a scaled discrete Laplacian instead of a zero block in the original system. The log-conformation representation proposed in [22] has been implemented in order to improve the limiting Weissenberg numbers in the proposed finite volume method. The implementation of the preconditioned Krylov subspace algorithm, matrix-matrix multiplication and the multilevel preconditioner were carried out using the PETSc software package [5] developed at the Sandia National Laboratories for improving the efficiency of the parallel code. The present numerical algorithm is validated for the Kovasznay flow, the flow of an Oldroyd-B fluid past a confined circular cylinder in a channel and the three-dimensional flow of an Oldroyd-B fluid around a rigid sphere falling in a cylindrical tube. The numerical results for the flow of an Oldroyd-B fluid past a confined circular cylinder at W e = 0.7 indicate that mesh convergence is achieved in the wake of the cylinder. However, the numerical results at higher Weissenberg numbers indicate that no steady-state solution is possible for an Oldroyd-B fluid beyond W e = 0.8. Although, we employ very high mesh resolution in the wake region, allowing very accurate solutions at minimum cost, in order to study mesh convergence in the wake of the cylinder, the decoupled time integration method significantly limits the allowable time step. Therefore, we will develop a fully-coupled iterative solver in order to further improve the computational efficiency of the present numerical algorithm.

5

ACKNOWLEDGMENTS

The author gratefully acknowledge the use of the Chimera machine at the Faculty of Aeronautics and Astronautics at ITU, the computing resources provided by the National Center for High Performance Computing of Turkey (UYBHM) under grant number 10752009 and the computing facilities at TUBITAK ULAKBIM, High Performance and Grid Computing Center.

References [1] M. A. Alves, F. T. Pinho and P. J. Oliveira, The flow of viscoelastic fluids past a cylinder: finitevolume high-resolution methods. J. Non-Newtonian Fluid Mech. 97, (2001), 207–232. [2] A. Afonso, P. J. Oliveira, F. T. Pinho and M. A. Alves, The log-conformation tensor approach in the finite-volume method framework. J. Non-Newtonian Fluid Mech. 157, (2009), 55–65. [3] R. Amit, C. A. Hall and T. A. Porsching, An application of network theory to the solution of implicit Navier-Stokes difference equations. J. Comput. Phys. 40, (1981), 183–201. [4] W. K. Anderson and D. L. Bonhaus, An implicit upwind algorithm for computing turbulent flows on unstructured grids. Comp. & Fluids 23, (1994), 1–21.

13

[5] S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith and H. Zhang, PETSc Users Manual. ANL-95/11, Mathematic and Computer Science Division, Argonne National Laboratory, (2004). http://www-unix.mcs.anl.gov/petsc/petsc-2/ [6] T. J. Barth, A 3-D upwind Euler solver for unstructured meshes. AIAA Paper 91-1548-CP, (1991). [7] M. Benzi, G. H. Golub and J. Liesen, Numerical solution of saddle point problems. Acta Numer. 14, (2005), 1–137. [8] M. L. Bittencourt, C. C. Douglas and R. A. Feij´ oo, Nonnested Multigrid Methods for Linear Problems. Num. Meth. for Partial Diff. Eqs. 17, (2001), 313–331. [9] T. D. Blacker, S. Benzley, S. Jankovich, R. Kerr, J. Kraftcheck, R. Kerr, P. Knupp, R. Leland, D. Melander, R. Meyers, S. Mitchell, J. Shepard, T. Tautges and D. White, CUBIT Mesh Generation Enviroment Users Manual Volume 1. Sandia National Laboratories: Albuquerque, NM (1999). [10] C. Bodart and M. J. Crochet, The time-dependent flow of a viscoelastic fluid around a sphere Original Research Article. J. Non-Newtonian Fluid Mech. 54, (1994), 303–329. [11] A. E. Caola, Y. L. Joo, R. C. Armstrong and R. A. Brown, Highly parallel time integration of viscoelastic flows. J. Non-Newtonian Fluid Mech. 100, (2001), 191–216. [12] Z. Castillo, X. Xie, D. C. Sorensen, M. Embree and M. Pasquali, Parallel solution of large-scale free surface viscoelastic flows via sparse approximate inverse preconditioning. J. Non-Newtonian Fluid Mech. 157, (2009), 44–54. [13] C. Chauvière and R. G. Owens, How accurate is your solution?: Error indicators for viscoelastic flow calculations. J. Non-Newtonian Fluid Mech. 95, (2000), 1–33. [14] C. Chauvière and R. G. Owens, A new spectral element method for the reliable computation of viscoelastic flow. Comput. Methods Appl. Mech. Eng. 190, (2001), 3999–4018. [15] M. Crouzeix and P.A. Raviart, Conforming and nonconforming finite element methods for solving the stationary Stokes equations. RAIRO Anal. Num´ er. 7 (1973), 33–76. [16] E. Cuthill and J. McKee, Reducing the bandwidth of sparce symmetric matrices. 24th. ACM National Conference, (1969), 157–172. [17] O. M. Coronado, D. Arora, M. Behr and M. Pasquali, A simple method for simulating general viscoelastic fluid flows with an alternate log-conformation formulation. J. Non-Newtonian Fluid Mech. 147, (2007), 189–199. [18] Y. Dimakopoulos, An efficient parallel and fully implicit algorithm for the simulation of transient free-surface flows of multimode viscoelastic liquids. J. Non-Newtonian Fluid Mech. 165, (2010), 409–424. [19] H.-S. Dou and N. Phan-Thien, Parallelisation of an unstructured finite volume code with PVM: Viscoelastic flow around a cylinder. J. Non-Newtonian Fluid Mech. 77, (1998), 21–51. [20] H. C. Elman, Preconditioning for the steady-state Navier-Stokes equations with low viscosity, SIAM J. Sci. Comput. 20, (1999), 1299–1316. [21] H. C. Elman, V. E. Howle, J. N. Shadid and R. S. Tuminaro, A parallel block multi-level preconditioner for the 3D incompressible Navier-Stokes equations. J. Comput. Phys. 187, (2003), 504–523. [22] R. Fattal and R. Kupferman, Constitutive laws for the matrix-logarithm of the conformation tensor. J. Non-Newtonian Fluid Mech. 123, (2004), 281–285. [23] R. Guenette and M. Fortin, A new mixed finite element method for computing viscoelastic flows. J. Non-Newtonian Fluid Mech. 60, (1995), 27-52. [24] W. Hackbusch, Multigrid Methods and Applications. Springer-Verlag, Heidelberg, (1985). 14

[25] F. H. Harlow and J. E. Welch, Numerical calculation of time-dependent viscous incompressible flow of fluid with free surface. J. Comput. Phys. 8, (1965), 2182–2189. [26] M. A. Hulsen, R. Fattal and R. Kupferman, Flow of viscoelastic fluids past a cylinder at high Weissenberg number: Stabilized simulations using matrix logarithms. J. Non-Newtonian Fluid Mech. 127, (2005), 27–39. [27] Y. H. Hwang, Calculations of incompressible flow on a staggered triangle grid, Part I: Mathematical formulation. Numer. Heat Transfer B 27, (1995), 323–1995. [28] R. Falgout, A. Baker, E. Chow, V. E. Henson, E. Hill, J. Jones, T. Kolev, B. Lee, J. Painter, C. Tong, P. Vassilevski and U. M. Yang, Users manual, HYPRE High Performance Preconditioners. UCRLMA-137155 DR, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, (2002). http://www.llnl.gov/CASC/hypre/ [29] G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, (1998), 359–392. [30] A. Kane, R. Gunette and A. Fortin, A comparison of four implementations of the log-conformation formulation for viscoelastic fluid flows. J. Non-Newtonian Fluid Mech. 164, (2009), 45–50 [31] D. Kay, D. Loghin and A. J. Wathen, A preconditioner for the steady-state Navier-Stokes equations, SIAM J. Sci. Comput. 24, (2002), 237–256. [32] L. I. G. Kovasznay, Laminar flow behind a two-dimensional grid, Proc. Cambridge Phil. Soc. 44, (1948), 58–62. [33] J. M. Kim, C. Kim, K. H. Ahn and S. J. Lee, An efficient iterative solver and high-resolution computations of the Oldroyd-B fluid flow past a confined cylinder. J. Non-Newtonian Fluid Mech. 123, (2004), 161–173. [34] P.T. Lin, M. Sala, J. N. Shadid and R. S. Tuminaro, Performance of a Geometric and an Algebraic Multilevel Preconditioner for Incompressible Flow with Transport. Proceedings of Computational Mechanics WCCM VI in conjunction with APCOM’04, Sept. 5–10, (2004), Beijing, China. [35] P. L. Lin, M. Sala, J. N. Shadid and R. S. Tuminaro, Performance of fully-coupled algebraic multilevel domain decomposition preconditioners for incompressible flow and transport. Int. J. Numer. Meth. Engng 19, (2004), 1–10. [36] W. J. Lunsmann, L. Genieser, R. C. Armstrong, R. A. Brown, Finite element analysis of steady viscoelastic flow around a sphere in a tube: calculations with constant viscosity models. Int. J. Numer. Meth. Engng 48, (1993), 63–99. [37] D. J. Mavriplis, Multigrid solution strategies for adaptive meshing problems. NASA/CP-3316, (1995). [38] P. J. Oliveira, F. T. Pinho and G. A. Pinto, Numerical simulation of non-linear elastic flows with a general collocated finite-volume method. J. Non-Newtonian Fluid Mech. 79, (1998), 1-43. [39] R. G. Owens, C. Chauvière and T. N. Philips, A locally-upwinded spectral technique (LUST) for viscoelastic flows. J. Non-Newtonian Fluid Mech. 108, (2002), 49–71. [40] R. G. Owens and T. N. Phillips, Steady viscoelastic flow past a sphere using spectral elements. Int. J. Num. Meth. Engrg. 39, (1996), 1517–1534. [41] R. G. Owens and T.N. Phillips, Computational Rheology, Imperial College Press, London, (2002). [42] H. M. Park and J. Y. Lim, A new numerical algorithm for viscoelastic fluid flows : The grid-by-grid inversion method. J. Non-Newtonian Fluid Mech. 165, (2010), 238-246. [43] M. G. N. Perera and K. Walters, Long-range memory effects in flows involving abrupt changes in geometry. J. Non-Newtonian Fluid Mech. 2, (1977), 49-81. 15

[44] R. Rannacher and S. Turek, A simple nonconforming element. Numer. Meth. PDEs 8, (1992), 97–111. [45] C. M. Rhie and W. L. Chow, Numerical study of the turbulent flow past an airfoil with trailing edge separation. AIAA J. 21 (1983), 1525–1532. [46] S. Rida, F. McKenty, F. L. Meng and M. Reggio, A staggered control volume scheme for unstructured triangular grids. Int. J. Numer. Meth. Fluids 25, (1997), 697–717. [47] M. Rozloznik, Saddle point problems, iterative solution and preconditioning: a short overview, Proceedings of the XVth Summer School Software and Algorithms of Numerical Mathematics, I. Marek (Ed.), University of West Bohemia, Pilsen, (2003), 97–108. [48] Y. Saad, A flexible inner-product preconditioned GMRES algorithm. SIAM J. Sci. Statist. Comput. 14, (1993), 461–469. [49] M. Sahin, A preconditioned semi-staggered dilation-free finite volume method for the incompressible Navier-Stokes equations on all-hexahedral elements. Int. J. Numer. Meth. Fluids 49, (2005), 959–974. [50] M. Sahin and H. J. Wilson, A semi-staggered dilation-free finite volume method for the numerical solution of viscoelastic fluid flows on all-hexahedral elements. J. Non-Newtonian Fluid Mech. 147, (2007), 79–91. [51] P. R. Schunk, M. A. Heroux, R. R. Rao, T. A. Baer, S. R. Subia and A. C. Sun, Iterative solvers and preconditioners for fully-coupled finite element formulations of incompressible fluid mechanics and related transport problems. SAND2001-3512J, Sandia National Laboratories Albuuquerque, New Mexico, (2001). [52] J. Sun, N. Phan-Thien and R. I. Tanner, An adaptive viscoelastic stress splitting scheme and its applications: AVSS/SI and AVSS/SUPG. J. Non-Newtonian Fluid Mech. 65, (1996), 75-91. [53] S. P. Vanka, Block-implicit multigrid solutions of Navier-Stokes equations in primitive variables. J. Comput. Phys. 65, (1986), 138–158. [54] R. Vefrth, A multilevel algorithm for mixed problems. SIAM J. Numer. Anal. 21, (1984), 264–271. [55] P. Wapperom and M. F. Webster, A second-order hybrid finite-element/volume method for viscoelastic flows. J. Non-Newtonian Fluid Mech. 79, (1998), 405–431. [56] P. Wesseling, An introduction to multigrid methods. John Wiley & Sons, New York, (1992). [57] F. Yurun, Solution behavior of the falling sphere problem in viscoelastic flows. Acta Mechanica Sinica 19, (2003), 394–408. [58] F. Yurun, Limiting behavior of the solutions of a falling sphere in a tube filled with viscoelastic fluids. J. Non-Newtonian Fluid Mech. 110, (2003), 77–102. [59] F. Yurun, M. J. Crochet, High-order finite element methods for steady viscoelastic flows. J. NonNewtonian Fluid Mech. 57, (1995), 283–311. [60] F. Yurun, R. I. Tanner and N. Phan-Thien, Galerkin/least-square finite element methods for steady viscoelastic flows. J. Non-Newtonian Fluid Mech. 84, (1999), 233–256.

16

Table I: The two-level multigrid method is given below. P and R are prolongation and restriction operators, respectively.

1. Perform ν1 step iterative method for solving Ax = b 2. Compute residual r = b − Ax 3. Solve error on the coarse level RAP e = Ac e = Rr 4. Compute new x value x = x + P e 5. Perform ν1 step iterative method for solving Ax = b 6. Check convergence. If krk2 ≤ rtol goto 1.

Table II: The description of quadrilateral meshes used for the Newtonian Kovasznay flow.

Mesh U1 U2 U3 U4 U5

Structured Mesh Number of Nodes Number of Elements 441 400 1681 1600 6561 6400 25921 25600 103041 102400

DOF 2080 8160 32320 128640 513280

Mesh M1 M2 M3 M4 M5

Unstructured Mesh Number of Nodes Number of Elements 314 273 1133 1052 4393 4232 17635 17314 66591 65950

DOF 1445 5420 21480 82765 331030

Table III: The change of iteration number and computation time for the two-dimensional lid-driven Stokes flow using the one- and two-level preconditioned iterative methods given in the Section 2.2 on an SGI Altix 3000 (1300MHz, Itanium 2) for the present method and the MAC scheme [25]. The relative residual is set to rtol = 10−8 .

Method

1-Level Present

2-Level Present

1-Level MAC

Precond. ILU(0) ILU(1) ILU(2) ILU(3) ILU(4) ILU(5) MG-SOR MG-ILU ILU(0) ILU(1) ILU(2) ILU(3) ILU(4) ILU(5)

Proc. Num. 8 8 8 8 8 8 8 8 8 8 8 8 8 8

501 × 501 Iter. Time Num. [sec] − − 2860 577 1300 353 757 248 528 182 396 158 19 37 12 30 − − 3540 444 1368 189 581 84 462 70 371 63

17

1001 × 1001 Proc. Iter. Time Num. Num. [sec] 32 − − 32 − − 32 5905 1645 32 3544 1078 32 3349 1126 32 2156 784 32 19 42 32 15 36 32 − − 32 − − 32 8075 1138 32 3896 582 32 2186 354 32 1669 286

Table IV: The change of iteration number and computation time for the the three-dimensional cubic lid-driven Stokes flow using the one- and two-level preconditioned iterative methods given in the Section 2.2 on an SGI Altix 3000 (1300MHz, Itanium 2) for the present method. The relative residual is set to rtol = 10−8 .

Method 1-Level Present 2-Level Present

Precond. ILU(0) ILU(1) MG-SOR MG-ILU

51 × 51 × 51 Proc. Iter. Time Num. Num. [sec] 4 361 244 4 − − 4 14 133 4 11 129

101 × 101 × 101 Proc. Iter. Time Num. Num. [sec] 32 1279 767 32 − − 32 14 154 32 11 147

Table V: The description of quadrilateral meshes used for an Oldroyd-B fluid past a confined circular cylinder in a channel with R/H = 0.5. ∆rmin is the minimum normal mesh spacing and ∆Smin and ∆Smax are the minimum and maximum tangential mesh spacing on the cylinder surface. Mesh M1 M2 M3

Number of Nodes 35815 141148 565122

Number of Elements 35313 140147 563121

DOF 283508 1123178 4508970

∆rmin /R 0.00494 0.00241 0.00119

∆Smin /R 0.0004 0.0002 0.0001

∆Smax /R 0.031415 0.015707 0.007853

Table VI: The comparison of maximum value of extra stress tensor component T xx at the cylinder wall and in the wake region at W e = 0.7 for an Oldroyd-B fluid with β = 0.59. Authors Present (M3) Yurun et al. [60] Chauvière and Owens [14] Kim et al. [33] Hulsen et al. (M7) [26] Afonso et al. [2]

Maximum value of Txx at the cylinder wall 106.80 106.77 106.4 107.7 107.73 100.98

Maximum value of Txx in the wake region 42.33 40.05 37.1 38.8 38.92 40.79

Table VII: The comparison of the dimensionless drag coefficient for an Oldroyd-B fluid past a confined circular cylinder in a channel. We 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

M1 130.301 126.566 123.133 120.535 118.772 117.722 117.263 117.297

M2 130.384 126.647 123.213 120.613 118.848 117.797 117.337 117.374

M3 130.387 126.649 123.215 120.615 118.849 117.798 117.339 117.376

Hulsen et al. [26] 132.358 130.363 126.626 123.193 120.596 118.836 117.792 117.340 117.373

Yurun et al. [60] 132.36 130.36 126.62 123.19 120.59 118.83 117.77 117.32 117.36

18

Owens et al. [39] 132.357 − − − − 118.827 117.775 117.291 117.237

Afonso et al. [2] − − − − − 118.818 117.774 117.323 117.364

Kim et al. [33] 132.354 130.359 126.622 123.118 120.589 118.824 117.774 117.315 117.351

Caola et al. [11] 132.384 − − − − 118.763 − − −

Table VIII: The comparison of maximum value of extra stress tensor component T xx on the sphere surface and in the wake region at W e = 0.6 for an Oldroyd-B fluid with β = 0.50. Authors Present Lunsmann et al. [36] Owens et al. [40]

Maximum value of Txx on the sphere surface 34.73 35.14 35.67

19

Maximum value of Txx in the wake region 5.12 − −

Figure 1: Two-dimensional unstructured mesh with a dual control volume.

Figure 2: Three-dimensional unstructured mesh with a dual control volume.

20

10-1

10

MAC Scheme Structured Unstructured

-2

U1 M1

10-3

Error

U2 M2 U3

10-4

M3 U4 M4

10

2 1

-5

U5 M5

10-6 -3 10

10-2

10-1

∆h

√ √ Figure 3: The spacial convergence of the error (ku − uexact k2 / Nu ) with mesh refinement (∆h = 1/ Ne ) for the Kovasznay flow at Re = 40.

Figure 4: The computational coarse mesh M1 for an Oldroyd-B fluid past a confined circular cylinder in a channel with R/H = 0.5. The total number of nodes is 35,815 and the total number quadrilateral elements is 35,313.

21

120 Mesh M1 Mesh M2 Mesh M3

100

80

Txx

60

40

20

0

-20 -1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

x/R

Figure 5: The convergence of Txx with mesh refinement on the cylinder surface and in the cylinder wake at W e = 0.7 with β = 0.59 for an Oldroyd-B fluid.

120

Mesh M3 Yurun et al. Hulsen et al. Afonso et al.

100

80

Txx

60

40

20

0

-20 -1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

x/R

Figure 6: The comparison of Txx with the numerical results of Yurun et al. [60], Hulsen et al. [26] and Afonso et al.[2] on the cylinder surface and in the cylinder wake at W e = 0.7 with β = 0.59 for an Oldroyd-B fluid.

22


150

120

Txx

90

60

30

0

-30 -1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

x/R

Figure 7: The convergence of Txx with mesh refinement on the cylinder surface and in the cylinder wake at W e = 0.8 with β = 0.59 for an Oldroyd-B fluid.

108

10

6

104

RMS


100

10-2

10-4

10-6

0

1000

2000

3000

4000

5000

Iteration Number Figure 8: The RMS convergence for the extra stress tensor at W e = 0.9 with β = 0.59 for an Oldroyd-B fluid past a confined cylinder (∆t = 0.01).

23

Y X

Z

Figure 9: Partial view of the computational mesh for a sphere falling in a circular tube (R s /Rt = 0.5). The mesh is highly refined in the wake region. The total number of nodes is 1,214,542 and the total number hexahedral elements is 1,190,376.

Figure 10: The computed u−velocity component isosurfaces with streamtrace plot for an Oldroyd-B fluid around a falling sphere in a circular tube at W e = 0.6 with β = 0.5. The contour levels are 0.0, 0.4, 0.8, 1.2 and 1.6.

24

Figure 11: The computed Txx extra stress tensor component isosurfaces with contour plots on y = 0 plane (red lines) and on solid walls (black lines) for an Oldroyd-B fluid around a falling sphere in a circular tube at W e = 0.6 with β = 0.5. The contour levels are 0.1, 1, 2, 4 and 8.

40

32

Present Owens&Phillips

Txx

24

16

8

0

-8 -1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

x/R

Figure 12: The comparison of Txx with the results of Owens and Phillips [40] on the sphere surface for an Oldroyd-B fluid around a falling sphere in a circular tube at W e = 0.6 with β = 0.5.

25