Parallel multigrid methods for Geodynamics

2 downloads 0 Views 24MB Size Report
Oct 16, 2016 - dedicated to the memory of. Hans Petter Langtangen (1962 - 2016). 2. Simulation of Earth Mantle Convection - Uli Rüde. TERRA ...
Parallel multigrid methods for Geodynamics U. Rüde, (FAU Erlangen, CERFACS) joint work with B. Gmeiner, D. Thoennes, H. Stengel, H. Köstler (FAU) C. Waluga, M. Huber, L. John, D. Drzisga, B. Wohlmuth (TUM) M. Mohr, S. Bauer, J. Weismüller, P. Bunge (LMU)

Lehrstuhl für Simulation FAU Erlangen-Nürnberg

CERFACS Toulouse ELEVENTH WORKSHOP ON MATHEMATICAL MODELING OF ENVIRONMENTAL AND LIFE SCIENCES PROBLEMS

TERRA NEO

October 12-16, 2016

Organized by Simulation of Earth Mantle Convection - Uli Rüde TERRA ROMANIAN ACADEMY "OVIDIUS" UNIVERSITY of

1

dedicated to the memory of

Hans Petter Langtangen (1962 - 2016)

TERRA NEO

TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

2

Mantle Convection Why Mantle Convection? driving force for plate tectonics mountain building and earthquakes

Why Exascale? mantle has 1012 km3

inversion and UQ blow up cost

Why TerraNeo? ultra-scalable and fast

sustainable framework

Challenges

computer sciences: software design for future exascale systems

mathematics: HPC performance oriented metrics

geophysics: model complexity and uncertainty

bridging disciplines: integrated co-design TERRA NEO

TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

3

Multi-PetaFlops Supercomputers Sunway TaihuLight SW26010 processor 10,649,600 cores 260 cores (1.45 GHz)
 per node 32 GiB RAM per node 125 PFlops Peak Power consumption:
 15.37 MW TOP 500: #1

JUQUEEN Blue Gene/Q
 architecture 458,752 PowerPC
 A2 cores 16 cores (1.6 GHz)
 per node 16 GiB RAM per node 5D torus interconnect 5.8 PFlops Peak TOP 500: #13

Parallel MLMC



Ulrich Rüde

SuperMUC (phase 1) Intel Xeon architecture 147,456 cores 16 cores (2.7 GHz) per node 32 GiB RAM per node Pruned tree interconnect 3.2 PFlops Peak TOP 500: #27

How big PDE problems can we solve? 400 TByte main memory = 4*1014 Bytes = 
 5 vectors each with 1013 elements
 8 Byte = double precision even with a sparse matrix format, storing a matrix of dimension 1013 is not possible on Juqueen matrix-free implementation necessary Which algorithm? multigrid asymptotically optimal complexity: Cost = C*N C „moderate“ does it parallelize well? equal, not „ overhead? TERRA NEO

TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

5

Hierarchical Hybrid Grids (HHG) B. Bergen, F. Hülsemann, U. Rüde, G. Wellein: ISC Award 2006: „Is 1.7× 1010 unknowns the largest finite element system that can be solved today?“, SuperComputing, 2005. B. Gmeiner, H. Köstler, M. Stürmer, and U. Rüde, Parallel multigrid on hierarchical hybrid grids: A performance study on current high performance computing clusters, Concurrency and Computation: Practice and Experience, 26 (2014), pp. 217–240.

Parallelize „plain vanilla“ multigrid for tetrahedral finite elements partition domain parallelize all operations on all grids use clever data structures matrix free implementation

Bey‘s Tetrahedral Refinement

Do not worry (so much) about coarse grids idle processors? short messages? sequential dependency in grid hierarchy?

Elliptic problems always require global communication. This cannot be accomplished by local relaxation or Krylov space acceleration or TERRA NEO domain decomposition without coarse grid TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

6

HHG: A modern architecture for FE computations

GeometricalTERRA Hierarchy: Volume, Face, Edge, Vertex NEO TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

7

Edge

Vertex

Copying to update ghost points

Interior

14 12

13

9

10

11

5

6

7

8

0

1

2

3

5

6

0 9

7

4 8

1 2NEO 3 TERRA 10

TERRA

11

12

4

linearization & memory representation

MPI message to update ghost points Simulation of Earth Mantle Convection

-

Uli Rüde

8

Application to Earth Mantle Convection Models Ghattas O., et. al.: An Extreme-Scale Implicit Solver for Complex PDEs: Highly Heterogeneous Flow in Earth's Mantle. Gordon Bell Prize 2015. Weismüller, J., Gmeiner, B., Ghelichkhan, S., Huber, M., John, L., Wohlmuth, B., UR, Bunge, H. P. (2015). Fast asthenosphere motion in high-resolution global mantle flow models. Geophysical Research Letters, 42(18), 7429-7435. TERRA NEO

TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

9

HHG Solver for Stokes System

by Earth Mantle convection problem inesqMotivated model for mantle convection problems

Gmeiner, Waluga, Stengel, Wohlmuth, UR: Performance and Scalability ived from the equations balance of forces, conservation of Hierarchical Hybridfor Multigrid Solvers for Stokes Systems, of SISC, ss and2015. energy:

r · (2⌘✏(u)) + rp = ⇢(T )g, r · u = 0,

@T + u · rT @t

r · (rT ) = .

u p T ⌫ ✏(u) = 12 (ru + (ru)T ) ⇢ , , g

Solution of the Stokes equations

velocity dynamic pressure temperature viscosity of the material strain rate tensor density thermal conductivity, heat sources, gravity vector

Scale up to ~1013 nodes/ DOFs resolve the whole Earth Mantle globally TERRA NEO with 1km resolution TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

10

mensionless form, we obtain:

Coupled Flow-Transport u + rp =

div u = 0

RaT b r

@tT + u · rT = Pe

1

Notation: Problem u velocity T temperature p pressure b r radial unit vector

T

nsionless numbers:

yleigh number (Ra): determines the presence and strenght of convection

let number (Pe): ratio of convective to di↵usive transport

mantle convection problem is defined by the model equations in the interior of the cal shell as well as suitable initial/boundary conditions.

I II

• •

9

6.5×10 DoF

0 1 2 3 4

10000 time steps



run time 7 days



Mid-size cluster: 288 compute cores in 9 nodes of LSS at FAU TERRA NEO

Gmeiner, UR, Stengel, Waluga, Wohlmuth: Towards Textbook Efficiency for Parallel Multigrid, Journal of Numerical Mathematics: Theory, Methods and Applications, 2015 TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

11

Pushing the limits … ally appear in simulations for molecules, quantum mechanics, or geophysics. The initial

Gmeiner B., Huber M, John L, UR, Wohlmuth, B (2015), A quantitative performance analysis for Stokes solvers at thetetrahedrons extreme scale, arXiv:1511.02134, in JOCS. onsists of 240 for the caseaccepted of 5 nodes and 80 threads. The number

of degre

Multigrid withT0Uzawa Smoother oms on the coarse grid grows from 9.0 · 103 to 4.1 · 107 by the weak scaling. We con

Optimized Minimal Memory Consumption tokes system with thefor Laplace-operator formulation. The relative accuracies for coarse 13 Unknowns correspond to 80 TByte for the solution vector 10 r (PMINRES and CG algorithm) are set to 10 3 and 10 4 , respectively. All other param Juqueen has 450 TByte Memory he solver remain as previously described. matrix free implementation essential nodes

threads

5

80

40

640

320

5 120

2 560

40 960

20 480

DoFs

iter

2.7 · 109

2.1 · 1010 1.2 · 1011

1.7 · 1012

TERRA NEO 327 680 1.1 · 1013

time

time w.c.g.

time c.g. in %

10

685.88

678.77

1.04

10

703.69

686.24

2.48

10

741.86

709.88

4.31

9

720.24

671.63

6.75

9

776.09

681.91

12.14

Table 10: Weak scaling and without coarse for the spherical of Earth Mantlegrid Convection - Uli Rüdeshell 12 geometry. TERRAresults withSimulation

Uncertainty Quantification with

Multilevel Monte Carlo Methods Cliffe, K. A., Giles, M. B., Scheichl, R., & Teckentrup, A. L. (2011). Multilevel Monte Carlo methods and applications to elliptic PDEs with random coefficients. Computing and Visualization in Science, 14(1), 3-15.

Parallel MLMC



Ulrich Rüde

Uncertainties: Monte Carlo Sampling Simple (scalar) quantity of interest Sampling High dimensional stochastic parameter space Multiple solves forward problem Expensive (3D PDE) MC becomes quickly prohibitively expensive

Parallel MLMC



Ulrich Rüde

14

of freedom. In the Hierarchical Hybrid Grids (HHG) framework a coarsest grid T0 , and thus h` ' 2 ` h0 and M` ' 23` M0 in thre ng sequence ofDenoting FE Method meshes is obtained via uniform mesh refineme Monte Carlo by `u` = u` (x, !) 2 V`3`the FE approximation of u o id T0 , and thus h` ' 2 h0 and M` ' 2 M0 in three space dime g by u` = u` (x, !) 2 Vsolution of u on `, we M` (u` ; !) = 0, ` Level 0. of approximation ` the FE

{V` }` 0 nested sequence of FE spaces Here, theM (non)linear operator M the functional of interest ` and ; !) = 0, ` 0. ` (u ` M degrees of freedom ` 2.1. Standard Monte-Carlo Simulation. The standard M involve numerical approximations. 1. Standard Monte-Carlo Simulation. The standard Monte C estimator for the expected value E[Q] of Q(u) on level L 0 is give Computational goal: tor for theoperator expectedM value E[Q] of Q(u) on level L 0 is given bym on)linear and the functional of interest Q (u , !) ` ` ` 3 X N 1 MC,N i b erical Standard approximations. N = Q Monte Carlo estimate 1 QX L, L i N i=1 b MC,N = Q Q L, L Even if the solver has ionlyi linear N complexity,
 i i=1

where QL = QL (uL , ! ), i = 1, . . . , N , are N independent samples 3 the cost grows with O(M` N ) are two sources of error: (i) The bias error due to the F i There i QiL = QL (uAssuming , ! ), i = 1, . . samples of ↵ i . , N , are i iN independent i QL ( L that |QL Q(u , ! )| = O(ML ), for almost all ! and There two types of error involved: here are twoitare sources of error: The biasa constant error due the FE appr follows directly that (i) there exists Cb , to independent of M

i i (resolution of PDE↵to coarse) • bias ming that |QiL error Q(u , ! i )| = O(M ), for almost all ! and a const L ↵ error exists (too few samples) • sampling |E[QL C Q]|  Cb ML  "of b M , suc ows directly that there a constant , independent b L

for ML

("b /Cb )1/↵ (cf. [36]).

|E[Q

Parallel MLMC

Q]|Ulrich  Rüde C M





"

15

abe coarsest grid Ti 0 , and thus h 2 second h0 and M`in'Monte 2smaller M0Carlo inthan thre ` 'the (2.4) can of (2.3), and term is a 2.1. Standard Monte-Carlo Simulation. The standard (M i bounded iin terms where QL2of= FE QL (umeshes , ! ), i = 1, . . . , N , are N independent samples of Q (u ng sequence is obtained via uniform mesh refineme L L 2= 2.1. Standard Monte-Carlo Simulation. The standard Carlo Denoting by u u` (x, !) 2 V the FE approximation of u estimator for the expected value E[Q] ofnote Q(u) on level Lsufficiently 0 Monte is given by (MC) tolerance " if N V[Q ]" . We that for L large, V[Q ` `3` L L ]o s s ` Monte Carlo Method There are two sources of error: (i) The bias error due to the FE appr 22 L id TTo and thus h` total ' 2 MSE h and M`on'level M three dime estimator for the expected value E[Q] of Q(u) is given by space 0 , ensure 0 00 in that the " we choose ↵ i i isi less than i N Assuming that |QL Q(u , ! )| = O(M ), for almost all ! and a const X L 1 MC,N i; !) = 0, NM 2(u g by uit`follows = u` (x, !) 2 V the FE approximation of u on Level `, we b ` 0. solution of 2 2 2 ` X Q = Q , ` ` directly that there exists a constant C , independent of M , suc 1 L L "s = ✓" and "b = (1 N✓)"i , for bany fixed 0 < ✓ < 1.L (2 MC,N b QL = QL , (2.2) i=1 N i=1 ↵ |E[Q Q]|  C M functional "b fine FE of Approximation bias error Thus, to reduce (2.4) we need to choose a sufficiently mesh and L b L i i i(non)linear operator Here, the M and the interest ` where QiL = QL (u ), `i(u = `1, . . . ,= N ,0, are N independent samples of QL (uL ). M ; !) ` 0. L, ! i i ciently large This very quickly leadsoftoQan intractable p where QL =are QLtwo (unumber , sources !numerical ), i1/↵ =ofof 1,samples. .error: . .approximations. , N , are N independent samples L (uapproximati L ).1 1 L involve There (i) The bias error due to the FE for ML ("b /C (cf. [36]). convergence rate mesh size iserror reduced, typical b ) when i↵ = 3, 6 There are twoPDE sources of error: (i) The bias due to the FE approximation. for complex problems in 3D. The cost for one sample Q QL depend ↵ i i i i L aofconstant Assuming (ii) thatThere |Q Q(u , ! )| = O(M ), for almost all ! and ↵> error i ↵Ldue to the finitei number N of samples i L is a isampling i Assuming that |Q Q(u , ! )| = O(M ), for almost all ! and a constant ↵ > 0, complexity of the FE solver and of the random field generator. Typically it w L that there L functional on)linear operator M and the of mean interest (u !) it follows directly exists a constant CN ,)independent of MQ such thatm ` typically ` , (MSE O(M for 1 Cost for solving one sample: bvia L ,` The total error is quantified the square error 3 ` it follows that there exists a constant Cb , independent of ML , such like Cdirectly some 1 and some constant Cc , independent of ithat and of ML c ML , for erical approximations. MC,N ⇣ to achieve ⌘2 |E[Q 2 ↵ 2 b Q]|  C M (the "b "-cost) 2is (2 ↵ L b the total cost a MSE e( Q )  " L Total cost of Monte Carlo (MC) sampling: MC,N MC,N |E[Q Q]|  C M  " (2.3) 2 1 L LQ bE[Q]) b (E[Q b b L e Q := E[( ] = Q]) + N V[QL ], L L L ⇣ ⌘ 1/↵ for M (" /C ) (cf.[36]). [36]). b MC,N 1/↵ L b b for ML ("b /Cb ) (cf. Cost QL 3 = O(M N ) = O(" 2 /↵ ). (ii) There asampling error due finite number of samples in The (2.2). where V[Q ]sampling denoteserror the variance of the random QLin (u(2.2). fir L ). (ii) There isisaL due to to thethe finite number Nvariable ofNsamples The total error istypically typically quantified via the mean square error (MSE), given (2.4) can be bounded in terms of (2.3), and the second term in is smaller The total error is quantified via the mean square error (MSE), given by For the coefficient field and for the output functional studied below, wetha ha Total mean square error (MSE): 2 2 ⌘ " if N V[Q ]" note that for the L2 sufficiently large, V[Q 2 ⌘ L ↵ tolerance =⇣⇣1/6. In that case, even if = 1, to reduce error by a factor 2 t 2s s . We MC,N MC,N 2 1 2 b b MC,N MC,N 2 1 bQL btotal 2LQ]) Q]) e Q :=E[( E[( QL MSE E[Q]) ](E[Q = (E[Q +N V[Q L ], (2.4) (2 8 e := Q E[Q]) ] = + N V[Q ], L L To ensure that the is less than " we choose L L grows by a factor of 2 = 256, which quickly leads to an intractable problem a massively parallel environment. 2 where V[QLL] ]denotes denotes the QLL).(u The first term where V[Q of of the random QL (u The L ). first "2sthe = variance ✓"variance and "2bthe = random (1 variable ✓)"2variable , for any fixed 0 0). exploiting the linearity of the expectation operator, we avoid E[Q] d and do not compute all samples to the error). Instead, using the sim ` = 0, . . . , L do P i i i L(! ii. Compute u (! ) and u ) (if ` > 0), as well as Y .reduce on [14, 6, 2] seeks to reduce the variance of the estimator and thus to ` ` 1 on the finest level L and do not compute all samples to the desired accuracy ` g, N the simple identity E[Q ] = E[Q ] + E[Y ], we estimate the mean on the coarsest leve Multilevel Monte Carlo (MLMC) Method L 0 ` do `=1 P P ` N MC,N L ` 1 coarser ` i FE models as control variates. By b tional time, by recursively using b. Compute Y = Y .` >E[Q i error). Instead, using the simple identity =by E[Q sest level (Level 0) and correct this mean adding expected va 0 ] +of the ` ], we es ` i=1 ` `=1 E[Y his estimator is Level ` and `N` 1 (ifsuccessively (2.1) for ! on 0).L ]estimates b MLicoarsest i mean linearity of the expectation operator, estimating the the level 0)1 (u and this by ected values ofuon Q``(u !)(Level Q`well ,avoid !), for ` mean Y0 1. := successively QE[Q] MLMC 2. Compute Q using (2.7). Use `we 0 , thedirectly eg the u` (! ) and (! (if >` ,0), as as1correct Y`iSetting . and L ):= `Y`1(!) L all P estimates of ithe expected values of samples Y` (!) := to Q`the (u` , !) Q` accuracy for nest LNand do not compute desired X 1 (u` 1 , !),(bias eC,N MLMC is then defined as ` 1 estimator ` level =Setting Y PL as MLMLMC estimator is then defined b ` .Q0 ,Q i=1 N` Y := the Cost( ) = N C , (2.8) 0 The cost ofsimple this estimator is E[Q ` L`] = E[Q0 ] + the telescoping sum nstead,and using the E[Y ], we estimate L identity ` L `=1 X MC,N ng (2.7). `0)`=0 LL n on the coarsest level (Level and correct this mean successively by adding b ML b X X Q := Y , (2.7) L ` ML MC,N` ML b b b Cost( Q ) = N C , (2.8) Q := Y forexpected the MLMC s of the values of Y` (!) := 1. L L Q` (u` , !) ` ` ` Q`, 1 (u` 1 , !), for ` `=0estimator ost to iscompute one sample of Y` on level `. For simplicity, we use `=0 timator `=0 Y := Q , the MLMC4 estimator is then defined as 0

0

4ofstandard ples across levels, so that the L + 1 MC L where C`all is the cost to compute one sample Y` on level `. Forestimators simplicity, we in use X MLX L b b ML samples across all levels, so of that the L + 1to standard MC estimators inc dent.independent Then, the MSE of Q simply expands Cost(Q ) = N C , (2.8) ` ` where the numbers of samples N , ` = 0, . . . , L, are number samples L L ` MC,N` ML

bThen, Q :=the MSE Yb` of QbML , simply expands to (2.7) are independent. L `=0 of this estimator forL a given prescribed

(2.7) sampling erro L `=0 X ⇣ ⌘2 i L ⇣ ⌘ X that we require the FE solutions u (x, ! ) and u` 1 ( 2 2 `. For simplicity, compute one sample of Y`b ML on level we 1use ` 2 1 ML b 4 QL Q] = iE[Q +V[Y`N] `. V[Y` ] . (2.9) e all QMLMC = E[Q +ofLY ,Q] N (2.9) Lsample L ` aethe for ` 1, and thus two PDE sol cross levels, error so that L+1Y standard MC estimators in ` ` l=0 b MLsame l=0 Then, the MSE of Q simply to with the same PDE coefficient (see ! iexpands and thus L This to profit a hugely reduced variance of the estimatorvariance since bothsince FE approximations Weleads here from a drastically reduced Lthe gely⌘reduced of estimator since FE1.approximations Q` and Q` variance to Q and thus V[Y` ] ! 0, as both M` 1 ! X 2 1 converge 2 1 1/↵ Algorithm 1, `we Multilevel Monte Carlo. b ML = By Q E[Q Q] + N V[Y (2.9) choosing M (" /C ) ensure again that the bias error is less L b` 0, L verge to Q and thusL V[Y`b] ! as] .Mcan ! 1. ` 1 than "b , but we1/↵ still1. havel=0 some freedom to choose the0, numbers ofdo samples N` on each For all levels ` = . . . , L ML of the ("b /C , wetocan ensure again that the bias error is less b ) and thus levels, ensure that the sampling error is less than "2s . We will For iUlrich =FE 1, .approximations . , N` of do samples educed variance the estimator since till have someoffreedom toa. choose the numbers b.ML 17 N` on each Parallel MLMC —both Rüde use this freedom to minimize the cost Cost(Q

) in (2.8) subject to the constraint

L 1/↵ 2 less PLand y M (" /C ) , we can ensure again that the bias error is hechoosing levels, thus to ensure that the sampling error is less than " We w L b b spect to N , . . . , N (cf. [14, 6]). It leads to 1 2 s . problem 0variance L of the l=0 to a hugely leads reduced estimator since both FE approximat N V[Y ] = " , a simple discrete, constrained optimization ` s `=0 have ` b MLnumbers , but we still some freedom to choose the of subject samples N ons each b ` the this freedom to minimize the cost Cost( Q ) in (2.8) to constrai Multilevel Monte Carlo (MLMC) Method ! L spect to N , . . . , N (cf. [14, 6]). It leads to and Q converge to Q and thus V[Y ] ! 0, as M ! 1. 2 0 L 1 ` 1 L p levels,1` and thus ensure that1/↵ the sampling error is` X less than "s . We will imator since both approximations 2to FE V[Y N V[Y ] = " , a simple discrete, constrained optimization problem with ` ] isr ` 2 s s =0 `choosing By M (" /C ) , we can ensure again that the bias error ! ML Use L b b b N)` inL =(2.8) "s subject toV[Y . freedom to minimize the cost Cost( Q the ` ]Cconstraint ` ,is as M ! 1. L ` 1 X t "to1, but N0 , . . . ,still NL (cf. [14, 6]).freedom It leadstoto2choosep C` `on e V[Y `] nN have some the numbers of samples N 2 b V[Y ]we `=0 N`we = less "s V[Y ]C`choose . with 2re"s , a the simple discrete, constrained optimization problem sure again that bias error is `to ` =exploit and the fact that have freedom the ` s ! C` than "s . We he levels, and(cf. thus to6]).ensure that the sampling error is less L to N , . . . , N [14, It leads to `=0 0 numbers L X passumptions e the of samples samples N2` on each number of each level the cost Finally, under the that V[Y ] ML to minimize ` b this freedom to minimize the Q ) in (2.8) .subject to the constr N = " V[Y ]C (2.1 2s cost Cost( s L ` ` ` ! pling 1errorFinally, is less than " . We will under the assumptions that L p C` sX 2 V[Y ] N V[Y ] = " , a simple discrete, constrained optimization problem with `=0 ` ` C  C M and V[Y ]  C M , 2 s ML =0 ` ` c ` v ` ` N` =to"sthe constraint V[Y` ]C` . (2.10) ) in (2.8) subject t to N , . . . , Nthe(cf. [14, 6]).`=0 leads to` andC` V[Y` ]  Cv M` , CIt  Cc M ` that Finally,0 under L assumptions

ained optimization problem for some 0 < with  2↵reand 1 and s for two constants Cc and ! L p ML 2 bconstants for some 0 2↵ and = 2↵ and thus Cost( Q ) = O(" which is a 0When < and 1 and for two constants C and C , ), independent s c v 5 L L , (2.11) ` ]  C v M` ML 2 2 `=0 b of M , the "-cost to achieve e( Q )  " can be bounded by Moreover, ders of magnitude faster than the standard MC method. Moreover, orders of magnitude faster than the standard MC method. ` Lfaster ically Two ⇡ 2↵ for smooth functionals Q(·). For CDFs we typically haveMLMC = ML ↵. orders of magnitude than MC forfor this the sense that its cost isweasymptotically al problem, in the sense that its cost is asymptotically the s ally ⇡this 2↵ smooth functionals Q(·). For CDFs typically have of= the ↵.of same ! nstants Cproblem, and Cvin , independent of i 2 cfor 5 L — Ulrich Rüdesame tolerance18". XMLMC psample he cost of computing Parallel a single to the

` i.e. totic Q]| ⇡`Cb M in (2.3). [11]) ime, |E[Qi.e. Q]|`⇡ C in` (2.3). ThenThen (cf. (cf. [11]) ` |E[Q bM . regime, (2.13) full two orders of magnitude faster than the standard MC method. Moreover, MLMC

is optimal for this problem, in the senseCarlo that its cost is asymptotically of the same Adaptive Multilevel Monte Method 1 MC,N 1 ` MC,Nb

order as cost of computing ". |E[Q Y`` .a single. sample to the same tolerance (2.13) b ` the Q]| 2 |E[Q Q]| Y (2.13) ↵ MC,N ` ` ` V[Y 8for 1 `] ↵ Yb` Sample to estimate estimator 8 1 2.3. Adaptive Multilevel Monte Carlo. In Algorithm 2 we present a simple PN` from 2 the computed samples to estimate MC,N mate C , we can estimate sequential, adaptive that `uses 2 P 1algorithm i [14,b6] ` sample estimator s1` :=NN Yb`MC,N Y`` 2 to estimate V[Y` ] `` 2 i i=1 error Y and thus the optimal values for V[Y L and stimator bias s` and := sampling Ychooses to estimate ` ]N` . Alternative ` i=1 s ` N ` mes from the runsalgorithms up-to-date to estimate C` ,7,we can estimate adaptive are described in [15, 11]. For the remainder of the paper we 2 the runs up-to-date to estimate C , we can estimate and estimates for the cost (e.g. by measuring run times) ` s` s will restrict to uniform mesh refinement, i.e. h` = 2 ` h0 and M` = O(8` M0 ) in 3D. ! L q (2.14) . we can construct 2 s X the adaptive MLMC algorithm: ! s C` 2 ` 2 L q N` ⇡X "s 2 Adaptive sMultilevel C`s2 Monte . Carlo. (2.14) Algorithm ` ` C` 2C N` ⇡ 1. "s 2 Set ", ✓, sL`=0 . (2.14) ` =`1 and N0 = N1 = NInit . C` r. As an2. example, we consider For`=0 all levels ` = 0, . . . , L do problem and deterministic solver. Asuntil an example, we consider 1 a. Compute new samples of Y there are N . such that ` ` 1 0n(D) MC,N weakdeterministic form:b.Find u(·, !) Hs02 (D) such that we consider and solver. an example, Compute Yb` 2 V := andAs ` , and estimate C` . 3. u(·, Update theV estimates for such N` using orm: Find !) Z2 := H01 (D) that(2.14) and MC,N ! 2 ⌦. for all v!) 2 if VdxYband (3.1) , !)ru(x, = f>(x)v(x) dx, for all Lv! 2 LV+and 2 ⌦.NL(3.1) (8↵ 1)" 1 and! set = NInit . b , increase L Z 4. If all ND ` and L are unchanged, , !) dx = f Go (x)v(x) for all v 2 V and ! 2 ⌦. (3.1) to 5. dx, motivated subsurface flow. The solution u and the coefficient e solutionfrom uElse the coefficient Dand Return to 2. elds ⇥ ⌦ related to pressure and rock permeability. For PLfluid ure on andD rock permeability. For MC,N ML b b 5. Set QL flow. = 3 `=0The Y` . d from subsurface solution u and the coefficient nly consider D = (0, 1) , homogeneous Dirichlet conditions and a ous Dirichlet conditions and a Drce ⇥ term ⌦ related to!) fluid pressure (as and rock permeability. f . If k(·, is continuous a function of x) and kminFor (!) := s a function To of estimate x) kmin :=— the bias(!) error, let us assume that M` is sufficiently large, so that we 3 and 19 LaxParallel MLMC Ulrich Rüde der D = (0, 1) , homogeneous conditions and a ↵ > 0 almost surely (a.s.) in ! 2 ⌦,Dirichlet then it follows from the `

L

`

synchronization point

Processors

Level 2

Time Level 0: Level 1:

Processors

synchronization point

Level 2

Time

Fig. 4.2: Upper row: illustration of homogeneous bulk synchronous st layer (left) and two-layer parallelism (right); Lower row: illustration of Uncertaintybulk Quantification synchronous strategies. two-layer (left) and three-layer parallelism

one-layer homogeneous strategy, as shown in Fig. 4.2 (left), SchedulingTheParallel P P ibility. The theoretical run-time is simply given by t(i, `, Multilevel Carlo ✓ Monte is such that P = 2Methods P . It guarantees perfect load b max `

max

3`+✓`max

min 0

L `=0

N` i=1

will not lead to a good overall efficiency since on the coarser levels ✓`m significantly larger than S. On the coarsest level we may even have M less grid points than processors. Thus we will not further consider this

B. Gmeiner, D. Drzisga, UR, R. Scheichl, B. Wohlmuth: Scheduling massively parallel multigrid for 5. Examples for scheduling multilevel Monte Carlo methods, arXiv preprint 1607.03252, 2016 strategies. Our focus is on scheduli

that are flexible with respect to the scalability window of the PDE solv up to a huge number of processors Pmax . To solve the optimization Parallel MLMC Ulrich Rüde will either impose—additional assumptions that allow an exact solution,

sFind u(·, !)` 2 V := N`CH⇡0 (D) "s such that s ` C` (2.1 ak form: 1. 1 ` elliptic PDE in weak form: Find u(·, !) 2 V := H (D) such that an elliptic PDE in weak form: Find u(·, !) 2 V := H (D) suc C 2 1  k  4The denote four vertices of the element ⌧ . variance 0` 0the corre `=0 the two parameters in this model are the and `=0 Z Zapproximated Z Zfield Model problem subsurface flow 0,t 3` ) to quantity of interest Q(u) is from simply by Q(u Q(u ` ). inFor Individual samples k(·, !) of this random are C (R for a and deterministic solver. As an example, we consider 3.dx Model problem and deterministic solver. As an (3.1) example, we),consid u(x, !) = f (x)v(x) dx, for all v 2 V and ! 2 ⌦. torv(x) Q(u), ·particular, ask(x, ` !!)ru(x, 1, we means need stronger assumptions the random field k.all !) dx = f (x)v(x) dx, for all v 2 V and ! 2 rv(x) · k(x, !)ru(x, !) dx = f (x)v(x) dx, for v 1 on this that the convergence rates in (2.3) and (2.11) a 1 D an elliptic PDE in weak form: Find u(·, !) 2 V := H (D) such that orm: Find u(·, !) 2 V := H (D) such that 0,t 0 0 D H¨ D2 (0, 1), and suppose D 2 C (D), i.e. D !) o lder-continuous with coefficient t and = 2/3 in this case, for any > 0. The field Z(·, !) belongs to Z Z Z vated from subsurface flow. isThe solution u and coefficient 0,t have bounded and kk(·, !)k second moments. Itthe was shown in [36] that C of Mat´ e rn covariances [25, 26], which also includes smoother, statio This problem motivated from subsurface flow. The solutio is problem is motivated from subsurface flow. The solution u and the rv(x) · k(x, !)ru(x, !) dx = f (x)v(x) dx, for all v 2 V and ! 2 ⌦. (3. ,n!)D dx = f (x)v(x) dx, for all v 2 V and ! 2 ⌦. (3.1) ⇣ ⌘ ⇥D ⌦fields, related towefluid pressure and permeability. For in this pa DD rock but will only consider the exponential covariance k are random fields on ⇥ ⌦ related to fluid pressure and tq/3 are random fields on D ⇥ ⌦ related to fluid pressure and rock permeab tq D 3` ))q ] = O h E [(Q(u) Q(u = O M , q = 1, 2, (3.2) onsider D = Two (0,is1)of , the homogeneous Dirichlet a random `flow.conditions ` subsurface 3 and 3 most common approaches to realise the fie This problem motivated from The solution u and theconditio coefficie we only D = (0, 1) , homogeneous Diri mplicity, motivated we simplicity, only from consider D = consider (0, 1) , homogeneous Dirichlet subsurface flow drm from flow. The solution utoand coefficient fk .are Ifsubsurface k(·, !) isfields continuous (as a function ofthe x)and and kand := min (!) random on D ⇥ ⌦ related fluid pressure rock permeability. Karhunen-Loeve (KL) expansion [13] circulant embedding [8,Fk2 t deterministic source term f . If k(·, !) is continuous (as a functi terministic source term f . If k(·, !) is continuous (as a function of x) and he bound in (2.3) holds with ↵ = , and since solution ufluid andinpressure coefficient k3then are 3D stochastic fields 3 permeability. ⇥⌦ related to and rock Foressential almost surely (a.s.) ! 2 ⌦, it follows from the Lax-conditions simplicity, we only consider D = (0, 1) , homogeneous Dirichlet and KL expansion is very convenient for analysis and for polyno min k(x, !) > 0 almost surely (a.s.) in ! 2 ⌦, then it n k(x, !) > 0 almost surely (a.s.) in ! 2 ⌦, then it follows from 3 of X quantity interest x2D ⇥ ⇤ ⇥ ⇤ x2D der D = (0, 1) , homogeneous Dirichlet conditions and a deterministic source term f . If k(·, !) is continuous (as a function of x) and k (!) : this problem hassuch a unique solution Ascan quantities of 2(cf. [4]). it 2 min methods as stochastic collocation, quickly dominate all the ulgram Q(u that E (Q(u Q(u ))  2 E (Q(u) Milgram Lemma that this has a := unique solution (cf `) ` 1 )] ` )⇤problem ` 1has r )) As qua Lemma this a⇤problem unique solution (cf.Q(u [4]). min k(x, !) > 0 almost surely (a.s.) in ! 2 ⌦, then it follows from the La fwe . Ifconsider k(·, !) is continuous (as a function of x) and k (!) or Q(u) := u(x ), for some x 2in D, or alternatively min x2D costinterest for short correlation lengths three dimensions. Circulant em ⇤ ⇤ ⇤ r=`,`), 1 for := in Section 7, we consider Q(u) u(x ),As for some x erest in Section 7, we consider Q(u) := u(x some x 2 D, or alt Milgram Lemma that this problem has a unique solution (cf. [4]). quantities R st surely (a.s.) in ! 2 ⌦, then it follows from the LaxR s, for some two-dimensional ⇢ D ⇤Transform, can be of interest. hand, relies onmanifold the which may pose limi 1 @uFast Fourier 1 other @u ⇤ Q(u) := k ds, for some two-dimensional manifold 2t interest in Section 7, we consider Q(u) := u(x ), for some x 2 D, ka@n ds,| for some two-dimensional ⇢ or Dalternative can be o for some 2D manifold manifold | @n su) unique solution (cf. [4]). As quantities of R d problem in:=(2.11) holds with = 2↵ = . | | inhas a massively parallel environment. An alternative way to sampl @u 6⇤ds, for some3two-dimensional Q(u) := | 1 | k @n manifold ⇢ D can be of interes ⇤ 6 consider Discretization Q(u) := u(x ), for some x 2 D, or alternatively 6 exploit the fact that in three FE, dimensions, mean-free Gaussian fields w with tetrahedral multigrid solver funcPDE-based sampling for lognormal random fields. A coefficient 6using HHG or some two-dimensional manifold ⇢ Dstochastic can be ofpartial interest. covariance are solutions the di↵erential equation particular interest is the lognormal random field := exp(Z(·, !)), where additionally, generation ofto samples of k(·, !) 6 a mean-free, stationary Gaussian with (2 field )Z(x, !)exponential =d W (x, !),covariance by solving stochastic PDErandom ✓ ◆ d whitewhite noise noise with unit variance nd side W : Gaussian is Gaussian with unit variance and = |x y| 7 E[Z(x, !)Z(y, !)] = 2 exp . (3.3) distribution. As shown by Whittle [41], a solution of this SPDE 2 Parallel MLMC Ulrich 1 Rüde th exponential covariance = —(8⇡) and = 2/. 21

Scheduling Multilevel Monte Carlo We must exploit three levels of parallelism levels may be executed in parallel the samples within a level may be executed in parallel processors. Otherwise it is called heterogeneous bulk synchronous. The upper row of the4.2 solver may executed in parallel Fig. illustrates two be examples of homogeneous bulk synchronous strategies, whereas the lower presents two heterogeneous strategies. must use heuristics Finding a row good schedule is NP complete:

Level 2

Processors

homogeneous bulk synchronous

Processors

synchronization point

Time

Time

synchronization point

Time Level 0: Level 1:

Parallel MLMC

Processors

synchronization point

Level 2

Processors

heterogeneous bulk synchronous

Level 2

Level 2

Time



Ulrich Rüde

22

Scheduling Multilevel Monte Carlo Solver times themselves are stochastic

Processors

Time

Processors

Processors

synchronization point

Time

Time

sample synchronous dynamic sample Fig.level 5.1: synchronous Illustration of di↵erent homogeneous scheduling strategies. Left: samhomogeneous homogeneous homogeneous ple synchronous homogeneous (SaSyHom); Centre: levelsynchronous synchronous homogeneous (LeSyHom); Right: dynamic level synchronous homogeneous (DyLeSyHom, Sec. 5.3).

It is essential to make flexible use of solver parallelism, but 5.1. Sample level possible synchronous homogeneous. Here, to not allsynchronous processor and counts (e.g. power of 2) schedule the samples we assume that the run-time of the solver depends on the level memory limitations ` and on the number of associated processors, but not on the particular sample Y`i , scaling limitations (loss of efficiency) 1  i  N`strong . As the di↵erent levels are treated sequentially and each concurrent sample is executedtechnical with the same number of processors, we can test all possible configurations. limitations (MPI communicators) Let 0  `  L be fixed. Then, for a fixed 0  ✓  S, the total time on level ` is . We select the largest index ✓` 2 {0, 1, . . . , S} such that

k`seq (✓)t`,✓

Parallel MLMC seq



Ulrich Rüde

23

1

MC,N` b 1 b MC,N` (2.13) |E[Q Q]|  Y . ` . Performance results 8↵ 1 ` |E[Q` Q]|  8↵ 1 Y` Table MLMC 7.2: Weak scaling of an adaptive MLMC estimator. Adaptive PN` PN` MC,N` 2 MC,N 2 ` 21 i 1 2 i b b Also, using the sample estimator s := Y Y to mple estimator Y` level Y` ` estimate estimatings`the:=variance for each ` `V[Y` ] N` to i=1 i=1 N` No. Samples Correlation Idle and the CPU times from the runs up-to-date to estimate C , we can e ` and the cost of computing samples from the runs up-to-date to estimate C , we can estimate ` Table 7.2: Weak scaling of an Fine adaptiveTotal MLMC estimator. Processes Resolution Runtime length time s ! 3 s of required dynamical estimate the· 10 number L samples q 1.50E-022 ! 4 096 1 0243 of5.0 s 68 13 316 3% X s L 2 ` 2C 2⇡ 3 q 3 N No. Samples Correlation Idle X " s . s 32 768 2 048 3.9 · 10 s 44 10 892 7.50E-03 4% ` ` s ` 2 ` 2 C N`144⇡ "Resolution sRuntime . 10Total Processes Fine length` (2.14) time ` 3 s `=0 s 4 0963 `C 262 5.2 · 10 60 940 3.75E-03 5% C 3 3 `68 13 316 4 096 1 024 5.0 · 10 s 1.50E-02 3% `=0

3. Model 3 problem 3 and deterministic solver. As an examp 32 768 2 048 3.9 · 10 s 44for 10 892 levels 7.50E-03 4% able 7.3: Number of samples and over-samples di↵erent for 1 the largest run elliptic PDE Find u(·, !) 2 V := H0 (D) such 3 in weak blem 262 and deterministic As example, we consider 144an 4 096 5.2solver. · 103 form: s 60 an 10 940 3.75E-03 5%that Z Z weak form: Find u(·, !) 2 V :=No. H01Samples (D) such that No. Over-samples rv(x) · k(x, !)ru(x, !) Calculated dxfor = di↵erent f (x)v(x) dx,for for all v 2 Vrun an Table Level 7.3: Number ofZsamples and over-samples levels the largest No.D partitions Scheduled Estimated Actual D 0 dx = 2 f048 7 506 for all 8v192 3!726 686 )ru(x, !) (x)v(x) dx, 2 V and 2 ⌦. (3.1) No. Samples No. is motivated from subsurface flow.Over-samples The solution u and 1 This problem 256 2 111 2 304 429 193 D Level kNo. partitions Scheduled Estimated Actual are random D ⇥ ⌦Calculated related 2 32 fields on 382 384to3 fluid pressure 15 and rock 2 pe 0 simplicity, 2 048 7The 506 solution 8 192 3 726coefficient 686 we4 only 1) homogeneous Dirichlet co otivated 3from subsurface flow.consider u ,and the 57 D = (0, 60 3 3 1 deterministic 256source term 2 111 2 304 193 of x) f . If k(·, !) is continuous429 (as a function s on D ⇥2⌦ related to32fluid pressure and rock permeability. For 382 384 15 10 minx2D k(x,3 !) > 0analmost surely ! 2 ⌦, then it 2follows Each sample involves solving FE problem with 6.8 * 10in unknowns. 3 (a.s.) 1 1 ⇥ 10 consider3 DMilgram = (0, 1) ,4 homogeneous Dirichlet conditions and 57 problem 60 a unique 3 3a Lemma that this has solution V(cf. ar(Q) [4]). A

1 24 MLMC —(asUlrich 1 ⇥Rüde 10 1 ⇥ 10 f ⇤ e term . If k(·, !) isParallel continuous a function of x) and k 4

(!) ⇤:=

Conclusion Towards exascale simulations for earth mantle convection Multigrid scales Scientific software lives on a long time scale HHG development started in >15 years ago HHG: lean and mean implementation excellent time to solution Reaching 1013 DoF Multilevel Monte Carlo with multigrid

TERRA NEO

TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

25

Nobel Prize 2016:

Congratulations! In the dime stores and bus stations People talk of situations Read books, repeat quotations Draw conclusions on the wall Some speak of the future My love she speaks softly She knows there’s no success like failure And that failure’s no success at all 1965, Love Minus Zero, No Limit TERRA NEO

TERRA

Simulation of Earth Mantle Convection

-

Uli Rüde

26