A RANDOMIZED TENSOR QUADRATURE METHOD

0 downloads 0 Views 681KB Size Report
Orthogonal polynomial, tensor Gauss quadrature, randomized Kaczmarz algo- ... This paper presents an algorithm for polynomial approximation in high dimen-.
A RANDOMIZED TENSOR QUADRATURE METHOD FOR HIGH DIMENSIONAL POLYNOMIAL APPROXIMATION KAILIANG WU, YEONJONG SHIN

AND DONGBIN XIU∗

Abstract. We present a numerical method for polynomial approximation of multivariate functions. The method utilizes Gauss quadrature in tensor product form, which is known to be inefficient in high dimensions. Here we demonstrate that by using a new randomized algorithm and taking advantage of the tensor structure of the grids, a highly efficient algorithm can be constructed. The new method does not require prior knowledge/storage of the entire data set at all the tensor grid points, whose the total number of points is excessively large in high dimensions. Instead the method utilizes one data point at the time and iteratively conducts the approximation. This feature allows the use of the method irrespective of the size of the data set. We establish the rate of convergence of this iterative algorithm and show that its operational counts can be lower than the standard methods, when applicable, such as least squares. Numerical examples in up to hundreds of dimensions are presented to verify the theoretical analysis and demonstrate the effectiveness of the method. Key words. Orthogonal polynomial, tensor Gauss quadrature, randomized Kaczmarz algorithm, approximation theory

1. Introduction. We are concerned with the problem of approximating an unknown function f (x) using (orthogonal) polynomials via samples of the function f (xi ), i = 1, . . . Here x, xi ∈ D ⊆ Rd , d ≥ 1. Our focus shall be in multiple dimensions (d > 1), particularly in high dimensions (d  1). Polynomial approximation in one dimension (d = 1) has been studied extensively. The performance of the approximation, using interpolation or regression, depends critically on the choice of the sample points {xi }. It is widely accepted that Gauss quadrature is among the most effective choices. In multiple dimensions, however, their effectiveness diminishes, as the only way to keep all the desirable features of the quadrature is to use tensor product of the points. This induces exponential growth of the total number of points. Suppose one uses m quadrature points in each dimension, then the total number of of the full tensor points is M = md . This becomes too large for d  1 and makes the approximation problem intractable. It is generally acknowledged that tensor grids shall be used at low dimensions, e.g. d < 5. One way to alleviate the difficulty is to use Smolyak rule ([11]) to construct sparse grids (cf., [1, 3]), which are subsets of the full tensor grids. Although effective to a certain degree, the growth of the number of points is still fast due to the underlying tensor structure. This limits its applicability to very high dimensions. This paper presents an algorithm for polynomial approximation in high dimensions using the full tensor quadrature grids. A key feature of the algorithm, which is different from most of the existing approximation methods, is that it iteratively conducts function approximation using one randomly chosen data point at a time. By doing so, there is no need to store or operate on the entire data set, which can be exceedingly large. This allows us to take advantage of the desirable mathematical properties of the full tensor quadrature points in an efficient manner. Since the method is iterative and uses one data point at each step, it avoids the formation of a model matrix of size M × N , where M = md is the number of points and N is the cardinality of the polynomial space that can also be large for d  1. The implementation of the algorithm does not involve any matrices and requires only vector operations. Also, the algorithm can reach a converged result before all the M = md ∗ Department

of Mathematics, The Ohio State University, Columbus, OH, USA. [email protected]. 1

tensor points are used, which avoids the requirement of sampling at the exceedingly large number (M ) of points. This paper is largely motivated by two recent pieces of work. One is a randomized function approximation method developed in [10], which proposed to conduct function approximation sequentially using one randomly chosen data point at a time. [10] was further motivated by the randomized Kaczmarz method developed in [12], which is a randomized version of the standard Kaczmarc method ([7]) and has been studied rather extensively in recent years (cf. [9, 16, 4, 5, 8, 2, 14]). The (randomized) Kaczmarz methods solve overdetermined linear system of equations Ax = b iteratively using one (random) row at a time. The work of [10] applied the similar principle to function approximation and proved that converged function approximation can be achievd by randomly sampling the target function sequentially one point at a time. More specifically, the sampling probability measure to ensure (fast) convergence was developed in [10]. The other motivation of the current paper is the development of [15], which studied least squares polynomial approximation in multiple dimensions using tensor Gauss quadrature points. It was shown that convergeed (in expectation) polynomial least squares solution can be obtained by randomly selecting a subset of the full tensor Gauss points. The subset should be sufficiently large but is much smaller than the full tensor grid set. This establishes the result that one does not need to use the entire set of M = md points for accurate polynomial least squares approximation. However, to solve the least squares problem in [15], one still needs to form and invert the model matrix of size M × N , which becomes exceedingly large at very high dimensions. This paper combines and extends the work of [10] and [15]. More specifically, we employ the same randomized sequential approximation method of [10] and apply it to the full tensor Gauss quadrature point set. In another word, instead of sampling in the continuous domain D as in [10], the current method samples in the discrete lattice defined by the tensor Gauss quadrature set. The prohibitively large size of the set, which is the major difficulty for most of the existing methods, does not pose a challenge, as the method utilizes one randomly chosen point at each iterate and does not require the definition and storage of the function data on the entire set. We derive an optimal discrete sampling probability to sample the tensor quadrature set and establish the convergence of the algorithm. Error analysis is expressed as an equality, rather than inequalities which are common in standard error estimates. We also prove that the method shall converge if one uses an arbitrary discrete sampling measure (rather the optimal one), in which case an upper bound and a lower bound of the error are derived. Owing to the tensor structure of the points, the method allows highly efficient implementation, as one can store all the one dimensional information prior to the computation. Both the theoretical analysis and the numerical examples indicate that highly accurate approximation results can be reached at K ∼ γN iterations, where N is the dimensionality of the polynomial space and γ = 5 ∼ 10. This implies that only a very small porition of the full tensor Gauss points are used, as γN  md when d  1. The operation count of the method is O(KN ), which becomes O(N 2 ) and can be notably smaller than the operation count of the standard least squares method O(N 3 ). Also, since the new method always operates on row vectors of length N , it requires only O(N ) storage of real numbers and avoids matrix operations which would require O(N 2 ) storage. This feature makes the method highly efficient in very high dimensions d  1. The rest of the paper is organized as follows. After presenting the basic setup 2

of the problem in Section 2, we present the new method in Section 3, in term of the algorithm, its convergence theory and various implementation details. Numerous examples in dimensions 2, 10, 40, 100 and 500, for both bounded and unbounded domains, are presented in Section 4 to verify the theoretical analysis, before the concluding remarks in Section 5. 2. Problem Setup. Consider the problem of approximating an unknown function f : D → R using its samples, where D ⊆ Rd , d ≥ 1, is equipped with a measure ω. Let x = (x1 , . . . , xd ) be the coordinate and f ∈ L2ω (D), the standard Hilbert space with the inner product Z (g, h)L2 := g(x)h(x)dω(x), ω

D

and the corresponding induced norm k·kL2ω . The measure ω is assumed to be absolute continuous and of product form dω = $(x)dx = $1 (x1 ) · · · $d (xd )dx.

(2.1)

Let xi ∈ D, i = 1, . . . , be a sequence of samples and f (xi ) be the function value at the samples. Based on this information, we then seek an approximation fe ≈ f in a finite dimensional linear space. This paper focuses on polynomial approximation. Let Πdn ⊂ L2ω (D) be the linear polynomial subspace of polynomials of degree up to n ≥ 1. That is, Πdn := span{xk = xk11 · · · xkdd , |k| ≤ n}, where k = (k1 , . . . , kd ) is multi-index with |k| = k1 + · · · + kd , and   n+d (n + d)! . N = dim Πdn = = n!d! d

(2.2)

(2.3)

Let {ψj (x), j = 1, . . . , N } be a basis of Πdn . The approximation can then be expressed as n X

fe(x) =

ck ψk (x) =

|k|=0

N X

ck ψk (x),

(2.4)

k=1

where a linear ordering between the multi-index k and the single index k is used. This means that for any 1 ≤ k ≤ N , there exists a unique multi-index (k1 , . . . , kd ) = k such that k ←→ k = (k1 , . . . , kd ).

(2.5)

The multivariate basis polynomials ψk (x) are tensor products of their corresponding one dimensional polynomials φki (xi ), i.e., ψk (x) =

d Y

φki (xi ),

1 ≤ k ≤ N.

(2.6)

i=1

Throughout this paper we shall frequently interchange the use of the single index k and the multi-index k, with the one-to-one correspondence (2.5) assumed. 3

Upon using vector notation Ψ(x) = [ψ1 (x), . . . , ψN (x)]T ,

c = [c1 , . . . , cN ]T ,

(2.7)

the approximation can be written as fe(x) = hc, Ψ(x)i,

(2.8)

where h·, ·i is the standard vector inner product. The obvious goal is to compute the expansion coefficients cj . The normalized orthogonal polynomials are utilized as the basis (note non-orthogonal basis can be orthogonalized by the Gram-Schmidt procedure), 1 ≤ i, j ≤ N.

(ψi , ψj )L2 = δij , ω

(2.9)

The best L2ω approximation of f in Πdn is its orthogonal projection onto Πdn , i.e., PΠ f =

N X

cˆk ψk (x) = hˆ c, Ψ(x)i,

(2.10)

k=1

where ˆ c = [ˆ c1 , . . . , cˆN ]T ,

cˆk = (f, ψk )L2ω

1 ≤ k ≤ N.

(2.11)

Throughout this paper, we shall use PΠ f as the reference and compare our numerical approximations aginst it. 3. Randomized Tensor Quadrature Approximation. This section describes the new randomized sequential apporoximation method using tensor Gauss quadrature. We present both a main algorithm and a general algorithm. The main algorithm is a special case of the general algorithm. It is presented separately because, as compared to the general algorithm, it offers an optimal rate of convergence and a unique error analysis and is more suited for practical computations. 3.1. Tensor quadrature. We first define the tensor Gauss quadrature, on which (`) (`) our method shall be applied. For each dimension i, i = 1, . . . , d, let (zi , wi ), ` = 1, . . . , m, be a set of one-dimensional quadrature points and weights such that Z m X (`) (`) wi f (zi ) ≈ f (xi )$i (xi )dxi `=1

for any integrable function f . We then proceed to tensorize the univariate quadrature rules. Let n o (1) (m) Θi,m = zi , . . . , zi , i = 1, . . . , d, (3.1) be the one dimensional quadrature point set. The tensor product is taken to construct a d-dimensional point set ΘM = Θ1,m ⊗ · · · ⊗ Θd,m .

(3.2)

Obviously, M = #ΘM = md . As before, an ordering scheme can be employed to order the points via a single index, i.e., for each j = 1, . . . , M ,   (j ) (j ) z(j) = z1 1 , . . . , zd d , j ←→ j ∈ Nd . 4

That is, each single index j corresponds to a unique combination of j := (j1 , . . . , jd ) with |j|∞ ≤ m. Each point has the scalar weight (j )

(j )

w(j) = w1 1 × · · · × wd d ,

j = 1, . . . , M,

with the same j ↔ j correspondence. Here the one-dimensional quadrature rule is required to have polynomial exactness of 2n. That is, for each i = 1, . . . , d, Z m X (`) (`) wi f (zi ) = f (xi )$i (xi )dxi , ∀f ∈ Π12n . (3.3) `=1

Choices exist to satisfy this requirement. In this paper, we empoly Gauss-Legendre points of φn+1 (x), which are the zeros of the (n + 1)-degree univariate orthogonal polynomial φn+1 (xi ). It is well known that φn+1 (xi ) has n + 1 real and distinct zeros, i.e., (`)

φn+1 (zi ) = 0,

` = 1, . . . , n + 1,

(3.4)

and with their corresponding weights (`)

wi

=

1 (`) λn (zi ), n+1

` = 1, . . . , n + 1,

where λn is the (normalized) Christoffel function associated with dimension: λn (xi ) = Pn

n+1

j=0

2.

(3.5)

(φj (xi ))

Details of Gauss quadratures are well documented in the literature, see, for example, [13]. Using these points and weights, the polynomial exactness of this m = n+1 Gauss quadrature is 2n + 1, which is sufficient for the requirement of 2n exactness in (3.3). With this choice, the total number of the tensor quadrature points is M = (n + 1)d and satisfies 2n polynomial exactness M X

w

(j)

f (z

(j)

Z )=

f (x)$i (x)dx,

∀f ∈ Γd2n ,

(3.6)

D

j=1

where Γd2n := span{xk = xk11 · · · xkdd , |k|∞ ≤ 2n} is the tensor product of the one-dimensional polynomial space Π12n . Since Πd2n ⊆ Γd2n for any d ≥ 1 and n ≥ 0, the 2n polynomial exactness (3.6) obviously holds for all f ∈ Πd2n . Other one-dimensional choices such as Gauss-Lobatto rule, Gauss-Radau rule can certainly be used. The number of points m in each dimension may be different. 3.2. Main algorithm. With the tensor quadrature points defined, we proceed to apply the randomized sequential approximation method developed in [10]. For any tensor quadrature point z(j) ∈ ΘM , let us define the sampling discrete probability mass p∗j = w(j)

kΨ(z(j) )k22 , N 5

j = 1, . . . , M,

(3.7)

PM which satisfies j=1 p∗j = 1 by the 2n polynomial exactness of the tensor quadrature. Using the vector notation (2.7) and setting an initial choice of c[0] = 0, one then computes, for k = 0, 1, . . . , c

[k+1]

=c

[k]

+

f (z(j

[k]

)

) − hc[k] , Ψ(z(j kΨ(z(j [k] ) )k22

[k]

)

)i

Ψ(z(j

[k]

)

),

z(j

[k]

)

∼ dµp∗ ,

(3.8)

where dµp∗ denotes the probability measure corresponding to (3.7), i.e. dµp∗ :=

M X

 p∗j δ x − z(j) dx,

(3.9)

j=1

and δ(x) is the Dirac delta function. The implementation of the algorithm is remarkably simple. One randomly draws a point from the tensor quadrature set ΘM using the discrete probability (3.7), and then applies the iteration update (3.8), which requires only vector operations. The iteration continues until convergence is reached. We now present the theoretical convergence rate of the main algorithm (3.8). For convenience, we introduce the following weighted discrete inner product [f, g]w :=

M X

w(j) f (z(j) )g(z(j) ),

j=1

for functions f and g. The corresponding induced discrete norm is denoted by k · kw . Theorem 3.1. Assume c[0] = 0. The k-th iterative solution of the algorithm (3.8) satisfies ˆk22 = kf − PΠ f k2w + E + rk (kPΠ f k2w − kf − PΠ f k2w − E), Ekc[k] − c

(3.10)

where  E = 2 [f − PΠ f, PΠ f ]w − hˆ c, ei ,

r = 1 − 1/N,

˜−c ˆ with c˜j := [f, ψj ]w . And, and e = c ˆk22 = kf − PΠ f k2w + E = kf k2w − k˜ lim Ekc[k] − c ck22 + kek22 .

k→∞

(3.11)

Furthermore, the resulting approximation fe[k] = hc[k] , Ψ(x)i satisfies Ekfe[k] −f k2L2ω = kf −PΠ f k2L2ω +kf −PΠ f k2w +E +rk (kPΠ f k2w −kf −PΠ f k2w −E). (3.12) And, lim Ekfe[k] − f k2L2ω = kf − PΠ f k2L2ω + kf − PΠ f k2w + E.

k→∞

(3.13)

The proof is in the next section. Theorem 3.1, as an equality, gives the expected numerical error of the proposed algorithm (3.8). The error converges with a rate of 1 − 1/N . Upon convergence, as the iterate step k → ∞, the error depends only on the best approximation PΠ f of the target function. The expectation n oE in (3.10) shall be understood as the expectation over the random sequence

z(j

[`]

)

of the algorithm.

0≤`≤k−1

6

3.3. General algorithm. Instead of using the discrete probability p∗j in (3.7), the randomized iteration (3.8) in our main algorithm can be carried out by any discrete probability. This results in a more general algorithm. Let dµp :=

M X

 pj δ x − z(j) dx,

(3.14)

j=1

be a general discrete probability measure, where pj is any discrete probability mass PM satisfying j=1 pj = 1. Then the same iteration (3.8) can be adopted with the measure dµp , i.e., c[k+1] = c[k] +

f (z(j

[k]

)

) − hc[k] , Ψ(z(j kΨ(z(j [k] ) )k22

[k]

)

)i

Ψ(z(j

[k]

)

z(j

),

[k]

)

∼ dµp .

(3.15)

We now analyze the convergence of this general algorithm. The result will facilitiate the analysis of the main algorithm in Theorem 3.1. For any discrete probability pj , let θj := N

pj , kΨ(z(j) )k22

j = 1, . . . , M,

(3.16)

and define the corresponding weighted discrete inner product [f, g]θ :=

M X

θj f (z(j) )g(z(j) ).

j=1

The induced discrete weighted norm is k · k2θ . We now define the following “covariance matrix” Σ = (σij )1≤i,j≤N ,

σij = [ψi , ψj ]θ ,

(3.17)

for the basis {ψj (x), j = 1, . . . , N }. Obviously, Σ is symmetric and positive definite. It possesses eigenvalue decomposition Σ = QT ΛQ,

(3.18)

where Q is orthogonal and Λ = diag(λ1 , . . . , λN ) with λmax (Σ) = λ1 ≥ λ2 ≥ · · · ≥ λN = λmin (Σ) > 0.

(3.19)

Let e = [e1 , . . . , eN ]T ,

ej = [f − PΠ f, ψj ]θ ,

1 ≤ j ≤ N.

(3.20)

For the general measure dµp (3.14), the convergence of the general algorithm (3.15), in term of its coefficient vector c[k] compared to the best approximation coefficient vector ˆ c in (2.11) in expectation, is given by the following theorem. Theorem 3.2. Assume c[0] = 0. The k-th iterative solution of the algorithm (3.15) satisfies ˆk22 ≤ Fu + (ru )k (kˆ F` + (r` )k (kˆ ck22 − F` ) + [k] ≤ Ekc[k] − c ck22 − Fu ) + [k] , (3.21) 7

where kf k2θ − kˆ ck22 , λmin (Σ) kf k2θ − kˆ ck22 , F` = λmax (Σ)

ru = 1 − λmin (Σ)/N,

Fu =

r` = 1 − λmax (Σ)/N,

(3.22)

and [k] = −

2 T T [k] ˆ Q D Qe, c N

(3.23)

with [k]

[k]

[k]

D[k] = diag[d1 , . . . , dN ],

dj :=

1 − (1 − λj /N )k , λj /N

1 ≤ j ≤ N.

In the limit of k → ∞, ˆk22 ≤ Fu − 2ˆ F` − 2ˆ cT Σ−1 e ≤ lim Ekc[k] − c cT Σ−1 e. k→∞

(3.24)

Proof. This result is a direct consequence of Theorem 3.1 in [10]. The proof shall follow exactly the proof of Theorem 3.1 in [10] and is omitted here. 3.4. The main algorithm: optimality and analysis. Using the convergence rate of the general algorithm (3.15), we now return to our main algorithm (3.8). We first discuss the optimality of the main algorithm, as compared to the general algorithms, and then provide the proof for Theorem 3.1. We then establish almost sure convergence of the main algorithm under a special condition. 3.4.1. Optimality. From Theorem 3.2, it can be seen that the convergence rate of the general algorithm depends on the eigenvalues of Σ. To achieve faster convergence rate, it is necessary to minimize ru = 1 − λmin (Σ)/N , the slower rate in (3.22), which implies maximization of λmin (Σ). Consequently, we define the optimal sampling probability as  opt pj = argmax λmin (Σ), (3.25) P p =1 {pj ∈R+ : N } j j=1 where the matrix Σ is defined in (3.17) and depends on pj . It can be shown that the discrete probability p∗j (3.7) used in our main algorithm (3.8) is a solution to this optimality criterion. Lemma 3.3. The discrete probability p∗j defined in (3.7) is a solution of the optimization problem (3.25). Proof. We first observe the following fact. N λmin (Σ) ≤

N X

λi (Σ) = tr (Σ)

i=1

=

N X

[ψi , ψi ]θ =

i=1

=

M X j=1

N X M X

θj ψi2 (z(j) )

i=1 j=1

θj

N X

! ψi2 (z(j) )

i=1

=N

M X j=1

8

pj = N.

This implies that λmin (Σ) ≤ 1. For p∗j defined in (3.7), we have θj = w(j) ,

j = 1, . . . , M,

by following the defintion of θj in (3.16). The entries in Σ, defined in (3.17), become σij = [ψi , ψj ]w =

M X

w

(k)

ψi (z

(k)

)ψj (z

(k)

Z )=

ψi (x)ψj (x)dω(x) = δij ,

k=1

where in the last equality the 2n polynomial exactness of the tensor quadrature (3.6) is used. Thus Σ becomes the identity matrix of size N and its eigenvalues become 1, i.e. λi ≡ 1, i = 1, · · · , N . This implies that λmin (Σ) is maximized and the optimality (3.25) is satisfied. 3.4.2. Proof of Theorem 3.1. Using Theorem 3.2 and Lemma 3.3, we now provide the proof of Theorem 3.1. Proof. For the optimal probability measure dµp∗ , the analyses in Section 3.4 indicate that θj = w(j) and Σ = IN , where IN the identity matrix of size N . We have Q = Λ = IN ,

 D[k] = N 1 − rk IN ,

(3.26)

and Fu = F` = kf k2w − kˆ ck22 .

ru = r` = 1 − 1/N,

(3.27)

From (3.20) and (2.10), we derive ej = [f − hˆ c, Ψ(x)i, ψj ]w = c˜j −

M X

w(i) ψj (z(i) )hˆ c, Ψ(z(i) )i

i=1

* = c˜j − ˆ c,

M X

+ (i)

(i)

(i)

w ψj (z )Ψ(z )

D

i=1

= c˜j −

N X

 Z  = c˜j − ˆ c, ψj (x)Ψ(x)dω(x)

cˆ` δ`j = c˜j − cˆj ,

`=1

where the 2n polynomial exactness of the tensor quadrature has been used. This  implies e = ˜ c−ˆ c. Substituting it and (3.26) into (3.23) gives [k] = −2hˆ c, ei 1 − rk . Combining this and (3.27), one obtains that the lower and upper bounds in (3.21) both equal to   kf k2w − kˆ ck22 + rk 2kˆ ck22 − kf k2w − 2hˆ c, ei 1 − rk , which can be reformulated into the right hand side term of (3.10) by the 2n polynomial exactness of the tensor quadrature. Finally, we arrive at (3.12) by utilizing (3.10) and ˆk22 + kf − PΠ f k2L2ω . kfe[k] − f k2L2ω = kfe[k] − PΠ f k2L2ω + kf − PΠ f k2L2ω = kc[k] − c The proof is then completed. 9

3.4.3. Almost sure convergence. For the special case of f ∈ Πdn , we can establish almost sure convergence of the main algorithm (3.8). Theorem 3.4. Assume f ∈ Πdn , then the k-th iterative solution of the algorithm (3.8) satisfies ˆk22 = 0 lim kc[k] − c

k→∞

almost surely.

(3.28)

Furthermore, the resulting approximation fe[k] := hc[k] , Ψ(x)i satisfies lim kfe[k] − f k2L2ω = 0

k→∞

almost surely.

(3.29)

Proof. Under the assumption f ∈ Πdn , one has f = PΠ f and E = 0. From (3.10) we obtain ˆk22 = rk kPΠ f k2w = rk kˆ ck22 , Ekc[k] − c

(3.30)

where r = 1 − 1/N ∈ (0, 1). To show (3.28), it suffices to prove that, for ∀ε > 0, ! ∞ n o [ ˆk22 > ε lim P = 0. (3.31) kc[i] − c k→∞

i=k

This is true because, for any fixed ε, it holds ! ∞ ∞ ∞ n   1X o X [ ˆk22 ˆk22 > ε ≤ ˆk22 > ε Ekc[i] − c P kc[i] − c P ≤ kc[i] − c ε i=k

i=k

i=k

∞ kˆ ck22 X i kˆ ck22 rk = r = → 0, ε ε 1−r

as k → ∞,

i=k

where the Chebyshev’s inequality has been used. We then immediately obtain (3.29) via ˆk22 . kfe[k] − f k2L2ω = kfe[k] − PΠ f k2L2ω = kc[k] − c The proof is completed. 3.5. The main algorithm: computational aspects. In this section we discuss several essential issues regarding the implementation of the main algorithm (3.8). We remark that the main algorithm (3.8) is in fact a special case of the general algorithm (3.15), which uses any discrete probability to draw the samples. However, the optimal rate of convergence is achieved by using the special probability (3.7), which is used in the main algorithm. The optimal convergence rate is essential in high dimensions, and the performance of the main algorithm is significantly superior to the general algorithm. Therefore, we focus exclusively on the main algorithm. 3.5.1. Sampling the discrete probability (3.7). The probability distribution (3.9), defined via p∗j in (3.7), is critical for the optimality of the main algorithm (3.8). This is a multivariate distribution with a non-standard form and can not be easily sampled by existing software. Direct sampling requires the evaluation and storage of p∗j for all j = 1, . . . , M . This becomes impractical in high dimensions, as M can be prohibitively large. One alternative is to use accept-reject method, which is a popular 10

approach for sampling non-standard probability. Although applicable in this case, it results in an increasingly large portion of rejections and becomes highly inefficient in higher dimensions. Here we present an effective sampling strategy for p∗j . It utilizes conditional probability to conduct the sampling dimension-by-dimension. Let x be a random variable satisfying the distribution (3.7). Then p∗j is the   (j ) (j ) probability of x taking the tensor quadrature point z(j) = z1 1 , . . . , zd d     (j ) (j ) p∗j = P x = z(j) = P x1 = z1 1 , · · · , xd = zd d   1 X  (j1 ) 2 (j1 )  (j ) (j ) = w1 φk1 (z1 ) · · · wd d φ2kd (zd d ) . N |k|≤n

  (j ) The marginal probability of P x1 = z1 1 is m m   X   X (j ) (j ) (j ) P x1 = z1 1 = ··· P x1 = z1 1 , · · · , xd = zd d j2 =1

=

jd =1

d 1 X  (j1 ) 2 (j1 )  Y 2 kφk` (x)kL2 w1 φk1 (z1 ) w N |k|≤n

(3.32)

`=2

 (j ) n  w1 1 X n − k1 + d − 1 2 (j1 ) = φk1 (z1 ), N d−1 k1 =0

where the 2n polynomial exactness of the tensor quadrature (3.6) is used. In fact, without any special treatment to the first component, this result applies to any dimension i = 1, . . . , d,  n    w(ji ) X n − ki + d − 1 2 (ji ) (ji ) i P xi = zi = φki (zi ). d−1 N ki =0

The marginal distribution of the first ` components is m m     X X (j ) (j ) (j ) (j ) P x1 = z1 1 , · · · , xd = zd d ··· P x1 = z1 1 , · · · , x` = z` ` = jd =1

j`+1 =1

  1 X  (j1 ) 2 (j1 )  (j ) (j ) = w1 φk1 (z1 ) · · · w` ` φ2k` (z` ` ) . N |k|≤n

(3.33) These probabilities can be explicitly evaluated. Since the quadrature set ΘM (3.2) is the tensor product of the one-dimensional sets Θi,m (3.1), we can efficiently sample p∗j component-by-component in a recursive manner. Also, it is important to realize that the points in each one-dimensional quadrature point set Θi,m (3.1) are uniquely labelled by the integer {1, . . . , m}. That is, o n (1) (m) Θi,m = zi , . . . , zi ←→ {1, . . . , m}, i = 1, . . . , d. (3.34) Then, each point in the tensor quadrature set ΘM corresponds to a unique multiindex. Sampling a random variable x from the set ΘN M is equivalent to sampling a d random multi-index α from the tensorized integer set i=1 {1, . . . , m}. 11

(j )

(j ) 

Algorithm 1 Draw a point z(j) = z1 1 , . . . , zd d ΘM (3.2) using the discrete distribution (3.7)

from the tensor quadrature set (`)

Evaluate the one-dimensional marginal probability mass P(x1 = z1 ) using (3.32) for all ` = 1, . . . , m. n om (`) (j ) 2: From the probability P(x1 = z1 ) , draw z1 1 ∈ Θ1,m , or, equivalently, draw

1:

`=1

j1 from the  integer set  {1, . . . , m}. (j1 ) 3: Set p ˆ = P x1 = z1 4: for k from 2 to d do 5: Evaluate the following probabilities using (3.33). For ` = 1, 2, · · · , m,   (jk−1 ) (`) (j ) (`) Pk := P x1 = z1 1 , · · · , xk−1 = zk−1 , xk = zk . 6: 7:

ˆ (`) = P (`) /ˆ Evaluate the conditional probabilities p. k k n Po m (`) (j ) From the conditional probability Pˆk , draw zk k ∈ Θk,m , or, equiva`=1

lently, draw jk from the integer set {1, . . . , m}. (j ) 8: Set pˆ = Pk k 9: end for (j1 ) (jd )  10: return z(j) = z1 , . . . , zd and the multi-index j = (j1 , . . . , jd ).

Our sampling method is now described in Algorithm 1. The distinct feature of this algorithm is that it involves only one-dimensional sampling in Step 2 for the first component and Step 7 for the kth component, k = 2, . . . , d, and can be trivially realized. 3.5.2. Convergence criterion. The convergence rate (1 − 1/N ) in Theorem 3.1 is in the form of equality. This provides us a very sharp estimate of convergence criterion. Let K be the total number of iterations for the main algorithm (3.8). Let K = γN , where γ is constant independent on N . Then we derive !!  K    ∞ X 1 1 1 1 1− = exp γN ln 1 − = exp −γ N N i N i−1 i=1    1 = exp −γ + O ≈ e−γ , if N  1. N According to Theorem 3.1, this implies that the square of iteration error, the last term in (3.12), becomes roughly e−γ times smaller. For example, when γ = 5, e−γ ≈ 6.7×10−3 ; when γ = 10, e−γ ≈ 4.5×10−5 . In most problems, γ = 10 can be sufficient. Our extensive numerical tests verify that K ∼ 10N is a good criterion for accurate solutions. On the other hand, suppose one desires the iteration error to be at certain small level   1, then the iteration can be stopped at ∼ − log()N steps. In high dimensions d  1, the total number of tensor quadrature points M = (n+ 1)d grows exceptionally fast. It is much bigger than the cardinality of the polynomial  space N = n+d . That is, M  N . The linear stopping criterion K = γN implies d that not all the tensor quadrature points are sampled. In fact, in high dimensions, only a very small random portion (consisting of K points) of the full grids ΘM is used in the iteration to reach accurate convergence. This remarkable feature implies that 12

the exceptionally large set ΘM is never needed during implementation. 3.5.3. Computational cost. Owing to the tensor structure of the underlying grids, the randomized tensor quadrature algorithm (3.8) allows a remarkably efficient implementation in high dimensions d  1. For simplicity of exposition, let us assume, only here in this subsection, that we use the same type of polynomials with the same degree in each dimension. Prior to the computation, one needs to store only one-dimensional data regarding the polynomial basis. More specifically, at each one-dimensional quadrature point j = 1, . . . , m, we store w(j) ,

{φi (z (j) )}ni=0 ,

{φ2i (z (j) )}ni=0 .

(3.35)

These are very small amount of data to store and irrespective of the dimension d of the problem. The evaluations of multi-dimensional polynomials, whenever needed, can be quickly conducted by using these stored one-dimensional data and the multi-index to single-index mapping (2.5). For example, for any multi-index j [k] drawn from the Algorithm 1, we have ψi (z(j

[k]

)

(j

[k]

) = φi1 z1 1

)

(j

[k]

· · · φid zd d

)

,

1 ≤ i ≤ N.

We now provide a rough estimate of the operation count of the main algorithm (3.8). At each iteration step, one needs to first draw a random sample based on the probability (3.7), and then conduct the iteration (3.8). The sampling step, when using the Algorithm 1 in Section 3.5.1, requires the use component information of wj and {φ2i (zj )}ni=0 , which can be readily retrieved. In our current implementation, the major cost stems from the evaluation of (3.33). This requires memory storage of O(d × N ) terms in (3.33) and then O(d × N ) flops. Then, in the iteration step of (3.8), the evaluation of the polynomial basis functions requires O(d × N ) flops, and the vector inner product and norm computation require O(N ) flops. The required memory storage is O(N ) real numbers, for the vectors of length N . However, the efficient implementation requires the storage of the multi-index to single-index mapping (2.5), which involves O(d × N ) entries. In summary, our main algorithm (3.8) requires the memory storage of O(d ×N ) real numbers, and each iteration step involves O(d×N ) flops. Note that N = n+d ∼ d dn . Since any reasonable polynomial approximation requires n ≥ 2, we have N  d in high dimensions. Assume the algorithm (3.8) terminates after K = γN steps with γ ∼ 10, then the overall computational complexity of the algorithm is: O(KN ) ∼ O(N 2 ) flops and O(N ) storage of real numbers. On the other hand, most of the existing regression methods require the explicit operations on the model matrix, which is the polynomial basis functions evaluated at the sample points. Let J ≥ N be the number of sample points. Then memory storage is for O(J × N ) ∼ O(N 2 ) real numbers. The operation counts depend on the method. For example, for the least squares method, it is O(J × N 2 ) ∼ O(N 3 ) flops. We observe that the dominating cost and storage for the current method (3.8) are both one order smaller than the standard regression methods such as least squares. 4. Numerical Examples. In this section we present several numerical examples to demonstrate the effectiveness of the proposed randomized tensor quadrature method. We consider three types of different, and representative, polynomials as the basis. They are Legendre polynomials in bounded domain, Hermite polynomials in 13

unbounded domain, and trigonometric polynomials in periodic domain. The dimensions of our examples include low-dimension d = 2, intermediate dimensions d = 10 and d = 40, and high dimensions d = 100 and d = 500. We consider the following six multivariate functions as the target functions. Four of them are from [6] and have been widely used for multi-dimensional function integration and approximation tests. They are 2 !  d X xi + 1 2 f1 (x) = exp − − χi ; (GAUSSIAN) σi 2 i=1 ! d X xi + 1 f2 (x) = exp − − χi ; (CONTINUOUS) σi 2 i=1 (4.1) !−(d+1) d X (xi + 1) 1 f3 (x) = 1 + σi , where σi = 2 ; (CORNER PEAK) 2 i i=1 ! −1  2 d Y xi + 1 σi−2 + f4 (x) = − χi ; (PRODUCT PEAK). 2 i=1 where σ = [σ1 , · · · , σd ] are parameters controlling the difficulty of the functions, and χ = [χ1 , · · · , χd ] are shifting parameters. We also consider the following two functions  f5 (x) = cos kx − χk2 ,

f6 (x) = 5 − 4 cos

9 P d

i=1

xi

,

(4.2)

which are fully coupled in all dimensions. We remark that these functions do not possess low-dimensional structures. Two types of convergence results are examined. One is the numerical error in the expansion coefficient c[k] at the k-th iteration, compared against

the best

approximation coefficient ˆ c in (2.11). We shall report the relative error c[k] − ˆ c 2 / kˆ ck2 . The best approximation coefficient ˆ c are computed by high accuracy numerical quadrature. This is straightforward in low dimensions. It becomes difficult in high dimensions, except for the functions f1 , f2 and f4 which take product form. The other error measure is the error in the function approximation fe[k] constructed by the kth-step iteration. To examine this error we independently draw a set of dense (105 ) samples from the probability measure dω(x), and compare the difference between fe[k] and the target function f at these sampling points. Vector 2-norm is again used

to report

the difference, and gives an approximation to the “continuous” error fe[k] − f L2 . ω We then use these two errors to verify the theoretical convergence results in Theorem 3.1. Moreover, for the functions of product form, f1 , f2 , and f4 , all the terms in the convergence estimation (3.12) can be numerically computed with high accuracy. This gives us a direct comparison between the theoretical convergence result (3.12) and the numerical convergence. In low dimension d = 2, we shall present numerical results averaged over 100 independent simulations. This gives us a better comparison against Theorem 3.1, whose results are stated in expectation. In higher dimensions, all numerical results are obtained over single simulation. The convergence behaviors for all the target functions are very similar. Therefore, we rather arbitrarily choose subsets of the results to present here. 14

4.1. Low dimension d = 2. We first present comprehensive results in two dimension (d = 2). At this low dimension, the computation (3.8) is cheap. We therefore present averaged numerical results over 100 independent simulations to reduce the statistical oscillations. 4.1.1. Legendre polynomial. We first consider the normalized Legendre polynomials approximation for the target functions f1 , f2 , f3 and f4 in (4.1) over the domain D = [−1, 1]2 . The tensor quadrature point set is constructed by the standard one-dimensional Gauss-Legendre quadrature points. The convergence history of the function approximation errors are displayed for different polynomial degrees n in Fig. 4.1, where the theoretical rates are also plotted. Good agreement between the numer1/2 is observed. As excepted, ical convergence rate and the theoretical rate 1 − N1 the convergence rate becomes smaller for higher degree (n) polynomials, because the cardinality (N ) of the polynomial space becomes larger. The saturation level of the errors becomes smaller at higher degree polynomial, due to the higher order accuracy. 0

0

10

10

n=5 n=10 n=20 −1

10 −5

10

Theoretical rates −2

10 −10

10

−3

10

−15

10

Theoretical rates −4

0

5K

10K

15K

20K

0

10

n=20 n=30 n=40

0

5K

10K

15K

20K

2

10

10

n=5 n=10 n=20

0

10

Theoretical rates −5

10

−2

10

−4

10 −10

10

−6

10

Theoretical rates −8

0

5K

10K

15K

20K

10

0

n=10 n=20 n=30 5K

10K

15K

20K

Fig. 4.1. Function approximation errors versus number of iterations by Legendre polynomials for four test functions in (4.1) at d = 2. Top left: f1 with σ = [1, 1] and χ = [1, 0.5]; Top right: f2 with σ = [−2, 1] and χ = [0.25, −0.75]; Bottom left: f3 ; Bottom right: f4 with σ = [−3, 2] and χ = [0.5, 0.5].

4.1.2. Hermite polynomial. We then examine the use of Hermite polynomials in unbounded domain. The normalized Hermite polynomials are used. The tensor quadrature set is constructed via the standard one-dimensional Gauss-Hermite 15

quadrature points. The results for f1 , f2 , f4 , and f5 , are shown in Fig. 4.2, along with the theoretical rates. Again, we observe excellent agreement between the numerical convergence and the theoretical convergence. 0

0

10

10

−2

10

−1

10

Theoretical rates

Theoretical rates

−4

10

−2

10 −6

10

−8

10

n=10 n=15 n=20

0

n=10 n=20 n=40

−3

5K

10K

15K

20K

0

10

0

5K

10K

15K

20K

0

10

10

−2

10

−4

Theoretical rates

−5

Theoretical rates

10

10

−6

10

−8

−10

10

10

−10

10

−12

10

0

n=5 n=10 n=20

n=10 n=15 n=20

−15

5K

10K

15K

20K

10

0

5K

10K

15K

20K

Fig. 4.2. Function approximation errors versus number of iterations by Hermite polynomials for four of test functions in (4.1) and (4.2) at d = 2. Top left: f1 with σ = [1, 1] and χ = [0.55, 0.5]; Top right: f2 with σ = [0.2, 0.1] and χ = [0.5, 0.5]; Bottom left: f4 with σ = [− 31 , 0.5] and χ = [0.5, 0.5]; Bottom right: f5 with χi = 0.

4.1.3. Trigonometric polynomial. We now examine the use of trigonometric polynomials for function approximation, which is essentially multivariate Fourier expansion. The domain is D = [0, 2π]d , and linear subspace for approximation is  ˆ dn := span eiα·x , |α| ≤ n/2 , Π (4.3) where α = (α1 , . . . , αd ) is multi-index with |α| = |α1 | + · · · + |αd | and αi ∈ Z, i = 1, · · · , d. The cardinality of this space is ˆ dn = N = dim Π

d X i=max{

2d−i }

d− n 2 ,0

  n  d 2 . i d−i

(4.4)

It is well known that the trapezoidal rule on uniform grids with equal weights is a quadrature rule for trigonometric polynomials. Consequently, we employ (n + 1) equally spaced points {2π`/(n + 1), ` = 0, 1, · · · , n} in each dimension to construct 16

the tensor quadrature points. This ensures the exactness of the tensor quadrature for ˆ d . In this case, the optimal sampling probability distribution (3.7) any functions in Π 2n becomes the discrete uniform distribution. In Fig. 4.3, the numerical convergence of the errors in f1 , f2 , f4 and f6 is shown. Again, excellent agreement between the numerical convergence rate and the theoretical convergence rate can be observed. The errors saturate at smaller levels when larger n is used, as expected. 0

0

10

10

−1

10

−1

10

Theoretical rates

−2

10

Theoretical rates −2

10 −3

10

−4

10

n=20 n=30 n=40

0

−3

5K

10K

15K

20K

1

10

n=20 n=30 n=40

0

5K

10K

15K

20K

15K

20K

1

10

10

0

0

10

10

Theoretical rates Theoretical rates

−1

10

−1

10

−2

−2

10

−3

10

0

10

n=20 n=30 n=40

−3

5K

10K

15K

20K

10

0

n=20 n=30 n=40 5K

10K

Fig. 4.3. Function approximation errors versus number of iterations by trigonometric polynomials for four of test functions in (4.1) and (4.2) at d = 2. Top left: f1 with σi = 1 and χi = π+1 ; Top right: f2 with σi = 1 and χi = π+1 ; Bottom left: f4 with σi = 2 and 2 2 π+1 χi = 2 ; Bottom right: f6 .

4.2. Intermediate dimensions d = 10 and d = 40. We now focus on dimensions d = 10 and d = 40. Hereafter all numerical results are reported as those of single simulation. 4.2.1. Legendre polynomial. We first consider the approximation in d = 10 dimension using the normalized Legendre polynomials. The results for approximating f1 in (4.1) over D = [−1, 1]10 are presented in Fig. 4.4 for different degrees. The exponential convergence of the errors with respect to the iteration count can be clearly observed. For the polynomial degrees n = 5, 6, 7, 8, 9, the cardinality of the polynomial space is N = 3, 003, 8, 008, 19, 448, 43, 758 and 92, 378, respectively. We observe from Fig. 4.4 that the ∼ 10N iteration count is indeed a good (and quite conservative) criterion for converged solution. 17

Furthermore, for this separable function, all the terms in the theoretical convergence formula (3.12) in Theorem 3.1 can be accurately computed. We thus obtain 

2

E fe[k] − f L2 (ω) ' 6.1279 × 10−5 + 172.5288 × 1 −

1 19448

k

n=7:

1 43758

k

n=8:



[k]

2 −6 e

E f − f L2 (ω) ' 2.8532 × 10 + 172.5289 × 1 −

,

and .

We plot these “theoretical curves” in Fig. 4.5, along with the numerical convergence over a single simulation. We observe that they agree with each other very well and the difference is indistinguishable. 0

10

n=5 n=6 n=7 n=8 n=9

0

10

−1

10

n=5 n=6 n=7 n=8 n=9

−1

10

−2

10 −2

10

−3

−3

10

−4

10

10

−4

10

−5

10

0

−5

500K

1000K

1500K

2000K

10

0

500K

1000K

1500K

2000K

Fig. 4.4. Coefficient errors (left) and function approximation errors (right) versus number of iterations by Legendre polynomials of different degree n at d = 10. The target function is the GAUSSIAN function f1 in (4.1) with σi = 1 and χi = 0.375.

0

0

10

10

Numerical Theoretical

Numerical Theoretical −1

10

−1

10

−2

10 −2

10

−3

10 −3

10

−4

10

−4

10

0

−5

100K

200K

300K

400K

10

0

200K

400K

600K

800K

1000K

1200K

Fig. 4.5. Numerical and theoretical function approximation errors versus number of iterations by Legendre polynomials at d = 10: n = 7 (left) and n = 8 (right). The target function is the GAUSSIAN function f1 in (4.1) with σi = 1 and χi = 0.375.

18

−1

−1

10

10

Numerical Theoretical

Numerical Theoretical

−2

10

−2

10

−3

10 −3

10

0

200

400

600

800

0

5K

10K

15K

−1

−1

10

10

Numerical Theoretical

Numerical Theoretical

−2

−2

10

10

−3

10

0

20K

1000

−3

50K

100K

150K

200K

10

0

500K

1000K

1500K

2000K

Fig. 4.6. Numerical and theoretical function approximation errors versus number of iterations by Legendre polynomials at d = 40. Top left: n = 1; Top right: n = 2; Bottom left: n = 3; Bottom right: n = 4. The target function is the GAUSSIAN function f1 in (4.1) with σi = 1 and χi = 0.375.

−1

−1

10

10

Optimal Nonoptimal

Optimal Nonoptimal

−2

−2

10

10

−3

−3

10

0

100K

200K

300K

400K

10

0

2M

4M

6M

8M

10M

Fig. 4.7. Function approximation errors obtained by optimal sampling strategy and a non-optimal sampling strategy (uniform) with Legendre polynomials at d = 40: left (n = 3); right (n = 4). The target function is the GAUSSIAN function f1 in (4.1) with σi = 1 and χi = 0.375.

19

0

0

10

10

Optimal Optimal rate Nonoptimal

−1

−1

10

10

−2

−2

10

0

Optimal Optimal rate Nonoptimal

100K

200K

300K

400K

10

0

2M

4M

6M

8M

10M

Fig. 4.8. Function approximation errors obtained by optimal sampling strategy (3.7) and a non-optimal sampling strategy, the uniform distribution in this case, with Legendre polynomials at d = 40. Left: n = 3; Right: n = 4. The target function is f5 in (4.2) with χi = −0.1.

For higher dimensions, we present the results of d = 40 in Fig. 4.6. As the complexity of the problem grows exponentially in higher dimensions, we confine ourselves to polynomial degree n ≤ 4. Again, the “theoretical convergence curves” agree very well with the actual numerical convergence. We now examine the convergence rate with different sampling probability. As Theorem 3.2 in Section 3.3 indicates, the randomized tensor quadrature method shall converge via any proper discrete sampling probability, as in the general algorithm (3.15). However, the optimal sampling probability (3.7) in our main algorithm (3.8) shall give us the optimal rate of convergence. This can be clearly seen in Fig. 4.7 and 4.8, where the numerical convergence by the optimal sampling probability (3.7) and by the discrete uniform probability are shown. Fig. 4.7 is for the GAUSSIAN function f1 and Fig. 4.8 for the function f5 . We clearly see that the general algorithm (3.15) using the non-optimal sampling measure does converge. However, its rate of convergence is significantly slower than that of the main algorithm (3.8) using the optimal measure. The difference in performance grows bigger in higher dimensions. Consequently, we shall focus exclusively on the optimal algorithm (3.8). 4.2.2. Hermite polynomial. We now consider approximations in unbounded domain using Hermite polynomial in d = 10 and d = 40. We present the results of function f5 in (4.2). The results are shown in Fig. 4.9 and 4.10, for the case of d = 10 and d = 40, respectively. Again, we observe the exponential rate of convergence and its good agreement with the theoretical rate. 4.3. High dimensions d ≥ 100. We now present the results at high dimensions d ≥ 100. In Fig. 4.11, the Legendre approximation results in [−1, 1]100 are shown, for polynomial order of n = 2 and n = 3. The results shown here are for the GAUSSIAN function f1 in (4.1). We also plot the theoretical convergence using Theorem 3.1. Excellent agreement between the theory and the numerical results are observed. The cardinality of the polynomial space is N = 5, 151 for n = 2 and N = 176, 851 for n = 3. We observe that convergence is reached at around K ∼ 50, 000 steps for n = 2 and K ∼ 1.5 × 106 steps for n = 3. This verifies the K ∼ 10N convergence criterion. Note that the total number of the tensor quadrature points is M = 3100 ≈ 5.2 × 1047 20

0

10

Theoretical rates −1

10

−2

10

n=5 n=6 n=7 n=8 n=9

−3

10

−4

10

0

500K

1000K

1500K

2000K

Fig. 4.9. Function approximation errors versus number of iterations by Hermite polynomials of different degree n at d = 10. The target function is f5 in (4.2) with χi = −0.1. 0

0

10

10

Numerical Theoretical rate

Numerical Theoretical rate

−1

−1

10

10

−2

10

0

−2

50K

100K

150K

200K

10

0

500K

1000K

1500K

2000K

Fig. 4.10. Function approximation errors versus number of iterations by Hermite polynomials of different degree n at d = 40: left (n = 3); right (n = 4). The target function is f5 in (4.2) with χi = −0.1.

for n = 2, and M = 4100 ≈ 1.6 × 1060 for n = 3. Our algorithm thus converges after using only a very small portion of the tensor quadrature. In Fig. 4.12 we present the approximation in unbounded domain R100 by Hermite polynomials of degree n = 2 and n = 3, for the function f5 in (4.2). Again, we observe the good agreement between the numerical convergence and the theoretical convergence rate. The K ∼ 10N convergence criterion also holds well. Finally, we present the approximation results in d = 500 dimension. This is for the function f5 in (4.2), by polynomials of degree n = 2. The cardinality of the polynomial space is N = 125, 751. The left of Fig. 4.13 is for the results in bounded domain by Legendre polynomials, and the right is for unbounded domain by Hermite polynomials. Again, we observe the expected exponential convergence and its agreement with the theoretical convergence. Note that in this case the tensor 21

−3

−3

10

10

Numerical Theoretical

Numerical Theoretical

−4

−4

10

10

−5

10

0

−5

20K

40K

60K

80K

100K

10

0

500K

1000K

1500K

2000K

Fig. 4.11. Numerical and theoretical function approximation errors versus number of iterations by Legendre polynomials at d = 100: n = 2 (left) and n = 3 (right). The target function is the GAUSSIAN function f1 in (4.1) with σi = 1 and χi = 0.375. 0

10

0

10

Numerical Theoretical rate

Numerical Theoretical rate

−1

10

0

−1

10K

20K

30K

40K

50K

10

0

500K

1000K

1500K

2000K

Fig. 4.12. Function approximation errors versus number of iterations by Hermite polynomials at d = 100: n = 2 (left) and n = 3 (right). The target function is f5 in (4.2) with χi = −0.1.

quadrature grids ΘM consists of M = 3500 ≈ 3.6×10238 points. A number too large to be handled by most computers. However, the current randomized tensor quadrature method converges after ∼ 1.2 × 106 steps, following the ∼ 10N convergence rule and using only a tiny portion, ∼ 1/10232 , of the tensor quadrature points. 5. Conclusion. In this paper we proposed a highly efficient iterative method for high dimensional polynomial approximation. The method utilizes tensorized quadrature points, which has been considered the worst candidate for high dimensional approximation. By deriving a new randomized Kaczmarz iteration, we demonstrated that the method converges exponentially fast and achieves the optimal rate of convergence. We established the theoretical rate of convergence in term of equality and derived reliable convergence criterion. The use of the tensor quadrature points allows highly efficient implementation, via the tensor structure of the grids and the use of multi-index. The method converges well before the entire set of the points are used. We further demonstrated that in term of both memory storage and operation counts, 22

0

0

10

10

−1

10

−2

10

0

Numerical Theoretical rate 500K

−1

1000K

1500K

2000K

10

0

Numerical Theoretical rate 500K

1000K

1500K

2000K

Fig. 4.13. Function approximation errors versus number of iterations of n = 2 at d = 500. Left: Legendre polynomials in [−1, 1]500 ; Right: Hermite polynomials in R500 . The target function is f5 in (4.2) with χi = 0.

the new method can be more efficient than the standard approximation methods. Our numerical examples included a large number of tests in dimensions as high as 500. The examples verified the theoretical convergence analysis and demonstrated the effectiveness of the method. REFERENCES [1] V. Barthelmann, E. Novak, and K. Ritter. High dimensional polynomial interpolation on sparse grid. Adv. Comput. Math., 12:273–288, 1999. [2] C. Brezinski and M. RedivoZaglia. Convergence acceleration of Kaczmarz’s method. J. Eng. Math., 93:3–19, 2015. [3] H.J. Bungartz and M. Griebel. Sparse grids. Acta Numer., 13:147–269, 2004. [4] X. Chen and A.M. Powell. Almost sure convergence of the Kaczmarz algorithm with random measurements. J. Fourier Anal. Appl., 18:1195–1214, 2012. [5] Y.C. Eldar and D. Needell. Acceleration of randomized Kaczmarz method via the JohnsonLindenstrauss lemma. Numer. Algo., 58(2):163–177, 2011. [6] A. Genz. Testing multidimensional integration routines. In Proc. of international conference on Tools, methods and languages for scientific and engineering computation, pages 81–94. Elsevier North-Holland, Inc., 1984. [7] S. Kaczmarz. Angen¨ aherte aufl¨ osung von systemen linearer gleichungen. Bulletin International de lAcademie Polonaise des Sciences et des Lettres, 35:355–357, 1937. [8] J. Liu and S.J. Wright. An accelerated randomized Kaczmarz algorithm. Math. Comp., 85(297):153–178, 2016. [9] D. Needell. Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math., 50(2):395–403, 2010. [10] Y. Shin and D. Xiu. Randomized Kaczmarz algorithm for multivariate function approximation. SIAM J. Sci. Comput., page submitted, 2016. [11] S.A. Smolyak. Quadrature and interpolation formulas for tensor products of certain classes of functions. Soviet Math. Dokl., 4:240–243, 1963. [12] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl., 15(2):262–278, 2009. [13] G. Szeg¨ o. Orthogonal Polynomials. American Mathematical Society, Providence, RI, 1939. [14] T. Wallace and A. Sekmen. Acceleration of Kaczmarz using orthogonal subspace projections. In Biomedical Sciences and Engineering Conference (BSEC),Oak Ridge, Tennessee, USA, May 21-23, 2013, pages 1–4. IEEE, 2013. [15] T. Zhou, A. Narayan, and D. Xiu. Weighted discrete least-squares polynomial approximation using randomized quadratures. J. Comput. Phys., 298:787–800, 2015. [16] A. Zouzias and N.M. Freris. Randomized extended Kaczmarz for solving least squares. SIAM 23

J. Matrix Anal. Appl., 34(2):773–793, 2013.

24