Monotonic Quantile Regression with Bernstein ...

Monotonic Quantile Regression with Bernstein Polynomials for Stochastic Simulation Matthias H.Y. Tan Systems Engineering and Engineering Management Department, City University of Hong Kong Abstract Quantile regression is an important tool to determine the quality level of service, product, and operation systems via stochastic simulation. It is frequently known that the quantiles of the output distribution are monotonic functions of certain inputs to the simulation model. Because there is typically high variability in estimation of tail quantiles, it can be valuable to incorporate this information in quantile modeling. However, the existing literature on monotone quantile regression with multiple inputs is sparse. In this paper, we propose a class of monotonic regression models, which consists of functional ANOVA (FANOVA) decomposition components modeled with Bernstein polynomial bases for estimating quantiles as a function of multiple inputs. The polynomial degrees of the bases for the model and the FANOVA components included in the model are selected by a greedy algorithm. Real examples demonstrate the advantages of incorporating the monotonicity assumption in quantile regression and the good performance of the proposed methodology for estimating quantiles. Keywords: Computer experiments, Monotonic Metamodel, Quantile Estimation 1. Introduction Prediction of the output of a computer model as a function of its inputs is an important problem in stochastic simulations of service, product, and operation systems. In these simulations, the outputs often have highly nonnormal distributions. When the output is nonnormal, estimation of the quantiles of the output rather than its mean and variance is more meaningful. Emulators are useful for reducing the cost of quantile estimation in stochastic simulation. However, compared to estimation of the mean, estimation of tail quantiles typically involves more uncertainty (a larger variance for the same sample size). Thus, it is of interest to utilize prior knowledge about the conditional quantile function to reduce estimation variability. Frequently, it is known that the quantiles are monotonic functions of the inputs. For instance, in 1

open Jackson network queueing systems (Gross et al. 2013), the quantiles of the waiting time distribution are increasing functions of arrival rates into the system. As another example, quantiles of the solutions of stochastic differential equations (Xiu, 2010) are often monotonic in the equation parameters. A specific example is the transient heat transfer problem considered in Cengel (2002), in which the temperature of a solid body immersed in a fluid is a solution of a differential equation. Various physical properties of the system (e.g., thermal conductivity) can be viewed as uncertain and assigned a probability distribution. Physical knowledge suggests that for a solid with lower temperature than its medium, the temperature in the solid should increase with time under all conditions. Thus, the quantiles of the temperature distribution should increase with time. In these examples, a model that is consistent with the monotonicity information will be advantageous due to three reasons. First, quantile estimation accuracy is expected to improve. As pointed out by a referee, for strictly monotone functions, any consistent estimator of the function that, when differentiated, yields a consistent derivative estimator will be monotonic for large sample sizes because the derivative will be strictly positive/negative. Consequently, the imposition of monotonicity will not improve performance asymptotically. However, significant improvements can often be seen when the sample size is small. Second, confidence intervals will not be unnecessarily wide because nonmonotonic predictions are ruled out. Third, the model is more interpretable and acceptable to the practitioner than one that is nonmonotonic. Due to these advantages, statistical methods for monotonic quantile estimation are of practical value. This paper addresses the problem of monotonic quantile regression in stochastic simulation. Linear regression models have long been used as a mean model in stochastic simulation experiments (Law, 2014). Recently, Gaussian process (GP) emulators (Sacks et al., 1989; Currin et al., 1991) have been popularized for stochastic simulation by Jack P.C. Kleijnen and coauthors (Kleijnen, 2009). Ankenman et al. (2010) propose a modification of the GP emulator, called stochastic Kriging. The GP or stochastic Kriging emulators are for estimating the mean. 2

However, in many cases, interest centers on quantile estimation. This is the case for decisionmaking under uncertainty and nonparametric modeling of the output distribution. For example, in the design of a service system, a decision-maker may be interested in ensuring that the 0.9 quantile of the daily average waiting time of customers does not exceed a certain value. Note that maximization of quantiles of the utility is given a sound decision theory foundation by Rostek (2010). Quantile linear regression originated from the seminal work of Koenker and Bassett (1978), and there has been much further development on the subject (see Koenker (2005)). Reich et al. (2011) considered the problem of spatial modeling of quantiles. Recently, Plumlee and Tuo (2014) employ the GP model with a nugget to estimate quantiles in stochastic simulation. Research on shape-restricted regression provides methods for building monotonic models. Chang et al. (2007), McKay Curtis and Ghosh (2011), and Wang and Ghosh (2012) use Bernstein polynomial models for monotonic mean regression with a single input. There are many other methods (e.g., Mammen et al. (2001), Ramsay (1988)). Most monotone models can be used for both mean and quantile estimation. He and Ng (1999) discuss monotone quantile smoothing with a single input. Kim (2006) studies linear quantile models where the coefficients are monotonic functions of an input. There is less work on monotonic regression with multiple inputs. Wang and Ghosh (2011) use multivariate Bernstein polynomials. Gluhovsky (2006) models main effects and two-factor interactions with I-splines. Few articles explicitly consider monotone quantile regression with multiple inputs. Chernozhukov et al. (2009) propose a general method, called rearrangement, to monotonize a nonmonotonic estimate of a function. Maineffects-only models (Koenker, 2005) can be used but they are restrictive. This paper employs monotonic regression models obtained by modeling FANOVA components with Bernstein polynomial bases, called FANOVA Bernstein models, for quantile estimation in stochastic simulation. Stochastic simulation problems typically consist of multiple inputs and the output is often known to be monotonic functions of some of the inputs. Moreover, 3

we can expect only a handful of inputs to have large effects on the output and a small subset of the large number of possible interactions between inputs to be significant. Thus, it is often of interest to build a monotonic quantile regression model with multiple inputs that includes only important main effects and interactions between inputs. We propose an approach based on identifying important FANOVA components by modeling each component using Bernstein polynomials. Monotonicity is guaranteed by imposing certain linear restrictions on the model coefficients. The degree of the Bernstein bases used and the FANOVA components are chosen by minimizing the Bayesian information criterion (BIC) (Lian, 2012). The rest of the paper is organized as follows: Section 2 reviews the univariate and multivariate Bernstein polynomials. Section 3 introduces the FANOVA Bernstein model and shows how monotonicity can be imposed. Section 4 discusses the monotone quantile regression problem. Section 5 proposes an algorithm for model selection, and discusses design and statistical inference issues. Two examples are given in Section 6. Section 7 concludes the paper. 2. Univariate and Multivariate Bernstein Polynomials The univariate Bernstein polynomials/bases of degree on [0,1 are given by ( , , )=

(1 − )

, = 0,1, … , .

(1)

The univariate Bernstein model employed by Wang and Ghosh (2012) and others is ( , , ).

( )=∑ Because −

( )/

=∑

≥ 0, = 0, … ,

(2) (

−

) ( , ,

−1

ensures that ′( ) ≥ 0 for all

(3) ∈ [0,1 .

For monotonic regression with

> 1 inputs on [0,1 , Wang and Ghosh (2011) employ

the multivariate Bernstein polynomials/bases =∏

1−

− 1), the restrictions

, where

( , , )=∏

= ( ,…,

),

=(

, , ,…,

), and = ( , … , ) ∈ 4

( )=∏ ( ) = ∑ ∈

. The multivariate Bernstein model is

( , , ).

( )

( )=∑

Because

,

∑

( ) ≥ 0∀ ∈ [0,1

see that ( ,…,

0,1, … ,

,

,…, )

,

−

( ,…,

(4) ∏

∑

, ,

,

[

) , it is easy to

( , ,

is guaranteed if , ,

,…, )

≥ 0, ∀ = 0, … ,

, ≠ ,

= 0, … ,

− 1.

(5)

One important limitation of model (4) is that the number of bases increases very fast with the dimension . Thus, a large experiment is needed to fit the model when

is large.

The linear restrictions on the coefficients (5) are a sufficient but not necessary condition for monotonicity. Thus, coefficients that give a monotonic

may not satisfy (5) and it is not

clear that (5) allows sufficient flexibility to model a large class of functions. A justification for imposing (5) is the following theorem, stated in Page 115 of Gal and Anastassiou (2010). Theorem 1: Let : [0,1 =( /

/

Let

), converge uniformly to

,…,

/

=

: ∈

( ) and

( ) = ∑ ∈

as min

,…,

= ( / ) yields a

,…,

. This observation and Theorem 1 imply that there exists

,…,

(/ )

→ 0 for any sequence

that is in ⋂

,

, … with min

,…,

(, )

∑

is increasing in

that is increasing in (

∈⋂

) such

→ ∞, where

=

). This justifies the use of constraints (5) to achieve monotonicity.

3. FANOVA Decomposition and Modeling A FANOVA decomposition of : [0,1 → ℝ is given by ( ) = ∑

( , , ), where

→ ∞.

( ), i.e., an

, setting

−

( )

( ) be the set of ’s that satisfy (5). If

,…,

that (

→ ℝ be continuous. Then,

(, )

,

+ ⋯+

( ,…, ) (

), where we call

the intercept,

+∑ ()

( )(

)+

a main effect,

a two-factor interaction, and similarly for the other components of the decomposition. Such

decompositions can be employed to improve modeling of

because we can assume that a small 5

subset of main effects and two factor interactions are important. This assumption is supported by the effect sparsity and hierarchy principles (Wu and Hamada, 2009). Efron and Stein (1981) point out that the FANOVA decomposition is unique if conditions

,…,

( ,…, )

has independent components and the

= 0, = 1, … are imposed, where the expectation is taken

while holding other inputs fixed. This decomposition is an important tool in

with respect to

global sensitivity analysis (Saltelli et al., 2000) and functional data analysis (Ramsay and Silverman, 2005). This paper adopts a different FANOVA decomposition obtained by imposing a different set of conditions that allow easier modeling with the multivariate Bernstein bases. Theorem 2: Let : [0,1 ( )= if

( )(

+∑ ,…,

( ,…, )

→ ℝ be a function. Then, )+∑

∑

= 0∀

(, )

,…,

+⋯+

( , ,…, ) (

∈ [0,1 : ∏

∈

It is a simple matter to see that

,

has a unique FANOVA decomposition:

( ,…, )

,…,

)

(6)

= 0 , = 1, … , . can be approximated with a linear

combination of multivariate Bernstein polynomials ( ,…, )

= ∑ (

,…, ,…,

where ,…, ( ,…, )

,…, )∈

=∏

, ( , … , ),

,…,

= 0∀

,…,

1, … ,

,…,

, ( , … , ),

,…,

,(7)

. Note that the coefficients of the bases with ∏

,…, ,…,

( ,…, )

∈ ( ,…,

= 0 are set to zero so that ) ∈ [0,1 : ∏

= 0 . Thus, it is very

natural and convenient to model the FANOVA components in (6) with Bernstein bases. Note that ∑

( , , ) = 1 so that an alternative set of bases with the same span, which

we call the modified Bernstein bases is ( , 0, ) = 1, ( , , ) = ( , , ), = 1, … , .

(8)

If we use the tensor product of the modified Bernstein bases (8) instead of the tensor product of the Bernstein bases (as in (4)) for function approximation, we obtain a model 6

( ) = ∑ ∈

( , , )=∑

( )

⋯∑

)⋯ ( , ,

( , ,

( ,…, )

) , 0,

that decomposes into the FANOVA components given in (7). This is because Sufficient conditions that ensures monotonicity of ( ) with respect to ( , 0, ), … , ( , , ) and

as follows: Since

= 1.

can be obtained

( , 0, ), … , ( , , ) have the same span, ( , ,

model (9) can be rewritten in terms of the bases 1, … ,

(9)

)⋯ ( ,

):

,

= 0, … ,

, =

, which gives model (4). By equating (9) and (4), we obtain (see Appendix A)

( ,…, )

∑

=∑

).

( ,…,

∈ ,

(10) ( ,…, ) ’s

From (10), we obtain the following theorem, which gives linear restrictions on the

that

are sufficient to achieve monotonicity in an axial direction. Theorem 3: Model (9) is increasing in ∑

∑

,

0,1, … ,

∑

∈ ,

( ,…,

∈ ,

if

)

− ∑

( ,…,

∈ ,

)

≥ 0∀

= 0, … ,

− 1,

∈

, ≠ .

(11)

Proof: It follows from (10) that (9) with constraints (11) is equivalent to (4) with constraints (5). Since it is often the case that only main effects and two factor interactions are significant, we shall assume that ( ) =

+∑

)+∑

( )(

∑

the paper. Proposition 1 below demonstrates that the main effect ensuring that

is increasing in

,

(, ) ( )

in the remainder of

plays an important role in

. It must be increasing. It must also offset all decreases in two

factor interactions involving input , i.e., the positive increment in

( )

must be at least equal in

magnitude to the sum of all negative increments in two factor interactions involving input . Proposition 1: Suppose ( ) = ( ,…, ( )(

for all

,

+Δ ,

+Δ )−

( )(

∈ [0,1 and Δ

,…, ) ≥ sup ∈ [0,1 −

+∑

( )(

)+∑

∑

(, )

) − ( ) ≥ 0∀ ∈ [0,1 , Δ ∈[ ,

∑

, where

,

( , )

= ( ,…,

, ,

,

and

∈ [0,1 −

. Then,

−

+Δ ,

( , )

,…,

(12)

). 7

Remark: By abuse of notation, we let

,

( , )

=

,

( , )

,
0 and

> 0. The value of

(, )

()

and

determine whether

( )

()

and

( , ),

the

are included in the

(, )

is actually included.

This incorporates the strong heredity principle, which states that a two factor interaction can be 10

in the model only if both of its parent main effects are active (Wu and Hamada, 2009; Chipman 1996). Given

and

=

( , ), … , (

, )

> 0,

+ ∑ ∈

,…,

(, )

> 0 and

(15). Note that if ( )

> 0,

\(

∪ )

( , ) ,

(, ) (, )

corresponding to active two factor

( )|( , ) to be increasing in

= 1). To force

( , )

()

is a linear model with coefficients

> 0) and

corresponding to active main effects ( interactions (

,

, we use ( )

= 0 for all ∈ , then (15) can be replaced with ( , ) ,

−

≥ 0, = 0,1, … ,

− 1,

= 0,1, … ,

−

.

For simplicity of notation, we order and rewrite the nonzero coefficients in (18) as ,

,…,

. Let

=(

,

) , and ( ) = 1,

,…,

the value of the basis function corresponding to ⊂ [0,1

,…,

(

, )(

) = 1 if

−

∈ (−∞, 0), and

(

) ,

= ( ,…,

, where , )(

( ) , where

( ) denotes

at . Given an experiment design

and corresponding output data

minimizing the check loss ∑

( ), … ,

) = 0 if

( )=

=

can be estimated by −

(

, )(

) ,

∈ [0, ∞). This is the key idea in quantile

regression (Koenker and Bassett, 1978). When the distribution of in (17) depends on , it is more efficient to estimate

by minimizing a weighted check loss described in page 160 of

Koenker (2005). However, the weights are difficult to estimate. ( )|( , ) ≥ 0∀ ∈ [0,1 ,

To incorporate the monotonicity constraint 1, … ,

∈Ω⊂

, we simply need to enforce the linear constraints (15), which can be written as

where the matrix

≥ ,

consists of elements that equal -1, 0, or 1. Thus, the monotonic quantile

regression problem can be formulated as an optimization problem: min ∑

−

s. t.

≥ .

(19)

The problem (19) can be rewritten as a linear program min

, ,

In (20),

[

+ (1 − )

s. t.

≥ ,

is a matrix with rows ( ), … , (

+ ) and

−

= , is an

≥ ,

≥ .

(20)

× 1 vector of 1’s. 11

We shall provide a justification for the approach given by (19) and (20). Suppose that there are distinct design points function in (19) normalized by

=

each replicated

( )|( , ), where

− [ ,

− ( )|( , )

( )

( )|( = , ), which is a model that

includes all main effects and two factor interactions when [ ,

[

/ , where

is a density on [0,1 , can be viewed as the population quantity being

estimated by the objective function in (19). Let ℱ( ) =

minimizing

→ ∞, the objective

times. As

converges to ∑

. Thus, the integrated check loss

denotes the output at of

,…,

− ℱ( ) −

[

− ( )

[

> . Theorem 4 states that

subject to (15), where

is the true

conditional quantile function, yields a quantile regression model ℱ that achieves the minimum possible integrated check loss (which is achieved by ) as min Theorem 4: Let : [0,1 1, … ,

Denote by ℱ ( ) the function ℱ( ) =

where

[ ,

,…,

∈Ω⊂

= ( ) + , where ( ≤ 0| ) = (

[

− ℱ( ) −

( )|( = , ) (see (18)) with [

− ( )

( )

is a compact set that contains the coefficients of

lim

,

≤

∈ (0,1). Assume that ℎ ( ) ( ) is the probability density function of ( , ).

( )| ) =

argmin

→ ∞.

→ ℝ be a continuous function that is increasing in

be a density on [0,1 . Let

and

,…,

→

[ ,

[

−ℱ ( ) −

[

− ( )

set to

=

: (15) holds for all ∈ Ω,

∈

,

. Then, ( )

= 0.

5. Model Selection, Statistical Inference, and Design For practical purposes, we need a data driven procedure to determine the values of ,…,

and the

( , ) ’s.

By increasing

, model (18) can approximate functions of increasingly

complex shapes. On the other hand, by setting by setting

( , )

= 0, model (18) will be independent of

= 0∀ ≠ , it will be additive in input . Thus, by selecting appropriate

and and

, (18) can be used to model a wide variety of functions with moderately large number of inputs. We use the BIC (Lian, 2012) for comparing models with different

and . The BIC is 12

BIC = ln ∑ where

−

+ | | ln /(2 ),

(21)

is the optimal solution to (20) and | | is the number of elements in . The BIC has been

proven to be asymptotically consistent for model selection under certain conditions. We set BIC = ∞ for all models with model matrix

that is not of full column rank. Without any

monotonicity constraints, the model coefficients are not identifiable and we can reduce the number of columns of

(which is equal to | |) without changing ∑

−

. The

situation is more complicated when there are monotonicity constraints. However, if there exists an optimal solution

to (20) such that

> , then there are multiple optimal solutions to (20).

Thus, it makes sense to rule out models that give an

that is not of full column rank even if

there are monotonicity constraints. For quasi-random space-filling designs, model matrices are almost always of full column rank unless the number of columns exceeds the number of rows. Because the number of models is huge, we use a heuristic optimization approach to search for the best model. We suppose that

≤

, = 1, … , , where

is an upper

bound specified by the modeler. The model search algorithm is given in Figure 1.

1. 2. 3.

4.

Model Search Algorithm Initialize with = , = . Change a single component of in such a way that the BIC is decreased the most. Repeat this step until the BIC value cannot be improved further. > 0, > 0 , compare the BIC For each ( , ) ∈ = ( , ): 1 ≤ < ≤ , value resulting from the following changes to and , where and are reset to their values at the start of this step before performing each change. a. Change the value of ( , ) to 0. to minimize the BIC. b. Change the value of ( , ) to 1 and find ∈ 1, … , c. Change the value of ( , ) to 1 and find ∈ 1, … , to minimize the BIC. Adopt the best change to an ( , ) ∈ that minimizes the BIC. Repeat Steps 3 and 4 until the BIC cannot be improved further. Figure 1: Model search algorithm The algorithm in Figure 1 is a greedy algorithm. With

fixed at , it optimizes

by

finding the change in a single component that improves the BIC value by the largest amount, where each component is restricted to take on values 0, … ,

. This is repeated until the BIC 13

∗

cannot be improved, which yields

. With

fixed at

∗

, each two factor interaction involving

only inputs with active main effects is removed from and added to the model. When the ( , ) interaction is added, the algorithm also finds the best conditioned on

best

=

∗

,

=

conditioned on

≠ . The reason for this step is that if

∗

∗

and

,

≠ and the ∗

are both

large, then adding the ( , ) interaction into the model will increase the number of terms by ∗

∗

, which is huge. This may give a model with far too many terms (perhaps even more than

the number of experiment runs) and a high BIC value. As such, the ( , ) interaction may not be able to enter the model even though it is significant. By reducing

, the ( , ) interaction

or

can be allowed to enter the model. The best option involving two factor interactions is adopted each time Step 3 is performed. A best option can involve removing interaction ( ∗ , ∗ ), adding interaction ( ∗ , ∗ ) and changing

∗

, or adding interaction ( ∗ , ∗ ) and changing

∗

. This process

is repeated until no improvements in the BIC can be made, which leads to termination of the algorithm. Note that existing model selection methods cannot be utilized for the problem in this paper. The LASSO and other penalties (Wu and Liu, 2009) can be used to select model terms but not the polynomial degrees

. Moreover, they do not impose the effect heredity principle, which

have been found improve model selection when the number of model terms far exceeds the number of distinct experiment runs (Chipman, 1996), as in the problems considered in this paper. To quantify model and parameter estimation uncertainty, we employ a bootstrap procedure. In stochastic simulations, it is common and sensible to employ replicated designs, where each distinct input setting is replicated design point

be denoted by

( ), … , ( ).

times. Let the ordered observations at a distinct

Then, we estimate the quantile function ( ) =

with the piecewise linear function that connects the points 0, ( )

, 0.5/ ,

( )

, … , ( − 0.5)/ ,

( )

, 1,

( )

+ 0.5

( )

−

(

( )

)

− 0.5

( )

−

. The first point is

14

obtained by continuing the linear trend between 0.5/ , the last point. For each bootstrap replication, we sample

( )

and 1.5/ ,

( )

and similarly for

uniform random numbers on [0,1 and

compute the value of the piecewise linear quantile function at the

numbers to obtain

random

samples of the response. This is called a smoothed bootstrap (Silverman and Young, 1987). The design is fixed and not resampled. Using each bootstrap sample of the output as data, we find the best model using the Model Search Algorithm in Figure 1. The collection of best models found with the procedure gives an indication of model uncertainty. In addition, a 100(1 − )% confidence interval for a quantile is constructed from the /2 and 1 − /2 sample quantiles of the estimates given by the best model in each bootstrap sample. Optimal designs for the FANOVA Bernstein model is a complex problem that we shall leave for further research. In the examples, we employ the cosine maximin Latin hypercube design (cosine MLHD) proposed by Dette and Pepelyshev (2010) and Tan (2014). Both Dette and Pepelyshev (2010) and Tan (2014) have demonstrated that the cosine MLHD improves performance of GP models. Furthermore, Tan (2014) has shown that cosine MLHD’s are good for fitting polynomial models. Since (18) is a polynomial model, cosine MLHD’s should be a good choice for fitting the model. We suggest that the number of replicates

for the design be

chosen so that the expected number of observations above or below the quantile to be estimated is at least one, i.e., min , 1 −

≥ 1. This will allow reasonably reliable estimation of the

quantile at each distinct design setting using only the data for that setting. When estimating multiple conditional quantile functions, the problem of quantile crossing can occur, i.e., an estimated given

even though

0

1.000

0.980

1.000

Mean

0.859

Mean

6.272

0.953

0.757

Std

0.228

Fraction >0

1.000

1.000

1.000

Multivariate Bernstein FANOVA Bernstein

Quantile GP FANOVA Bernstein

Rearranged Quantile GP - FANOVA Bernstein

MSE

MAE

= 0.1 FANOVA Bernstein

MSE

Mean

7.022

Mean

53.604

-1.179

-1.630

Std

2.175

Fraction >0

1.000

0.220

0.160

Mean

1.962

Mean

4.373

-0.141

-0.217

Std

0.305

Fraction >0

1.000

0.260

0.180



Rearranged Quantile GP - FANOVA Bernstein

MAE


MSE

MAE

Mean

3.817

Mean

91.399

3.972

2.427

Std

1.872

Fraction >0

1.000

0.880

0.800

Mean

1.396

Mean

6.365

0.616

0.425

Std

0.305

Fraction >0

1.000

0.900

0.840

In each simulation, the design

is generated and the output from the simulator is

obtained. Then, quantile regression is performed using the multivariate Bernstein model (4) with the monotonicity constraints (5), the FANOVA Bernstein model, the quantile GP model, and the rearranged quantile GP model. The value of

for the FANOVA Bernstein model is fixed at 21

seven. For the multivariate Bernstein model, we set

= (1,1,1,1,1). Note that this model

requires 2 = 32 distinct design points so that its coefficients are identifiable. If each component of

is increased to two, we would need 3 = 243 distinct design points. This demonstrates a

disadvantage of the multivariate Bernstein model. Two quantile estimation performance measures are computed in each simulation run: the square error and absolute error averaged over a set of validation data points, which are the mean squared error (MSE) and mean absolute error (MAE) respectively. The validation data consist of 1000 replicates of a 500-run MLHD design. The square error at each of the 500 data points is

̅ −

, where ̅ is the sample quantile of

the 1000 replicates computed using Matlab’s prctile function, and

is the quantile estimate

obtained using one of the four alternative models above (similarly for the absolute error). In Appendix D, we plot the point and interval predictions given by the FANOVA Bernstein and rearranged quantile GP model for some simulated data and discuss model selection results. Table 2 presents the simulation mean and standard deviation of the MSE and MAE for the FANOVA Bernstein model and the mean difference in MSE/MAE between a competing model and the FANOVA Bernstein model. We see that the FANOVA Bernstein model is substantially better than the multivariate Bernstein model in all cases. Table 2 shows that in the simulations, the FANOVA Bernstein model performs better than the quantile GP and rearranged quantile GP models for

= 0.5 and

= 0.9 but performs worse when

= 0.1. Since the same

data is used to fit the four models in each simulation, the fraction of simulations in which the difference in MSE/MAE is positive can be used to test whether the FANOVA Bernstein model is significantly better. These fractions are given in Table 2. Since ( /50 ≥ 0.68) = 0.0077, where

is a binomial random variable with 50 trials and success probability 0.5, we say that the

FANOVA Bernstein model is significantly better if the observed fraction is larger than or equal to 0.68. We see that the FANOVA Bernstein model is significantly better (with respect to both 22

MSE and MAE) than the quantile and rearranged quantile GP models for estimating the 0.5 and 0.9 quantile but significantly worse for the 0.1 quantile. Table 3: Mean and standard deviation of coverage of 95% confidence/credible intervals given by FANOVA Bernstein, quantile GP, and rearranged quantile GP models. FANOVA Bernstein = 0.9 = 0.1 = 0.5

Quantile GP

Rearranged Quantile GP

Mean

0.91

0.82

0.85

Std

0.09

0.10

0.07

Mean

0.91

0.83

0.85

Std

0.06

0.10

0.09

Mean

0.87

0.80

0.84

Std

0.07

0.11

0.08

We assess the coverage of 95% confidence/credible intervals of the FANOVA Bernstein, quantile GP, and rearranged quantile GP models. The coverage of the intervals for a model is the fraction that contains the sample quantiles of the validation data. The 95% confidence intervals for the FANOVA Bernstein model are constructed with 100 bootstrap samples. Table 3 gives the mean and standard deviation of the coverages. We see that the FANOVA Bernstein model gives coverage closer to the nominal coverage than the quantile GP or rearranged quantile GP models. 7. Conclusions This paper proposes a class of FANOVA Bernstein models for quantile estimation in stochastic simulation. The models are obtained by modeling main effects and two factor interactions in a FANOVA decomposition with Bernstein bases. The model coefficients are estimated with a linear program as in quantile regression. Monotonicity with respect to the inputs is imposed by linear constraints on the coefficients. Two factor interactions and the degree of the polynomial bases for each input are selected by an algorithm that uses the BIC as model selection criterion. A smoothed bootstrap method is employed to perform statistical inference. Three examples that involve estimating the quantiles of real stochastic simulation models illustrate that quantile point estimates obtained with the proposed model are robust to outliers in the response and can be more accurate than the quantile GP or rearranged quantile GP models. 23

A few areas require further research. First, a faster procedure for constructing confidence intervals is desirable. The bootstrap procedure for obtaining confidence intervals is timeconsuming due to the need to repeat the model selection step. If the model selection step is not repeated, the confidence intervals that are obtained can suffer from undercoverage. Second, a rigorous method is needed to choose the design points and the number of replicates. Acknowledgements We thank the associate editor and two referees for helpful comments. This research was supported by City University of Hong Kong Start-Up Grant 7200364 and Early Career Scheme (ECS) project 21201414 funded by the Research Grants Council of Hong Kong. Supplementary Materials: Appendices.pdf: This file contains Appendices A-D. Appendix A contains proofs of mathematical results. Appendix B gives a numerical example in which quantiles of a queueing simulation model with four inputs are modeled. Appendix C gives simulation results for Example 1. Appendix D gives illustrations and results for Example 2. References 1. Ankenman, B., Nelson, B. L., and Staum, J. (2010). “Stochastic kriging for simulation metamodeling,” Operations Research, 58(2), 371-382. 2. Barenblatt, G. I. (1996). Scaling, self-similarity, and intermediate asymptotics: dimensional analysis and intermediate asymptotics (Vol. 14). Cambridge University Press. 3. Cengel, Y.A. (2002). Heat transfer: a practical approach. New York: McGraw-Hill. 4. Chang, I.S., Chien, L.C., Hsiung, C.A., Wen, C.C., and Wu, Y.J. (2007). “Shape restricted regression with random Bernstein polynomials,” IMS Lecture Notes-Monograph Series, 54, 187–202. 5. Chernozhukov, V., Fernandez-Val, I., and Galichon, A. (2009). “Improving point and interval estimators of monotone functions by rearrangement,” Biometrika, 96(3), 559-575. 6. Chipman, H. (1996). “Bayesian variable selection with related predictors,” The Canadian Journal of Statistics, 24(1), 17-36. 7. Currin, C., Mitchell, T., Morris, M., and Ylvisaker, D. (1991). “Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments,” Journal of the American Statistical Association, 86(416), 953-963. 8. Dette, H. and Pepelyshev, A. (2010). “Generalized Latin hypercube design for computer experiments,” Technometrics, 52(4), 421-429. 9. Efron, B. and Stein, C. (1981). “The jackknife estimate of variance,” The Annals of Statistics, 9(3), 586-596. 10. Gal, S. and Anastassiou, G. A. (2010). Shape-preserving approximation by real and complex polynomials. New York: Springer. 11. Gluhovsky, I. (2006). “Smooth isotonic additive interaction models with application to computer system architecture design,” Technometrics, 48(2), 176-192. 12. Gross, D., Shortle, J. F., Thompson, J. M., and Harris, C. M. (2013). Fundamentals of queueing theory (4th Edition). New York: Wiley. 13. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models (Vol. 43). Boca Raton: CRC Press. 14. He, X. and Ng, P. (1999). “COBS: qualitatively constrained smoothing via linear programming,” Computational Statistics, 14(3), 315-338. 15. Jaluria, Y. and Torrance, K.E. (2003). Computational Heat Transfer (2nd Edition). New York: CRC Press. 24

16. Kim, M. (2006). “Quantile Regression with Shape-Constrained Varying Coefficients,” Sankhya, 68(3), 369-391. 17. Kleijnen, J. P. (2009). “Kriging metamodeling in simulation: A review,” European Journal of Operational Research, 192(3), 707-716. 18. Koenker, R. (2005). Quantile regression (No. 38). Cambridge university press. 19. Koenker, R. and Bassett Jr, G. (1978). “Regression quantiles,” Econometrica: journal of the Econometric Society, 33-50. 20. Koenker, R. and Mizera, I. (2004). “Penalized triograms: total variation regularization for bivariate smoothing,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1), 145-163. 21. Koenker, R., Ng, P., and Portnoy, S. (1994). “Quantile smoothing splines,” Biometrika, 81(4), 673-680. 22. Law, A. (2014). Simulation Modeling and Analysis (5th Edition). New York: McGraw-Hill. 23. Lian, H. (2012). “A note on the consistency of Schwarz’s criterion in linear quantile regression with the SCAD penalty,” Statistics & Probability Letters, 82(7), 1224-1228. 24. Mammen, E., Marron, J. S., Turlach, B. A., and Wand, M. P. (2001). “A general projection framework for constrained smoothing,” Statistical Science, 16(3), 232-248. 25. McKay Curtis, S. and Ghosh, S. K. (2011). “A variable selection approach to monotonic regression with Bernstein polynomials,” Journal of Applied Statistics, 38(5), 961-976. 26. Plumlee, M. and Tuo, R. (2014). “Building accurate emulators for stochastic Simulations via quantile Kriging,” Technometrics, 56(4), 466-473. 27. Ramsay, J.O. (1988). “Monotone regression splines in action,” Statistical Science, 3(4), 425– 441. 28. Ramsay, J.O. and Silverman, B.W. (2005). Functional Data Analysis (2nd Edition). New York: Springer. 29. Reich, B. J., Fuentes, M., and Dunson, D. B. (2011). “Bayesian spatial quantile regression,” Journal of the American Statistical Association, 106(493), 6-20. 30. Rostek, M. (2010). “Quantile maximization in decision theory,” The Review of Economic Studies, 77(1), 339-371. 31. Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). “Design and Analysis of Computer Experiments,” Statistical Science, 4(4), 409-423. 32. Saltelli, A., Chan, K., and Scott, M. (eds) (2000). Sensitivity Analysis. New York: Wiley. 33. Silverman, B. W. and Young, G. A. (1987). “The bootstrap: To smooth or not to smooth?,” Biometrika, 74(3), 469-479. 34. Tan, M.H.Y. (2014). “Stochastic Polynomial Interpolation for Uncertainty Quantification with Computer Experiments,” Technometrics, (just-accepted). 35. Wang, J. and Ghosh, S. K. (2011). “Shape Restricted Nonparametric Regression Based on Multivariate Bernstein Polynomials,” NC State University Department of Statistics Technical Report #2640. 36. Wang, J. and Ghosh, S. K. (2012). “Shape restricted nonparametric regression with Bernstein polynomials,” Computational Statistics & Data Analysis, 56(9), 2729-2741. 37. Wu, C.F.J. and Hamada, M.S. (2009). Experiments: Planning, Analysis, and Optimization (2nd Edition). New York: Wiley. 38. Wu, Y. and Liu, Y. (2009).” Variable selection in quantile regression,” Statistica Sinica, 19(2), 801-817. 39. Xiu, D. (2010). Numerical methods for stochastic computations: a spectral method approach. New Jersey: Princeton University Press.

25

Online Supplement for the Paper Titled “Monotonic Quantile Regression with Bernstein Polynomials for Stochastic Simulation” Matthias H.Y. Tan Systems Engineering and Engineering Management Department, City University of Hong Kong Appendix A: Proofs of Mathematical Results Proof of Equation (10) = ( , ,

Denote

= ( , ,

) and

). We split ∑ = 0 for

2 sums, where each sum is obtained by fixing ⋯∑

= ∑

∪ℬ { ,…, }, ∩ℬ ∅ ∑ ∈ℬ ∑

( ,…,

)∏ ∈

= ∑

∪ℬ { ,…, }, ∩ℬ ∅ ∑ ∈ℬ ∑

( ,…,

)∏ ∈

= ∑

∪ℬ { ,…, }, ∩ℬ ∅ ∑ ∈ℬ ∑

∑ ∈ ∑

=∑

⋯∑

)∏

∑

=∑

∈{ , }

Theorem 2: Let : 0,1 ( )= if

+∑

( ,…, )

( )(

,…,

)∏

into

⊂ {1, … , }:

∈

1 ∏ ∈ℬ ∏ ∈ℬ

∑ ( ,…,

∏ ∈ℬ

∏ ∈

)

(A1)

. can only be found with

> 0, terms involving ∏

hand, if ( ,…, )

( ,…,

= 0, terms involving ∏

If

( ,…,

)∏

∑

( ,…,

⋯∑

( ,…,

= 0∀

or

in (A1). On the other ∈ ℬ. This implies that

).

→ ℝ be a function. Then, )+∑

∈

can be found with

∈

∑

,

(, )

has a unique FANOVA decomposition: +⋯+

∈ { ∈ 0,1 : ∏

,…,

( , ,…, ) (

)

(A2)

= 0}, = 1, … , .

Proof: We first show that such a decomposition exists. Define ( )( ( , )(

)=(

= 0, … ,

, )=

and similarly for

= 0,

= ,

= 0, … ,

= 0,

= ,

( , , )(

, , ). We set

= 0,

= 0),

= 0, … ,

= 0,

= ,

= 0, … ,

=0 ,

= ( ), 1

( )(

)=

(, )

,

=

( )(

)−

( )

(, , )

( )(

,

(, )

−

) −

+

( )

,

+

( )(

, ,

−

, ,

−

) +

( )

,

(, )

( , )(

−

(, )

,

−

+

( )(

) −

)−

,

(, )

( , )(

,

) −

−

( )(

)−

( ) (0)

=

,

( , )

,

−

−

( )

,

and similarly for all higher order interactions. It is easy to see that ( , )(

=

, (, , )

)−

( )(

−

=

(, , )

( )(

,

, ,

( , )

=

) −

, 0) =

0, ,

(, , )

=

( , , )(

)=

, 0,

(, , )

0,

(, )

=

, , 0 = 0 and similarly for all

higher order interactions. To show that the decomposition is unique, suppose that there is another decomposition:

0,

(, )

( )(

+∑

( )= 0,

=

( , )(

( )( ( )(

) −

)−

( )

=

∑

( )(

−

( , ,…, ) (

(, )

= ,

+ ⋯+

) = 0 ∀ ∈ =

. Setting

). Setting

=

,

(, )

, 0) = 0, … , = ( )=

Then, we must have

)+∑

(, )

,

−

( )(

( )(

( )

), where

∈ 0,1 : ∏

) gives

yields )−

( , ,…, ) (

)=

,

=

−

=

=

=0 .

( )(

(, )

( ) (0)

( )(

(, )

,

(, )

) − ,

= −

. The proof

that higher order interactions of both decompositions are equal can be obtained by induction. Theorem 4: Let : 0,1 {1, … , } and ( )| ) =

→ ℝ be a continuous function that is increasing in

be a density on 0,1 . Let

,

∈Ω⊂

= ( ) + , where ( ≤ 0| ) = (

≤

∈ (0,1). Assume that ℎ ( ) ( ) is the probability density function of ( , ).

Denote by ℱ ( ) the function ℱ( ) =

( )|( = , ) (see (18)) with

set to

=

2

argmin where {

{

,

− ℱ( ) −

− ( )} ( )

,…,

{

,

}→

−ℱ ( ) −

∈

,

. Then,

is a compact set that contains the coefficients of

lim

: (15) holds for all ∈ Ω,

− ( )} ( )

= 0.

Proof: Note that ( ) − ( ) − ( ) > 0, − ( (1 − ) ( ) − ( ) − ( ) < 0, − − ( ) = ( )− ( ) − ( )− ( )< − ( )− ( ) + ( )− ( )
0 ( ) ( ), then, − ( ) −

{ =

( )− ( ) ℎ ( )

( )

( ) + =

− ( )}

( )−

(1 − ) ( ) − ( ) ℎ ( )

( ) {− ( )

+

( )−

}ℎ ( )

( )− ( )

( ) +

( )

+

( )−

≥ ( ) + (1 − ) 1

( )≤

( )− ( ) −

≤ ( ) ℎ ( )

( )− ( )

(A4) − ( ) −

(A3) and (A4) are bounded from below by 0. Suppose that ( ) = defined in Corollary 2. Because the Bernstein bases ( , , ( )(

) = ∑

≤

.

It is well known that ( ) = ( ) minimizes {

sum to less than or equal to one,

( )≤

( )(

( ), where

), = 1, … ,

/ ) ( , ,

− ( ) } because ( ) is as

are nonnegative and

) ≤∑

( )(

/ 3

) ( , ,

) ≤ max

( )(

∈ ,

) . Similarly,

(, )

,

≤ max

∑

max

,

,

(, )

∈ ,

.

Thus, ( ) ≤| |+∑

max

( )(

∈ ,

) +∑

,

∈ ,

,

(, )

=

.

Consequently, we have ( ) −

−

− ( )

( + | ( )|) − ( ) > 0, − ( (1 − )( + | ( )|) − ( ) < 0, − ≤ ( + | ( )|) + ( + | ( )|) ( ) < ( + | ( )|) + ( + | ( )|) ( ) < ≤ ( + | ( )|)1(

+

> 0,

− ( ) > 0) + (1 − )( + | ( )|)1(

( ) < 0) + (1 + )( + | ( )|)1 − )=

)>0 ( )0

1.000

0.760

0.690



Rearranged Quantile GP FANOVA Bernstein

MSE

MAE


MSE

Mean

0.158

Mean

1.423

0.068

0.043

Std

0.089

Fraction >0

1.000

0.680

0.650

Mean

0.293

Mean

0.555

0.047

0.027

Std

0.069

Fraction >0

1.000

0.670

0.610




MAE


MSE

MAE

Mean

0.052

Mean

0.813

0.069

0.059

Std

0.024

Fraction >0

1.000

0.940

0.900

Mean

0.167

Mean

0.496

0.082

0.071

Std

0.035

Fraction >0

1.000

0.920

0.870

Predictions of the quantile GP on the grid {0,0.25,0.5,0.75,1} obtained in the simulations indicate that it is almost never monotonic. Table B2 presents the simulation mean and standard deviation of the MSE and MAE for the FANOVA Bernstein model. The mean difference in MSE and MAE between the multivariate Bernstein/quantile GP/rearranged quantile GP and the FANOVA model is also given in Table B2. In all cases the mean difference is positive, i.e., the FANOVA Bernstein model performs better in the simulations. We see that the 10

FANOVA Bernstein model performs much better than the multivariate Bernstein model in all cases. Table B2 also suggests that the FANOVA Bernstein model is better than the quantile GP and rearranged quantile GP models. The fraction of simulation runs in which the difference in MSE/MAE between a competing model and the FANOVA Bernstein model is positive, which are presented in Table B2, can be used to test the hypothesis that the FANOVA Bernstein model gives more accurate estimates more often than the competing model. Since ( /100 ≥ 0.62) = 0.0105, where

is a binomial random variable with 100 trials and success probability 0.5, we

can say that the FANOVA Bernstein model is significantly better whenever the observed fraction is larger than or equal to 0.62. Thus, we see that FANOVA Bernstein model is significantly better (with respect to both MSE and MAE) in all cases except when compared based on the MAE to the rearranged quantile GP in estimating the 0.05 quantile. Table B3: Number of times FANOVA component is selected out of 100 simulations Number of times selected Component = 0.95 = 0.05 = 0.5 (1) 100 100 100 (2) 19 23 24 (3) 21 23 25 (4) 100 100 100 (1,2) 0 3 1 (1,3) 0 1 1 (1,4) 100 100 100 (2,3) 1 2 1 (2,4) 1 4 0 (3,4) 3 4 0 Finally, we examine the model selection results for the FANOVA Bernstein model. Table B3 presents the frequencies with which each FANOVA component is selected. We see that in all simulations,

( ), ( ),

and

( , )

are selected for all three values of . The significance of these

components is confirmed by Figure B2 for = 0.05. Figures similar to Figure B2 are obtained for

= 0.5 and

= 0.95. We also see that

( )

and

( )

are quite often selected to enter the 11

model even though Figure B2 suggests that

and

are inert for

knowledge and some analysis of the validation data suggest that

= 0.05. However, physical and

do have an effect on

. Reducing the mean disassembly/ assembly time should reduce the time it takes to fix a machine that breaks down, and this should cause an increase in average number of machines in the production system. The average values of see that the average value of is around three for

,

,

, and

are presented in Table B4. We

is around two for all three values of and the average value of

= 0.95 and

= 0.5 and is near four for

= 0.05.

values = 0.05 2.15 0.38 0.43 3.78

= 0.5 2.05 0.39 0.33 3.17

Table B4: Average = 0.95 2.42 0.25 0.25 2.9 References

1. Daryanto, A., van Ommeren, J. C. W., and Zijm, W. H. M. (2003). “A closed-loop twoindenture repairable item system.” In Proceedings of the Fourth Aegean International Conference on Analysis of Manufacturing Systems, 1-4. Appendix C: Simulation Results for Example 1 We perform 200 simulation runs to compare the prediction accuracy and confidence/ credible interval coverage of the FANOVA Bernstein, cubic Berstein, quantile GP, and rearranged quantile GP models for estimating the 0.1, 0.5, and 0.9 quantiles of

as a function of

. The design has four distinct points {0.02,0.34,0.66,0.98} and ten replicates at each point. We assess the prediction accuracy by computing the mean square error (MSE) and mean absolute error (MAE) on the test set {0,0.01, … ,1}. Quantile estimates based on 50,000 samples are taken as the true values. The simulation mean and standard deviation of the MSE and MAE for the FANOVA Bernstein model are given in Table C1. The mean differences in MSE and MAE between the cubic Berstein/quantile GP/rearranged quantile GP and the FANOVA Bernstein 12

Table C1: Quantile estimation performance of FANOVA Bernstein model and comparison with cubic Bernstein, quantile GP, and rearranged quantile GP models for Example 1. The table gives the simulation mean and standard deviation of the MSE and MAE for the FANOVA Bernstein model, mean difference in MSE and MAE between a competing model and the FANOVA Bernstein model, and fraction of simulations in which the difference is positive. = 0.9 FANOVA Bernstein

MSE

MAE

Cubic Bernstein FANOVA Bernstein



Mean

0.34

Mean

0.532

0.110

0.097

Std

0.79

Fraction >0

0.775

0.635

0.610

Mean

0.35

Mean

0.238

0.073

0.065

Std

0.28

Fraction >0

0.795

0.625

0.615





MSE

Mean

0.011

Mean

0.0049

0.0060

0.0048

Std

0.015

Fraction >0

0.640

0.640

0.630

Mean

0.070

Mean

0.0161

0.0172

0.0151

Std

0.040

Fraction >0

0.645

0.690

0.680




MAE


MSE

MAE

Mean

0.049

Mean

0.0070

0.0072

0.0046

Std

0.078

Fraction >0

0.510

0.525

0.510

Mean

0.138

Mean

0.0099

0.0100

0.0070

Std

0.074

Fraction >0

0.525

0.560

0.545

model are also given in Table C1. In all cases the mean difference is positive, i.e., the FANOVA Bernstein model performs better in the simulations. The cubic Bernstein model performs poorly for estimating the 0.9 quantile because the mean differences are large. Since the same data is used to fit the four models in each simulation, the fraction of simulations in which the difference in MSE/MAE is positive can be used to assess whether the FANOVA Bernstein model is significantly better. These fractions are given in Table C1. Since ( /200 ≥ 0.585) = 0.0097, where

is a binomial random variable with 200 trials and success probability 0.5, we can say

that the FANOVA Bernstein model is significantly better if the observed fraction is larger than or equal to 0.585. We see that FANOVA Bernstein model is significantly better (with respect to 13

both MSE and MAE) than all other models for the 0.1 and 0.9 quantile but not the 0.5 quantile. Table C2: Mean and standard deviation of coverage of 95% confidence/credible intervals given by FANOVA Bernstein, quantile GP, and rearranged quantile GP models. = 0.9 = 0.1 = 0.5

FANOVA Bernstein

Quantile GP

Rearranged Quantile GP

Mean

0.86

0.47

0.49

Std

0.20

0.40

0.39

Mean

0.87

0.41

0.42

Std

0.21

0.40

0.39

Mean

0.96

0.47

0.48

Std

0.09

0.42

0.42

We also assess the confidence/credible interval coverage of the FANOVA Bernstein (200 bootstrap samples), quantile GP, and rearranged quantile GP models. The fraction of 95% confidence/credible interval that contains the true values at {0,0.01, … ,1} (coverage) is recorded in each simulation. Table C2 presents the mean and standard deviation of the coverage. It is seen that the FANOVA Bernstein model gives coverage closer to the nominal coverage than the quantile GP or rearranged quantile GP. The poor performance of the latter two models is likely due to the erroneous assumption of constant error variance and the inadequacy of the design (GP models tend to perform better when the design has many distinct runs). Plumlee and Tuo (2014) do not indicate how the quantile GP can be modified when the error variance is heterogeneous. Lastly, we point out that the rearrangement procedure only yields small improvements to the estimation accuracy and credible interval coverage, as can be seen from Table C1 and Table C2. Appendix D: Illustrations and Model Selection Results for Example 2 For illustration, we plot in Figure D1a estimates of the 0.1, 0.5, and 0.9 quantiles of versus

conditional on ( ,

,

,

) = (0.5,0.5,0.5,0.5) given by the FANOVA Bernstein and

rearranged quantile GP for one dataset. The design is a 40-run cosine MLHD replicated 10 times. The figure shows that the rearranged quantile GP can have sharp corners. In contrast, the FANOVA Bernstein model is smooth. Figure D1a (solid lines) and (22) suggest that the true conditional quantile functions are smooth. Thus, the FANOVA Bernstein model appears to be 14

more suitable than the rearranged quantile GP for this problem. In Figure D1b, we plot point estimates and 95% confidence/credible intervals of the 0.1 and 0.9 quantiles given by the FANOVA Bernstein and rearranged quantile GP models for one dataset. Again, a 40-run cosine MLHD replicated 10 times is used. We see that the confidence/credible intervals are narrow. (a) 0.1,0.5,0.9 Quantiles of Temperature versus Z2 with (Z1,Z3,Z4,Z5)=(0.5,0.5,0.5,0.5)

(b) 0.1, 0.9 Quantiles of Temperature versus Z2 with (Z1,Z3,Z4,Z5)=(0.5,0.5,0.5,0.5)

260

250

250

240

240 Temperature

Temperature

260

230 220

230 220 210

210

200

200

190 190 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

Time (Z2)

0.6

0.8

1

Time (Z2)

Figure D1: (a) 0.1, 0.5, and 0.9 quantiles (estimated with 5000 simulations) of temperature versus time conditioned on ( , , , ) = (0.5,0.5,0.5,0.5) (solid lines), and point estimates of the quantiles given by the FANOVA Bernstein (dashed lines) and rearranged quantile GP (dotted lines) models for one dataset; (b) 0.1 and 0.9 quantile of temperature versus time (solid lines), point estimates and 95% confidence/credible intervals of the quantiles given by the FANOVA Bernstein (dashed lines), and quantile GP (dotted lines) models for one dataset. We examine the model selection results for the FANOVA Bernstein model. Note that a model expressed in terms of a handful of basis functions like the FANOVA Bernstein model can be very useful for understanding and interpreting the computer simulation model. Table D1 presents the frequencies with which each FANOVA component is selected. We see that in all simulations,

( ), ( ), ( ),

and

( , )

are selected. The significance of these components is

confirmed by Figure D1 and Equation (22) (observe from the equation that ℎ(

,

,

,…,

)). We also see from Table D1 that

which suggests that inputs

and

and

( )

+

are sometimes selected,

have only small effects. The average values of

are given in Table D2. The average value of = 0.1 to 5.6 for

( )

=

,…,

is the largest for each and range from 3.1 for

= 0.9, which indicate that Bernstein bases of high degrees are needed to 15

model the asymptotes seen in Figure D1 accurately. The average values of one, which is the right value to use as (22) implies that

=

+

ℎ(

,

and ,

are close to ).

,…,

Table D1: Number of times FANOVA component is selected out of 50 simulations Component

Number of times selected

(1)

(2)

(3)

(4)

(5)

= 0.9

4

50

50

50

8

0

0

0

0

50

7

0

0

2

0

= 0.1

17

50

50

50

15

4

3

15

3

50

13

1

1

2

2

= 0.5

12

50

50

50

14

2

1

3

3

50

10

0

0

5

2

= 0.9 = 0.1 = 0.5

(1,2) (1,3) (2,3) (1,4) (2,4) (3,4) (1,5) (2,5) (3,5) (4,5)

Table D2: Average values 1 2 3 4 0.1 5.56 1.18 1.02 0.48 3.08 2 1.54 0.28 3.62 1.26 1.06

5 0.18 0.56 0.32

16