Optimal Design of Nonlinear Multifactor Experiments

16 downloads 0 Views 410KB Size Report
Mar 17, 2018 - which can produce the best known designs for these problems. ..... users have bad luck with the choice of tuning constants. Usually when ...
arXiv:1803.06536v1 [stat.CO] 17 Mar 2018

Optimal Design of Nonlinear Multifactor Experiments Yuanzhi Huang School of Mathematics and Statistics University College Dublin, Dublin, Ireland Insight Centre for Data Analytics University College Dublin, Dublin, Ireland ([email protected]) and Steven G. Gilmour Department of Mathematics King’s College London, Strand, London, United Kingdom and Kalliopi Mylona Department of Statistics Universidad Carlos III de Madrid, Madrid, Spain Department of Mathematics King’s College London, Strand, London, United Kingdom and Peter Goos Department of Mechatronics, Biostatistics and Sensors KU Leuven, Leuven, Belgium Department of Engineering Management University of Antwerp, Antwerp, Belgium March 20, 2018 Abstract Most chemical and biological experiments involve multiple treatment factors. Often, it is convenient to fit a nonlinear model in these factors. This nonlinear model can be mechanistic, empirical or a hybrid of the two, as long as it describes the unknown mechanism behind the response surface. Motivated by experiments in chemical engineering, we focus on D-optimal design for multifactor nonlinear response surfaces in general. In order to find and study optimal designs, we first implement conventional point (coordinate) exchange algorithms. Moreover, we develop a novel multiphase optimisation method to construct D-optimal designs with improved properties that are illustrated in our results. The benefits of this method are demonstrated by application to two experiments. For each case, highly efficient and feasible designs are found and tailored to the nonlinear regression models. Such designs can be easily used for comprehensive mechanism studies and are shown to be considerably more informative than standard designs.

Keywords: Continuous optimisation; D-optimality; Multiphase optimisation; Nonlinear model; Parameter estimation; Response surface. 1

1

Nonlinear Multifactor Models

In chemical and biological studies, experimenters often wish to explore mechanisms relating controllable input variables (treatment factors) to observed outputs, which give rise to data on response surfaces. To describe an unknown mechanism, it is convenient to fit a model in the factors that affect experimental responses. In this paper, we are interested in the estimation of the parameters in experiments, when a model is known or assumed at the design stage. A statistical model can be either mechanistic (theoretical) or empirical. Unlike common empirical models, mechanistic models tend to be nonlinear but more frugal in the use of parameters. For some discussion of mechanistic studies, see Box and Hunter (1965). The mechanistic model is often derived from scientific theories and formulas, though the real mechanism might be somewhat simplified. A fitted mechanistic model is still an approximation to the true response surface, but it can lead to more scientifically informative interpretations than some empirical models. Although less often used, these benefits apply even when the unknown mechanism involves multiple controlled factors. Derivation of a mechanistic model requires strong assumptions about the form, apart from the true parameter values, of the response surface model function. Sometimes, however, we cannot acquire this information prior to the experiment being planned, depending on the current phase of study of the mechanism. Under these circumstances, it is advisable to resort to empirical modelling techniques, while still taking into account the known parts of the mechanism when building the model. These kinds of models can be referred to as hybrid models, incorporating both mechanistic and empirical features. Particularly for mechanistic models, it is recommended to use model-oriented optimal design of experiments, since no standard designs exist for such models. It would require advanced optimisation methods to search for such designs (Chaloner and Larntz, 1989; Gotwalt et al., 2009; Gilmour and Trinca, 2012; Goos and Mylona, 2017; Overstall and Woods, 2017) and sell this idea to experimenters in practice. Little work has been done on designing multifactor experiments for hybrid models or similar nonlinear models in general. This is the focus of this paper. Optimal design for a few specific classes of nonlinear model have been studied in, particularly, some early work on two-factor inverse polynomial response surface models for agricultural experiments (Mead and Pike, 1975), several studies with generalised linear models, e.g. Woods et al. (2006); Stufken and Yang (2012), and recent pharmacokinetic studies with nonlinear mixed models (Bogacka et al., 2015). In the remainder of this section, we introduce two experiments which illustrate the work presented. In Section 2, commonly used existing methods for constructing optimal designs are described in some detail. We then build on these methods in Section 3 to describe a novel continuous optimisation algorithm for finding optimal designs. The application to the illustrative experiments is described in Section 4. The designs obtained can be considerably better than those obtained from standard methods, but there is still room to improve the computation and optimisation process. In Section 5, we combine the continuous optimisation method with traditional ones, along with a final adjustment to obtain a computationally efficient optimisation which can produce the best known designs for these problems. This version is applied to the illustrative applications in Section 6. Final comments and recommendations are made in Section 7.

1.1

Illustrative Applications

Our methods can be useful in any experiment assuming a nonlinear model, as is often done for describing a chemical reaction mechanism. Experimentation in this case can be on a small scale for a limited amount of data (e.g. with not much more than 20 runs in total), using a standard or

2

any other practical design. The model itself could be multifactor and complex, but the optimal design we find in the end may be even simpler and more meaningful than those standard designs. This is demonstrated in two examples which both use central composite designs (CCDs) and are typical of the types of experiment performed in chemical engineering studies. Our first experiment is described in Box and Draper (1987, chapters 11-12), where both an empirical full second-order linear model and a mechanistic nonlinear model were fitted to the data. A pilot plant was operated in a continuous stirred reactor, for which the chemists wished to approximate the response surface and interpret the mechanism it may imply. The presumed mechanism, which was derived after the experiment was conducted, describes a twostep consecutive decomposition reaction of a chemical solution, where the intermediate is the output stream observed from the reaction. The yield η is the amount of the desirable product that has been formed. Meanwhile, three factors are under control: the flow rate R (L/min) at which the chemical flows into the reactor, the catalyst concentration C and xT = 1000/(T + 273) where T is the temperature (◦ C). Yield data were obtained using a 24-run spherical CCD. The full second-order linear model involving 10 parameters is fitted using scaled variables x1 =

log(C) − log(2) T − 80 log(R) − log(3) , x2 = , x3 = . log(2) log(2) 10

The variable space (or design region) X is assumed to be cuboidal, i.e. (x1 , x2 , x3 ) ∈ [−1, 1]3 , which is equivalent to (R, C, T ) ∈ [1.5, 6] × [1, 4] × [70, 90] without the above transformation. In contrast, the characteristics of this chemical reaction can be better understood if we fit the mechanistic model ηi =

Ciθ1 θ0 Ri exp(θ2 Xi ) θ10

(Ri + Ci θ00 exp(θ20 Xi ))(Ri + Ciθ1 θ0 exp(θ2 Xi ))

+ εi ,

(1)

where Xi = 0.0028344 − 1/(Ti + 273) and εi is the error term for the ith run of the experiment. This model is derived from a number of differential equations and the Arrhenius equation for temperature effects. We will consider how to plan a new experiment and precisely estimate parameters θ0 , θ00 , θ1 , θ10 , θ2 , θ20 , given the mechanistic model now assumed. In the second application, the experimenters’ interest is in the enzymatic depolymerisation mechanism of a dextran substrate (Mountzouris et al., 1999). In a stirred-cell membrane reactor, endodextranase is used as the enzyme activator, while the products of the chemical reaction are different kinds of oligodextrans. The controlled factors are the substrate concentration S (2.5-7.5% in weight/volume), the enzyme concentration E (0.625-62.5 Units ml−1 × S) and the transmembrane pressure P (200-400 kPa). In this paper, the response we consider is the substrate conversion rate ξ (%). The data were obtained using an 18-run face-centred CCD as given in Mountzouris et al. (1999). No mechanistic model can be found to describe the response surface for ξ, so the alternative is to use an empirical approximation. In spite of the simplicity of deriving purely empirical polynomial models, it is worth considering a hybrid model which could have some mechanistic implications and so catalyse scientific reasoning. First, we can observe a negative effect from the experimental data: the higher the substrate concentration, the lower the conversion rate ξ. This suggests a smooth function E(ξ/(100 − ξ)) = γ10 S/(γ20 + S), where E(ξ/(100 − ξ)) is the expectation of the transformed response and γ10 , γ20 are constants. The nonnegative γ10 is then replaced with an empirical exponential function in terms of xE = log10 (E/6.25) ∈ [−1, 1] and xP = (P − 300)/100 ∈ [−1, 1]. Thus we obtain a hybrid nonlinear model exp(a0 + a1 xE,i + a2 xP,i + a3 xE2,i + a4 x2P,i )Si ξi = + εi , 100 − ξi a5 + Si 3

(2)

where εi is the error term and a0 , a1 , a2 , a3 , a4 , a5 are six parameters. For details of the model derivation from our conjectures, see APPENDIX A. We assume that model (2) should be studied for which efficient experimental design is needed for estimation.

2 2.1

Methods for Optimal Design Local D-Criterion for Nonlinear Models

We must select an exact experimental design X , i.e. for each of n runs, we need to choose a level for each of the v factors. For i = 1, 2, . . . , n, the ith response is Yi = f (Xi , θ) + εi , where Xi is the ith row of the n × v matrix X and θ denotes the set of p parameters. Under least squares estimation, the errors εi must be uncorrelated and come from identical distributions with zero mean and constant variance V(εi ) = σ 2 . If the function f (Xi , θ) is linear, the matrix form of the model is Y = F θ + ε. Here, Y is the column vector of the n responses, ε is the column vector of the n error terms, and F is the n × p model matrix based on the design X . We assume the primary experimental purpose to be precisely estimating the parameters of the model. This requires us to minimise the elements of the p × p variance-covariance matrix ˆ = (F T F )−1 σ 2 , where from now on we will assume without loss of generality that σ 2 = 1. V(θ) If the model is linear, F T F is the information matrix. To ensure that X contains the maximal amount of information about θ, it makes sense to use the so-called D-optimality criterion to search for a design X to maximise the D-criterion function φ = log F T F ∈ (−∞, +∞), the determinant of the information matrix in terms of X on a logarithmic scale. Let φA and φB be the respective D-criterion values of any two designs XA and XB . The relative efficiency of XA with respect to XB is exp(φA /p) 100% ∈ (0, +∞). eff = exp(φB /p) However, to derive the D-criterion function of any nonlinear model (regardless of its functional form), the widely used approach is to linearise f (Xi , θ) in θ (see e.g. Atkinson et al. (2007, chapter 17)). With a first-order expansion about a selected centre θ 0 ,  p  X ∂f (Xi , θ) f (Xi , θ) ≈ f (Xi , θ ) + (θj − θj0 ). ∂θ θj =θj0 j j=1 0

(3)

The column vector θ 0 is often chosen to be the point prior estimate of the p parameters θ. As a result, for i = 1, 2, . . . , n, the ith row of F is Fi =

∂f (Xi , θ) . ∂θ θ=θ 0

Given θ 0 , the information matrix F T F is identical to the Fisher information of X . Therefore, φ is called the “local” D-criterion function, since its value is associated with the substitution θ = θ 0 . The choice of the prior θ 0 depend on similar experiments reported in the literature, the experimenters’ expertise or even a “best guess” about the response surface. When the difference θ − θ 0 is a zero vector, there is no error in (3) and the design is truly optimal. As the difference increases, the linearisation tends to be less successful at the true values of θ. Then the design might be suboptimal, though it is still valid in the sense that it gives asymptotically unbiased estimators of all the parameters. For Example 2, there is a sensitivity study in Section 6, focusing on the robustness of locally optimal design to moderate variation in the values of θ.

4

Because of the dependence of Fisher information on the model, the parameter prior, and the number of experimental runs, the optimal designs for nonlinear models may be quite different from those for second-order or even higher-order linear models. Hence we need to evaluate experimental assumptions case by case, and use the exact optimal design if feasible. To search for an exact design X comprising n × v coordinates on factor settings, we develop an algorithm that can derive and quickly integrate the local D-criterion function in iterative search and optimisation. This should be able to work with any nonlinear parametric model, independent of model assumptions and, up to computational limitations, any experimental size. The essential inputs to our algorithm are model function f (Xi , θ), experimental size n, and v-dimensional variable space X .

2.2

Iterative Discrete Optimization

Fedorov (1972) built the theoretical framework for the earliest algorithm for locally D-optimal design (LDOD), which should discretise the continuous variable space X to obtain a set Ω of N candidate points. For instance, for a first-order linear model with three controlled variables, two candidate levels should be defined for each variable (i.e. the maximum and minimum). Thus there are 2 × 2 × 2 = 8 candidate points in total. A full second-order linear model demands at least one more candidate level for each factor such that there could be at least 3 × 3 × 3 = 27 candidate points. Defining candidates is much harder for nonlinear models, since there is often little information about possibly best candidates. A well-defined Ω leads to the traditional discrete optimisation through numerous iterations, each of which can exchange at most one point of X with one candidate. The more candidates we include, the more reliable the optimisation is but the computational time would increase in LDOD. Despite the dependence between iterations, X can usually be optimised as a whole in the end, though there is no guarantee of this. To speed up the computation, the modified Fedorov exchange algorithm (Cook and Nachtsheim, 1980) streamlines the iterative procedure of the Fedorov exchange algorithm, i.e. up to n points of X can be updated in each iteration and an exchange will be executed whenever a clear improvement in the criterion function is achievable. Table 1 illustrates an algorithm we adapt for nonlinear models which, like all other computation in this paper, is implemented in Matlab environment. Here, we define Υ as an indicator variable. Υ = 1 indicates that at least one improvement has been made on X in the current iteration, in which case we should evaluate all the points again in a new iteration. We also let |F T F |new denote the determinant of the updated information matrix after substitution Xi = Xnew,i in X . The iteration can take advantage of the updating function of |F T F | for the D-criterion (Fedorov, 1972). This is defined as the relative improvement function di = |F T F |new /|F T F |, for i = 1, 2, . . . , n. As such, maximising this function is equivalent to maximising the unconstrained D-criterion function defined in terms of X , where the points of X should be bounded within X . A smaller critical value in STEP 4 would lead to more iterations and updates of X . Suppose we reduce this value from 1.0001 to 1.00000001, an additional number of minor improvements could be made when di ∈ (1 + 10−8 , 1 + 10−4 ]. As a result, X attained in STEP 6 may converge further towards (and get stuck in) a local optimum. We will discuss in Section 3 how to determine the critical value. Under the point exchange approach (PEA), the iteration is broken into n dependent steps, one for each row of X . Therefore, to compare the different local maxima of the criterion function in STEP 6, we should do a number of tries, i.e. random starting designs, to secure an efficient solution Xd . The alternative to PEA is the coordinate exchange approach (CEA). For discrete optimisation, this would interact with a unidimensional subset Ωk at each step of the iteration, which consists of the candidate coordinate levels of the kth factor. Compared with the outline above,

5

Table 1: Modified Fedorov Exchange Algorithm for Nonlinear Models 1 Generate an initial X giving a nonsingular design by means of n independent random draws (with replacement) from Ω of N candidate points within X . After substituting θ = θ 0 , compute F and φ. 2 Set index i to 1 and Υ to 0 at the start of each iteration, which will work through all n rows of X , and the corresponding rows of F , sequentially. 3 The relative improvement function di is evaluated for each possible point exchange between the ith row of X and one of the candidate points in Ω. Execute the substitution Xi = Xnew,i in X , which is the exchange for maximal di value. 4 Update F and φ. If the above maximal di exceeds the critical value, set Υ to 1. 5 When i < n, set i to i + 1 and return to STEP 3. 6 When i = n and if Υ = 1, end the current iteration and return to STEP 2. Otherwise if Υ = 0, save the current X and the local maximal criterion value φ. 7 Redo STEP 1-6 for τ independent tries. 8 The LDOD Xd is declared to be the most efficient X attained in STEP 6, which corresponds to the largest value of φ. in STEP 3, instead of updating the current row Xi in one step of the iteration, at most one coordinate Xik shall be updated, as we can use the best candidate coordinate from Ωk , for k = 1, 2, . . . , v and i = 1, 2, . . . , n. As such, each iteration will take vn steps to optimise all the coordinates of X . When the experiment involves many runs and factors, this approach can be used to reduce the computational time compared with PEA. On the other hand, we would experience more trouble in the discrete optimisation, since the local search of optimal coordinates are more dependent on the previous iterations and the other coordinate levels of X . For solving very complicated and high-dimensional LDOD problems, some advanced heuristic search methods, such as the particle swarm optimisation (Wong et al., 2015; Phoa, 2016), are popular in the recent literature. These methods are able to learn from stochastic search and iteration, and tend to run faster than PEA or CEA. Nevertheless, heuristic solutions are suboptimal in nature (Blum and Roli, 2003; S¨orensen, 2015) and some would be inefficient when users have bad luck with the choice of tuning constants. Usually when searching for LDOD is not too computationally expensive, we should use PEA or CEA for more reliable solutions in low-dimensional cases.

3

A Novel Continuous Optimisation Method

In this section, we present a new, improved, computing method which overcomes the common weakness of all previous ones in Section 2: the candidate points for LDOD are limited to a fixed set and must be predefined, which could lead to lower efficiencies of the designs obtained (in particular for nonlinear multifactor experiments). To be more specific, candidate runs in the discrete optimisation are chosen in the variable space X . When we look for an efficient design Xd for a nonlinear experiment, it is often hard to decide on suitable discrete candidates. As a result, it is useful to explore and include more points outside the finite set Ω, which is composed of v sets of candidate coordinates. One might end up with a denser Ω, with smaller meshes 6

and narrower space between adjacent candidate points within X . This attempt would slow down the computation, however, and is not a fundamental solution to the problems of discrete optimisation. For most linear experiments with sufficient numbers of runs, the traditional discrete optimisation method works well because equally spaced levels are generally either optimal or very close to optimal. In contrast, when the model is nonlinear, we believe that a continuous optimisation of the D-criterion function over X in each iteration will be a more attractive choice. The optimisation could demand a high computational cost in order to figure out a number of specified factor levels for each point of X . Below is a succinct outline of the continuous optimisation algorithm with PEA: STEP 1-2 as in Table 1, but points of X are drawn from X instead of Ω. 3 To derive the function of di , substitute the ith row of X with the v controlled variables of the nonlinear model. Using the Nelder-Mead (or quasi-Newton) method, maximise di to attain the local numeric solution Xi = Xnew,i . STEP 4-8 as in Table 1. The alternative version with CEA can be constructed in a similar way, the difference being that we will optimise one coordinate at a time in STEP 3 and update X more often in each iteration. While the critical value in STEP 4 makes little impact in the discrete optimisation over a small number of candidates, it can be set to be smaller so as to allow for minor iterative improvements on X (i.e. a larger number of iterations on average). The rule of thumb is, the larger the number of candidate coordinates (infinite in the continuous optimisation case) and/or the larger the number of rows (and columns under CEA) of X , the smaller the critical value should be. In both of our examples, 1.0001 is a reasonable critical value for the continuous optimisation, which allows for 4-6 iterations on average. Similarly, one may trace the criterion value change (or relative improvement change, equivalently) between iterations and define an iteration-stopping criterion based on that (Gotwalt et al., 2009). Instead of using the critical value, another alternative is to fix the number of iterations to complete, which would limit the computational amount for each random try. As long as these values are carefully chosen, different iteration-stopping criteria will lead to very similar solutions of LDOD. We choose the Nelder-Mead method (Nelder and Mead, 1965; Press, 2007) to maximise the function of di ∈ (0, +∞) and thus the D-criterion function φ. This is subject to the boundary constraint Xi ∈ X . This simplex method excels at optimising complex functions. While we are able to look at all the candidates in discrete optimisation (due to limited size of Ω), this is no longer the case now. Because the continuous optimisation is candidate-free and based on iterative search over the variable space X , the solutions we find are guaranteed to be locally optimal only (but are close to be globally optimal under well-performed optimisation). When it is hard to approximate the unknown global optimum, e.g. in high-dimensional cases, we can also search from more than one vector of initial values. The quasi-Newton method is also useful in simple constrained optimisation, the solutions of which are often similar. Additionally, it can handle more flexible constraints on the variable space, at a cost of higher computational amount. In comparison, optimal design process in Gotwalt et al. (2009) is based on traditional coordinate exchange algorithm combined with a less accurate nonlinear optimisation method from Brent (1973, chapter 5), which is for unidimensional optimisations of coordinates in X (thus infeasible for PEA). While there is a bit lack of technical details about their development, it has been implemented in a commercial software package named JMP. In revisiting our first application in Section 6, we will look at the Brent’s method.

7

Table 2: 24-Run LDOD for the Linear Model R 1.5 1.5 1.5 1.5

C 1 1 1 2

T 70 80 90 70

R 1.5 1.5 1.5 1.5

C 2 4 4 4

T 90 70 70 90

R 1.5 3 3 3

C 4 1 1 2

T 90 70 90 70

R 3 3 6 6

C 2 4 1 1

T 80 80 70 70

R 6 6 6 6

C 1 1 2 2

T 80 90 80 90

R 6 6 6 6

C 4 4 4 4

T 70 70 90 90

If the total number of coordinates in design X is small (e.g. vn < 10), Chaloner and Larntz (1989) introduced a method for direct optimisation of X as a whole, but this quickly becomes unmanageable for large vn. The algorithm suggested here can be considered to be a middle ground between these extremes.

4

Numerical Results and Comparisons

We now demonstrate how nonlinear experiments can be optimally designed by applying PEA or CEA to the illustrative experiments in Section 1. LDOD found using different approaches will be compared with each other. All our demonstrations are in the same computing environment.

4.1

Example 1: Multifactor Mechanistic Model

For the experiment from Box and Draper (1987), we compare the results from the continuous optimisation algorithm to those of the traditional algorithms that use discrete optimisation. The benchmark/reference for LDOD we find is a face-centred CCD with two replicates of the axial points and four centre points, which is a modification of the design actually used, with each new point located within variable space X = [1.5, 6] × [1, 4] × [70, 90]. In APPENDIX√B, additionally, we evaluate two alternative standard designs: 1) a spherical CCD with radius 2, which is similar to the used design but restricted into the narrower variable space; 2) a BoxBehnken design. These three types of standard designs are all commonly used and require fewer runs than a full 3×3 factorial. Depending on the nonlinear response surface, under optimal design theory, one design (i.e. Box-Behnken design in this case) is more efficient than the other two. To evaluate the local D-criterion function for model (1), the prior θ 0 is taken to be the nonlinear least ˜ = {θ˜0 = 5.90, θ˜0 = 1.15, θ˜1 = 0.53, θ˜0 = −0.01, θ˜2 = 15475, θ˜0 = 7489}, squares estimate θ 0 1 2 from fitting (1) to the published experimental data. The criterion value of the reference design (i.e. face-centred CCD) is φ = −52.7712. Consider the full second-order polynomial model in terms of the scaled variables x1 , x2 , x3 . This empirical model fits the data well as shown in Box and Draper (1987, chapter 11), but its LDOD Xemp , shown in Table 2, understandably does not improve much on the reference CCD for parameter estimation in the mechanistic model (1) (the criterion value of Xemp is −51.0181). Our interest is in the LDOD for model (1), a mechanistic approximation to the response surface. To ensure an efficient design, we do τ = 100 random tries with the critical value fixed at 1.0001. We use the factor levels in the reference CCD as the candidate coordinates, which are Ω1 = {1.5, 3, 6}, Ω2 = {1, 2, 4}, Ω3 = {70, 80, 90}. The full candidate set Ω then consists of 33 = 27 points. We find LDOD Xd in Table 3, the criterion value of which is −49.7321 irrespective of the exchange approach. With PEA, X from best three tries are identical to the optimal solution Xd . This hints that we have attained the maximum of the local D-criterion function in terms of X . With CEA, the best three X are dissimilar and their criterion values are −49.7321, −49.7452, −49.7452. In practice, we could do more tries in this case to boost our confidence that the design we find is indeed most efficient. 8

Table 3: 24-Run LDOD with Discrete Optimisation R 1.5 1.5 1.5 1.5

C 1 1 1 1

T 70 80 90 90

R 1.5 1.5 1.5 1.5

C 1 4 4 4

T 90 70 70 70

R 1.5 1.5 1.5 1.5

C 4 4 4 4

T 80 90 90 90

R 1.5 3 3 6

C 4 1 1 1

T 90 70 70 80

R 6 6 6 6

C 1 1 1 4

T 80 90 90 70

R 6 6 6 6

C 4 4 4 4

T 70 70 80 80

Table 4: 24-Run LDOD with Continuous Optimisation R 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.7667 1.7667 1.7668 1.7668

C 1 1 1 1 4 4 4 4 4 4 4 4

(a) PEA T R 70 2.9812 90 2.9836 90 2.9848 90 5.8689 73.9282 5.8691 73.9316 5.8694 73.9444 6 73.9458 6 6 90 6 90 90 6 90 6

C 1 1 1 4 4 4 1 1 1 1 4 4

T 70 70 70 70 70 70 85.0823 85.1136 85.1208 85.1353 78.4433 78.4712

R 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.7304 1.7304 1.7305 1.7306

C 1 1 1 1 1 4 4 4 4 4 4 4

(b) CEA T R 70 3.4081 70 3.4102 90 5.8728 90 5.8747 90 5.8758 74.0128 5.8765 74.0131 6 74.0250 6 90 6 90 6 90 6 90 6

C 1 1 4 4 4 4 1 1 1 1 4 4

T 70 70 70 70 70 70 84.5574 84.7942 84.8235 84.9113 80.1277 80.1320

To facilitate iterative continuous optimisation, Matlab is instructed to do parallel computation. To compare this with the discrete optimisation over Ω, we also do 100 random tries and use the same critical value 1.0001. In each step of the iteration, the current point (or coordinate) also acts as the initial vector (or value) in optimising the criterion function. When PEA is used, the local D-criterion value of Xd (Table 4a) equals φ = −49.5528 (the mean value based on 100 tries is −49.7496). The mean number of iterations is 4.4, so it does not take many steps to obtain a local maximum of φ. When we use CEA, the Xd in Table 4b is found to be locally D-optimal. The maximal criterion value is −49.5573, which is no better than that of Table 4a. On average, the criterion value is −49.7843 and the number of iterations is 5. No clear difference between PEA and CEA is visible here. Exact replicate runs are rarer than near-replicates, as we see in Table 4, since the variable space X is continuous and implies an infinite number of feasible candidate runs. However, despite small differences in the later decimal places of the coordinates, it is clear what kind of coordinates we should use under the optimality criterion. Many of the runs are not far in distance from the candidates previously defined in the 3 × 3 × 3 set Ω. The result does not show much of an improvement from Table 3, which we found with the traditional discrete optimisation. With respect to Table 4b which consists of quasi-continuous coordinates (i.e. the real coordinate values are irrational numbers), the reference CCD is exp(−52.7712/6)/exp(−49.5528/6) ≈ 58.48% efficient and Xemp is 78.33% efficient. However, the relative efficiency is as high as 97.06% for the design in Table 3, which is made up of candidate points from Ω. It shows that, even though the model is complex and nonlinear, the discrete optimisation can be effective over those small but well-defined candidate sets.

9

Table 5: 18-Run Reference Face-Centred CCD in Mountzouris et al. (1999) S 5 5 5 5 5 5

4.2

E 6.25 6.25 62.5 6.25 6.25 0.625

P 300 200 300 300 300 300

ξ 73.6 81.6 76.0 69.4 73.6 50.5

S 2.5 7.5 5 7.5 2.5 5

E 62.5 6.25 6.25 0.625 0.625 6.25

P 400 300 400 200 200 300

ξ 95.2 77.3 69.0 43.3 62.8 74.0

S 7.5 2.5 2.5 7.5 7.5 2.5

E 62.5 6.25 0.625 0.625 62.5 62.5

P 400 300 400 400 200 200

ξ 82.7 90.0 55.2 87.0 96.0

Example 2: Hybrid Nonlinear Model

For the experiment in Mountzouris et al. (1999), we assume model (2). To set prior θ 0 , we fit (2) to the dataset of the 18-run face-centred CCD (Table 5, where no valid response was obtained ˜ = {˜ during the 16th run). The nonlinear least squares estimate is θ a0 , a ˜1 , a ˜2 , a ˜3 , a ˜4 , a ˜5 } ≈ {0.4340, 1.3140, −0.1059, −0.8224, 0.4105, −2.0633}, which acts as a reasonable prior for choosing a new design. The local D-criterion function value of the reference CCD is 31.7538. We assume the same variable space for the new experiment, which is a cuboid (S, E, P ) ∈ X = [2.5, 7.5] × [0.625, 62.5] × [200, 400]. Both enzyme concentration E and pressure P are scaled as in model (2). As an empirical approximation to the response surface, again, one can fit a full second-order linear model in terms of the scaled variables xS , xE , xP , where xS =

S−5 ∈ [−1, 1]. 2.5

It is easy to find the Xemp that is optimal over discrete candidates, the 10 × 10 information matrix of which is independent of the prior. If we use Xemp for the experiment but estimate the parameters θ of model (2), the local D-criterion value is 34.4783. Without doubt, model (2) leads to a better approximation to experimental data. The same critical value 1.0001 is used in our search for an 18-run X to maximise the local D-criterion function of (2). For discrete optimisation with PEA, the candidate set Ω of 33 points is based on the factor levels of CCD in Table 5. Out of 100 tries, the best three solutions X are similar, each criterion value of which is φ = 38.8433. Under CEA, the largest three values are 38.7313, 38.6514, 38.6514. PEA works better than CEA in this case. For the continuous optimisation, we let it start at multiple initial points in this case, in addition to the default, i.e. current point of X . We use a coarse set of 23 initial points {3.75, 6.25} × {1.9764, 19.764} × {250, 350} spreading over X . To maximise the criterion function in v = 3 factors, each time in iteration, there are in total (1 + 23 ) = 9 initial starts under PEA and four for unidimensional maximisation under CEA. We can see little difference between the two solutions in Table 6, as more than 10 of the 100 local D-criterion values (at different local maxima) have reached the maximum at 41.2246, in both cases. Based on the 100 tries, the mean criterion value is 41.1871 for PEA (requiring 4 iterations on average) and 41.1605 for CEA (4.8 iterations on average). The candidate set is less suitable here, so that the implementation of the continuous optimisation can make a bigger improvement in Xd . With respect to Table 6a, the out-of-date LDOD X which we previously find with the discrete optimisation is exp(38.8433/6)/exp(41.2246/6) ≈ 67.24% efficient. Likewise, the Xemp is 32.49% efficient and the reference CCD is 20.63% efficient, both of which are unsuitable for parameter estimation in the assumed model. This demonstrates forcefully that if a nonlinear multifactor model is to be used, a design for this model must be

10

Table 6: 18-Run LDOD with Continuous Optimisation S 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5

E 2.7970 2.7973 2.7974 19.275 19.280 20.647 20.648 23.248 23.249

(a) P 200 200 200 400 400 200 200 288.37 288.37

PEA S 2.5 2.5 2.5 2.5 2.5 2.5 3.0978 3.1501 3.1502

E 62.5 62.5 62.5 62.5 62.5 62.5 62.5 31.239 31.255

P 200 200 288.88 288.88 400 400 200 200 200

S 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5

E 2.7970 2.7970 2.2978 19.276 19.279 20.641 20.650 23.247 23.250

(b) P 200 200 200 400 400 200 200 288.37 288.37

CEA S 2.5 2.5 2.5 2.5 2.5 2.5 3.0978 3.1502 3.1502

E 62.5 62.5 62.5 62.5 62.5 62.5 62.5 31.243 31.246

P 200 200 288.88 288.88 400 400 200 200 200

sought, rather than relying on standard designs. It is also clear that the benefit can only be realised by using a continuous optimisation algorithm.

5

A Novel Multiphase Optimisation Algorithm

The continuous optimisation algorithm demands many tries to find a highly efficient design. On the one hand, it is a safe strategy to use randomly sampled initial design coordinates. On the other hand, random sampling lacks effectiveness and is wasteful in computing, because many initial designs X simply cannot lead to desirable D-optimality. After repeated continuous optimisation, while the most efficient solution Xd is treated as optimal, all the inferior solutions are useless and shall be abandoned. In order to improve our search effectiveness, we introduce a technique to refine the new continuous optimisation method. In most situations, there is no sufficient information about the appropriate candidate points that would eventually compose LDOD. However, we may assume that the more efficient are the initial designs, the easier can the continuous optimisation algorithm work through iterations. Then the solution Xd at optimality tends to be more stable in structure and more efficient under the D-criterion. The promising idea is to schedule a swift discrete optimisation, over a small but suitable candidate set (based on our best guess), to convert each random initial design into an intermediate solution Xd1 , which is relatively efficient. We may increase the critical value for this optimisation, e.g. from 1.0001 to 1.1, to obtain a larger number of distinct Xd1 . Next, some or all of the distinct Xd1 shall act as new starts for the continuous optimisation, the computation of which is intensive. With a few additional steps for making further improvements, this is the multiphase method we develop, which works even with a small number of Xd1 . We summarise the multiphase optimisation algorithm in Table 7, with PEA for instance. CEA can benefit the continuous optimisation if the number of factors v is large. Otherwise, it does not matter which approach is used for Phase 1 discrete optimisation, even though PEA is found to be more reliable in our examples. In Table 7, the new continuous optimisation method is used to complement the basic discrete optimisation over an imperfect candidate set. There is also Phase 3, which aims to convert the quasi-replicate runs of Xd into exact replicates for all support points. See Donev and Atkinson (1988) for another algorithm for adjusting an already efficient design. Practical limitations will influence how we can set the levels of factors under control. To take account of them, in Phase 3, the closest distance of a factor indicates the minimum space between two feasible coordinates next to each other, the levels of which must be distinguishable to experimenters. As we have set the v closest distances, the continuous variable space X is redefined to be a discrete set of numerous candidate levels. Suppose the closest distance

11

Table 7: Multiphase Optimisation Algorithm for Nonlinear Models 1.1 Phase 1: LDOD over discrete set Ω, as instructed in STEP 1-7 in Table 1. 1.2 Refer to the intermediate solutions attained in Phase 1 as Xd1 . 2.1 Phase 2: LDOD over X , for each Xd1 as the initial X for continuous optimisation. 2.2 Refer to the revised solutions as Xd2 , the most efficient of which is selected to be Xd . 3.1 Phase 3: LDOD for refinement. For k = 1, 2, . . . , v, determine the closest distance dk for the kth controlled variable. Group quasi-replicate runs (i.e. points that are close to each other in distance) in Xd , so the n runs of X = Xd are divided into n∗ homogeneous clusters. While the column order of the v factors is fixed, sort the rows of X with respect to the clusters and factor levels. Reset k to 1. 3.2 Set both i and i∗ to 1 as the starting values. 3.3 Pick out the i∗ th cluster of X , corresponding to ni∗ quasi-replicate points. The aim is to find a substitute for the kth factor level of the points (let the maximum of which be Xmax,k and the minimum be Xmin,k ). After substitution, there would be ni∗ exact replicates of the new coordinate. 3.4 Create a provisional subset of candidate coordinates Ωk , on the basis of the closest distance, the interval [Xmin,k , Xmax,k ], and X . Search over Ωk and let Xnew,k be the candidate that maximises φ. Use this substitute for each point of the i∗ th cluster. Set i to i + ni∗ . 3.5 Unless i = n + 1, set i∗ to i∗ + 1 before returning to STEP 3.3 . 3.6 Unless k = v, set k to k + 1 before returning to STEP 3.2. 3.7 Create a new candidate set Ω, consisting of the n∗ support points of current X . Perform a discrete optimisation over Ω, the final solution Xd∗ of which is the LDOD.

12

Table 8: 24-Run LDOD for Model (1): Interim Solution after Phase 2 of the Multiphase Method R 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.7348 1.7348

C 1 1 1 1 1 1 4 4 4 4 4 4

(a) PEA T R 70 1.7348 70 1.7348 3.3188 90 90 3.3190 90 5.8118 90 5.8120 74.0877 5.8121 74.1142 6 74.1295 6 74.1343 6 90 6 90 6

C 4 4 1 1 4 4 4 1 1 1 4 4

T 90 90 70 70 70 70 70 84.8573 84.8577 84.8591 79.3271 79.3282

R 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.7348 1.7348

C 1 1 1 1 1 1 4 4 4 4 4 4

(b) CEA T R 70 1.7348 70 1.7349 90 3.3188 90 3.3191 90 5.8119 90 5.8121 74.0840 5.8123 74.1156 6 74.1303 6 74.1329 6 90 6 90 6

C 4 4 1 1 4 4 4 1 1 1 4 4

T 90 90 70 70 70 70 70 84.8571 84.8583 84.8594 79.3267 79.3274

is 0.01 unit for the variable of flow rate in the Box and Draper example. In Table 4b, the 13-14th runs are quasi-replicates. The corresponding two flow rates are within the interval [3.4081, 3.4102] ⊂ [3.40, 3.42]. To uniform their factor levels as in STEP 3.4, the candidate subset should be Ω1 = {3.40, 3.41, 3.42}. As to the final solution Xd∗ , n∗ is equal to the number of support points and ni∗ indicates the number of exact replicates at each point, for i∗ = 1, 2, . . . , n∗ .

6 6.1

Examples Revisited Example 1

For the Box and Draper example, the new multiphase optimisation algorithm can use the same 3 × 3 × 3 candidate set Ω, which takes the factor levels in the reference CCD. This candidate set is suitable in initial discrete optimisation and thus will also facilitate Phases 2 and 3. With 30 random tries and a critical value of 1.1, 30 distinct intermediate solutions Xd1 are obtained under PEA. Under either PEA or CEA, the continuous optimisation then starts with the identical set of Xd1 and a decreased critical value 1.0001 in Phase 2. The interim solutions after Phase 2 are illustrated in Table 8. Under PEA, the mean number of iterations is 3.6 in Phase 2, while the largest three criterion values are −49.5143, −49.5182, −49.5182 (the mean of the 30 is −49.5677). In comparison to the continuous optimisation results in Section 4, where the mean criterion value of 100 tries is −49.7496 and mean number of iterations is 4.4, there is a clear improvement. Likewise, CEA takes 3.8 iterations on average and the mean of the 30 criterion values is −49.5687 (the largest three values of which are the same as −49.5143, −49.5182, −49.5282). With CEA, we also test the Brent’s method instead of the Nelder-Mead method. Notice that the maximum number of iterations we allow for is 30. Under this scenario, CEA takes 30 iterations in all but one case. The mean criterion value is −50.0487 and the largest three values are −49.8285, −49.8633, −49.8766. The Brent’s method is faster for a single iteration, but it fails to deliver a stable and efficient solution after Phase 2. In our demonstration on a Windows desktop computer with Intel Core i7 Processor, the total time cost is 8 seconds with PEA, 10 with CEA, and 15 with the Brent’s method. Here, the time cost difference is unimportant and can be ignored. We can choose either Xd in Table 8 for Phase 3. The closest distance is set to be 0.1 unit for R, 0.1 unit for C and 1 unit for T . After the reallocation of experimental runs as in STEP 3.7

13

Table 9: 24-Run LDOD for Model (1) R 1.5 1.5 1.5 1.5

C 1 1 1 1

T 70 70 90 90

R 1.5 1.5 1.5 1.5

C 1 1 4 4

T 90 90 74 74

R 1.5 1.5 1.7 1.7

C 4 4 4 4

T 74 74 90 90

R 1.7 1.7 3.3 3.3

C 4 4 1 1

T 90 90 70 70

R 5.8 5.8 5.8 6

C 4 4 4 1

T 70 70 70 85

R 6 6 6 6

C 1 1 1 4

T 85 85 85 79

(i.e. recalculating the optimal numbers of replicates for each support point of the design), the final solution Xd∗ is shown in Table 9 and it is unlikely that we have missed a better one. The new maximal criterion value is −49.5116 which has increased. More importantly, now we have fully achieved the objective to accurately identify the factor levels of every support point (at desirable precision) as well as the corresponding number of exact replicate runs. A standard experimental design (i.e. CCD or Box-Behnken design) might define 13-15 support points here, illustrated in APPENDIX B, while a full factorial design requires at least 33 = 27 support points. By using the multiphase optimisation algorithm, in addition to achieving higher efficiency and more reliable parameter estimation in modelling, we are able to simplify practical design structure and streamline actual experimentation. In this example, there are only n∗ = 8 support points in Xd∗ , also in contrast to the 11 support points in Table 3 which is found with the discrete optimisation. These are the eight points experimenters should study, and the remaining 24 − 8 = 16 runs are only replicates. Counterintuitively, it is actually easier to use this LDOD rather than any other design we discuss. There is definitely no need to look at every point of a full factorial design. Searching for such an optimal design is no longer challenging with the multiphase optimisation method. On a side note, even if we have to use the Brent’s method for CEA (i.e. unidimensional continuous optimisation), it is feasible to obtain Table 9. We can, for instance, increase the maximum number of iterations and reduce the critical value (currently at 1.0001) to allow for more optimisations towards a stable D-optimality. More effectively, we may do STEP 3.7 of Phase 3 for every Phase 2 design Xd2 , which has been discussed briefly in Gotwalt et al. (2009) and implemented in JMP. Also in APPENDIX B, we show a design found in JMP 13, with 100 random tries on the basis of the Brent’s method, almost identical to Table 9.

6.2

Example 2

For the Mountzouris et al. example, which we discuss briefly here, (1 + 23 ) = 9 initial points are used to improve the continuous optimisation solutions in each iteration, which are defined in the previous demonstration in Section 4. After the initial discrete optimisation (Phase 1), there are just four distinct designs Xd1 for Phase 2, each of which is made up of 18 runs selected from the 24-run candidate set Ω. At the end of Phase 2, with PEA, the local D-criterion values are 41.2246, 41.2052, 41.2052, 41.1403 whereas the mean number of iterations is 4. When we use CEA, the criterion values of Xd2 are 41.2246, 41.1319, 41.1301, 41.1301, attained after 4.5 iterations on average. We then set the closest distance to be 0.01 for S, 0.005 for E, and 0.1 for P . These are used in Phase 3 to obtain the Xd∗ in Table 10, the criterion value of which is 41.2246. This is similar to Table 6 but here we can figure out the exact number of support points, n∗ = 9. In most cases, there are two replicate runs for each support point. The model we assume in this example is nonlinear, but it also exhibits some linearity. Often, LDOD for such a model is insensitive to changes in the parameter prior. We may do a sensitivity study using the final solution here, where we perturb one parameter value at a time by one standard error from the model fitted to the original data. Perturbation would lead to different local D-criterion functions and hence different optimal designs for the new prior vectors. In spite

14

Table 10: 18-Run LDOD for Model (2) S 2.5 2.5 2.5 2.5 2.5 2.5

E 2.795 2.795 2.795 19.275 19.275 20.645

P 200 200 200 400 400 200

S 2.5 2.5 2.5 2.5 2.5 2.5

E 20.645 23.25 23.25 62.5 62.5 62.5

P 200 288.4 288.4 200 200 288.9

S 2.5 2.5 2.5 3.1 3.15 3.15

E 62.5 62.5 62.5 62.5 31.25 31.25

P 288.9 400 400 200 200 200

of this, the Xd∗ in Table 10 is found to be highly efficient actually under all the scenarios. Even in the worst case when a3 = −0.6215 is the new parameter value, Xd∗ is still 98.25% efficient. This implies, the LDOD we find is very robust to uncertainties and not overdependent on the true parameter values. This result boosts our confidence in using the local D-criterion.

7

Discussion and Recommendations

Characteristics and statistical interpretation of empirical polynomial response surfaces are commonly studied and are familiar to many experimenters. When there is sufficient resource (i.e. number of experimental runs) enabling precise parameter estimation, efficient (or exact Doptimal) design of linear experiments is never too difficult. Despite the usefulness of polynomial models as elucidated in Box and Draper (1987), experimenters may alternatively derive a mechanistic model or an empirical nonlinear model. It is not unusual that such a nonlinear model can better approximate the response surface and facilitate the scientific interpretation of results. For nonlinear multifactor experiments, the traditional design method via discrete optimisation is unsuitable. To investigate and solve this problem, we develop a novel continuous optimisation method, which is then implemented in a multiphase optimisation algorithm. When this algorithm is applied to both illustrative examples, we easily obtain a feasible LDOD with high efficiency and optimised points to support any new experiments. Discussed in Section 6, for instance, these LDOD could have advantageous properties (e.g. simplified structure and practical meanings) that are different from those of standard experimental designs. It would therefore be interesting to apply the methodology in this paper to study different classes of nonlinear experiments involving statistical data analysis. To justify and promote such designs, there are two challenges in practice: (i) to find a suitable form of the nonlinear model; and (ii) to choose a set of realistic values as the parameter prior. On the first, if we cannot assume a deterministic mechanistic model, often, it is still possible to use a hybrid nonlinear model reflecting some characteristics about the underlying mechanism. This is demonstrated in Example 2. As to the second challenge, we often base LDOD on the most realistic parameter values we are able to presume. Some reference data, either from the literature or earlier experiments, would be very helpful here. We can then extract information from data and use that for LDOD. When there are no historical data at all, there are still many feasible options. Experimenters may try to learn from relevant studies to build an initial response surface design. Depending on their best knowledge in envisaging future data and making appropriate hypotheses for testing, this could be an intuitive and practical process. While designs used elsewhere may be referred to, experimenters can also choose from modified versions of standard designs including CCD, Box-Behnken design or (fractional) factorial designs. Such designs may also allow experimenters to compare different treatments and identify the rough pattern of the response surface within a defined variable space. Alternatively, they may start with a linear model as the empirical

15

approximation to the nonlinear model they actually assume, and use LDOD for that linear model for a preliminary study. No matter which design is to be used at this exploratory or screening stage, it can be very small so as to make the pilot experiment cost-saving and speedy. As soon as a limited amount of data has been obtained, experimenters can then utilise these data for better design and LDOD may become a more attractive option for the later stages. Such sequential design methodology is often applied in clinical trials research and drug development in the pharmaceutical industry. Besides, in our future research, we would like to demonstrate how Bayesian optimal design can be obtained, which is usually less dependent on the choice or informativeness of the parameter prior. The continuous optimisation method is robust in updating the factor levels in X , as we demonstrate in both examples. It takes the whole variable space X into account, where the aim is to maximise the D-criterion function. This is not limited to a finite candidate set Ω, so we take no risk from improperly specified candidate points. This also saves us efforts to compare and validate the discrete optimisation over several candidate sets, which will smooth the experiment. Further, the multiphase optimisation algorithm is implemented to circumvent ineffective tries and iterations, in order to find a LDOD with less computational input. To some extent, its process and outcomes depend on a small set Ω, which shall be reasonably defined to obtain the truly optimal solution. PEA and CEA are both efficient in most situations, especially when the nonlinear model is simple. However, each element of the Fisher information is a function in terms of some of or all the coordinate levels. This reflects the associations between the updates of the individual coordinate levels in X . Hence, as long as the optimisations are reliable in individual steps of iterations, e.g. when the number of factors v is small or under the discrete optimisation over a small candidate set, PEA should be preferred. It requires fewer steps to maximise the criterion function of the nonlinear model and thus to optimise X as a whole. In contrast, though CEA requires more steps to optimise X , optimising individual coordinates in X is easier. Overall, we recommend the multiphase optimisation algorithm with PEA when the number of factors v is smaller than, e.g. five. Otherwise, when multidimensional optimisation of a point (which contains v coordinates representing the multiple factors of the model) is too difficult due to model complexities, we shall use the multiphase optimisation algorithm with CEA. The clearest message from this work is that, if a multifactor nonlinear model is to be used to gain scientific insight from experimental data analysis, it is important to tailor the experimental design to that model. We have provided the tools needed to do this effectively and we recommend their use to experimenters in chemical engineering, biochemistry, and related areas. The enormous potential to harvest data which are informative about plausible mechanisms, rather than just giving rough predictions more than compensates for the extra efforts taken to acquire designs.

8

Acknowledgements

The first author acknowledges a Vice-Chancellor’s Scholarship from the University of Southampton and financial support from the Faculty of Applied Economics, University of Antwerp. The third author has received funding from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 600371, el Ministerio de Economa y Competitividad (COFUND201340258), el Ministerio de Educaci´ on, cultura y Deporte (CEI-15-17) and Banco Santander.

16

References Atkinson, A. C., Donev, A. N., and Tobias, R. D. (2007). Optimum Experimental Designs, with SAS. Oxford University Press, Oxford. Blum, C. and Roli, A. (2003). Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3):268–308. Bogacka, B., Latif, A. H. M. M., Gilmour, S., and Youdim, K. (2015). Optimum designs for non-linear mixed effects models in the presence of covariates. Under Review for Biometrics. Box, G. E. P. and Draper, N. R. (1987). Empirical Model-Building and Response Surfaces. Wiley, New York. Box, G. E. P. and Hunter, W. G. (1965). The experimental study of physical mechanisms. Technometrics, 7:23–42. Brent, R. P. (1973). Algorithms for Minimization without Derivatives. Prentice Hall, Englewood Cliffs. Chaloner, K. and Larntz, K. (1989). Optimal Bayesian design applied to logistic-regression experiments. Journal of Statistical Planning and Inference, 21:191–208. Cook, R. D. and Nachtsheim, C. J. (1980). A comparison of algorithms for constructing exact D-optimal designs. Technometrics, 22:315–324. Donev, A. N. and Atkinson, A. C. (1988). An adjustment algorithm for the construction of exact D-optimum experimental designs. Technometrics, 30:429–433. Fedorov, V. V. (1972). Theory of Optimal Experiments. Academic Press, New York. Gilmour, S. G. and Trinca, L. A. (2012). Bayesian L-optimal exact design of experiments for biological kinetic models. Journal of the Royal Statistical Society Series C-Applied Statistics, 61:237–251. Goos, P. and Mylona, K. (2017). Quadrature methods for Bayesian optimal design of experiments with nonnormal prior distributions. Journal of Computational and Graphical Statistics, online:1–16. Gotwalt, C. M., Jones, B. A., and Steinberg, D. M. (2009). Fast computation of designs robust to parameter uncertainty for nonlinear settings. Technometrics, 51:88–95. Mead, R. and Pike, D. J. (1975). A biometrics invited paper. a review of response surface methodology from a biometric viewpoint. Biometrics, 31:803–851. Mountzouris, K. C., Gilmour, S. G., Grandison, A. S., and Rastall, R. A. (1999). Modeling of oligodextran production in an ultrafiltration stirred-cell membrane reactor. Enzyme and Microbial Technology, 24:75–85. Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7:308–313. Overstall, A. M. and Woods, D. C. (2017). Bayesian design of experiments using approximate coordinate exchange. Technometrics, online:1–13.

17

Phoa, F. K. H. (2016). A swarm intelligence based (SIB) method for optimization in designs of experiments. Natural Computing, pages 1–9. Press, W. H. (2007). Numerical Recipes : The Art of Scientific Computing. Cambridge University Press, Cambridge, 3rd edition. S¨ orensen, K. (2015). Metaheuristics - the metaphor exposed. International Transactions in Operational Research, 22(1):3–18. Stufken, J. and Yang, M. (2012). On locally optimal designs for generalized linear models with group effects. Statistica Sinica, 22:1765–1786. Wong, W. K., Chen, R., Huang, C., and Wang, W. (2015). A modified particle swarm optimization technique for finding optimal designs for mixture models. PLoS ONE, 10:1–23. Woods, D. C., Lewis, S. M., Eccleston, J. A., and Russell, K. G. (2006). Designs for generalized linear models with several variables and model uncertainty. Technometrics, 48:284–292.

A

Derivation of the Multifactor Model for Example 2

For the Mountzouris et al. (1999) experiment, we are interested in a model of the substrate conversion rate ξ, in terms of initial substrate concentration S. The reaction mechanism is unknown but it involves a mixture of two reagents: the substrate and an activator (i.e. enzyme). It is therefore reasonable to presume the first step of mechanism to be reversible and bimolecular: the small molecules of dextran will attach to the active sites of endodextranase molecules. A bond like this could form an intermediate complex in unstable state, which is then converted to the final reaction product. Under an extreme scenario when even the maximal substrate concentration S = 7.5 is far too low (relative to the fixed concentration of endodextranase), the reaction rate will be high and stable over time. The conversion is then close to 100% at measurement time (after the reaction almost ceases), when the instantaneous substrate concentration is near zero and treated as independent of the initial concentration. The expected conversion rate is E(ξ) → 100(S − a)/S, where a is a nonnegative constant near zero. In contrast, if even the minimal substrate concentration level 2.5 is too high for the concentration of endodextranase, the reaction will be slow and its rate can decrease quickly over time. The concentration of the converted substrate is almost independent of the initial substrate concentration, so that E(ξ) → 100b/S, where b is a nonnegative constant. A tradeoff must be found between the two impractical scenarios, where we assume an ideal solution (or ideal mixture). When the substrate concentration increases, ξ shall decrease. It is also more realistic to consider the function of ξ to be concave rather than convex. As an approximation to the observed response surface in Mountzouris et al. (1999), we consider a nonlinear function in parameters γ1 ≥ 0 and γ2 ∈ (−2.5, 0): E(ξi ) =

γ1 Si , for i = 1, 2, . . . , n. γ2 + Si

To use a transformed dependent variable, we derive the statistical model ξi γ 0 Si + ε0i , for i = 1, 2, . . . , n, = 01 100 − ξi γ2 + Si 18

(4)

Table B1: 24-Run Reference Face-Centred CCD R 1.5 1.5 1.5 1.5 1.5 1.5

C 1 1 2 2 4 4

T 70 90 80 80 70 90

R 3 3 3 3 3 3

C 1 1 2 2 2 2

T 80 80 70 70 80 80

R 3 3 3 3 3 3

C 2 2 2 2 4 4

T 80 80 90 90 80 80

R 6 6 6 6 6 6

C 1 1 2 2 4 4

T 70 90 80 80 70 90

which incorporates an error ε0i and two constants γ10 ≥ 0 and γ20 ∈ (−2.5, 0). When the substrate concentration is fixed, we use a second-order linear function to explain the transformed conversion rate in the scaled variables xE and xP . With interaction xE xP excluded because of insignificance in this case, the finalised model is ξi = exp(a00 + a01 xE,i + a02 xP,i + a03 xE2,i + a04 xP2,i ) + ε00i , 100 − ξi

(5)

where ε00i is the error and a00 , a01 , a02 , a03 , a04 are unknown parameters. We combine (4) and (5) to write the nonlinear multifactor model as exp(a0 + a1 xE,i + a2 xP,i + a3 x2E,i + a4 x2P,i )Si ξi = + εi , for i = 1, 2, . . . , n, 100 − ξi a5 + Si where εi is the error, a0 , a1 , a2 , a3 , a4 , a5 are the parameters. This model fits the reference data well, so we can use it to approximate the unknown response surface.

B

Standard Designs for Example 1

Tables B1-B3 below are three popular standard experimental designs we may construct for Example 1, each of which consists of 24 independent runs. For each design, we define four exact replicate runs at the centre (3, 2, 80) of the variable space X . These designs are used in this paper as benchmarks in evaluating the relative efficiency of LDOD found using different algorithms in Section 4. Particularly, the face-centred CCD is the reference design we choose and study in Section 4, the local D-criterion value of which is −52.7712. We can notice that the spherical CCD (Table B2) is even less efficient in this case, with a criterion value of −54.6880. This is because of the nonoptimal factor levels it defines, which can also contribute to an inefficient LDOD with the discrete optimisation. The Box-Behnken design (Table B3), on the other hand, has a larger criterion value at −51.5174, but it is still less efficient than any optimal designs we find. Overall, while it is easy to construct these standard designs when exactly three factors are under control, they are not very useful in parameter estimation for the nonlinear model. In Fig. B1, we also show a locally D-optimal design obtained from a commercial software package named JMP, after a large number of random tries (100 in this case). As far as we know, the used algorithm is based on combining the traditional coordinate exchange procedure (for the discrete optimisation) and the Brent’s method for unidimensional optimisation, which is described briefly in Gotwalt et al. (2009). After we round off decimal digits of this design, it is the same as the design we find in Table 9, which is optimal over the continuous variable space X.

19

Figure B1: 24-Run LDOD in JMP

20

Table B2: 24-Run Spherical CCD R 1.5 1.5 1.8376 1.8376 1.8376 1.8376

C 2 2 1.2251 1.2251 3.2651 3.2651

T 80 80 72.9289 87.0711 72.9289 87.0711

R 3 3 3 3 3 3

C 1 1 2 2 2 2

T 80 80 70 70 80 80

R 3 3 3 3 3 3

C 2 2 2 2 4 4

T 80 80 90 90 80 80

R 4.8976 4.8976 4.8976 4.8976 6 6

C 1.2251 1.2251 3.2651 3.2651 2 2

Table B3: 24-Run Box-Behnken Design R 1.5 1.5 1.5 1.5 1.5 1.5

C 1 1 2 2 4 4

T 80 80 70 90 80 80

R 3 3 3 3 3 3

C 1 1 1 1 2 2

T 70 70 90 90 80 80

21

R 3 3 3 3 3 3

C 2 2 4 4 4 4

T 80 80 70 70 90 90

R 6 6 6 6 6 6

C 1 1 2 2 4 4

T 80 80 70 90 80 80

T 72.9289 87.0711 72.9289 87.0711 80 80