Estimating parameters in stochastic compartmental ... - Oxford Journals

8 downloads 0 Views 2MB Size Report
GAVIN J GIBSON AND ERIC RENSHAW ity are made in order to render tractable analysis of nonlinear stochastic processes via the moment-generating function.
IMA Journal of Mathematics Applied in Medicine & Biology (1998) 15, 19-40

Estimating parameters in stochastic compartmental models using Markov chain methods GAVIN J GIBSON

Biomathematics and Statistics Scotland, King's Buildings, MayfieldRoad, Edinburgh EH9 3JZ, UK ERIC RENSHAW Department of Statistics and Modelling Science, Livingstone Tower, University of Strathdyde, 26 Richmond Street, Glasgow Gl 1XH, UK [Received 7 March 1997 and in revised form 18 July 1997] Markov chain Monte Carlo methodology is presented for estimating parameters in stochastic compartmental models from incomplete observations of the corresponding Markov process. The methods, which are based on the Metropolis-Hastings algorithm, are developed in the context of epidemic models. Their use is illustrated for the particular case where only susceptible, infective, and removed states are represented using simulated realizations of the process. By comparing estimated likelihoods with theoretical forms, in cases where these can be derived, or with the known model parameters, we show that the methods can be used to provide meaningful estimates of parameters and parameter uncertainty. Potential applications of the techniques are also discussed. Keywords: stochastic compartment models; parameter estimation; Markov chain Monte Carlo methods; hidden Markov models.

1. Introduction The field of stochastic modelling of biological and ecological systems (Durrett & Levin, 1994) is currently undergoing considerable development as advances in computing power make investigation of complex stochastic models by simulation methods more feasible. Mathematicians have contributed to the understanding of stochastic models, and a range of papers can be found in the literature of probability theory and statistical physics characterizing the theoretical properties of a large variety of stochastic models. For example, Cox & Durrett (1988) have provided a rigorous mathematical treatment of the large-scale 'shapes' and velocities of spatio-temporal models for epidemics and forest fires. Other questions of interest include critical phenomena (for example, phase transitions between persistence and extinction), and several researchers have investigated this topic for the contact process (Liggett, 1985; Brower et al., 1978; Buttell et al., 1993). Because a full analytic treatment of a dynamical stochastic system is rarely feasible, attention has also focused on how models might be approximated by systems which are amenable to analysis. Pairwise approximation (see, for example, Sato et al., 1994; Levin & Durrett, 1996), a technique borrowed from statistical physics which simplifies spatio-temporal stochastic models by neglecting correlations beyond nearest neighbours, is one example. Another approach is the normal approximation (Whittle, 1957; Isham, 1991) whereby assumptions of normal© Oxford University Press 1998

20

GAVIN J GIBSON AND ERIC RENSHAW

ity are made in order to render tractable analysis of nonlinear stochastic processes via the moment-generating function. All these approaches share the feature of being aimed primarily at characterizing the dependence of model behaviour, or its statistical properties, on the model parameters. In this paper we are concerned with the inverse problem. That is, given an observation of a process, which is believed to be governed by a known stochastic model, how can we obtain an indication of the range of model parameters which could plausibly explain the observation? This problem is, of course, one of statistical inference and, depending on our philosophy, the 'range' may correspond to a posterior density or a confidence interval. Although a vast literature on statistical inference exists, there are relatively few instances where continuous-time nonlinear stochastic models of the kind considered in this paper have been fitted to observations. This is partly due to the fact that, although likelihoods can be written down when observations are in some sense complete, observations of biological processes from practical experiments typically record only a subset of the information that defines the evolution of the system. Computing the likelihood of the observations necessitates 'integrating out' such uncertainty and this may present great difficulty. Because of the difficulties inherent in working directly with stochastic dynamical systems, researchers may choose to fit descriptive models, such as time-series models (Box & Jenkins, 1970), that do not necessarily reflect their belief about underlying processes. Alternatively, they may opt to fit deterministic models to observations, for example, by least-squares minimization. Whilst this latter approach can be extremely useful there may be problems in assigning measures of uncertainty to estimated parameters. Deterministic models accurately reflect the behaviour of stochastic versions in many situations, but in other cases the agreement, both qualitative and quantitative, can be poor. Here we describe a methodology for fitting stochastic dynamical models directly to observations which allows parameter uncertainty to be treated within a statistical framework. Advances in stochastic integration methods—in particular, Markov chain Monte Carlo (MCMC) methods (see, for example, Metropolis et al., 1953; Hastings, 1970; Smith & Roberts, 1993; Besag & Green, 1993)—are certainly relevant to our needs. For example, Gibson (1997) has applied MCMC methods to fit spatio-temporal 5/ (susceptibleinfective) models to observations of the spread of virus diseases in orchards where temporal sampling is infrequent. A feature of this situation is that the events that occur between observation times, these being infections of previously healthy individuals, are specified uniquely by the observations. An MCMC algorithm is then applied to integrate out the uncertainty regarding the order in which these events have occurred to enable parameter likelihoods to be computed. In this paper we consider a more complex situation where the observations do not uniquely specify the events that occur. Specifically, we focus attention on systems whose behaviour is governed by nonlinear stochastic epidemic compartment models such as the SIR (susceptible-infected-removed) or the more general SEIR model, where E denotes a latent state. These models and their many variants (see, for example, Murray, 1989; Renshaw, 1991) have seen widespread application in studies of epidemics in humans, plants, and animals. Furthermore, related compartment models, such as the Lotka-Volterra and other predator-prey formulations, have been applied in ecology. Thus the simple models considered here are often the building blocks for modelling studies of complex epidemiological or ecological systems (for example, Yakowitz et al., 1996) and are therefore an appropriate starting point for our

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

21

own investigations. In many systems to which compartment models are applied, not all the transitions between compartments, such as that from the susceptible to the latent state in an SEIR model, may be observable. However, the corresponding transition rate may be an important determinant of the system dynamics. We believe the methodology described in this paper represents progress on the problem of estimating transition rates between unobservable states in stochastic compartment models. 2. Parameter estimation in stochastic epidemic models Consider a stochastic SEIR compartment model for an epidemic in a population whose state at time / is defined by the vector of non-negative integers s(t) = (S(t), E(t), I(t)) denoting the numbers of individuals in the susceptible, latent, and infective states. Without ambiguity these variables are also used to denote the sets of individuals in these classes at time t. The model is specified by a parameter vector a = (a\, a2, aj,, a4). Assume that each susceptible gives birth at rate a\ and enters the latent state at rate a2l(t). An individual in E becomes infective at rate a3, while any infected individual dies, corresponding to removal, at rate a 4 . Events of these four types are respectively denoted by B, L, I, and D. We adopt the convention that s(t) is left-continuous and therefore represents the state of the system prior to the occurrence of any event at t. The notation s(t+), and obvious extensions, will be used to denote the state of the system immediately after the occurrence of any events at time t. Formally, the model is specified by the equations Pr(S(f + dr) = S(t) + 1) = a,S(O dt

(= r)B(a, s(t)) dt),

Pr(£(r + dt) = E(t) + 1) = a2S(t)I(t) dt

(= ^ L (a, s(t)) dt),

Pr(/(f + df) = 7(0 + 1) = a3£(f) dt

(= m(a, s(t)) dt),

Pr(/(f + dr) = /(0 - 1) = 04/(0 dr

(= /?D(a, s(t)) dt).

(2.1)

Assume that a is unknown and that the population is observed over a period [0, T] during which only the occurrence and times of events of type B and D are recorded. Events of type L and I are referred to as hidden events. Thus the observations record the evolution of the quantity N(t) = S(t) + E(t) + I(t) over the period [0, T]. We consider the problems of estimating the unknown parameter a from knowledge of the initial state s and N(t) over t e [0, T], and of quantifying the uncertainty in the estimate. These problems are tackled using a likelihood approach. However, rather than dealing explicitly with the likelihood of the observations given the parameters, we consider only its relative values over the parameter space. This information is, nevertheless, sufficient for likelihood-based parameter estimation and, furthermore, it enables the problem to be recast in a form which can exploit the powerful computational methodology more commonly applied in Bayesian inference. This is done as follows. For simplicity we restrict attention to a bounded parameter space A, in which all rates are strictly positive and, initially, we assume that so = s(0) is known. A realization s of the process over [0, T] is specified by the initial state 5(0), the sequence of events of type B, L, I, and D occurring in [0, T], and their times of occurrence. Let k(s) denote the

22

GAVIN J GIBSON AND ERIC RENSHAW

number of events occurring, £, e {B, L, I, D} = 11 (1 ^ j ^ k(s)) denote the type of the jth event, and /, e (0, T) denote its time of occurrence. Let Q be the set of all such s for which 5(0) = So- For a selected uniformly from A, and s € Q simulated from the model with parameter o, the joint density of (o, s) e A x £2 is defined, up to a constant of proportionality, by /5(a,s)dads Tjn(a,s(ti)) I . \ neil I

(2.3)

Now let i? c Q denote the set of realizations consistent with the observation N(t) (t e [0, T]). Any realizations sl, s2 e R are identical in terms of the times of events of type B and D, but they may differ in terms of the number and timings of events of type L or I. Our parameter-estimation approach is to consider a density n(a, s) on A x R which is proportional to fi(a, s). Thus n(a, s) represents the posterior joint density of a and s conditional on the observations N(t) and SQ, assuming a uniform prior for a. For simplicity, we adopt the common practice of using the symbol n to denote the joint density 7r(a, s) and its associated marginal and conditional densities. Now, the marginal density

;r(a) = f f n(a\ s')n(a', s') ds' da'

(2.4)

can be interpreted as giving the relative values of a parameter likelihood G(a) over A, thereby enabling us to make inference on a. As we will show, the space A x R has a complicated high-dimensional structure so that estimation of the integral (2.4) is problematic. Nevertheless, progress can be made using MCMC methods to exploit our knowledge of the density n(a, s) up to a constant of proportionality. Specifically, we can construct a Markov chain on A x R whose equilibrium distribution is ;r(a, s), and which is subsequently used to generate a sequence of samples (aj,Sj) from n(a, s). This sequence can then be used to estimate the marginal density n(a), for example by averaging the conditionals n(aj \ Sj) over j or by forming a histogram from the ay-. 3. Construction of reversible-jump MCMC sampler In this section we describe the construction of the Markov chain. An upper bound for the number of hidden events in any j e R for which L(o, s) ^ 0 can be deduced from knowledge of N(f). (Since any individual can undergo at most two hidden transitions the total number of hidden events cannot exceed 2 x (N(0)+ number of births in [0, 7"]}.) Thus A x R can be decomposed as a finite union of subspaces A x RmM, where each Rmn consists of those realizations on [0, T] incorporating m events of type L and n events of type I. We can therefore express Rm,n as

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

23

Rm.n = {('{,.... t'j | t'j e [0, T], t'j < t'j+x) x {(r{

t'n) I *j. e [0, T], t) < rj +1 }, (3.1) where the two components specify, respectively, the times of occurrence of events of type L and I. The Markov chain is therefore required to move between subspaces of differing dimension, a task at which reversible-jump methodology (Green, 1995) is specifically aimed. This approach can be applied, for example, to problems of Bayesian model determination where the subspaces are the parameter spaces of competing models. Recently Richardson & Green (1997) have proposed a reversible-jump framework for fitting mixture distributions with unknown numbers of components. The following presents a formulation of a reversible-jump sampler to the specific scenario of our paper. Our approach has some similarity with that used by Newton etal., (1992) tofitlinear stochastic models in haematology, in that the Markov chain constructed involves moving, inserting, or deleting hidden events in the time-window of the observations. The construction of a suitable Markov chain is not unique; here we describe a Metropolis-Hastings sampler (Hastings, 1970) which generates a new state coj+\ = (a.j+\,Sj+\) from the current state toj = (aj,sj) according to the following two-step process. First, we propose a candidate for the next state co' = (a', s') and select a', the candidate parameter, uniformly from the parameter space A. Second, a new realization s' is proposed by modifying sj according to one of the following possibilities: (a) with probability p\ > 0 delete a randomly chosen hidden event from the realization sy, (b) with probability p2 > 0 insert a new hidden event at time t, where t is selected uniformly from [0, T] and the new event is type I or L with equal probability; (c) with probability pi = 1 — p\ — p2 > 0 move a randomly chosen hidden event to a new time t chosen uniformly from [0, T]. These options define a density q(co', coj) for the candidate state co' conditional on the current state COJ . The state co' is then accepted with probability equal to the ratio of measures . f n(dco')q(co',dcoj)} [ n(dcoj)q(COJ , da/) J in which case we set coj+\ = co'; otherwise the move is rejected and coj+\ = coj. From (2.2) the acceptance probability can be calculated explicitly, namely for

(a) p = mm 1, —

(b)

p — min j 1,

(c)

p=

^— ,

L(a,'s')p2(n + 1 2L(aj,Sj)piT

(



>

L(a',s')

Note that in none of these above cases can the denominator vanish since L(a,j, Sj) must be nonzero in order for (o;-, sj) to be the current state of the Markov chain.

24

GAVIN J GIBSON AND ERIC RENSHAW

If a(co', co) denotes the density for the next state co' conditional on the current state co, then, by construction, the above chain has the property of detailed balance (see, for example, Smith & Roberts, 1993) with respect to JI, namely a(co', to)n(co) = a(co, co')n(a>'). The transition probabilities therefore satisfy = /

X\o)j=

co)7i(w)dco

(3.4)

AxR

for all measurable X. From standard theory (see, for example, Theorem 1 of Smith & Roberts, 1993) it follows that if, in addition to satisfying (3.4), the chain is aperiodic and n -irreducible (see below) then lim /

V

\pM(couco)-7t(co)\dco = 0,

(3.5)

-*°°JA

where p(v)(co\, co) is the probability density of state co, reached after v iterations of the chain from initial state co\. Moreover, for any realization of the chain {o>,-} — V ] f{wi) —*• I K i^{

f(co)n(co) dco almost surely as K -*• oo,

(3.6)

JAXR

for all real-valued 7r-integrable functions / . A consequence of (3.5) is that, irrespective of the starting state a>\, the state co, reached after a sufficiently large number of iterations of the chain, can be considered to be drawn from n. Setting /(&>,•) = /(o,-, s,-) = n(a | 57) in (3.6), it follows that, for all a & A, 1 K f — Y^ 7r(o I Si) —> I K

^

n(a \ s')n(co') da/ = 7r(a) almost surely as K ~* 00, (3.7)

JAXR

where co' = (a', s'). Hence, conditional on aperiodicity and irreducibility, given a sequence of samples {(a,-, s,-)} generated from the chain, we may estimate the marginal density n(a) by averaging the conditional densities n(a | 5,) in the manner of Gelfand & Smith (1990). For the chain to be aperiodic, it must not cycle between the subsets in some nontrivial partition of A x R. For our chain, aperiodicity is an immediate consequence of the fact that q{co, co) is nonzero for all co e A x R. The chain is JT-irreducible if, for all n(coi), n(tO2) > 0, there exists a positive integer v such that p(v)(wi, 0. Intuitively, an irreducible chain connects all states which have nonzero weight under the density n, so that regardless of the initial state the entire space will be explored by the chain. In practical applications of MCMC, demonstrating the irreducibility of a chain can be a nontrivial problem, and this property should certainly not be taken for granted. On the grounds that moving a hidden event from t to t' can be achieved by a deletion at t followed by an insertion of an identical event at t', one might reasonably propose a simpler chain involving operations of types (a) and (b) only. However, the simpler chain fails to be irreducible in general. To see this, suppose that s(0) = (1,0, 1) and that our observations consist of deaths at t\ and t2 (t\ < t2). Clearly, this implies that an event of type L has occurred at some time rL (0 < fi, < '1). and an event of type I has occurred at time fi (tL < h < t2)- Although the nature and order of the hidden events is fixed, any combination ('L. ' I ) satisfying the above inequalities is possible. However, the Markov chain with only

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

25

deletions and insertions is incapable of exploring the space of possibilities since either operation results in an impossible realization. In this case the ability to alter the time of hidden events through operations of type (c) is necessary to explore the space of feasible realizations. The irreducibility of the chain described above is a consequence of the following result which is proved in the Appendix. First let R+ denote the set of all s e R for which L(a, s) 7^ 0 for some (and therefore all) a e A. PROPOSITION 3.1 Let realizations s\s2 e R+. Then for some q ^ 2 there exist realizations y 1 , y2,..., yq G R+, where sx = yx and s2 = yq, and, for all 2 < 7 < 1, yj is obtained from yi~l by one of the following procedures: (i) the insertion of a hidden event at some time t e [0, T\, (ii) the deletion of a hidden event; (iii) an alteration to the time of occurrence of a hidden event in [0, T]. Showing that the Markov chain has the required equilibrium distribution is not the only issue to be considered. Although Proposition 3.1 shows theoretically that any feasible state can be reached with a nonzero probability from any initial state after sufficiently many iterations of the Markov chain, it does not preclude this probability being extremely small for a given pair of states—even after many iterations. For a chain to be of practical use in exploring the equilibrium distribution we also require that it 'mixes' efficiently, in the sense that the distribution pM(a>i, w) converges rapidly to the stationary distribution as v -*• 00. A chain which mixes slowly can pose problems in several ways. For example, the resultant correlation between successive states means that standard errors in expectations estimated from a sequence of samples may be underestimated. This problem can be partly overcome by batching (Ripley, 1987), which involves partitioning the sequence into blocks of a fixed size and estimating expectations on each block. As the block size increases, the correlation in the sequence of within-block estimates decreases, so that realistic standard errors can be estimated from it. In some situations a slowly mixing chain can fail to explore the target distribution adequately, although a sequence of outputs may give the misleading impression that convergence has occurred. It is straightforward to construct examples involving mixtures of bivariate normal distributions which illustrate this point. Theoretically, the convergence properties of finite-state Markov chains can be understood in terms of the eigenvalues of the corresponding state-transition matrix, and considerable progress has been made on bounding the mixing properties of some chains arising in combinatorial problems (Sinclair, 1993). In most cases the convergence of chains is most often investigated using the statistical properties of sequences of samples generated from the chain. A wide range of techniques now exists. However, recent studies (Cowles & Carlin, 1996) have demonstrated that tests of convergence based on a simulated sequence of states may give misleading results, and this should be borne in mind whenever the MCMC methodology is used. In this paper we investigate the validity of the MCMC approach in two ways. The first is to apply the algorithms in situations where the target density can be derived analytically, and to observe whether density estimates are in accord with the theoretical predictions (to within estimated uncertainty). The second method is to use the techniques to estimate model parameters from data sets simulated from the model with known parameter values.

26

GAVIN J GIBSON AND ERIC RENSHAW

A topic which has not yet been discussed is how a feasible realization can be obtained to initiate the MCMC algorithm for the SEIR model from knowledge of SQ and the times of all events of type B and D in [0, T]. Suppose that m deaths occur in [0, T] at times 11 ^ h ^ • • • ^ tm. Here we produce a feasible realization by inserting hidden events so that for each j (1 ^ i ^ m - 1) /(*,-) ^ 2 (bearing in mind that the variables are left-continuous). If 7(0) > m, then no hidden events are required for a feasible realization. If 1(0) < m, then no hidden events are required in the interval [0, f/(0)-i], but for 1(0) ^ j ^ m — 1 we must insert hidden events immediately prior to tj, and after any event of type B and D occurring prior to tj, to ensure that /(/,) = 2. If E(tj) > 1 then only an event of type I is required. On the other hand, if E(tj) = 0 then an event of type L followed immediately by one of type I must be inserted prior to tj. It can be shown that this procedure will yield a feasible realization with which to initiate the MCMC algorithm, if one exists. Furthermore, the realization produced has the minimum number of hidden events for any feasible realization. 4. Illustration of reversible-jump sampler for SIR model The methods described in Section 3 have been implemented, with some modifications, in the C programming language. Here we illustrate their use by applying them to simulated data sets. Attention is restricted to an SIR model, since for this case analytic solutions to the likelihood calculations exist for some scenarios and they can be used to validate the results obtained using the MCMC methodology. We further simplify matters by assuming that the parameters specifying the birth rate, a\, and the death rate, a4, are both equal to unity. The problem therefore reduces to estimation of the parameter aj, (henceforth denoted as a), from knowledge of so = (5(0), 1(0)), and observation of N(t) over the interval [0, T]. A Metropolis-Hastings reversible-jump sampler for this problem could be constructed in the way described in Section 3 with the important distinction that only one type of hidden event, signifying the transition from S to / and denoted by I, is involved, and the state space A x R is accordingly simpler. The irreducibility of the chain can be verified by repeating the logic of the proof of Proposition 3.1 with appropriate simplifications. A feasible realization with which to initiate the MCMC sampler can be constructed by a straightforward adaptation of the procedure described at the end of Section 3. The chain implemented here is a slight variant of the Metropolis-Hastings sampler described in Section 3. Specifically, the parameter a and the realization s are updated separately, and not within a single step of the Metropolis-Hastings algorithm. The modified algorithm is now described. For simplicity, we constrain a to take only a finite range of regularly spaced values in the parameter space A. To construct our Markov chain, the new state T}. This simulation is straightforward to perform and allows Pr{Ni(s) = n) to be estimated for larger values of n. We now present results of estimates of n(s) and n(Ni(s)) obtained using the MCMC algorithm for the following three cases: 1(0) = 15, 1(0) = 1, 1(0) = 5,

N(t) = 30 (f €[0,3]); N(t) = 30 ( r e [0,3]); N(t) = 30 ( r e [0,3]).

In all cases the discrete parameter space is A — (0.001,0.002 0.05}, and 5020 samples (aj, sj) are generated from the chain, applying the realization updating procedure M = 500 times to obtain each Sj from Sj_\. The marginal density n(a) is estimated by averaging K = 5000 densities n(a \ Sj) after discarding the first H = 20 densities to allow burnin of the Markov chain. Marginal confidence intervals for the estimates of the n(aj) are obtained by batching (Ripley, 1987). For these examples, a sequence of 500 blocks of 10 densities produced a sequence of within-block estimates which were not significantly correlated. The estimates of n(Ni(s)) are obtained by forming a histogram of the values of Ni(s), where the value of Ni(s) is recorded at each iteration of the realization-update steps (a)-(c), giving a total of 2.5 x 106 values with which to form the histogram. Standard errors in the estimates of n(Ni(s)) are obtained by partitioning the 2.5 x 106 values into blocks of 5000 and forming a separate estimate of n(Ni(s)) for each block. The standard error of n(Ni(s)) is then estimated using the sequence. The values of H, K, and M used in this experiment, and those which follow, were chosen after some experimentation, and we do not claim that they optimize the algorithm's performance. The value of M was selected to be large in relation to the maximal number of hidden events to ensure substantial mutation of the realization s, before calculation of the

30

GAVIN J GIBSON AND ERIC RENSHAW I(0)=15

I(0)=15

0.03

0.02-

0.01 0.01

0.02

0.03

0.04

0.05

3 6 9 12 Number of hidden events

(b)

(a)

0.03-

Number of hidden events K0)=5

I(0)=5

0.03

O.l-i 0.075-

0.02-

0.05-

o o

MCMC analytic simulated

,11 X

0.025-

0.01

ft-

0.05 (e)

(0

5 10 15 20 Number of hidden events

25

FIG. 2. Estimated and known values of (a, c,e)n(a) (a =0.001,0.002 0.05) and (b,i,f)n(Ni(s)) (N\(s) = 1,2 29) when NO) = 30 (/ e [0, 3]), with initial conditions given by: (a, b) 1(0) = 15, (c, d) 1(0) = l.and (e,f)/(0) = 5. (---) The exact value of n(a). The error bars on the MCMC estimates indicateil standard error.

next density n(a \ s 1+ i). The burn-in period H was selected by monitoring the sequence {(a,-, Ni(sj))), for a range of initial conditions, to determine how quickly the distribution of these variables converged to their asymptotic form. Our investigations suggested that this convergence occurred within a very few iterations. Figure 2 shows the estimated densities n(a) and n(Ni(s)) for the three cases described above. Figure 2 also shows the known values of these densities obtained from analytic calculations or, in the case of n(Ni(s)) for larger values of N\(s), obtained by simulation.

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

31

80-

FIG. 3. The evolution of N(t) (/ € [0, 3]), for simulated realizations of the SIR model with /(0) = 5(0) = 15: (a) a = 0.05, and (b) a = 0.1.

We can see that the results obtained by MCMC are in accord with the known densities to within the estimated standard errors for the estimates of both n(a) and x(N\(s)). Although the accurate estimation of marginal properties of n(a, s) does not in itself guarantee that the joint density n(a, s) is fully explored by the Markov chain, the results do provide strong evidence that the algorithm is capable of providing meaningful estimates of parameter likelihoods. 4.2 Figure 3(a) shows the function N(t) (t e [0, 3]), for three simulated realizations of the model with the birth and death rates fixed at unity and a = 0.05. In all cases 5(0) = 1(0) = 15. Note that the trajectories are quite distinct from each other. Figure 4(a) shows the posterior density 7T,(a) (i = 1, 2, 3) conditional on each simulated set of observations as estimated using the MCMC algorithm for the simulated trajectories. The parameter range is {0.001,0.002,..., 0.1}, and the other algorithm parameters are given by M = 1000, K = 5000, and H = 100. The standard errors in the estimates, obtained by batching using blocks of 25 densities, were of the order of 1 % of the estimated value of EXAMPLE

32

GAVIN J GIBSON AND ERIC RENSHAW 0.1 Simulation 1 Simulation 2

0.075-

Simulation 3 Joint likelihood

,0.05-

0.025-

0.02

0.04

0.03

(a)

a

0.05

0.07

0.06

0.08

U.U8-

A

Simulation 1

0.06-

D

Simulation 2

A

Simulation 3

X

Joint likelihood

0.04-

0.02-

0< 0.04 (b)

fly ^ 0.06

0.08

0.1

0.12 a

0.14

0.16

0.18

0.2

FIG. 4. Estimates of 7z(a) obtained by MCMC for simulated realizations: (a) simulations of Fig. 3(a), a 0.020,0.021,..., 0.08; and (b) simulations of Fig. 3(b), a = 0.040, 0.042 0.02.

7ii (a), and they are not displayed in Fig. 4. Figure 4 also shows an estimate of the posterior density conditional on all the observations jointly, which was obtained by normalizing the product n\(a)jT2(a)x3(a). The estimated densities are clearly consistent with the parameters used in the simulation. Moreover, the densities are consistent with each other, in the sense that there is no significant evidence that the simulations were not obtained using the same value of a for each. Therefore, the differences between the simulations for a parameter value are correctly interpreted as being consistent with the inherent stochasticity in the model. This experiment is repeated for a = 0.1 and the parameter range {0.002, 0.004,..., 0.2}, keeping the same values of M, K, and H. The simulated trajectories and the corresponding densities are plotted in Fig. 3(b) and 4(b), which show similar findings to the case where a = 0.05.

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

33

5. The case of unknown initial conditions Thus far we have assumed that the initial state so = (5(0), 1(0)) is known. Suppose now that this information is lacking but that we can assume 1(0) > 0. Such an assumption could be made, for example, whenever the observations include at least one death. We will now show that the MCMC routines described in the previous sections can be modified for application to this scenario. The approach exploits the idea that uncertainty in the initial conditions can be interpreted as uncertainty in the hidden events that have occurred prior to the start of the observation interval. This avoids the need to consider the initial conditions explicitly as additional parameters in the Markov chain state space. We 'expand' the observations by supposing that the observations commence at some time Tc < 0 and that S(TC) = #(0) - 1 and I(TC) = 1. Suppose further that N(t) = N(0) for all t e [Tc, 0]. It is clear that any initial condition So = (s(0), 1(0)) consistent with the observations N(t) (t e [0, t]) can be reached from the state sc = (S(TC), I(TC)) = (N(0) - 1, 1) through the occurrence of the appropriate hidden events during the interval (Tc, 0). Let Rc denote the set of realizations of the process on (Tc, T) which are consistent with sc and N(t), for t e (Tc, T). Then there is a surjective mapping / : Rc -*• W, where W denotes the set of all realizations s defined on [0, T] which are consistent with the observations N(t), (/ e [0, T]). Applying Proposition 3.1 to the interval [Tc, T], it follows that any element in Rc can be 'reached' from any other by a finite number of transitions of type (i)-(iii) above, and these operations induce transitions between elements of W. Since / is surjective, it follows that any element of W can be reached from any other by a sequence of induced transitions. Thus, when the initial conditions are unknown, the above 'expansion' process allows the space of all possible combinations of initial conditions and realizations to be explored using transitions of the form (a) to (c). We note that this procedure can equally well be applied to the SEIR model discussed in Section 2. Here the universal antecedent for arbitrary initial conditions is given by S(TC) = N(0) — 1 and E(TC) = 1. This argument allows us to construct a Markov chain on A x Rc, whose stationary distribution is the posterior density nc(a, sc | N(t), t e [0, T]), this time assuming a uniform joint prior for (a, sc(0)). This is achieved almost exactly as above, except that an amended likelihood Lc must be used in the acceptance steps, defined by , a)



(5-D

where s denotes the restriction of sc to [0, 7"]. The likelihood has this form because the 1. If the first event in both s' and J 2 is of type B or D, then we can redefine the observation interval so that it commences immediately after this event, but prior to the occurrence of any other event in s' or s2. This procedure can be repeated until at least one of the realizations sx and s2 commences with a hidden event. Hence, by relabelling sl and s2 and redefining the observation interval as appropriate, we may assume that the first event occurring after t = 0 for the realization s2 is a hidden event. Denote this event by e2. Let e\ be the first event of the same type in the realization of s'. Consider the new realization s' obtained by altering the time of e\ so that it occurs first in s'. Then s' e R+ so long as no event e'2, •••,e'k is rendered impossible by the prior occurrence of e\. So consider e'j (= ej_,) for any 2 ^ j ^ k, and let r denote its time of occurrence. There are two cases to consider. Case 1: e\ = L. If e'j = D, this is feasible since / ' ( r ) = /'(r). If e'j = B , then it is a feasible event since a birth occurs in the realization s2 and S'(r) ^ S2(j). If e'j = I, it is also possible since £'(r) ^ £ ' ( r ) . Case 2: e2 = I. If e'j = D, this is feasible since /'(r) ^ / ' ( r ) . If e'j = B, then it is also a feasible event since S'(T) = S'(r). If e' = L, it is feasible a fortiori since /'(r) ^ / ' ( r ) and5'(r) = 5 1 (r). By moving events in s' (without affecting their order) we obtain a realization s" such that e'{ = e2, t(e") = tie'1), and s"(T) = s2(T). Now consider the restriction of these realizations to the period (t(e2), T). Since m — 1 events occur in this interval, the claim (A.I) follows from the induction hypothesis. Suppose now that sl(T) ^ s2(T). Further suppose, without loss of generality, that S2(T) < N(T). If S](T) = N(T), then we can transform s] to s' satisfying S'(T) < N(T) by inserting a single event at an appropriate time in s]. To see this let r = sup[t e [0, T] \ Sl (f) ^ N(t)}. Now it follows that a death D must occur at time r in s\ and that S'(f) = N(t) must be monotonic increasing on (r, T). There are three possibilities to consider.

N(T+) = 1 < N(t). This implies S(T) = N(T) for any consistent realization s, contradicting the assumption that S2(7") < N(T). N(x+) = 1 = N(T). This implies that no events occur in s after the death at r. Since both S'(r) and / ' (r~) > 0, we can insert an event L, immediately prior to r, to produce the realization s' satisfying S'(T) < N(T). N(T+) > 1. Again insert an event L, immediately prior to the death at r to produce a new realization s'. Any births occurring after r remain feasible since S'(t) ^ 1 for all t e (r, r ) .

GAVIN J GIBSON AND ERIC RENSHAW

It now remains to show that s' can be transformed to s2 via a sequence of permissible realizations by mutations of the form (i)—(iii) defined in Proposition 3.1. Since S'(T), S2(T) < N(T), it follows that by appending appropriate events of type L and I in (T, T'], where T > T, we can 'extend' both s2 and s' to form realizations s2 and s'e which satisfy I2(T') = I'e(J') = N{T'). By (A.I), proved above, we can transform s'e into s2 via a sequence of permissible realizations using only mutations of type (iii). The proposition follows on noting that a mutation of type (iii) which moves an event from (7", T'] to [0, T] corresponds to a type (i) operation on a realization defined on [0, 7"], whilst moving an event from [0, T] to (T, T'] corresponds to a deletion (type (ii)). This completes the proof of the proposition.

40