Approximating first-passage time distributions via sequential Bayesian ...

9 downloads 118 Views 1MB Size Report
Jun 1, 2017 - equations. An important path quantity is the first- passage time (FPT), that is, the time it takes the process to first cross a certain threshold [8, 9].
Approximating first-passage time distributions via sequential Bayesian computation David Schnoerr1 , Botond Cseke3 , Ramon Grima2 , and Guido Sanguinetti1,∗

arXiv:1706.00348v1 [physics.comp-ph] 1 Jun 2017

2

1 School of Informatics, University of Edinburgh, Edinburgh, UK School of Biological Sciences, University of Edinburgh, Edinburgh, UK 3 Microsoft Research, Cambridge, UK

We show that computing first-passage time distributions for stochastic processes is equivalent to a sequential Bayesian inference problem for an auxiliary observation process. The solution can be approximated efficiently by solving a closed set of coupled ordinary differential equations. The method is accurate and orders of magnitude faster than existing approaches, enabling hitherto computationally prohibitive tasks such as sensitivity analysis. We apply it to an epidemic model, an entrained oscillator and a trimerisation process, and show good agreement with exact stochastic simulations.

Many systems in nature consist of stochastically interacting agents or particles. Such systems are typically modelled as continuous-time Markov processes. There are several examples of stochastic modeling in the fields of systems biology [1, 2], ecology [3], epidemiology [4], social sciences [5] and neuroscience [6]. For all but the simplest cases no analytic solutions to the corresponding master or Fokker-Planck equations are known and exact stochastic simulations of such processes are computationally expensive. Significant effort has therefore been spent on the development of approximation methods for efficient computation of systems dynamics [7]. Most methods compute the single-time marginal probability distributions of the process by developing an approximation to the underlying stochastic description. Path properties, in contrast, are not captured by these equations. An important path quantity is the firstpassage time (FPT), that is, the time it takes the process to first cross a certain threshold [8, 9]. FPT distributions play a crucial role both in the theory of stochastic processes and in their applications across various disciplines as they permit to investigate quantitatively the uncertainty in the emergence of system properties within a finite time horizon. For example, the time it takes cells to respond to external signals by expressing certain genes may be modelled as a FPT problem. Different characteristics of this first time distribution, for example its variance, may represent evolutionarily different strategies that organisms adopt to filter fluctuations in the environment [10–12]. Other intriguing phenomena, such as the variation in period of a stochastic oscillator, can be cast as a FPT problem. Fluctuations of such oscillators have recently been found to lead to widely varying differentiation times of genetically identical cells [13]. Examples from other disciplines include the extinction time of diseases in epidemic models, or the time for a species to reach a critical threshold in population dynamics. Barely any approximation methods exist for path properties such as FPTs, and the few existing approaches typically approximate mean FPTs only [14]. The reason for



[email protected]

this methodological gap is conceptual. The FPT probability is the probability that a process reaches a certain state for the first time, which means the probability to be in a certain state at a given time conditioned on the process not having been in this state before. In general, there are no tractable evolution equations for FPT distributions except for one variable, one step processes [15, 16] or certain catalytic processes [17, 18]. The only way to access FPTs is typically given by computationally intensive stochastic simulations. In this article, we approach the problem of computing FPTs from a novel perspective. We show that the FPT problem can be exactly formulated as a Bayesian inference problem. This is achieved by introducing an auxiliary observation model that determines if the process has ever crossed the threshold up to a given time. This novel formulation allows us to derive an efficient approximation scheme that relies on the solution of a small set of ordinary differential equations. We will use this approximation to analyse the FPT distributions in several non-trivial examples. The standard approach to assess the FPT of a process xt to leave a certain region C is to make the boundary of C absorbing, and to compute the survival probability Z[0,t] , that is, the probability that the process remains in C on the time interval [0, t] [16]. The FPT distribution is then given by the negative time-derivative of Z[0,t] . The latter can be written as a path integral over the process with absorbing boundary [16]. Here, we note that an absorbing boundary is mathematically equivalent to an unconstrained process reweighted by an auxiliary observation process p(C[0,t] |x[0,t] ) on the paths x[0,t] such that p(C[0,t] |x[0,t] ) = 1 if xt ∈ C for t ∈ [0, t] and zero otherwise. We can then write the survival probability Z[0,t] up till time t as a path integral over the density p(x[0,t] ) of the unconstrained process as Z Z[0,t] =

Dx[0,t] p(C[0,t] |x[0,t] )p(x[0,t] ),

(1)

Note that (1) is exact. However, it is not obvious how to compute the path integral in (1). To make progress, we approximate the continuous process with paths x[0,t] by a discrete-time version

2 (xt0 , . . . , xtN ) at points t0 = 0, . . . , tN = t with spacing ∆t = t/N . This means that the global observation process p(C[0,t] |x[0,t] ) can be written as a product of local observation process as p(C[0,t] |xt0 , . . . , xtN ) =

N Y

p(Cti |xti ),

(2)

i=0

where p(Cti |xti ) = 1 if xti ∈ C and zero otherwise. This gives the problem a Markovian structure and allows us to cast it into a sequential Bayesian inference problem, as follows. First, we approximate the binary observation factors in (2) by a smooth approximation of the form p(Cti |xti ) ≈ exp (−∆t U (xti , ti )) ,

(3)

where U (xti , ti ) is a smooth function that is large for xti ∈ / C and close to zero for xti ∈ C, with a sharp transition at the boundary. We will give specific examples later on. For now, the only property U (xti , ti ) has to satisfy is that we can compute its expectation analytically with respect to a normal distribution and that it allows a closed-form expression in the moments of this distribution. The smooth approximation in (3) proves computationally expedient in the algorithm below and allows us to take the continuum limit ∆t → 0 since it is the discretised version of the global soft constraint  Z t  p(C[0,t] |x[0,t] ) = exp − dτ U (xτ , τ ) . (4) 0

N Y

p(Cti+1 |C≤ti ),

Z Zt+∆t ≈

(5)

i=1

where C≤ti = {Cti Cti −1 , . . . , Ct0 }. The computation of the factors on the r.h.s. of (5) corresponds to a sequential Bayesian inference problem which can be solved iteratively as follows: (i) Suppose we know p(xti |C≤ti ), and that using this as the initial distribution, are able to solve the system (the master or Fokker-Planck equation) forward in time until time point ti+1 to obtain p(xti+1 |C≤ti ). (ii) To obtain p(xti+1 |C≤ti+1 ), take the constraint in (3) for time point ti+1 into account by means of the Bayes’ theorem as p(xti+1 |C≤ti+1 ) = p(Cti+1 |xti+1 )p(xti+1 |C≤ti )/Zt+∆t , (6)

where we defined Zti+1 = p(Cti+1 |C≤ti ). Note that the latter corresponds to the ith term in (5). Performing steps (i) and (ii) iteratively from t0 to tN one can thus, in principle, compute the survival probability according to (5).

ˆ t+∆t ) exp (−∆t U (x, t + ∆t)) . ˆ t+∆t , Σ dxN (x; µ (7)

Taking the logarithm of both sides, expanding in ∆t both the exponential within the integral and the logarithm, and taking derivatives w.r.t. µ and Σ, one obtains corresponding update equations (see Appendix A for a derivation). The continuum limit ∆t → 0 gives the following closed set of differential equations: ∂ µt = ∂t

 

Due to the Markov property, the survival probability Z[0,t] in (1) for the discrete-time system factorises as Z[0,t] ≈ p(Ct0 )

However, steps (i) and (ii) are generally intractable, and we propose an approximation method in the following. For step (i), we need to solve the system forward in time. We do this approximately by means of the normal moment closure [19], which assumes the single-time probability distribution to be a multi-variate normal distribution N (xt ; µt , Σt ), where µt and Σt are the mean and covariance of the process. This assumption leads to closed evolution equations for µt and Σt which can be solved numerically [7]. Now suppose that we have solved the system forward ˆ t+∆t (step ˆ t+∆t and Σ from time t to t + ∆t to obtain µ (i)). In order to perform the observation update in (6) we approximate the posterior distribution p(xt+∆t |C≤t+∆t ) by a normal distribution with mean µt+∆t and covariance Σt+∆t , an approach known as assumed-density filtering in the statistics literature [20]. The corresponding normalisation Zt+∆t in (6) can be written as

∂ µt ∂t

MC − Σt

∂ hU (t, xt )iN (xt ;µt ,Σt ) , ∂µt

(8)

MC

∂ ∂ ∂ − 2Σt Σt = Σt hU (t, xt )iN (xt ;µt ,Σt ) Σt , (9) ∂t ∂t ∂Σt ∂ log Z[0,t] = −hU (xt , t)iN (xt ;µt ,Σt ) . (10) ∂t

Here, the first terms on the r.h.s. of (8) and (9) are respectively the prior equations for the mean and covariance as obtained from the normal moment closure for the unconstrained process system, while the second terms incorporate the auxiliary observation. Equation (10) gives the desired survival probability for the process. We term this method for computing FPT distributions Bayesian First-Passage Times (BFPT). Equations (8)-(10) are the central result of this article. They constitute closed form ordinary differential equations for the mean, covariance and log-survival probability of the process, for which efficient numerical integrators exist. Solving these equations forward in time on an interval [0, t] provides an approximation of the survival probability Z[0,t] for τ ∈ [0, t] (on the time grid of the numerical integrator), from which the FPT distribution p(τ ; C) can be derived for all τ ∈ [0, t] by taking the negative derivative of Z[0,t] . We note that similar equations were obtained in a different context in [21] by means of a variational approximation without the need for timediscretisation. Note that in the derivation of (8) - (10) we utilised three approximations: we approximated the unconstrained process using normal moment closure, the binary

3 (�) mean , variance & mode of FPT

Population numbers

20

15

S I

10

R 5

0 0.0

0.5

1.0

1.5

0.15

4 3

FPT distribution

(�)

2 1

SSA mean SSA variance SSA mode

0

time

5

BFPT mean BFPT variance BFPT mode 10

15

FPT distribution

FPT distribution

36

1.2

0.8 0.6 0.4 0.2

1

2

3

4

5

6

7

0.6 0.4

0

1

2

3

4

5

6

time

FIG. 1. Results for the epidemic system defined in (11). (a) shows a simulated sample path of the process. (b) shows the mean, covariance and the mode of the FPT distribution of species I becoming extinct as a function of the initial populations x0 of species S, both obtained from the stochastic simulation algorithm (SSA, dots, 106 samples per point) and from BFPT (lines). The initial value for species I is set to y0 = 2x0 . (c) and (d) show the FPT distributions as obtained from the stochastic simulation algorithm (SSA, dots, 106 samples per parameter set) and BFPT (lines) for the parameter set (x0 , y0 , k1 , k2 ) chosen as (6, 1, 0.25, 1) (blue, (c)), (20, 10, 0.5, 1) (red, (c)), (20, 1, 0.5, 2) (blue, (d)), (40, 10, 0.25, 1) (red, (b)). The power in the potential in (12) was chosen to be n = 30 for all figures.

observation model by a soft potential and the observation updates by projections onto a normal distribution. Depending on the circumstances, the relative contribution of the three sources to the overall error may vary. We now examine the performance of BFPT on three numerical examples. First, we consider an epidemic system consisting of a susceptible population S, an infected population I and a recovered population R and interactions k

k

2 I −−−− −→ R.

(11)

This system is frequently modelled as a continuoustime Markov-jump process to model a disease spreading through a population. k1 and k2 in (11) are the corresponding rate constants. Let xt = (zt , yt , zt ), where xt , yt and zt denote the populations of S, I and R, respectively. We are interested in the probability distribution of time for the disease to be permanently eradicated, that is, the time it takes the process to reach a state with yt = 0. To approximate this extinction region by a soft constraint as in (3) we use a potential U (t, xt ) = (yt − b)n /an ,

42

44

46

48

50

0.8

time

1 S + I −−−− −→ 2I,

40

time

0.0 0

38

BFPT SSA

1.0

0.2

0.0

0.05

0.00

(�)

1.0

SSA 0.10

20

x0

(�)

BFPT

a ∈ R+ , b ∈ R, n ∈ Z and even. (12)

We found a polynomial potential as in (12) numerically more stable than for example an exponential potential of the form exp(−n(x − b)) and will use this form also for following examples. For more details on the choice of U (t, xt ) see the Appendix B.

7

FIG. 2. Results for the entrained oscillator defined in (13) obtained from the stochastic simulation algorithm (SSA, dots, 2 × 103 samples) and BFPT. The bars at the bottom indicate the mean value and the mean ± standard deviation of one oscillatory period. The black dashed line indicates the period of the external input. The used parameter values are k0 = k1 = 2, k2 = 0.1, k3 = 0.0005, k4 = 0.2, a = 0.1 and p = 44.5. The power in the potential in (12) was chosen to be n = 20.

Figure 1 (b) shows the mean, variance and mode of the FPT to extinction as obtained from our method and the stochastic simulation algorithm [22]. We find that BFPT captures the mean value of the FPT over a wide range of varying initial values for S and I. The variance and mode are also captured well with errors decreasing for larger initial values. Figure 1 (c) and (d) show the FPT distributions for four different parameter sets. The modality, mode and overall shape of the FPT are well captured, even for highly skewed and bimodal distributions (c.f. blue curve in Fig. 1 (c) and (d), respectively). In some cases the method predicts less peaked distributions than actual (not shown here). The value of the approach is borne out by considering its computational efficiency: for the results shown in Fig. 1, BFPT is several orders of magnitude faster than stochastic simulations. For example, simulating 104 paths to obtain the results shown in Fig. 1 (d), takes about 102 seconds in our implementation of the direct stochastic simulation algorithm [22], whereas BFPT takes less than a second. Crucially, while stochastic simulations scale with the number of transitions per unit time and thus with the average population size, BFPT scales with the number of species. Therefore, for larger population sizes stochastic simulations will become computationally more and more expensive while BFPT’s cost can be expected to remain of the same order. Next, we study a stochastic oscillator that becomes entrained by an external input. Such models are frequently used to model circadian oscillators in plants [23]. We are interested in the fluctuations of the oscillatory period of the stochastic process. To this end, we use a prey-predator system of the Lotka-Voltera type consisting of a prey species X and predator species Y that interact via k

0 ∅ −→ X,

k

k

1 ∅ −→ Y,

3 Y + X −→ Y + Y,

(a sin(2πt/p)+1)k2

X −−−−−−−−−−−−→ X + X, k

(13)

4 Y −→ ∅.

Note that the rate of the third reaction, which resembles

4 (�) 0.25

BFPT

0.20

SSA

Mean FPT

actions

-6.0 �

-6.5

0.15

log(k 2 )

FPT distribution

(�)

0.10 0.05



-7.0 -7.5

6

8

10

12

14



-8.0

time

-8.0 -7.5 -7.0 -6.5 -6.0



log(k1 ) (�)

(�)

Coefficient of Variation

1.0

0.225

-6.0

0.8 0.200

log(k 2 )

-6.5

0.175

0.6

-7.0 0.150

0.4

-7.5 0.125 0.2

-8.0

k

2 XX + X −→ XXX.

3

0.00 4

k

1 X + X −→ XX,

0.100

-8.0 -7.5 -7.0 -6.5 -6.0

log(k1 )

FIG. 3. Results for the polymerisation system in (14). (a) shows the FPT distribution for the parameters k1 = k2 = 10−3 obtained from the stochastic simulation algorithm (SSA, dots, 104 samples). (b) and (c) show respectively heat plots of the mean and the coefficient of variation (defined as standard deviation divided by mean) of the FPT to produce 200 trimers starting with 103 monomers, as a function of k1 and k2 on logarithmic scale. (d) shows a corresponding 3D-plot for the normalisation of the FPT distribution, that is, the probability with which at least 200 trimers are being produced. The white areas in the first two plots indicate that either the value is larger than the plotted range or that the target state is reached with such small probability that a estimation of moments is not sensible. The power in the potential in (12) was chosen to be n = 40.

reproduction of prey X, oscillates around a mean value k2 with amplitude a and period p. This could for example model the changing availability of food resources due to seasons over the period of a year. Figure 2 shows the distribution of one oscillatory period. The oscillatory period is computed by estimating the FPT for the process starting from its maximum minus half a standard deviation to first reach this value again after one period (see Appendix C for details). Again BFPT is in excellent agreement with simulation results, but in this case the difference in computational costs is even more marked: due to the time-dependence of the third reaction in (13), this system cannot be simulated by standard simulation algorithms [22], requiring a computationally intensive modified simulator [24]. In this example, simulating 1000 periods to produce the results shown in Fig. 2 took about 103 hours implementation. The BFPT result in Figure 2, in contrast, took only 0.1 seconds meaning that BFPT is about 7 orders of magnitude faster than the exact simulation. Finally, we apply BFPT to a polymerisation system of monomers X, dimers XX and trimers XXX with inter-

0

(14)

Starting from a fixed number of 10 of monomers, zero dimers and zero trimers, we are interested in the FPT it takes to produce 200 trimers. We are interested in exploring the dependence of this FPT distribution on the parameters of the system (dimerisation and trimerisation rate k1 and k2 respectively); such parameter exploration is computationally too demanding to be performed by brute force simulation without access to dedicated hardware since the FPT distribution needs to be estimated for a large number of parameter sets. Figure 3 shows the results for this process. We observe excellent agreement between BFPT and simulations for a particular value of the parameters (Fig. 3 (a)). The heat plot for the mean as a function of k1 and k2 indicates that for a given trimerisation rate k2 a minimal mean FPT is achieved for an intermediate value of dimerisation rate k1 . We find a linear relationship k2 ≈ 2.3k1 between the two rates for the location of these minima. The variance of FPT behaves quantitatively similarly (not shown in the figure). The coefficient of variation (Fig. 3 (c) ), however, becomes minimal for small values of k1 for a given k2 . This reveals an unexpected tradeoff between an optimal mean FPT and optimal noise-to-mean ratio (coefficient of variation). Figure 3 (d) shows the probability that the target state is reached, that is, the probability that at least 200 trimers are being produced. We find that there are two parameter regions, one with probability close to one and one with probability close to zero, and a small transition range between these two with boundary k2 ≈ 0.55k1 . In conclusion, we have shown that the problem of computing survival probabilities and FPT distributions for stochastic processes can be formulated as a sequential Bayesian inference problem. This novel formulation opens the way for a new class of efficient approximation methods from machine learning and computational statistics to address this classical intractable problem. Here, we derived one approximation which relies on solving a small set of ordinary differential equations and which was found to combine distributional accuracy with high computational efficiency for several examples. This efficiency can be leveraged to perform hitherto computationally prohibitive systematic analyses such as parameter sweeps and opens the way to quantitatively understand how physical parameters interact to determine emergent behaviours in stochastic systems. The approach can in principle also be used to compute more general path properties, such as general reachability constraints in computer science; it would be interesting to investigate extensions of this approach to address intractable problems for high dimensional and spatio-temporal processes. This work was supported by the Leverhulme Trust [RPG-2013-171]; and the European Research Council [MLCS 306999].

5

[1] A. Eldar and M. B. Elowitz, Nature 467, 167 (2010). [2] R. Grima, Physical Review Letters 102, 218103 (2009). [3] A. J. McKane and T. J. Newman, Physical Review Letters 94, 218102 (2005). [4] M. I. Dykman, I. B. Schwartz, and A. S. Landsman, Physical Review Letters 101, 078101 (2008). [5] J. Fern´ andez-Gracia, K. Suchecki, J. J. Ramasco, M. San Miguel, and V. M. Egu´ıluz, Physical Review Letters 112, 158701 (2014). [6] T. Betz, D. Lim, and J. A. K¨ as, Physical Review Letters 96, 098103 (2006). [7] D. Schnoerr, G. Sanguinetti, and R. Grima, Journal of Physics A: Mathematical and Theoretical 50, 093001 (2017). [8] M. R. D’Orsogna and T. Chou, Physical Review Letters 95, 170603 (2005). [9] S. Condamin, O. B´enichou, V. Tejedor, R. Voituriez, and J. Klafter, Nature 450, 77 (2007). [10] E. Kussell and S. Leibler, Science 309, 2075 (2005). [11] S. T˘ anase-Nicola, P. B. Warren, and P. R. Ten Wolde, Physical Review Letters 97, 068102 (2006). [12] T. J. Kobayashi, Physical Review Letters 104, 228104 (2010). [13] N. E. Phillips, C. S. Manning, T. Pettini, V. Biga, E. Marinopoulou, P. Stanley, J. Boyd, J. Bagnall, P. Paszek, D. G. Spiller, et al., eLife 5, e16118 (2016). [14] M. F. Weber and E. Frey, Reports on Progress in Physics 80, 046601 (2017). [15] C. W. Gardiner, Handbook of Stochastic Methods, Vol. 3 (Springer Berlin, 1985). [16] S. Redner, A guide to first-passage processes (Cambridge University Press, 2001). [17] R. Grima and A. Leier, The Journal of Physical Chemistry B 121, 13 (2016). [18] K. R. Ghusinga, J. J. Dennehy, and A. Singh, Proceedings of the National Academy of Sciences 114, 693 (2016). [19] L. A. Goodman, Biometrics 9, 212 (1953). [20] P. S. Maybeck, Stochastic models, estimation, and control, Vol. 3 (Academic press, 1982). [21] B. Cseke, D. Schnoerr, M. Opper, and G. Sanguinetti, Journal of Physics A Mathematical General 49, 494002 (2016). [22] D. T. Gillespie, Journal of Computational Physics 22, 403 (1976). [23] P. O. Westermark, D. K. Welsh, H. Okamura, and H. Herzel, PLoS Computational Biology 5, e1000580 (2009). [24] D. F. Anderson, The Journal of Chemical Physics 127, 214107 (2007).

we can thus expand as MC  ∂ ˆ t+∆t = µt + ∆t + O(∆t2 ), µt µ ∂t  MC ˆ t+∆t = Σt + ∂ Σt Σ ∆t + O(∆t2 ). ∂t

(A1) (A2)

These are the moments of the distribution at time t + ∆t prior to the update in (6): ˆ t+∆t ). ˆ t+∆t , Σ p(xt+∆t |C≤t ) = N (x; µ

(A3)

Using the update equation (6) and the exponential form of the constraint given in (3) one finds that the normalisation can be written as stated in (7). Using this one can derive the relations ∂ ˆ ˆ Σ ∂ µ/ =h

log Zt+∆t ∂

ˆ ˆ Σ ∂ µ/

ˆ t+∆t )iN (x;µ ˆ t+∆t , Σ log N (x; µ . t+∆t ,Σt+∆t ) (A4)

Taking the logarithm of (7) and expanding in ∆t we further find 2 log Zt+∆t = −∆thU (x, t + ∆t)iN (x;µˆ t+∆t ,Σ ˆ t+∆t ) + O(∆t ). (A5)

Using (A4) and (A5) this leads to the update equations ˆ t+∆t µt+∆t = µ (A6) ∂ 2 ˆ t+∆t − ∆tΣ hU (x, t + ∆t)iN (x;µˆ t+∆t ,Σ ˆ t+∆t ) + O(∆t ), ˆ ∂µ ˆ t+∆t Σt+∆t = Σ (A7)   ∂ ˆ t+∆t ˆ − 2∆tΣ hU (x, t + ∆t)iN (x;µˆ t+∆t ,Σ ˆ t+∆t ) Σt+∆t . ˆ t+∆t ∂Σ

Using log Zt+∆t = log Z[0,t+∆t] − log Z[0,t] ,

(A8)

in (A5) one can derive the discrete-time version of (10). Similarly, plugging (A1) or(A2) into (A6) or(A7) leads after some rearranging to the discrete-time version of (8) and (9), respectively. Appendix B: Choice of potential

Appendix A: Derivation of main result

Here, we derive the main results of this work given in Equations (8)-(10). To this end, consider step (i) of the iterative scheme presented in the main text and suppose we have solved the corresponding moment closure equaˆ t+∆t which ˆ t+∆t and Σ tions from t to t + ∆t to obtain µ

We approximate the binary observation process by a soft constraint of exponential form (3) with polynomial potential (12). Note that this (softly) confines the process to the interval xt ∈ [b − a, b + a]. However, in the examples studied here we want to confine only from either above or below. In the polymerisation system defined in (14), for example, we want to confine the population

6 of trimers to be smaller than 200. Choosing a = 200 and b = 0, we confine the trimer numbers to [−200, 200]. However, since the probability mass for negative values is vanishingly small (zero for the true, discrete process), this is basically the same as confining it to (−∞, 200]. Similar arguments apply to the other studied examples. This leaves us with the choice of the power n. A larger n corresponds to a steeper potential and hence to a better approximation of the binary observation process. However, a larger n also leads to stronger non-linearity of the equations (8)-(10). This will typically limit n for numerical feasibility. Optimally, one would hope the results to converge in the power n for values for which the equations are still numerically feasible. For some of the examples studied we indeed found this to be the case. For others, we chose n as large as numerically feasible. A more systematic way of choosing n is left for future work.

Appendix C: Period of entrained oscillator

Figure 2 in the main text shows the distribution of one oscillatory period of the entrained oscillator defined in

(13). Note that there are various potential ways of how to measure the oscillatory period. Here, we use a definition that allows a consistent comparison of stochastic simulations and our BFPT method. Specifically, we first estimate the maximum of the mean of species X and estimate the standard deviation in the maximum. We then take this maximal value minus half the standard deviation as a deterministic initial condition for the process (with external input shifted appropriately) and measure the time it takes the process to reach this inital value again after having dropped to less than the overall average mean value.

Note that not all trajectories of the process will reach a fixed target value at each period. This means that the FPT distribution does not integrate to one over one period. To get a statistically stronger estimate of the fluctuations in the period, we take the maximal mean value minus a half a standard deviation rather than just the maximal mean value, since this significantly increases the mass of the FPT distribution within one oscillatory period.