Wright-Fisher diffusion bridges, Coalescent processes in Wright-Fisher ...

arXiv:1703.00208v1 [math.PR] 1 Mar 2017

Wright-Fisher diffusion bridges Robert Griffiths∗

Paul A. Jenkins†

`‡ Dario Spano

March 2, 2017 ∗

Corresponding author. Version 1.7

Abstract This paper considers the coalescent genealogy in a Wright-Fisher diffusion process, modelling the frequency of an allele, when the frequency is fixed in a bridge at two times 0 and T . Particular attention is paid to beginning and ending at zero frequency. The coalescent genealogy of the population at time t is described by two coalescent processes from t to 0 and from t to T . If the frequencies are fixed at zero then the coalescent processes have single lineages at 0 and T . Genealogy of the neutral Wright-Fisher bridge is also modeled by branching Pólya urns, extending a representation in a Wright-Fisher diffusion. An algorithm for exact simulation from the density of a neutral Wright-Fisher bridge is derived. This follows from exact simulation algorithms for the Wright-Fisher density. The genealogy in a Wright-Fisher diffusion with selection is more complex than in a neutral model. It is a branching coalescing random graph. We consider theory analogous to the neutral model bridge theory as much as possible. This paper is dedicated to the memory of Paul Joyce. Keywords: Wright-Fisher diffusion bridges, Coalescent processes in WrightFisher diffusion bridges. 1.

Introduction

Schraiber et. al. (2013) investigated bridges in a Wright-Fisher process in a neutral model and a model with selection. There is particular interest in bridges where the allele frequency begins at 0 and ends at 0 in time T . ∗

Department of Statistics, University of Oxford; email: [email protected] Department of Statistics and Department of Computer Science, University of Warwick; email: [email protected] ‡ Department of Statistics, University of Warwick; email: D.Span` [email protected] †

1

Care is needed when beginning a bridge at zero frequency, conditioning on a non-zero sample path from 0 to T . The neutral model genealogy plays a role in obtaining theoretical results in Schraiber et. al. (2013). In the current paper the genealogy in bridges is brought out more fully, and the genealogy in a model with selection is also investigated in a different approach from the clever simulation scheme in Schraiber et. al. (2013). The neutral model and the model with selection are both reversible before fixation occurs which implies a bridge from time 0 to T looks the same probabilistically in reverse time from T to 0. The genealogy of bridges from 0 to 0 frequency is interesting in that if the frequency of the allele under consideration at time t is y, then coalescence of the genes of frequency y must happen in two directions to a single lineage from t to time 0 and from t to time T . This is illustrated in Figure 3 in a neutral model. Models with no mutation are considered, and models where mutation is away from the allele type under consideration. The model is a version of the infinitelymany-alleles model, where mutant types are lost by random drift and new mutations occur at rate θ/2. Genealogy of the neutral Wright-Fisher bridge is also modeled by branching P´ olya urns, extending a representation in a Wright-Fisher diffusion by Griffiths and Span` o (2010). An algorithm for exact simulation from the density of a neutral WrightFisher bridge is derived. This follows from exact simulation algorithms for the Wright-Fisher density in Jenkins and Spanò (2017). The genealogy in a Wright-Fisher diffusion with selection is more complex than in a neutral model. It is a branching coalescing random graph. We consider theory analogous to the neutral model bridge theory as much as possible, using the transition probabilities of Barbour et al. (2000) for the brancing coalescing genealogy. An illustration of the genealogy is in Figure 4. 2.

The Wright-Fisher diffusion process

Let A and a be two types of a gene in a population of individuals. A Wright-Fisher diffusion process modelling the relative frequency of A genes over time has a generator ! 1 ∂2 1 ∂ L = x(1 − x) 2 + x(1 − x)γ + (θ1 − θx) . (1) 2 ∂x 2 ∂x The population is subject to selection, whose sign and strength are described by γ; and mutations a → A and A → a which occur respectively at rates 2

θ1 /2, θ2 /2, with notation θ = θ1 + θ2 . Neutral Wright-Fisher diffusion Suppose γ = 0 so there is no selection in the model. There are two forms of a transition density expansion of fθ1 ,θ2 (x, y; t). The spectral expansion was derived by Kimura (1964) as ∞ n o X fθ1 ,θ2 (x, y; t) = Bθ1 ,θ2 (y) 1 + e−n(n+θ−1)t/2 Pbn(θ1 ,θ2 ) (x)Pbn(θ1 ,θ2 ) (y) , (2) n=1

for θ1 , θ2 ≥ 0, where Bθ1 ,θ2 (y) = B(θ1 , θ2 )−1 y θ1 −1 (1 − y)θ2 −1 , 0 < y < 1 (θ θ ) is the Beta (θ1 , θ2 ) density and {Pbn 1 2 (y)}∞ n=0 are orthonormal polynomials on the Beta density derived from scaling the usual Jacobi polynomials (a,b) {Pen (r)}∞ n=0 which are well defined for a, b ≥ −1 and are orthogonal on a (1 − r) (1 + r)b , −1 < r < 1. See, for example Ismail (2005). An expression for these polynomials is

Pen(a,b) (r) =

(a + 1)(n) 2 F1 − n, a + b + n + 1; a + 1; (1 − r)/2 , n!

where 2 F1 is a hypergeometric function. In terms of these polynomials the expansion (2) is ( ∞ X θ1 −1 θ2 −1 fθ1 ,θ2 (x, y; t) = y (1 − y) e−n(n+θ−1)t/2 cn (θ1 , θ2 ) n=0

) × Pen(θ2 −1,θ1 −1) (r)Pen(θ2 −1,θ1 −1) (s) ,

(3)

where r = 2x − 1, s = 2y − 1, and cn (θ1 , θ2 ) =

n!Γ(n + θ1 + θ2 − 1)(2n + θ1 + θ2 − 1) . Γ(n + θ1 )Γ(n + θ2 )

The expansion (3) holds for θ1 , θ2 ≥ 0 taking care with the starting summa(θ −1,θ1 −1) tion index n. If θ1 , θ2 > 0 then the starting index is n = 0, Pe0 2 ≡1 and the first term is c0 (θ1 , θ2 ) = B(θ1 , θ2 )−1 . If one of θ1 , θ2 is zero and the other non-zero then the summation begins from n = 1, and if θ1 = θ2 = 0 3

then the summation begins from n = 2. In the last case the expansion found by Kimura was the form f0,0 (x, y; t) = x(1 − x)

∞ X

e−i(i+1)t/2 (2i + 1)i(i + 1)Ri−1 (r)Ri−1 (s).

(4)

i=1

e(1,1) (r)} orThe polynomials {Ri (r)}∞ i=0 are scaled Jacobi polynomials {P thogonal on x(1 − x), 0 < x < 1 such that Ri (1) = 1. There is an identity between these polynomials and the Jacobi polynomials with index (−1, −1) shown later in the paper as (51). If θ1 , θ2 > 0 then {X(t)}t≥0 is reversible with respect to the stationary Beta (θ1 , θ2 ) distribution; or if at least one of θ1 , θ2 is zero {X(t)}t≥0 is reversible, before being fixed, with respect to the speed measure. The Yaglom limit density lim R 1

t→∞

0

f0,0 (x, y; t) f0,0 (x, y; t)dy

when either or both of θ1 , θ2 are zero is straightforward to obtain from (3) as being proportional to the first orthogonal polynomial times the speed measure density. If θ1 = 0, θ2 = θ the density is constant × y × y −1 (1 − y)θ−1 = θ(1 − y)θ−1 , 0 < y < 1. If θ1 = θ2 = 0 the density is constant × y(1 − y) × y −1 (1 − y)−1 = 1, 0 < y < 1. A second form of the transition density depends on the coalescent genealogy of the population. Let Lθ (t) be the number of non-mutant lineages at time t back in a coalescent tree, beginning with Lθ (0) leaves, which can be finite or infinity. Denote the transition functions of Lθ (t) when Lθ (0) = ∞ by P (Lθ (t) = l | Lθ (0) = ∞) = qlθ (t). An explicit expression (Griffiths, 1980; Tavaré, 1984) is qlθ (t) =

∞ X

ρθk (t)(−1)k−l

k=l

(2k + θ − 1)(l + θ)(k−1) , l!(k − l)!

where ρθk (t) = exp{− 12 k(k + θ − 1)t}. The falling factorial moments of Lθ (t) are known from Griffiths (1980); Tavaré (1984) as ∞ θ X l−1 θ E L (t)[k] = ρl (t)(2l + θ − 1) (θ + l)(k−1) . (5) k−1 l=k

4

Notation in this paper is that for an integer k ≥ 0, a[k] = a(a−1) · · · (a−k+1) and a(k) = a(a + 1) · · · (a + k − 1). Let {Tl }∞ l=1 be times between events P when there are l and l − 1 non-mutant lineages. The transition density of ∞ k=l Tk is l(l + θ − 1) θ ϕθl (t) = q (t), t > 0. (6) 2 The transition density of X(t) can be expressed as a mixture of Beta densities ∞ l X X l k θ fθ1 ,θ2 (x, y; t) = ql (t) x (1 − x)l−k Bk+θ1 ,l−k+θ2 (y) (7) k l=0

k=0

The expansion (7) is a special case of a very general model in Ethier and Griffiths (1993). It appeared first in implicitly in Griffiths (1980) and then written formally in Griffiths and Li (1983). The range of the summation index k depends on whether θ1 , θ2 are zero or positive. If θ1 , θ2 > 0 then 0 ≤ k ≤ l; if θ1 = 0, θ2 > 0 then 1 ≤ k ≤ l; if θ1 > 0, θ2 = 0 then 0 ≤ k ≤ l − 1; and if θ1 = 0, θ2 = 0 then 1 ≤ k ≤ l − 1. Griffiths and Span` o (2010) show the algebraic connection between the two forms of the transition density (2) and (7). The transition density is improper if either of θ1 or θ2 are zero because the population may be fixed for one of the allele types by time t when X(t) = 0 or X(t) = 1 with positive probability. Then R1 f 0 θ1 ,θ2 (x, y; t)dy is the probability that the population is not fixed by time t. If θ1 = 0, θ2 = 0 then Z

1

f0,0 (x, y; t)dy = 0

∞ X

ql0 (t) 1 − xl − (1 − x)l .

(8)

l=2

If θ1 = 0, θ2 = θ > 0 Z

1

f0,θ (x, y; t)dy =

∞ X

0

qlθ (t) 1 − xl

(9)

l=2

and if θ1 = θ, θ2 = 0 Z

1

fθ,0 (x, y; t)dy = 0

∞ X

qlθ (t) 1 − (1 − x)l .

(10)

l=2

A genealogical interpretation when θ1 = θ2 = 0 is that the coalescent tree backward in time of the individuals beginning at t has l edges at the origin. The joint distribution of the l family sizes at time t is Dirichlet 5

(1, . . . , 1). The types of the families at t are determined by the identity of the l edges at time 0. The probability that an edge is of type A is x, and a is 1 − x. The second summation in (7) takes account that there are k edges of type A and l − k of type a. The population is not fixed for one of the allele types if and only if l ≥ 1 and l − k ≥ 1. Grouping the family sizes into k and l − k in the Dirichlet distribution gives the distribution of the frequency of A alleles as Beta (k, l − k). This description is illustrated in Figure 1. If θ > 0 then families at time t can be labelled by those from non-mutant edges at time zero and those from each mutation occurring on the coalescent tree. The joint distribution of the family sizes if there are l non-mutant edges at time zero is V U , (1 − V ){w(j) }

(11)

where U has a l-dimensional Dirichlet(1, 1, . . . , 1) distribution; V has a Beta (l, θ) distribution; {w(j) } has a Poisson Dirichlet (θ) distribution and the three random variables are independent. This genealogical interpretation occurs in Griffiths (1980); Ethier and Griffiths (1993). If there are two types and θ1 , θ2 > 0 then choosing the type of the non-mutant edges as A with probability x and mutant types to be of type A with probability θ1 /θ gives the density (7). If θ1 = 0, θ2 > 0 the non-mutant edges type is chosen in the same way and all mutations are of type a instead. Figure 1: Old and new mutant families ×

×

t

t=0

6

Wright-Fisher diffusion with selection If γ 6= 0 selection is present in the model. When θ1 , θ2 > 0 the stationary density of {X(t)}t≥0 is a weighted Beta density c(θ)−1 eγx B(θ1 , θ2 )−1 xθ1 −1 (1 − x)θ2 −1 , 0 < x < 1, (12) where c(θ) = E eγX , with expectation in a Beta (θ1 , θ2 ) distribution. {X(t)}t≥0 is reversible with respect to the stationary distribution if θ1 , θ2 > 0 and reversible with respect to the speed measure if one or both of θ1 , θ2 are zero. The eigenvalues and eigenfunctions {λj , wj } satisfy Lwj = λj wj , or ! 1 1 x(1 − x)wj00 + γx(1 − x) + (θ1 − θx) wj0 + λj wj = 0. (13) 2 2 The eigenfunctions are orthogonal on the weighted Beta distribution or speed measure, but are not polynomials. If θ1 , θ2 > 0 then λ0 = 1 and w0 = 1. There is no explicit expression for the eigenvalues. Kimura (1955) found an expression for the eigenfunctions in terms of spheroidal wave functions when θ1 = θ2 = 0. If θ > 0 then the eigenfunctions cannot be expressed as spheroidal wave functions multiplied by a function not depending on the eigenfunction. Song and Steinr¨ ucken (2012) expand the eigenfunctions in terms of Jacobi polynomials and then use numerical methods to find a computational solution of the transition density. The Yaglom limit density when either or both of θ1 , θ2 are zero is still proportional to the speed measure times the 1st eigenfunction, which does not have a simple form. If θ1 = θ2 = 0, w can be expressed in terms of a spheroidal wave function. This was known by Kimura (1955). We consider the expression in more detail later in the paper. Another genealogical form of the transition density, derived in Barbour et al. (2000), is X bα (t, x)π[α + θ](y), (14) fθ1 ,θ2 ,γ (x, y, t) = α∈Z2+

where π[θ](y) = c(θ)−1 B(θ1 , θ2 )−1 eγy y θ1 −1 (1 − y)θ2 −1 . The expansion is analogous to (2) in the neutral model. If θ1 > 0, θ2 > 0 then π[θ] is the stationary density of {X(t)}t≥0 . bα (t, x) are transition functions in a two-dimensional birth and death process α(t) beginning with an entrance boundary at infinity where the relative frequencies of the types 7

are x, 1 − x. If θ1 > 0, θ2 > 0 summation is over α1 ≥ 0, α2 ≥ 0; if θ1 = 0, θ2 > 0 summation is over α1 ≥ 1, α2 ≥ 0; if θ1 > 0, θ2 = 0 summation is over α1 ≥ 0, α2 ≥ 1; and if θ1 = 0, θ2 = 0 summation is over α1 ≥ 1, α2 ≥ 1. The transition rates of {α(t)}t≥0 when γ > 0 are 1 c(θ + α − ei ) αi (|α| + θ − 1) , i = 1, 2 2 c(θ + α) |α| c(θ + α + e2 ) 1 γ(θ2 + α2 ) q(α, α + e2 ) = 2 |α| + θ c(θ + α) 1 q(α, α) = − (α2 γ + |α|(|α| − 1)) . 2 q(α, α − ei ) =

(15)

bα (t, x) satisfies the usual forward equations ∂ bα (t, x) = −q(α, α)bα (t, x) ∂t ! 2 X q(α + ei , α)bα+ei (t, x) + q(α − ei , α)bα−ei (t, x) , (16) + i=1

and the backwards equation ∂ bα (t, x) = Lbα (t, x), ∂t where L is the generator (1). The boundary condition is |α| α1 bα (0, x) = x (1 − x)α2 , α

(17)

(18)

the same as in a neutral model. If θ1 = 0, θ2 = 0 the rates simplify to 1 c(α − ei ) αi (|α| − 1) , i = 1, 2 2 c(α) c(α + e2 ) 1 q(α, α + e2 ) = γα2 2 c(α) 1 q(α, α) = − (α2 γ + |α|(|α| − 1)) . 2 q(α, α − ei ) =

(19)

The frequencies y1 = y, y2 = 1 − y can be decomposed into independent PD(θ1 +α1 ) and PD(θ2 +α2 ) processes {w1i }, {w2i } which are then weighted as V {w1i }, (1−V ){w2i } where V is independent of the two processes and has a density π[θ + α](v). This representation is then at the level of individual family sizes either beginning from the origin, or new mutations. 8

3.

Wright-Fisher diffusion bridges

Let {X x,z,[0,T ] (t)}0≤t≤T be a Wright-Fisher diffusion process {X(t)}t≥0 conditioned on X(0) = x and X(T ) = z. Notation is similar to that used in Schraiber et. al. (2013). The transition density of X(t)x,z,[0,T ] at v given X(s)x,z,[0,T ] = u is fx,z,[0,T ] (u, v; s, t) =

f (u, v; t − s)f (v, z; T − t) . f (u, z; T − s)

(20)

The bridge is a time-inhomogeneous diffusion with a generator acting on a test function g at time s, when θ1 = θ2 = 0 of 1 ∂ ∂g Lx,z,[0,T ];s g(u) = u(1 − u) γ+ log f (u, z; T − s) (u) 2 ∂u ∂u 1 ∂2g + u(1 − u) 2 (u). (21) 2 ∂u Sample paths of (0, 0) bridges when γ = 0 with T = 0.8 simulated from the stochastic differential equation equivalent to the generator are illustrated in Figure 2. Figure 2: Wright-Fisher diffusion (0,0) bridges with T = 0.8. 0.5

0.5

xt

xt

0.0

0.0 0.0

t

0.8

0.9

0.0

t

0.8

0.9

xt

xt

0.0 0.0

t

0.8

0.0 0.0

9

0.8

There is a very elegant theory of h-transforms and bridges. The probability distribution of bridges is invariant under mapping the transition density f (x, y; t) → f (x, y; t)h(y)/h(x). As in Schraiber et. al. (2013) consider the joint density of the h-transformed bridge process at times 0 < t1 < · · · < tn < T . The density is h(v2 ) h(y) 1) f (x, v1 ; t1 ) h(v h(x) f (v1 , v2 ; t2 − t1 ) h(v1 ) · · · f (vn , y; T − tn ) h(vn ) h(y) h(x) f (x, y; T )

=

f (x, v1 ; t1 )f (v1 , v2 ; t2 − t1 ) · · · f (vn , y; T − tn ) . f (x, y; T )

(22)

This shows the invariance of the bridge process under h-transforms. In a similar way the bridge density is invariant under a h transform depending on time h(x, t) = h(x)eλt , where λ is a constant. In a model where loss or fixation of an allele is certain let τ be the time when this occurs. Choose h(x) to be the first eigenfunction of the generator satisfying Lh(x) = −λ1 h(x), h(x) ≥ 0, where λ1 > 0 is the largest eigenvalue. Then Px (τ > T ) lim −λ1 T = constant. T →∞ e h(x) The h-transformed transition functions f (x, y; t)

h(y) e−λ1 t h(x)

(23)

can be thought of as asymptotic transition functions of a process conditioned not to be lost or fixed. Let T > t, then the transition functions of a process conditioned on τ > T are f (x, y; t)

Py (τ > T − t) Px (τ > T )

which converges to (23) as T → ∞. The generator of a process with infintesimal variance σ 2 (x) and mean µ(x) which is h-transformed is ∂ 1 2 ∂2 d σ (x) 2 + σ 2 (x) log h(x) + µ(x) 2 ∂x dx ∂x and the stationary distribution is h(y) R1 0

h(x)dx 10

.

(24)

A more formal description of conditioning in general diffusion processes is the first theorem in Pinsky (1985). Reversibility of {X(t)}t≥0 implies that the probabilistic behaviour of the bridge process should look the same from 0 to T as it does backwards in time from T to 0. Let π(x) be the density of the reversing measure so that π(x)f (x, y; t) = π(y)f (y, x; t). Now consider the h-transform with h(x) = π(x)−1 . The transform reverses the path with (22) evaluating to f (y, vn ; T − tn )f (vn , vn−1 ; tn−1 ) · · · f (v1 , x, t1 ) , f (y, x; T )

(25)

which is the path density back in time from T to 0 of a bridge from y to x. Theorem 1. (Schraiber et. al., 2013) The density of X x,z,[0,T ] (t) and the reverse form is f (x, y; t)f (y, z; T − t) fx,z,[0,T ] (y; t) = f (x, z; T ) f (z, y; T − t)f (y, x; t) = f (z, x; T ) = fz,x,[0,T ] (y; T − t). (26) Neutral Wright-Fisher bridges In a neutral model when γ = 0 and θ1 = θ2 = 0 a particular h-transform depending on t gives an interesting identity between transition functions with no mutation and those with mutation rates θ1 = θ2 = 2. Let f2,2 (x, y; t) = et f0,0 (x, y; t)y(1 − y)/x(1 − x).

(27)

Then to show the identity consider the following calculation with the backward generator. ∂ ∂2 y(1 − y) 1 f2,2 = f2,2 + et · x(1 − x) 2 f0,0 ∂t x(1 − x) 2 ∂x 2 1 ∂ = f2,2 + x(1 − x)f2,2 2 ∂x2 ∂2 1 ∂ 1 x(1 − x) 2 f2,2 + (2 − 4x) f2,2 . = 2 ∂x 2 ∂x (28) This shows that the generator in the h-transformed process is 1 ∂2 1 ∂ x(1 − x) 2 + (2 − 4x) , 2 ∂x 2 ∂x 11

which corresponds to a Wright-Fisher diffusion with θ1 = θ2 = 2. The bridge density (26) is invariant under the h-transform h(x) = x(1 − x) and also under h(x, t) = et x(1−x). This choice of h(x, t) is made because x(1−x) is the first eigenfunction and the first eigenvalue is 1. There is a h-transform identity similar to (27) when θ1 = 0, θ2 = θ > 0. Let f2,θ (x, y; t) = eθt/2 f0,θ (x, y; t)y/x. (29) Then a straightforward calculation is that ∂ ∂ 1 ∂2 1 f2,θ = x(1 − x) 2 f2,θ + 2 − (2 + θ)x f2,θ . ∂t 2 ∂x 2 ∂x

(30)

This shows that the generator in the h-transformed process is 1 ∂2 1 ∂ x(1 − x) 2 + (2 − (2 + θ)x) , 2 ∂x 2 ∂x which corresponds to a Wright-Fisher diffusion with θ1 = 2, θ2 = θ. The first eigenfunction in the process is x and first eigenvalue θ/2. Corollary 1. An alternative form for the density of X x,z,[0,T ] (t) in a neutral model when θ1 = θ2 = 0, is f2,2 (x, y; t)f2,2 (y, z; T − t) f2,2 (x, z; T ) f2,2 (z, y; T − t)f2,2 (y, x; t) = f2,2 (z, x; T ) = fz,x,[0,T ] (y; T − t).

fx,z,[0,T ] (y; t) =

(31)

An alternative form when θ1 = 0, θ2 = θ > 0 is f2,θ (x, y; t)f2,θ (y, z; T − t) f2,θ (x, z; T ) f2,θ (z, y; T − t)f2,θ (y, x; t) = f2,θ (z, x; T ) = fz,x,[0,T ] (y; T − t).

fx,z,[0,T ] (y; t) =

(32)

It is of interest to consider the coalescent genealogy at time points s < t < T and calculate the number of lineages b, at T , which are non-mutant back to t, but not as far back as s; the number of non-mutant lineages a back to time s; and the number of distinct non-mutant lineages c back to time s. To begin condition on the number of non-mutant lineages from s to t 12

Lθ (t − s) = l. Then, from the representation at time t, considering the roots of the lineages from T back to t, it must be that a fall on frequencies from V U and b fall on the frequencies (1 − V ){w(j) }. Therefore the conditional probability is l(a) θ(b) a+b a a+b b E V (1 − V ) = . (33) a a (l + θ)(a+b) The number of distinct non-mutant lineages c back to s is the number of distinct frequencies V U which are hit by the a roots. The required probability is therefore X l X a+b a a! b E U1a1 · · · Ucac . qa+b (T − t) E V (1 − V ) a1 ! · · · ac ! a c a;aj ≥1

a≥c

(34) Now

a1 ! · · · ac ! E U1a1 · · · Ucac = , l(a)

so the sum (34) is equal to X l(a) θ(b) a! a − 1 a+b · . qa+b (T − t) (l + θ)(a+b) l(a) c − 1 a

(35)

a≥c

The marginal distribution of c from (35) is, setting m = a + b, X X θ(m−a) l m! a−1 qm (T − t) . (m − a)! (l + θ)(m) c − 1 c a

(36)

m≥c

The inner sum of (36) is equal to m(1 + θ)(m−1) X m − 1 a − 1 1(a−1) θ(m−1−(a−1)) (l + θ)(m) a−1 c−1 (1 + θ)(m−1) a which is the coefficient of uc−1 in m(1 + θ)(m−1) X m − 1 (1 + u)a−1 E ξ a−1 (1 − ξ)m−1−(a−1) (l + θ)(m) a−1 a≥1

where ξ has a Beta (1, θ) distribution. Now (38) evaluates to m(1 + θ)(m−1) E (1 + uξ)m−1 (l + θ)(m) 13

(37)

(38)

showing that (38) is equal to m(1 + θ)(m−1) m − 1 (c − 1)! . (l + θ)(m) c − 1 (1 + θ)(c−1) Finally P (Lθ (T − s) = c | Lθ (t − s) = l) X Γ(θ + m) l m! θ = qm (T − t) . (m − c)! Γ(θ + c)(l + θ)(m) c

(39)

m≥c

From (39) an identity also follows that X X m! l Γ(θ + m) θ θ θ qm (T −t) qc (T −s) = ql (t−s) . (40) c (m − c)! Γ(θ + c)(l + θ)(m) m≥c

l

Bridges which start at zero need special treatment. We are particularly interested in {X 0,0,[0,T ] (t)}0≤t≤T diffusion process bridges. An approach is to let x, z → 0 in the general bridge, taking a genealogical approach. The transition density as x → 0 is ∞ X l 0 x(1 − x)l−1 B(1, l − 1)−1 (1 − y)l−2 + o(x2 ) f0,0 (x, y; t) = ql (t) 1 l=2 ∞ X

= x

l(l − 1)ql0 (t)(1 − y)l−2 + o(x2 ).

(41)

l=2

The term of order x in the first line of (41) is the joint density of X(t) and the probability that there is exactly one coalescent lineage from time t back to time zero that is of type A, and at least one other lineage of type a. Denote ∞ X g(y; t) = l(l − 1)ql0 (t)(1 − y)l−2 . (42) l=2

From a scale calculation the probability of hitting y > x from x is x/y. The transition density conditional on hitting y from x is x−1 f0,0 (x, y; t)y. Letting x → 0 shows that yg(y; t) is the density at t conditional on hitting y > 0. Note the identity, from (7), that f0,0 (y, 0; t) = y(1 − y)g(y; t). This is a consequence of reversibility with respect to the speed measure density y −1 (1 − y)−1 because z −1 (1 − z)−1 f0,0 (z, y; t) z→0 y −1 (1 − y)−1 = y(1 − y)g(y; t).

lim f0,0 (y, z; t) = lim

z→0

14

Now dividing the numerator and denominator of the second line in (26) by x, g(y; t)f (y, z; T − t) lim fx,z,[0,T ] (y; t) = . (43) x→0 g(z; T ) Set z = 0 in (43) to obtain that f0,0,[0,T ] (y; t) =

y(1 − y)g(y; t)g(y; T − t) , h(T )

where h(T ) := g(0; T ) =

∞ X

(44)

l(l − 1)ql0 (T ).

l=2

Substituting in (5) with θ = 0 and k = 2 h(T ) = g(0; T ) =

∞ X

l(l −

1)ql0 (T )

=

l=2

∞ X

1

e− 2 l(l−1)T (2l − 1)l(l − 1).

(45)

l=2

Collecting results for the next theorem. Theorem 2. (Schraiber et. al., 2013) The density of X 0,0,[0,T ] (t), 0 ≤ t ≤ T in a neutral Wright-Fisher diffusion bridge with θ1 = θ2 = 0 is y(1 − y)g(y; t)g(y; T − t) P∞ , −l(l−1)T /2 l=2 (2l − 1)l(l − 1)e where g(y; t) =

∞ X

(46)

l(l − 1)(1 − y)l−2 ql0 (t).

l=2

There is a clear reversibility in the density (46) from both ends of the bridge. A diagram of the genealogy of a (0,0) bridge is illustrated in Figure 3.

15

Figure 3: Genealogy in a neutral (0,0) bridge

z

x

y

1−z

1−x

1−y

t

0

T

There must be exactly 1 lineage from 0 to t and from T to t in reverse time with leaves in the frequency y, conditional on y being reached as x, z → 0. Let {Tl }∞ l=2 be the times between coalescence in a coalescent tree. These −1 ∞ P times are independent with rates { 2l }l=2 . Denote Wl = ∞ j=l Tl , l ≥ 2. 1 0 Then Wl has a density function 2 l(l − 1)ql (t). Thus (y/2)g(y; t) is the density function of a random variable WN , where N is independent of the times and has a geometric distribution y(1 − y)l−2 , l ≥ 2. An orthogonal function expansion from the Appendix in Schraiber et. al. (2013) is g(y; t) =

∞ X

(2i − 1)i(i − 1)Ri (s)e−i(i−1)t/2 ,

(47)

i=0

where Ri (s), with s = 1 − 2y, are Jacobi polynomials with index (1, 1) orthogonal on y(1 − y), scaled so that Ri (1) = 1. (Note that here s = 1 − 2x, different from 2x − 1 in (3).) The three term recurrence satisfied by the polynomials is (i + 2)Ri (s) = (2i + 1)sRi−1 (s) − (i − 1)Ri−2 (s),

(48)

with R0 (s) = 1 and R1 (s) = s. The density can be computed from (47) in a straightforward way. Corollary 2. The limit density of X 0,0,[0,T ] (T /2 + t) as T → ∞ is independent of t and is 6y(1 − y), 0 < y < 1. (49) 16

for x 6= 0, 1. This corollary is straightforward to prove by noting that q20 (t) = 3e−t plus smaller order terms as t → ∞. The limit distribution (49) is the same as the stationary distribution in f2,2 (x, y; t). The density (31) in Corollary 1 gives an alternative expression to (46) by setting x = z = 0, noting that terms on the right side are well defined in this case. Letting x → 0 in the identity (27), and using (7) ∞ X

qr4 (t)(r + 3)(r + 2)y(1 − y)r+1 = et y(1 − y)

r=0

∞ X

r(r − 1)qr0 (t)(1 − y)r−2

r=2

and equating coefficients of y(1 − y)l−1 shows the identity 4 (t) = et (l − 1)ql0 (t). (l + 1)ql−2

(50)

Another identity from the spectral expansion (3), and (27), equating coeffi1 cients of e− 2 n(n+4−1)t , is that for n ≥ 0, (−1,−1) x(1 − x)Pen(1,1) (r) = Pen+2 (r).

(51)

It is of interest to consider a bridge from 0 to 0 frequency when there is mutation away from a gene of type A. This models a new mutant in the infinitely-many-alleles model which arises at time 0, then is lost at time T with mutation away from the new mutant to other types at rate θ/2. The generator in a neutral model is 1 ∂2 1 ∂ L = x(1 − x) 2 − θx . 2 ∂x 2 ∂x The speed measure density is y −1 (1 − y)θ−1 and the process is reversible with respect to this measure before absorption. Theorem 1 holds for this model. Again care is needed for a 0,0 bridge. Denote the transition density as f θ (x, y; t). As x → 0 f0,θ (x, y; t) = x

∞ X

qlθ (t)l(l + θ − 1)(1 − y)l+θ−2 + o(x2 )

(52)

l=1

and f0,θ (y, 0; t) =

∞ X

qlθ (t)l(l + θ − 1)y(1 − y)l−2 .

l=1

17

(53)

Letting x → 0 in the identity (29) with θ1 = 0, θ2 = θ > 0 ∞ X

qr2+θ (t)(r + 1 + θ)(r + θ)y(1 − y)r+θ−1

r=0

= eθt/2 y

∞ X

l(l + θ − 1)qlθ (t)(1 − y)l+θ−2

l=2

and equating coefficients of y(1 − y)l+θ−2 shows the identity 2+θ (l + θ)ql−1 (t) = eθt/2 lqlθ (t).

(54)

Also from (3) and (29) xPen(θ−1,1) (r) =

n + 1 e(θ−1,−1) P (r). n + θ n+1

(55)

A similar calculation to that in Theorem 2 gives the bridge density at t. Theorem 3. The density of X 0,0,[0,T ] (t), 0 ≤ t ≤ T in a neutral WrightFisher diffusion bridge with mutation away from A at rate θ/2 is y(1 − y)−θ+1 gθ (y; t)gθ (y; T − t) , −l(l+θ−1)T /2 (2l + θ − 1)l(l + θ − 1) l=1 e

P∞ where

gθ (y; t) =

∞ X

l(l + θ − 1)(1 − y)l+θ−2 qlθ (t).

(56)

(57)

l=1

The distribution of the frequency in the bridge inTheorem 3 is the same from both ends as it is when there is no mutation. There is a similar diagram to that in Theorem 2, Figure 3, for the genealogy, except that only nonmutant lines from the frequency y are considered and there can only be single non-mutant lineages from each of the frequencies at 0 and T which have families in the y frequency. The density of the time to loss of an allele with initial frequency y is 1 g (y; t), see e.g. Griffiths (2006), Section 2.4 p18. Consider Figure 3 and θ 2 (56) in Theorem 3. Informally the density (56) times dy is proportional to the proportion of sample paths in (y, y + dy) at a given time where the allele of frequency y is lost in both directions at times t, and T − t, divided by the proportion of sample paths in (y, y + dy), which is y −1 (1 − y)θ−1 dy. 18

The scale function in the diffusion is S(x) = (1 − x)−θ so the probability of hitting y > x from x is (1 − x)−θ − 1 (1 − y)−θ − 1 and the transition function conditional on hitting y is f0,θ (x, y; t) ·

(1 − y)−θ − 1 . (1 − x)−θ − 1

Letting x → 0 θ−1 (1 − y)−θ (1 − y)−θ − 1 gθ (y; t), y > 0

(58)

is the density at t conditional on y being reached. The series expansions (42) and (57) converge for all t ≥ 0 and 0 < y < 1, though {qlθ (t)} is difficult to compute if t is small. Jenkins and Spanò (2017) have a computational method to deal with small t, which uses a result from Griffiths (1984) who shows that Lθ (t) has an asymptotic normal distribution as t → 0 with 2 2 µ = , σ2 = . (59) t 3t The asymptotic mean and variance do not depend on θ, however the exact mean and variance could be used instead, computed from the first and second factorial moments (5). There is an orthogonal function expansion of gθ (y; t) which is useful for computation. In view of (29) gθ (y; t) = =

lim f0,θ (x, y; t)/x x→0 e−θt/2 y −1 f2,θ (0, y; t) −1

= B(2, θ)

(1 − y)

θ−1 −θt/2

e

∞ X

e−n(n+θ+1)t/2 an Rn (s),

n=0

where s = 2x − 1 and an = (−1)n

(n + θ)(2n + θ + 1)Γ(n + θ) Γ(θ + 2)n!

e(θ−1,1) (s). The three term recurrence and, for ease of notation, Rn ≡ R satisfied by the polynomials is 2(n + θ)(2n + θ − 2)(n + 1)Rn = − (2n + θ − 1) (2n + θ)(2n + θ − 2)s + θ(θ − 2) Rn−1 − 2(n + θ − 2)(2n + θ)(n − 1)Rn−2 19

(60)

with R0 (s) = 1. Corollary 3. The limit density of X 0,0,[0,T ] (T /2 + t) in a neutral model with θ1 = 0, θ2 = θ > 0 as T → ∞ is independent of t and is (1 + θ)y(1 − y)1+θ , 0 < y < 1.

(61)

for x 6= 0, 1. This corollary is again straightforward to prove by noting that q2θ (t) = (1+θ)e−θt/2 plus smaller order terms as t → ∞. The stationary distribution in f2,2 (x, y; t) is the same as (49). The mean frequency in a bridge is found by multiplying (56) by y and integrating over (0, 1). Corollary 4. P∞ l(l+θ−1)m(m+θ−1) θ θ 0,0,[0,T ] l,m=1 (l+m+θ−2)(l+m+θ−3) ql (t)qm (T − t) . (62) E X (t) = P∞ −l(l+θ−1)T /2 (2l + θ − 1) (l − 1)(l + θ) + 1 l=1 e Wright-Fisher bridges and branching P´ olya urns Genealogy in the Wright-Fisher diffusion process can be modeled by branching P´ olya urns (Griffiths and Spanò, 2010). The joint density of X = X(0) and Y = X(t) when the marginal density of X is Beta (θ1 , θ2 ), from the conditional density of Y given X, (7), can be written as ∞ X l=0

qlθ (t)

l X l θ1 (k) θ2 (l−k) k=0

k

θ(l)

Bk+θ1 ,l−k+θ2 (x)Bk+θ1 ,l−k+θ2 (y). (63)

There is a P´ olya urn interpretation of (63), explained in Griffiths and Spanò (2010); Griffiths and Span` o (2013). Consider two Pólya urns with an initial configuration of θ1 red and θ2 blue balls. The urns are identical until there are l extra balls, then branch to two urns with probability qlθ (t). The composition of both urns at the moment of splitting has a Beta binomial distribution l θ1 (k) θ2 (l−k) , k = 0, 1, . . . , l. θ(l) k Draws continue independently in each urn after branching. The unconditional relative frequencies of red balls in the urns after an infinity of draws are Beta (θ1 , θ2 ) and the relative frequencies conditional on a configuration 20

of (k, l − k) are Beta (θ1 + k, θ2 + l − k). In both cases the relative frequency distributions are the de Finetti measures for the sequences of draws. The transition density fθ1 ,θ2 (x, y; t) is the density of the relative frequency Y in the second urn given the relative frequency x in the first urn. Instead of arguing to obtain a joint distribution of X and Y it is possible to argue directly to obtain the conditional distribution of Y given X = x. If X = x, then since the distribution of X is the de Finetti measure for whether the draws are red or blue, the probability of a configuration of (k, l − k) balls given the k l urns branch after l draws and X = x is k x (1 − x)l−k . the distribution of Y given the (k, l − k) configuration is then Beta (θ1 + l, θ2 + l − k). Therefore the conditional distribution of Y given X = x is (7). The conditional density fθ1 ,θ2 (x, y; t) converges as θ1 , θ2 → 0 to f0,0 (x, y; t) =

∞ X l=1

ql0 (t)

l−1 X l k x (1 − x)l−k Bk,l−k (y). k

(64)

k=1

Relating the limit to convergence in an urn model: if l > 1 the conditional distribution of the urn configuration at l given 0 < x < 1 as θ1 , θ2 → 0 is l k x (1 − x)l−k , 0 < k < l k as a consequence of the de Finetti representation. The joint probability that k = 0 (and similarly for k = l) at trial l and 0 < x < 1 tends to zero as θ1 , θ2 → 0, even though it formally appears to be (1 − x)l . In a neutral Wright-Fisher diffusion with mutation the density of the frequency at time t in a bridge from x to z at time t is (20), where fθ1 ,θ2 (x, y; t) is the density (2) or (7). An urn representation is found by considering the joint distribution of X, Y, Z when X and Z have marginal Beta (θ1 , θ2 ) distributions. The joint distribution of the three random variables is then Bθ1 ,θ2 (x)fθ1 ,θ2 (x, y; t)fθ1 ,θ2 (y, z; T − t) ∞ X X n θ1 (j+k) θ2 (n−j−k) θ θ = ql (t)qm (T − t) j+k θ(n) l,m=0

j≤l,k≤m

l j

m k

n j+k

× Bθ1 +j,θ2 +l−j (x)Bθ1 +k,θ2 +m−k (z)Bθ1 +k+j,θ2 +n−k−j (y),

(65)

where n = l + m. An urn representation has three urns U, V, W which share draws before branching after draw n in W . The probability of branching after n draws and that there are l, m draws respectively in U, V is ql (t)qm (T − t). The l, m draws are chosen at random from the n draws. 21

After branching the urn draws continue independently in U, V, W . For fixed l, m the probability of configurations (j, l − j) and (k, m − k) of (red,blue) balls in urns U, V and (j + k, n − j − k) in urn W is l m θ1 (j+k) θ2 (n−j−k) j k n . n j+k θ(n) j+k Conditional on l, m, j, k the limit relative frequencies of red balls in U, V, W have the joint density in the last line of (65). The unconditional joint density of the relative frequencies in urns U, V, W is the full expression (65). The configuration in the urns at branching, conditional on X = x, Z = z has a probability distribution pθ1 ,θ2 (l, m; j, k; x, z; t) = l m θ (T − t) n θ1 (j+k) θ2 (n−j−k) (j )( k ) B qlθ (t)qm (x)Bθ1 +k,θ2 +m−k (z) n j+k θ(n) (j+k ) θ1 +j,θ2 +l−j . Bθ1 ,θ2 (x)fθ1 ,θ2 (x, z; T ) (66) Then for l, m ≥ 1 as θ1 , θ2 → 0 the distribution (66) converges to p0,0 (l, m; j, k; x, z; t) = l m 0 (T − t) n (j+k−1)!(n−j−k−1)! (j )( k ) B ql0 (t)qm (x)Bk,m−k (z) n j+k (n−1)! (j+k) j,l−j . x−1 (1 − x)−1 f0,0 (x, z; T ) (67) There is a similar limit if θ1 → 0 and θ2 > 0. The relative frequency of red balls in urn W after an infinite number of draws when θ1 , θ2 > 0 has a Beta (θ1 + j + k, θ2 + n − j − k) distribution so the density of Y conditional on X = x, Z = z is ∞ X

pθ1 ,θ2 (l, m; j, k; x, z; t)Bθ1 +j+k,θ2 +n−j−k (y),

(68)

l,m=0

with a similar expression when θ1 = 0, θ2 = 0 where summation is over l, m ≥ 1, 1 ≤ j ≤ l − 1 and 1 ≤ k ≤ m − 1. The density (68) is the same as the neutral bridge densities in Section 3 with the different cases of mutation. The limit distribution when θ1 = 0, θ2 = 0 and x, z → 0 is now considered. The denominator of (67) converges to ∞ X

ql0 (T )l(l − 1)

l=1

22

which simplifies as (45). The conditional probability distribution of the urn configurations (67) converges to zero unless j = 1, k = 1 when the limit is 0 (T − t) n 1!(n−3)! lm (l − 1)(m − 1) ql0 (t)qm 2 (n−1)! (n) 2 p0,0 (l, m; j, k; 0, 0; t) = P∞ − 1 l(l−1)T 2 (2l − 1)(l − 1) l=2 e 0 (T − t) l(l−1)m(m−1) ql0 (t)qm (n−1)(n−2) . P∞ − 1 l(l−1)T 2 (2l − 1)(l − 1) l=2 e

=

(69)

The conditional distribution of the relative frequency of red balls in urn W is then ∞ X X p0,0 (l, m; j, k; 0, 0; t)B2,n−2 (y). (70) n=2 l,m≥1,l+m=n

Simulating the bridge It is possible to simulate exactly from the density of a Wright-Fisher bridge in a neutral model. Jenkins and Spanò (2017) gave a sophisticated algorithm for this in the special case that both θ1 , θ2 > 0 and x, z > 0. The identities in this paper provide a direct method for other cases considered here: 1. θ1 = 0, θ2 = θ > 0, and x = z = 0. 2. θ1 = θ2 = 0 and x = z = 0. 3. θ1 = 0, θ2 = θ > 0, and x, z > 0. 4. θ1 = θ2 = 0 and x, z > 0. 1. θ1 = 0, θ2 = θ > 0 and x = z = 0 The density of X 0,0,[0,T ] (t) is given in Theorem 3. It can be written as f 0,0,[0,T ] (t) =

∞ X ∞ X

pl1 ,l2

l1 =1 l2 =1

y(1 − y)l1 +l2 +θ−3 , B(2, l1 + l2 + θ − 2)

where l1 (l1 + θ − 1)l2 (l2 + θ − 1) q θ (t)qlθ2 (T − t), + l2 + θ − 1)(l1 + l2 + θ − 2) l1 ∞ X hθ (T ) = g θ (0; T ) = e−l(l+θ−1)T /2 (2l + θ − 1)l(l + θ − 1). pl1 ,l2 =

1

hθ (T ) (l1

l=1

23

(71)

We recognise (71) as the density of an infinite mixture of Beta random variables, so that (pl1 ,l2 )l1 ,l2 ∈N is a probability mass function on N2 (for convenience, set pl1 ,l2 = 0 if l1 = 0 or l2 = 0). The (l1 , l2 )th component corresponds to a Beta(2, l1 +l2 +θ−2) variate. Therefore in order to simulate from f 0,0,[0,T ] (t), it suffices to simulate from the discrete distribution (pl1 ,l2 ). This is complicated by the infinite series representations for qlθ1 (t), qlθ2 (T −t), and hθ (T ), but a solution is possible via the series method (Devroye, 1986, Ch. IV.5); we summarise the strategy as follows. Suppose (pl )l∈N is a probability mass function whose masses may not be computable with finite resource but for which we have available a pair of + sequences (p− l (k))k∈N , (pl (k))k∈N for each l with the following properties: + 1. p− l (k) ≤ pl ≤ pl (k) for all l, k ∈ N.

2. For each l, p− l (k) ↑ pl as k → ∞. 3. For each l, p+ l (k) ↓ pl as k → ∞. For PL U ∼ Uniform[0, 1], standard inversion sampling implies that inf{L ∈ N : l=0 pl > U } is distributed according to (pl ), but this is not computable, as we have noted. However, it is easily verified that if ( ) L L X X K(L) = inf k ∈ N : p− p+ l (k) > U or l (k) < U l=0

then

( inf

L∈N:

l=0 L X

) p− l (K(l))

>U

l=0

is also distributed as (pl ), and this can be computed from finitely many + terms in the double arrays (p− l (k)), (pl (k)). To employ this strategy for (pl1 ,l2 ) in (71) we must find monotonically converging upper and lower bounds on each of qlθ1 (t), qlθ2 (T − t), and hθ (T ). The first two expressions are covered by Jenkins and Spanò (2017, Proposition 1), who showed that (q θ )− l (k, t)

:=

l+2k+1 X

ρθj (t)(−1)j−l

j=l

≤

l+2k X j=l

(2j + θ − 1)(l + θ)j−1 ≤ qlθ (t) l!(j − l)!

ρθj (t)(−1)j−l

(2j + θ − 1)(l + θ)j−1 =: (q θ )+ l (k, t), l!(j − l)!

24

n o (t,θ) θ+2i+1 −(i+θ/2)t provided 2k + l ≥ Cl := inf i ≥ l : θ+i+l−1 e < 1 , a coni−l+1 θ+2i−1 stant beyond which convergence of these bounds becomes monotonic in k. [Jenkins and Span` o (2017) assumed that θ1 , θ2 > 0, but their result follows without change when either or both of θ1 = 0, θ2 = 0.] It remains to find similar converging bounds on hθ (T ). Writing θ

h (T ) =

∞ X

−l(l+θ−1)T /2

e

(2l + θ − 1)l(l + θ − 1) =:

l=1

∞ X

hθl (T ),

l=1

we find for l ≥ 2 that hθl+1 (T ) =

(2l + 1 + θ)(l + 1)(l + θ) −(l+θ/2)T θ e hl (T ) ≤ 5e−(l+θ/2)T hθl (T ) < hθl (T ), (2l − 1 + θ)(l − 1)(l + θ − 1)

with the final inequality holding provided l > D(θ,T ) := immediately that, for k > max(2, D(θ,T ) ), 0
max (Cl1 − l1 )/2, (Cl2 − l2 )/2, D(T,θ) , 2 for each l = 0, 1, . . . , L, then Sk− (L) and Sk+ (L) are respectively lower and upper bounds on the 25

PΣ(L) distribution function l=Σ(0) pl1 ,l2 which are monotonically converging to this limit as k → ∞ (that is, as kl → ∞ for every component of k = (k0 , k1 , . . . , kL )). Thus the following algorithm returns exact samples from the density of X 0,0,[0,T ] (t). Algorithm 1: Simulating from the density f 0,0,[0,T ] (t) of X 0,0,[0,T ] (t) when θ1 = 0, θ2 = θ > 0. 1 2 3 4

5 6 7

Set l ←− 0, k0 ←− 0, k ←− (k0 ). Simulate U ∼ Uniform[0, 1]. repeat Set (l1 , l2 ) ←− Σ(l), l m (t) (T −t) kl ←− max (Cl1 − l1 )/2, (Cl2 − l2 )/2, D(T,θ) , 2 . while Sk− (l) < U < Sk+ (l) do Set k ←− k + (1, 1, . . . , 1). end

8 9 10 11 12 13 14 15

if Sk− (l) > U then return Y ∼ Beta(2, l1 + l2 + θ − 2). else if Sk+ (l) < U then Set k ←− (k0 , k1 , . . . , kl , 0). Set l ←− l + 1. end until false

2. θ1 = θ2 = 0 and x = z = 0 This case is very similar to that of the previous subsection and so is omitted. (In fact, one can go through the previous subsection and replace θ with 0 throughout.) The density of the bridge when θ1 = θ2 = 0 simplifies slightly (see Theorem 2) to f 0,0,[0,T ] (t) =

∞ X ∞ X l1 =2 l2 =2

pl1 ,l2

=

pl1 ,l2

y(1 − y)l1 +l2 −3 , B(2, l1 + l2 − 2)

1 l1 (l1 − 1)l2 (l2 − 1) q 0 (t)ql02 (T − t). h(T ) (l1 + l2 − 1)(l1 + l2 − 2) l1

As a side remark, it is interesting to note how the coefficient l1 (l1 − 1)l2 (l2 − 1) (l1 + l2 − 1)(l1 + l2 − 2) 26

indicates exactly how l1 and l2 are ‘coupled’, by comparison with the case that the two genealogies in Figure 3 are independent (where this coefficient would be a constant). Notice that the coefficient is independent of t. Furthermore, it increases the relative probability of values of l1 and l2 which are larger and more similar to each other. 3. θ1 = 0, θ2 = θ > 0 and x, z > 0 By Corollary 1, the density of X x,z,[0,T ] (t) in this case is the same as for θ1 = 2, θ2 = θ, which is covered by Algorithm 4 of Jenkins and Spanò (2017). 4. θ1 = θ2 = 0 and x, z > 0 By Corollary 1, the density of X x,z,[0,T ] (t) in this case is the same as for θ1 = θ2 = 2, which is covered by Algorithm 4 of Jenkins and Spanò (2017). Wright-Fisher bridges with genic selection Theorem 1 holds when γ 6= 0, if x 6= 0, however care is needed when beginning from x = 0. For definiteness let γ > 0, then the speed measure density when θ1 = 0 and θ2 = θ ≥ 0 is e−γx x−1 (1 − x)θ−1 , 0 < x < 1. The density is reversible before fixation and X f0,0,γ (x, y; t) = bα (t, x)π[α](y) α

=

X

bα (t, y)π[α](x)

α

y −1 (1 − y)−1 e−γy . x−1 (1 − x)−1 e−γx

(72)

αi ≥ 1 in the summation, i = 1, 2. In a neutral model |α| α1 bα (t, y) = y (1 − y)α2 α1 An asymptotic form as x → 0 is f0,0,γ (x, y; t) = y −1 (1−y)−1 e−γy x

X α2 ≥1

27

b(1,α2 ) (t, y)α2 c(1, α2 )−1 +o(x2 ), (73)

since π[α](x) = o(xα1 −1 ) as x → 0 for α1 > 1 and π[(1, α2 )](x) = c(1, α2 )−1 Note that Z

1

c(1, α2 ) =

Γ(1 + α2 ) 1−1 x (1 − x)α2 −1 e−γx . Γ(1)Γ(α2 )

α2 z α2 −1 ezγ dz, γ ≥ 0.

0

The next theorem is an analogue of Theorem 2 which follows from (1) and (72). Theorem 5. The density of X 0,0,[0,T ] (t), 0 ≤ t ≤ T in a Wright-Fisher diffusion bridge with γ > 0 and θ1 = θ2 = 0 is y(1 − y)g(y; t, γ)g(y; T − t, γ) P∞ , 0 −1 α2 =1 b(1,α2 ) (T, 0)c(1, α2 ) α2 where g(y; t, γ) =

∞ X

(74)

b(1,α2 ) (t, y)c(1, α2 )−1 α2

α2 =1

and prime denotes differentiation with respect to y. There is an analogous theorem to Theorem 3 if θ1 = 0, θ2 = θ > 0. Theorem 6. The density of X 0,0,[0,T ] (t), 0 ≤ t ≤ T in a Wright-Fisher diffusion bridge when γ > 0 with mutation away from A at rate θ/2 is y(1 − y)θ+1 g θ (y; t, γ)g θ (y; T − t, γ) , P∞ 0 0,θ −1 (α + θ) b (T, 0)c(1, α + θ) 2 2 α2 =1 (1,α2 ) where g θ (y; t; γ) =

∞ X

(75)

−1 b0,θ (1,α2 ) (t, 0)c(1, α2 + θ) (α2 + θ).

α2 =1

Both Theorem 5 and Theorem 6 show that in a non-neutral (0, 0) WrightFisher bridge there is exactly one lineage from t to 0 and from t to T and that there is a reversibility from either end of the bridge. This is illustrated in Figure 4.

28

Figure 4: Genealogy in a non-neutral (0,0) bridge

y

1−y

t

0

T

The probability of reaching y > x is 1 − e−γx (1 − x)−θ 1 − e−γy (1 − y)−θ from a scale calculation where S(x) = e−γx (1 − x)−θ . The transition density at t conditional on hitting y is therefore f θ (x, y; t) ·

1 − e−γy (1 − y)−θ 1 − e−γx (1 − x)−θ

Therefore letting x → 0 g θ (y; t, γ) ·

1 − e−γy (1 − y)−θ −θ + γ

is the distribution of y at t, conditional on hitting y. Note that the multiplying factor is non-negative for 0 ≤ y ≤ 1. An analogy of the h-transform (27) in the non-neutral model is fw (x, y; t) = f (x, y; t)

w(y) λt e , w(x)

where (λ, w) are the first eigenvalue, eigenfunction pair satisfying (13) when either or both of θ1 , θ2 are zero. A calculation similar to that in the neutral 29

model shows that the generator associated with the transition functions fw when θ1 = 0, θ2 = θ ≥ 0 is ! 1 ∂ ∂2 1 ∂ γx(1 − x) + 2x(1 − x) x(1 − x) 2 + log w(x) − θx . (76) 2 ∂x 2 ∂x ∂x This is a generator of a Wright-Fisher diffusion with frequency dependent selection. The stationary distribution of the process with generator (76) is ψ(y) = Cw(y)2 π(y) = Cw(y)2 eγy y −1 (1 − y)θ−1 . An alternative form for the density of X x,z,[0,T ] is, for 0 ≤ t ≤ T fw (x, y; t)fw (y, z; T − t) . fw (x, z; T )

(77)

The limit density as T → ∞ of X x,z,[0,T ] (T /2 + t), from (77) is ψ(y)ψ(z)/ψ(z) = ψ(y). Computing ψ(y) (when θ = 0) rests on finding a numerical solution of the eigenfunction equation y(1 − y)w00 (y) + γy(1 − y)w0 (y) + 2λw(y) = 0

(78)

for a maximal λ. There is a unique solution with λ > 0, w(x) ≥ 0. Clearly w(0) = w(1) = 0 but the maximal λ is unknown. Parameterizing λ ≡ λ(γ), then λ(0) = 1, the first eigenvalue in a neutral model. Kimura (1955) recognizes a connection with Spheroidal wave functions, tabulates λ(γ), and expresses w(x) as a series in Gegenbauer polynomials. The spheroidal wave equations are d m2 2 dS 2 2 (1 − z ) + µ + c (1 − z ) − S = 0, (79) dz dz 1 − z2 see NIST Handbook of Mathematical Functions (2010). If c2 > 0 the equations are prolate or c2 < 0 oblate. Solutions to (79) are the Spheroidal wave 2 functions PSnm (z, c2 ) with eigenvalues µm n (c ), n = m, m + 1, . . .. Letting w(y) = e−(γ/2)y y 1/2 (1 − y)1/2 S ◦ (y), then d dy

dS ◦ m2 2 2 y(1 − y) + µ − c + 4c y(1 − y) − S ◦ = 0. dy 4y(1 − y) 30

(80)

This shows that eγ/2y y −1/2 (1 − y)−1/2 w(y) is proportional to PS11 (2y − 1), −γ 2 /16). Parameters are m = 1, c2 = −γ 2 /16, and µ − c2 = 2λ and z = 1 − 2y. Scaling w(y) to be a probability distribution w(y) = R 1 0

e−γ/2y y 1/2 (1 − y)1/2 PS11 (2y − 1, −γ 2 /16) e−(γ/2)x x1/2 (1 − x)1/2 PS11 (2x − 1, −γ 2 /16)dx

.

(81)

Spheroidal wave functions are implemented in Mathematica and it is straightforward to calculate and plot w(y) via WolframAlpha on the web interface. The basic command for solutions to (79) is SpheroidalPS[n, m, c, z]. In our application c2 ≤ 0 so c is a multiple of the complex number i. For example if γ = 3, then c = 0.75i, and n = m = 1, Plot[(−0.166599)−1 ∗ exp(−1.5 ∗ y) ∗ (y ∗ (1 − y))1/2 ∗ SpheroidalPS[1, 1, 0.75i, 2 ∗ y − 1], {y, 0, 1}] is the appropriate command. The constant −0.166599 is the scaling factor returned by Integrate[exp(−1.5 ∗ x)(x ∗ (1 − x))1/2 ∗ SpheroidalPS[1, 1, 0.75i, 2 ∗ x − 1], {x, 0, 1}]. It is not necessary to find the first eigenvalue for the above commands, however SpheroidalEigenvalue[n, m, c] will return it. In this example SpheroidalEigenvalue[1, 1, 0.75i] returns 2.44853, so λ = 1.505515. The full expansion of the transition function is f0,0,γ (x, y; t) = eγy y −1 (1 − y)−1

∞ X

2 1 γ

γ2

t

e−(λn ( 16 )+ 16 ) 2

n=1

γ2 ) 16 γ 1 1 γ2 × e− 2 y y 2 (1 − y) 2 PSn1 (2y − 1, − ). 16 γ

1

2n + 1 n(n + 1)

1

× e− 2 x x 2 (1 − x) 2 PSn1 (2x − 1, −

(82)

Mathematica also includes a function for the derivative of a Spheroidal wave function, SpheroidalPSPrime[n,m,c,z]. The drift coefficient in the htransformed generator (76) when θ = 0 can thus be easily evaluated or plotted. The coefficient is easily seen to be PS0 1 (2x − 1, −γ 2 /16) 1 (1 − 2x) + 2x(1 − x) 1 . 2 PS11 (2x − 1, −γ 2 /16) Carrying through the example above, if γ = 3, the command to plot the drift coefficient is Plot[0.5-x +2.0*x*(1-x)*SpheroidalPSPrime[1,1,0.75i,2*x1]/SpheroidalPS[1,1,0.75i,2*x-1],{x,0,1}]. In this particular example the the plot shows that the drift coefficient is very close to 1 − 2x, which would be exact if γ = 0 and w(x) = x(1 − x). 31

References Barbour, A.D., Ethier, S.N., Griffiths, R.C., (2000). A transition function expansion for a diffusion model with selection. Ann. Appl. Probab. 10, 123–162. Devroye, L. (1986). Non-uniform random variate generation. SpringerVerlag, New York. Ethier, S. and Griffiths, R. C. (1993). The transition function of a FlemingViot process. Ann. Probab. 21 1571–1590. Griffiths, R.C., (1980). Lines of descent in the diffusion approximation of neutral Wright-Fisher models. Theor. Popul. Biol. 17, 37–50. Griffiths, R. C. (1984). Asymptotic line-of-descent distributions. J. Math. Biol. 21, 67–75. Griffiths, R.C. and Tavaré, S. (1998). The age of a mutation in a general coalescent tree. Stochastic Models. 14, 273–295. Griffiths, R.C. and Tavaré, S. (2003). The genealogy of a neutral mutation. In: Green, P.J., Hjort, N.L. and Richardson, S. (Eds.), Highly Structured Stochastic Systems. Oxford University Press, Oxford, pp. 393-413. Griffiths, R.C., (2006). Coalescent lineage distributions. Adv. Appl. Probab. 38, 405–429. Griffiths, R.C., Li, W.-H., (1983). Simulating allele frequencies in a population and the genetic differentiation of populations under mutation pressure. Theor. Popul. Biol. 32, 19–33. Griffiths, R. C. and Span` o, D. (2010). Diffusion processes and coalescent trees. Chapter 15 358-375. In: Probability and Mathematical Genetics, Papers in Honour of Sir John Kingman. LMS Lecture Note Series 378. ed Bingham, N. H. and Goldie, C. M., Cambridge University Press. Griffiths, R. C. and Span` o (2013), D. Orthogonal Polynomial Kernels and Canonical Correlations for Dirichlet measures, Bernoulli 19, 548–598. Ismail, M.E.H. (2005) Classical and quantum orthogonal polynomials in one variable. Cambridge University press. Cambridge UK. Jenkins, P. A. and Span` o, D. (2017) Exact simulation of the Wright-Fisher diffusion. Ann. Appl. Prob., to appear. arXiv:1506.06998 32

Kimura, M. (1955) Stochastic processes and distribution of gene frequencies under natural selection. Cold Spring Harb. Symp. Quant. Biol. 20, 33–53. Kimura, M. (1957) Some problems of stochastic processes in genetics. Ann. Math. Stat. 28, 882–901. Kimura, M. (1964). Diffusion models in population genetics. J. Appl. Prob. 1, 177–232. Kingman, J.F.C., 1982. The coalescent. Stochastic Process. Appl. 13, 235– 248. Krone, S. M., Neuhauser, C., (1997). Ancestral processes with selection. Theor. Popul. Biol. 51, 210–237. Mano, S., (2009). Duality, ancestral and diffusion Processes in models with selection. Theor. Popul. Biol. 75, 164–175. Neuhauser, C., Krone, S. M., (1997). The genealogy of samples in models with selection. Genetics 145, 519–534. NIST Handbook of Mathematical Functions (2010). Editors: F. W. J. Olver, D. W. Lozier R. F. Boisvert C. W. Clark. Cambridge University Press, New York. Pinsky, R.G., (1985). On the convergence of diffusion processes conditioned to remain in a bounded region for large time to limiting positive recurrent diffusion processes. Ann. Prob. 13, 363–378. Schraiber, J. Griffiths, R. C., Evans, S. (2013). Analysis and rejection sampling of Wright-Fisher diffusion bridges. Theor. Popul. Biol. 89, 64–74. Song, Y.S., Steinr¨ ucken, M. (2012). A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics, 190, 1117–1129. Steinr¨ ucken, M., Wang, Y.X., and Song Y.S. (2013). An explicit transition density for a multi-allelic Wright-Fisher diffusion with general diploid selection. Theor. Popul. Biol. 83, 1–14. Tavaré, S. (1984). Line-of-descent and genealogical processes, and their application in population genetics models. Theor. Popul. Biol. 26, 119–164. Zhao, L., Lascoux, M. and Waxman, D. (2016). An informational transition in conditioned Markov chains: applied to genetics and evolution. J. Theor. Biol. 402, 158–170. 33