Microbial Genomics

1 downloads 0 Views 1MB Size Report
Mycobacterium tuberculosis genomes as part of a real-time public health outbreak .... We thank the members of the Interior Health Authority TB Outbreak. 155.
Microbial Genomics Declaring a tuberculosis outbreak over with genomic epidemiology --Manuscript Draft-Manuscript Number:

MGEN-D-16-00015R1

Full Title:

Declaring a tuberculosis outbreak over with genomic epidemiology

Article Type:

Short Paper

Section/Category:

Microbial evolution and epidemiology: Communicable disease genomics

Corresponding Author:

Jennifer Gardy BC Centre for Disease Control CANADA

First Author:

Hollie-Ann Hatherell

Order of Authors:

Hollie-Ann Hatherell Xavier Didelot Sue L. Pollock Patrick Tang Anamaria Crisan James C. Johnston Caroline Colijn Jennifer Gardy

Abstract:

We report an updated method for inferring the time at which an infectious disease was transmitted between persons from a time-labelled pathogen genome phylogeny. We applied the method to 48 Mycobacterium tuberculosis genomes as part of a real-time public health outbreak investigation, demonstrating that although active tuberculosis (TB) cases were diagnosed through 2013, no transmission events took place beyond mid-2012. Subsequent cases were the result of progression from latent TB infection to active disease and not recent transmission. This evolutionary genomic approach was used to declare the outbreak over in January 2015.

Downloaded from www.microbiologyresearch.org by 191.101.10.120 Powered by Editorial Manager® and IP: ProduXion Manager® from Aries Systems Corporation On: Mon, 01 Aug 2016 08:53:22

Manuscript Including References (Word document)

Click here to download Manuscript Including References (Word document) MGen_TBOutbreak_revision.docx

Short Paper template

1

Declaring a tuberculosis outbreak

2

over with genomic epidemiology

3 4

ABSTRACT

5 6

We report an updated method for inferring the time at which an infectious disease was transmitted

7

between persons from a time-labelled pathogen genome phylogeny. We applied the method to 48

8

Mycobacterium tuberculosis genomes as part of a real-time public health outbreak investigation,

9

demonstrating that although active tuberculosis (TB) cases were diagnosed through 2013, no

10

transmission events took place beyond mid-2012. Subsequent cases were the result of progression

11

from latent TB infection to active disease and not recent transmission. This evolutionary genomic

12

approach was used to declare the outbreak over in January 2015.

13 14 15

DATA SUMMARY

16 17

1. Short read data for the 48 sequenced M. tuberculosis genomes has been deposited in the

18

European Nucleotide Archive; accession number: PRJEB12764 (url -

19

http://www.ebi.ac.uk/ena/data/view/PRJEB12764)

20

2. The commands used in reference alignment and variant calling are available as a text file

21

from Figshare; DOI: 10.6084/m9.figshare.3153280 (url -

22

https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epid

23

emiology/3153280).

24 25

3. The final dataset of variants is available as a fasta file from Figshare; DOI: 10.6084/m9.figshare.3153280 (url -

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

26

https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epid

27

emiology/3153280)

28 29

4. The TransPhylo code, including the update written specifically for this study, is available from GitHub (url - https://github.com/xavierdidelot/TransPhylo)

30 31

I/We confirm all supporting data, code and protocols have been provided within the article or

32

through supplementary data files. 

33 34

IMPACT STATEMENT

35 36

We have previously described a method for inferring person-to-person transmission events from

37

pathogen genome data; here, we describe an improvement to the method’s underlying epidemic

38

model that allows us to infer the likely time at which a person was infected. Significantly, we

39

describe how this updated approach was used in the real-time investigation of a large tuberculosis

40

(TB) outbreak and how our results were used to declare an end to the outbreak. This is the first

41

report of genomic data being used to declare a complex community outbreak over, suggesting a new

42

role for the use of genomic data in TB control.

43 44

INTRODUCTION

45 46

Genomics is revolutionizing public health practice (Kwong et al., 2015). Mutational and evolutionary

47

events within a pathogen population not only have consequences for the disease, but also present

48

opportunities for understanding transmission and developing targeted public health interventions.

49

Inferring person-to-person transmission from genomic data is one such example – genome

50

sequencing has now helped identify individual infection events in multiple outbreaks at levels from

51

hospital wards to communities to countries (Croucher & Didelot, 2015).

52 53

Transmission inference from genomic data uses mutations – fixed or minor variants (Worby et al.,

54

2014; Poon et al., 2016) – shared across outbreak isolates to identify putative infection events. We

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

55

previously developed TransPhylo (Didelot et al., 2014) – a Bayesian method for inferring

56

transmissions and their timing given mutational events captured in a time-labelled phylogeny – and

57

used it to reconstruct transmissions between the first 33 cases (2008-2011) of a large TB outbreak.

58

The outbreak in question began with the May 2008 diagnosis of a highly infectious client in a

59

homeless shelter in British Columbia, Canada and peaked in 2010. Intensive case-finding in the

60

community ultimately screened 2,310 individuals and a total of 52 TB cases were diagnosed through

61

December 2013 (Figure 1A) (Cheng et al., 2015).

62 63

In the absence of a formal definition, a TB outbreak is generally deemed over when transmission of

64

the outbreak strain has stopped for >2 years; however, latent TB infection (LTBI) complicates

65

declaring an outbreak’s end. Amongst individuals identified with LTBI, only 5-10% will progress to

66

active disease, with most developing disease within two years of infection (WHO, 2015); however,

67

delayed progression occurring more than two years after infection is not uncommon. As incident

68

case numbers begin to decline, TB controllers must differentiate LTBI cases acquired >2 years ago

69

and only now progressing to active disease from new cases that were recently acquired, suggesting

70

ongoing transmission.

71 72

One of TransPhylo’s outputs is Tinf – the estimated time at which an individual was infected. Tinf can

73

differentiate delayed progression from new infection; however, TransPhylo’s underlying SIR model –

74

and indeed all compartmental epidemic models – does not capture the true size of the susceptible

75

population or the true variation in the infectious periods, which may affect inferred Tinf values.

76 77

In September 2014, at the request of the Medical Health Officer leading the outbreak response, we

78

analysed Mycobacterium tuberculosis genomes from 48 of the 52 cases to determine whether a

79

decline in newly diagnosed cases truly signaled the outbreak’s end. We replaced TransPhylo’s SIR

80

model with a branching model to better infer the timing of transmission amongst the 48 cases and

81

asked whether cases diagnosed in 2013 were the result of recent transmission or delayed

82

progression of an infection acquired earlier in the outbreak.

83 84 85

METHODS Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

86 87

As part of an earlier investigation during the outbreak, we sequenced 33 M. tuberculosis genomes

88

from outbreak cases diagnosed between May 2008 and April 2011 (Didelot et al., 2014). In

89

September 2014, we sequenced genomes from a further 15 cases on the MiSeq platform, for a total

90

of 48 genomes (Accession: PRJEB12764, Data Citation 1). Four of the 52 outbreak cases did not have

91

an M. tuberculosis isolate available for sequencing as they were diagnosed out-of-province or on

92

clinical grounds during a post-mortem.

93 94

Reads were mapped against the M. tuberculosis CDC1551 reference

95

genome (NC_002755.2) using BWAmem (Li & Durbin, 2009) and variants called using samtools

96

mpileup (Li et al., 2009); the commands we used are available in a text file from FigShare (DOI:

97

10.6084/m9.figshare.2077390, Data Citation 2). From the resulting VCF files, we removed all variant

98

positions that were identical across all 48 genomes, leaving only variants that differentiate outbreak

99

isolates. We filtered a matrix of these positions to remove variant positions within 150bp of another

100

variant position, suggesting misalignment to a low-complexity region, as well as variant positions

101

without an mpileup QUAL score equal to 222 in at least one isolate. This left 28 positions across the

102

48 isolates, which were manually reviewed before further analysis; a FASTA file of these variants

103

labeled with isolate sampling date (in the format “days since X”) is available at FigShare (DOI:

104

10.6084/m9.figshare.2077405, Data Citation 3). Variants were concatenated and analysed with

105

BEAST (Drummond & Rambaut, 2007) and the resulting timed phylogeny was passed to TransPhylo

106

for transmission timing inference.

107 108

We used a modified version of TransPhylo (https://github.com/xavierdidelot/TransPhylo, Data

109

Citation 4) in which we replaced the existing SIR epidemic model with a branching model. The

110

mathematics describing the branching model are detailed in a Methods Supplement included here.

111 112

RESULTS

113

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

114

Examining Tinf for each of the 48 sequenced cases revealed that the last person-to-person

115

transmission occurred in late July or early August 2012 (Figure 1A, B). The eight cases diagnosed

116

afterwards were all instances of delayed progression, with most of the transmission events leading

117

to these cases occurring in 2010-2011. Indeed, the epidemic curves based on Tinf and on diagnosis

118

date (Figure 1A) show echo each other, with transmission concentrated largely before 2011 and

119

diagnoses extending two years beyond that, underscoring the importance of early active case-

120

finding and preventive therapy in a TB outbreak.

121 122

The average time from infection to diagnosis was 1.2 years (95%CI ± 0.31; Figure 1B), increasing

123

from 0.98 (95%CI ± 0.82) in 2009 to 1.97 (95%CI ± 0.69) in 2013 despite continued intensive

124

surveillance. This supports the hypothesis that later cases were largely due to delayed progression of

125

infection acquired earlier in the outbreak and highlights the need for extensive follow-up of infected

126

contacts and provision of LTBI preventive therapy.

127 128

We also compared the Tinf values estimated here to those estimated using the original TransPhylo

129

release (Figure 2). Both the original SIR model and the updated branching model reported here

130

support our observation of at least two years without a transmission event. While the two methods

131

are somewhat similar in their Tinf estimates for isolates transmitted later in the outbreak, there is

132

substantial variation in early Tinf values, with the original SIR model reporting transmissions as early

133

as 2000. The values reported by the branching model are more consistent with the epidemiology of

134

TB in the outbreak community, demonstrating the importance of incorporating an epidemic model

135

that better reflects the biology underlying the epidemic process.

136 137

CONCLUSION

138 139

We presented our findings to the Medical Health Officer and the Outbreak Management Team on

140

January 9, 2015. After considering our genomic evidence indicating that no transmission of the

141

outbreak strain had been detected since 2012 – thereby fulfilling the criteria for at least two years

142

without a transmission event – and corroborating evidence from the ongoing epidemiological

143

investigation, the outbreak was declared over on January 29, 2015 (Interior Health Authority, 2015).

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

144 145

This is the first demonstration that evolutionary genomic analysis can be used to declare a complex

146

community outbreak over, suggesting a new role for public health genomics in not just identifying

147

transmission events, but also in timing these events to better understand an outbreak’s dynamics

148

and guide the real-time public health response.

149 150 151

ACKNOWLEDGEMENTS

152 153

This work was supported by the Engineering and Physical Sciences Research Council (H.H, C.C – grant

154

EP/K026003/1), the Canada Research Chairs program (J.L.G), and the Michael Smith Foundation for

155

Health Research (J.C.J). We thank the members of the Interior Health Authority TB Outbreak

156

Management Team, the BCCDC Public Health Laboratory’s Mycobacteriology Laboratory, and BCCDC

157

TB Services for their assistance, particularly Lori Hiscoe, Rob Parker, Clare Kong, Mabel Rodrigues,

158

and Victoria Cook.

159 160

REFERENCES

161 162

Cheng, J.M., Hiscoe, L., Pollock, S.L., Hasselback, P., Gardy, J.L., Parker, R. (2015). A clonal

163

outbreak of tuberculosis in a homeless population in the interior of British

164

Columbia, Canada, 2008-2015. Epidemiol Infect. 143, 3220-3226.

165 166

Croucher, N.J., Didelot, X. (2015). The application of genomics to tracing bacterial pathogen

167

transmission. Curr Opin Microbiol. 23, 62–67.

168 169

Didelot, X., Gardy, J., Colijn, C. (2014). Bayesian inference of infectious disease

170

transmission from whole-genome sequence data. Mol Biol Evol. 31, 869-1879.

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

171 172

Interior Health Authority. (2015). Six year TB outbreak comes to an end.

173

https://www.interiorhealth.ca/AboutUs/MediaCentre/NewsReleases/Documents/Six%20year%20TB

174

%20outbreak%20comes%20to%20an%20end.pdf, last accessed January 14, 2016.

175 176

Kwong, J.C, McCallum, N., Sintchenko, V., Howden, B.P. (2015). Whole genome sequencing in

177

clinical and public health microbiology. Pathology. 47, 199-210.

178 179

Poon, L.L., Song, T., Rosenfeld, R., Lin, X., Rogers, M.B., Zhou, B., Sebra, R., Halpin, R.A.,

180

Guan, Y. & other authors. (2016). Quantifying influenza virus diversity and transmission in humans.

181

Nat Genet. doi: 10.1038/ng.3479.

182 183

Worby, C.J., Chang, H.H., Hanage, W.P., Lipsitch, M. (2014). The distribution of pairwise genetic

184

distances: a tool for investigating disease transmission. Genetics.

185

198, 1395-1404.

186 187

World Health Organization. (2015). Guidelines on the Management of Latent Tuberculosis Infection.

188

http://apps.who.int/iris/bitstream/10665/136471/1/9789241548908_eng.pdf?ua=1&ua=1, last

189

accessed January 14, 2016.

190 191 192

DATA BIBLIOGRAPHY

193 194 195 196 197

1. Gardy, J. European Nucleotide Archive http://www.ebi.ac.uk/ena/data/view/PRJEB12764 (2016). 2. Gardy, J. FigShare https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epidemi

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

198 199

ology/3153280 (2016). 3. Gardy, J. FigShare

200

https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epidemi

201

ology/3153280 (2016).

202

4. Didelot, X. GitHub https://github.com/xavierdidelot/TransPhylo) (2015).

203 204 205

FIGURES

206 207

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

208 209

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

210 211 212 213 214

Figure 1. Timing of infections inferred from genomic data. Panel A: The outbreak’s epidemic curve

215

based on time of diagnosis (black line) or Tinf – the time of infection estimated by TransPhylo (grey

216

bars). Panel B: Each case’s infected period is shown as a line originating at Tinf (grey dot) and

217

continuing until the case was diagnosed (white dot). The outbreak period (May 2008-January 2015)

218

is indicated with shading.

219 220

Figure 2. Tinf estimated by a branching model versus SIR model.

221

Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

Supplementary Material Files

Click here to download Supplementary Material Files MGen_SupplementaryMethods.pdf

Supplementary Methods

February 9, 2016

1

Branching Model

Our approach follows Didelot et al. (2014), using the idea of assigning each host to a connected subset of a fixed, timed, phylogenetic tree (a genealogy). This assignment can be considered as a “colouring” of the tree. Our method is an MCMC approach beginning from an initial valid colouring as in Didelot et al, updating the colouring, computing its likelihood, and using a Metropolis-Hastings accept/reject step. The colouring specifies who infected whom and when and the timed phylogenetic tree contains the times of sampling of the hosts. Accordingly, we now need to specify how the likelihood of a “colouring” is computed. The key difference between our approach and that in Didelot et al. is that where Didelot et al. used a susceptible-infectious-recovered (SIR) model for the probability of the transmission process, we use a branching model. In this model, infected individuals cause a Poisson-distributed number of secondary infections (with mean connected to the generation time and the individual’s infectious period). The time between becoming infected and infecting others is distributed according to a generation time distribution fg . The time between becoming infected and being sampled is distributed according to a prior distribution fs . As in Didelot et al, we assume that all infectious cases are known. We write the probability of the transmission tree T , conditional on the phylogenetic tree G, the in-host effective population size Ne g and the branching model parameters  as P(T, , Ne g|G) ∝ P(G|T, , Ne g)P(, Ne g, T ) (1) which we write as P(T, , Ne g|G) ∝ P(G|T, Ne g)P(T |)π(, Ne g),

(2)

where π represents the priors for  and Ne g. The likelihood of the genealogical component G, P(G|T, Ne g), is as given in Didelot et al and is the product of the likelihoods of the (independent) genealogies inside each host under a fixed-size coalescent model. The term P(T | represents the probability of the transmission tree under the parameters of the branching model. We write this using the probability of the number of secondary infections k of each case (Poisson), the probability that the case was sampled at the specified time after infection (using fs ), and the probabilities of the infection times of the secondary infections using fg . Since we assume that each case’s infectious period ends at their time of sampling, we condition on the infection times of observed cases being prior to that time. This gives P(T |) =

n Y

fo (ki )fs (tisamp − tiinf )

i=1

ki Y fg (tjinf − tiinf )

F (ti − tiinf ) j=1 g samp

(3)

where fs , fo and fg represent the probability density functions for the sampling, offspring and generation time distributions and Fg is the cumulative distribution function for the generation time. Equation (3) describes the probability for each individual i = 1, ..., n in the tree of having ki offspring given how long they were infectious and the probability of infecting offspring j at tjinf and being sampled at tisamp both conditional on when they were infected, tiinf . 1 Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22

The expected number of infections depends on an individual’s duration of infectiousness and the infectivity, which is related to the generation time (Equation (4)). We used a gammadistributed generation time (fg is gamma with parameters kg and θg ) so as to have a distribution more centered around the mean than an exponential, and a long tail that allows for individuals to reactivate older TB infections. We have the expected number of secondary infections of host i: tisamp −tiinf

Z Ei =

R0 fg (τ )dτ

(4)

0

R0 = γ Γ(kg )

kg ,

tisamp − tiinf

! (5)

θg

We assume that the secondary infections from each individual occur as a Poisson process, and use Equation (4) as the mean of the offspring distribution, giving us a time-inhomogeneous Poisson process with intensity function R0 fg (τ ) up to τ = tsamp − tinf and 0 thereafter. We assume the sampling time also follows a gamma distribution (fs is gamma with parameters ks and θs ) to allow for a variable latency period with individuals sampled (and their TB sequenced and included in the study) only after they have active TB disease. In total, the parameters for the are branching model are  = {R0 , kg , θg , ks , θs } and these are held fixed at values {1.5, 2, 1, 1, 2} (though in principle they could also be assigned prior distributions). The full description of the terms is: fg (tjsamp − tiinf ) Fg (tisamp − tiinf )

1 k

=

fo (ki ) = fs (tisamp

2



tiinf )

=

θg g

(tjsamp

− tiinf )kg −1 e

j tsamp −tiinf θg

−   tisamp −tiinf γ kg , θg

(6)

1 (Ei )ki e−Ei ki ! 1

(tisamp Γ(ks )θsks

(7) −

tiinf )ks −1 e−

tisamp −tiinf θs

(8)

References 1. Didelot X, Gardy J, Colijn C. 2014. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol. 31:1869-1879.

2 Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22