Mycobacterium tuberculosis genomes as part of a real-time public health outbreak .... We thank the members of the Interior Health Authority TB Outbreak. 155.
Microbial Genomics Declaring a tuberculosis outbreak over with genomic epidemiology --Manuscript Draft-Manuscript Number:
MGEN-D-16-00015R1
Full Title:
Declaring a tuberculosis outbreak over with genomic epidemiology
Article Type:
Short Paper
Section/Category:
Microbial evolution and epidemiology: Communicable disease genomics
Corresponding Author:
Jennifer Gardy BC Centre for Disease Control CANADA
First Author:
Hollie-Ann Hatherell
Order of Authors:
Hollie-Ann Hatherell Xavier Didelot Sue L. Pollock Patrick Tang Anamaria Crisan James C. Johnston Caroline Colijn Jennifer Gardy
Abstract:
We report an updated method for inferring the time at which an infectious disease was transmitted between persons from a time-labelled pathogen genome phylogeny. We applied the method to 48 Mycobacterium tuberculosis genomes as part of a real-time public health outbreak investigation, demonstrating that although active tuberculosis (TB) cases were diagnosed through 2013, no transmission events took place beyond mid-2012. Subsequent cases were the result of progression from latent TB infection to active disease and not recent transmission. This evolutionary genomic approach was used to declare the outbreak over in January 2015.
Downloaded from www.microbiologyresearch.org by 191.101.10.120 Powered by Editorial Manager® and IP: ProduXion Manager® from Aries Systems Corporation On: Mon, 01 Aug 2016 08:53:22
Manuscript Including References (Word document)
Click here to download Manuscript Including References (Word document) MGen_TBOutbreak_revision.docx
Short Paper template
1
Declaring a tuberculosis outbreak
2
over with genomic epidemiology
3 4
ABSTRACT
5 6
We report an updated method for inferring the time at which an infectious disease was transmitted
7
between persons from a time-labelled pathogen genome phylogeny. We applied the method to 48
8
Mycobacterium tuberculosis genomes as part of a real-time public health outbreak investigation,
9
demonstrating that although active tuberculosis (TB) cases were diagnosed through 2013, no
10
transmission events took place beyond mid-2012. Subsequent cases were the result of progression
11
from latent TB infection to active disease and not recent transmission. This evolutionary genomic
12
approach was used to declare the outbreak over in January 2015.
13 14 15
DATA SUMMARY
16 17
1. Short read data for the 48 sequenced M. tuberculosis genomes has been deposited in the
18
European Nucleotide Archive; accession number: PRJEB12764 (url -
19
http://www.ebi.ac.uk/ena/data/view/PRJEB12764)
20
2. The commands used in reference alignment and variant calling are available as a text file
21
from Figshare; DOI: 10.6084/m9.figshare.3153280 (url -
22
https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epid
23
emiology/3153280).
24 25
3. The final dataset of variants is available as a fasta file from Figshare; DOI: 10.6084/m9.figshare.3153280 (url -
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
26
https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epid
27
emiology/3153280)
28 29
4. The TransPhylo code, including the update written specifically for this study, is available from GitHub (url - https://github.com/xavierdidelot/TransPhylo)
30 31
I/We confirm all supporting data, code and protocols have been provided within the article or
32
through supplementary data files.
33 34
IMPACT STATEMENT
35 36
We have previously described a method for inferring person-to-person transmission events from
37
pathogen genome data; here, we describe an improvement to the method’s underlying epidemic
38
model that allows us to infer the likely time at which a person was infected. Significantly, we
39
describe how this updated approach was used in the real-time investigation of a large tuberculosis
40
(TB) outbreak and how our results were used to declare an end to the outbreak. This is the first
41
report of genomic data being used to declare a complex community outbreak over, suggesting a new
42
role for the use of genomic data in TB control.
43 44
INTRODUCTION
45 46
Genomics is revolutionizing public health practice (Kwong et al., 2015). Mutational and evolutionary
47
events within a pathogen population not only have consequences for the disease, but also present
48
opportunities for understanding transmission and developing targeted public health interventions.
49
Inferring person-to-person transmission from genomic data is one such example – genome
50
sequencing has now helped identify individual infection events in multiple outbreaks at levels from
51
hospital wards to communities to countries (Croucher & Didelot, 2015).
52 53
Transmission inference from genomic data uses mutations – fixed or minor variants (Worby et al.,
54
2014; Poon et al., 2016) – shared across outbreak isolates to identify putative infection events. We
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
55
previously developed TransPhylo (Didelot et al., 2014) – a Bayesian method for inferring
56
transmissions and their timing given mutational events captured in a time-labelled phylogeny – and
57
used it to reconstruct transmissions between the first 33 cases (2008-2011) of a large TB outbreak.
58
The outbreak in question began with the May 2008 diagnosis of a highly infectious client in a
59
homeless shelter in British Columbia, Canada and peaked in 2010. Intensive case-finding in the
60
community ultimately screened 2,310 individuals and a total of 52 TB cases were diagnosed through
61
December 2013 (Figure 1A) (Cheng et al., 2015).
62 63
In the absence of a formal definition, a TB outbreak is generally deemed over when transmission of
64
the outbreak strain has stopped for >2 years; however, latent TB infection (LTBI) complicates
65
declaring an outbreak’s end. Amongst individuals identified with LTBI, only 5-10% will progress to
66
active disease, with most developing disease within two years of infection (WHO, 2015); however,
67
delayed progression occurring more than two years after infection is not uncommon. As incident
68
case numbers begin to decline, TB controllers must differentiate LTBI cases acquired >2 years ago
69
and only now progressing to active disease from new cases that were recently acquired, suggesting
70
ongoing transmission.
71 72
One of TransPhylo’s outputs is Tinf – the estimated time at which an individual was infected. Tinf can
73
differentiate delayed progression from new infection; however, TransPhylo’s underlying SIR model –
74
and indeed all compartmental epidemic models – does not capture the true size of the susceptible
75
population or the true variation in the infectious periods, which may affect inferred Tinf values.
76 77
In September 2014, at the request of the Medical Health Officer leading the outbreak response, we
78
analysed Mycobacterium tuberculosis genomes from 48 of the 52 cases to determine whether a
79
decline in newly diagnosed cases truly signaled the outbreak’s end. We replaced TransPhylo’s SIR
80
model with a branching model to better infer the timing of transmission amongst the 48 cases and
81
asked whether cases diagnosed in 2013 were the result of recent transmission or delayed
82
progression of an infection acquired earlier in the outbreak.
83 84 85
METHODS Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
86 87
As part of an earlier investigation during the outbreak, we sequenced 33 M. tuberculosis genomes
88
from outbreak cases diagnosed between May 2008 and April 2011 (Didelot et al., 2014). In
89
September 2014, we sequenced genomes from a further 15 cases on the MiSeq platform, for a total
90
of 48 genomes (Accession: PRJEB12764, Data Citation 1). Four of the 52 outbreak cases did not have
91
an M. tuberculosis isolate available for sequencing as they were diagnosed out-of-province or on
92
clinical grounds during a post-mortem.
93 94
Reads were mapped against the M. tuberculosis CDC1551 reference
95
genome (NC_002755.2) using BWAmem (Li & Durbin, 2009) and variants called using samtools
96
mpileup (Li et al., 2009); the commands we used are available in a text file from FigShare (DOI:
97
10.6084/m9.figshare.2077390, Data Citation 2). From the resulting VCF files, we removed all variant
98
positions that were identical across all 48 genomes, leaving only variants that differentiate outbreak
99
isolates. We filtered a matrix of these positions to remove variant positions within 150bp of another
100
variant position, suggesting misalignment to a low-complexity region, as well as variant positions
101
without an mpileup QUAL score equal to 222 in at least one isolate. This left 28 positions across the
102
48 isolates, which were manually reviewed before further analysis; a FASTA file of these variants
103
labeled with isolate sampling date (in the format “days since X”) is available at FigShare (DOI:
104
10.6084/m9.figshare.2077405, Data Citation 3). Variants were concatenated and analysed with
105
BEAST (Drummond & Rambaut, 2007) and the resulting timed phylogeny was passed to TransPhylo
106
for transmission timing inference.
107 108
We used a modified version of TransPhylo (https://github.com/xavierdidelot/TransPhylo, Data
109
Citation 4) in which we replaced the existing SIR epidemic model with a branching model. The
110
mathematics describing the branching model are detailed in a Methods Supplement included here.
111 112
RESULTS
113
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
114
Examining Tinf for each of the 48 sequenced cases revealed that the last person-to-person
115
transmission occurred in late July or early August 2012 (Figure 1A, B). The eight cases diagnosed
116
afterwards were all instances of delayed progression, with most of the transmission events leading
117
to these cases occurring in 2010-2011. Indeed, the epidemic curves based on Tinf and on diagnosis
118
date (Figure 1A) show echo each other, with transmission concentrated largely before 2011 and
119
diagnoses extending two years beyond that, underscoring the importance of early active case-
120
finding and preventive therapy in a TB outbreak.
121 122
The average time from infection to diagnosis was 1.2 years (95%CI ± 0.31; Figure 1B), increasing
123
from 0.98 (95%CI ± 0.82) in 2009 to 1.97 (95%CI ± 0.69) in 2013 despite continued intensive
124
surveillance. This supports the hypothesis that later cases were largely due to delayed progression of
125
infection acquired earlier in the outbreak and highlights the need for extensive follow-up of infected
126
contacts and provision of LTBI preventive therapy.
127 128
We also compared the Tinf values estimated here to those estimated using the original TransPhylo
129
release (Figure 2). Both the original SIR model and the updated branching model reported here
130
support our observation of at least two years without a transmission event. While the two methods
131
are somewhat similar in their Tinf estimates for isolates transmitted later in the outbreak, there is
132
substantial variation in early Tinf values, with the original SIR model reporting transmissions as early
133
as 2000. The values reported by the branching model are more consistent with the epidemiology of
134
TB in the outbreak community, demonstrating the importance of incorporating an epidemic model
135
that better reflects the biology underlying the epidemic process.
136 137
CONCLUSION
138 139
We presented our findings to the Medical Health Officer and the Outbreak Management Team on
140
January 9, 2015. After considering our genomic evidence indicating that no transmission of the
141
outbreak strain had been detected since 2012 – thereby fulfilling the criteria for at least two years
142
without a transmission event – and corroborating evidence from the ongoing epidemiological
143
investigation, the outbreak was declared over on January 29, 2015 (Interior Health Authority, 2015).
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
144 145
This is the first demonstration that evolutionary genomic analysis can be used to declare a complex
146
community outbreak over, suggesting a new role for public health genomics in not just identifying
147
transmission events, but also in timing these events to better understand an outbreak’s dynamics
148
and guide the real-time public health response.
149 150 151
ACKNOWLEDGEMENTS
152 153
This work was supported by the Engineering and Physical Sciences Research Council (H.H, C.C – grant
154
EP/K026003/1), the Canada Research Chairs program (J.L.G), and the Michael Smith Foundation for
155
Health Research (J.C.J). We thank the members of the Interior Health Authority TB Outbreak
156
Management Team, the BCCDC Public Health Laboratory’s Mycobacteriology Laboratory, and BCCDC
157
TB Services for their assistance, particularly Lori Hiscoe, Rob Parker, Clare Kong, Mabel Rodrigues,
158
and Victoria Cook.
159 160
REFERENCES
161 162
Cheng, J.M., Hiscoe, L., Pollock, S.L., Hasselback, P., Gardy, J.L., Parker, R. (2015). A clonal
163
outbreak of tuberculosis in a homeless population in the interior of British
164
Columbia, Canada, 2008-2015. Epidemiol Infect. 143, 3220-3226.
165 166
Croucher, N.J., Didelot, X. (2015). The application of genomics to tracing bacterial pathogen
167
transmission. Curr Opin Microbiol. 23, 62–67.
168 169
Didelot, X., Gardy, J., Colijn, C. (2014). Bayesian inference of infectious disease
170
transmission from whole-genome sequence data. Mol Biol Evol. 31, 869-1879.
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
171 172
Interior Health Authority. (2015). Six year TB outbreak comes to an end.
173
https://www.interiorhealth.ca/AboutUs/MediaCentre/NewsReleases/Documents/Six%20year%20TB
174
%20outbreak%20comes%20to%20an%20end.pdf, last accessed January 14, 2016.
175 176
Kwong, J.C, McCallum, N., Sintchenko, V., Howden, B.P. (2015). Whole genome sequencing in
177
clinical and public health microbiology. Pathology. 47, 199-210.
178 179
Poon, L.L., Song, T., Rosenfeld, R., Lin, X., Rogers, M.B., Zhou, B., Sebra, R., Halpin, R.A.,
180
Guan, Y. & other authors. (2016). Quantifying influenza virus diversity and transmission in humans.
181
Nat Genet. doi: 10.1038/ng.3479.
182 183
Worby, C.J., Chang, H.H., Hanage, W.P., Lipsitch, M. (2014). The distribution of pairwise genetic
184
distances: a tool for investigating disease transmission. Genetics.
185
198, 1395-1404.
186 187
World Health Organization. (2015). Guidelines on the Management of Latent Tuberculosis Infection.
188
http://apps.who.int/iris/bitstream/10665/136471/1/9789241548908_eng.pdf?ua=1&ua=1, last
189
accessed January 14, 2016.
190 191 192
DATA BIBLIOGRAPHY
193 194 195 196 197
1. Gardy, J. European Nucleotide Archive http://www.ebi.ac.uk/ena/data/view/PRJEB12764 (2016). 2. Gardy, J. FigShare https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epidemi
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
198 199
ology/3153280 (2016). 3. Gardy, J. FigShare
200
https://figshare.com/articles/Declaring_a_tuberculosis_outbreak_over_with_genomic_epidemi
201
ology/3153280 (2016).
202
4. Didelot, X. GitHub https://github.com/xavierdidelot/TransPhylo) (2015).
203 204 205
FIGURES
206 207
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
208 209
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
210 211 212 213 214
Figure 1. Timing of infections inferred from genomic data. Panel A: The outbreak’s epidemic curve
215
based on time of diagnosis (black line) or Tinf – the time of infection estimated by TransPhylo (grey
216
bars). Panel B: Each case’s infected period is shown as a line originating at Tinf (grey dot) and
217
continuing until the case was diagnosed (white dot). The outbreak period (May 2008-January 2015)
218
is indicated with shading.
219 220
Figure 2. Tinf estimated by a branching model versus SIR model.
221
Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
Supplementary Material Files
Click here to download Supplementary Material Files MGen_SupplementaryMethods.pdf
Supplementary Methods
February 9, 2016
1
Branching Model
Our approach follows Didelot et al. (2014), using the idea of assigning each host to a connected subset of a fixed, timed, phylogenetic tree (a genealogy). This assignment can be considered as a “colouring” of the tree. Our method is an MCMC approach beginning from an initial valid colouring as in Didelot et al, updating the colouring, computing its likelihood, and using a Metropolis-Hastings accept/reject step. The colouring specifies who infected whom and when and the timed phylogenetic tree contains the times of sampling of the hosts. Accordingly, we now need to specify how the likelihood of a “colouring” is computed. The key difference between our approach and that in Didelot et al. is that where Didelot et al. used a susceptible-infectious-recovered (SIR) model for the probability of the transmission process, we use a branching model. In this model, infected individuals cause a Poisson-distributed number of secondary infections (with mean connected to the generation time and the individual’s infectious period). The time between becoming infected and infecting others is distributed according to a generation time distribution fg . The time between becoming infected and being sampled is distributed according to a prior distribution fs . As in Didelot et al, we assume that all infectious cases are known. We write the probability of the transmission tree T , conditional on the phylogenetic tree G, the in-host effective population size Ne g and the branching model parameters as P(T, , Ne g|G) ∝ P(G|T, , Ne g)P(, Ne g, T ) (1) which we write as P(T, , Ne g|G) ∝ P(G|T, Ne g)P(T |)π(, Ne g),
(2)
where π represents the priors for and Ne g. The likelihood of the genealogical component G, P(G|T, Ne g), is as given in Didelot et al and is the product of the likelihoods of the (independent) genealogies inside each host under a fixed-size coalescent model. The term P(T | represents the probability of the transmission tree under the parameters of the branching model. We write this using the probability of the number of secondary infections k of each case (Poisson), the probability that the case was sampled at the specified time after infection (using fs ), and the probabilities of the infection times of the secondary infections using fg . Since we assume that each case’s infectious period ends at their time of sampling, we condition on the infection times of observed cases being prior to that time. This gives P(T |) =
n Y
fo (ki )fs (tisamp − tiinf )
i=1
ki Y fg (tjinf − tiinf )
F (ti − tiinf ) j=1 g samp
(3)
where fs , fo and fg represent the probability density functions for the sampling, offspring and generation time distributions and Fg is the cumulative distribution function for the generation time. Equation (3) describes the probability for each individual i = 1, ..., n in the tree of having ki offspring given how long they were infectious and the probability of infecting offspring j at tjinf and being sampled at tisamp both conditional on when they were infected, tiinf . 1 Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22
The expected number of infections depends on an individual’s duration of infectiousness and the infectivity, which is related to the generation time (Equation (4)). We used a gammadistributed generation time (fg is gamma with parameters kg and θg ) so as to have a distribution more centered around the mean than an exponential, and a long tail that allows for individuals to reactivate older TB infections. We have the expected number of secondary infections of host i: tisamp −tiinf
Z Ei =
R0 fg (τ )dτ
(4)
0
R0 = γ Γ(kg )
kg ,
tisamp − tiinf
! (5)
θg
We assume that the secondary infections from each individual occur as a Poisson process, and use Equation (4) as the mean of the offspring distribution, giving us a time-inhomogeneous Poisson process with intensity function R0 fg (τ ) up to τ = tsamp − tinf and 0 thereafter. We assume the sampling time also follows a gamma distribution (fs is gamma with parameters ks and θs ) to allow for a variable latency period with individuals sampled (and their TB sequenced and included in the study) only after they have active TB disease. In total, the parameters for the are branching model are = {R0 , kg , θg , ks , θs } and these are held fixed at values {1.5, 2, 1, 1, 2} (though in principle they could also be assigned prior distributions). The full description of the terms is: fg (tjsamp − tiinf ) Fg (tisamp − tiinf )
1 k
=
fo (ki ) = fs (tisamp
2
−
tiinf )
=
θg g
(tjsamp
− tiinf )kg −1 e
j tsamp −tiinf θg
− tisamp −tiinf γ kg , θg
(6)
1 (Ei )ki e−Ei ki ! 1
(tisamp Γ(ks )θsks
(7) −
tiinf )ks −1 e−
tisamp −tiinf θs
(8)
References 1. Didelot X, Gardy J, Colijn C. 2014. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol. 31:1869-1879.
2 Downloaded from www.microbiologyresearch.org by IP: 191.101.10.120 On: Mon, 01 Aug 2016 08:53:22