Helen Bernard. Simon le ..... Deng, X., Desai, P.T., den Bakker, H.C., Mikoleit, M., Tolar, B., Trees, E., Hendriksen, R.S., Frye, J.G.,. 347 .... Lane, C.R., LeBaigue, S., Esan, O.B., Awofisyo, A.A., Adams, N.L., Fisher, I.S.T., Grant, K.A., Peters,. 392.
Microbial Genomics Phylogenetic structure of European Salmonella Enteritidis outbreak correlates with national and international egg distribution network --Manuscript Draft-Manuscript Number:
MGEN-D-16-00014R3
Full Title:
Phylogenetic structure of European Salmonella Enteritidis outbreak correlates with national and international egg distribution network
Article Type:
Short Paper
Section/Category:
Microbial evolution and epidemiology: Communicable disease genomics
Corresponding Author:
Tim Dallman Public Health England UNITED KINGDOM
First Author:
Tim Dallman
Order of Authors:
Tim Dallman Thomas Inns Thibaut Jombart Philip Ashton Nicolas Loman Carol Chatt Ute Messelhaeusser Wolfgang Rabsch Sandra Simon Sergejs Nikisins Helen Bernard Simon le Hello Nathalie Jourdan da-Silva Christian Kornschober Joel Mossong Peter Hawkey Elizabeth de Pinna Kathie Grant Paul Cleary
Abstract:
Outbreaks of Salmonella Enteritidis have long been associated with contaminated poultry and eggs. In the Summer of 2014 a large multi-national outbreak of Salmonella Enteritidis phage type 14b occurred with over 350 cases reported in the United Kingdom, Germany, Austria, France and Luxembourg. Egg supply network investigation and microbiological sampling identified the source to be a Bavarian egg producer. As part of the international investigation into the outbreak, over 400 isolates were sequenced including isolates from cases, implicated UK premises, and eggs from the suspected source producer. We were able to show a clear statistical correlation between the topology of the UK egg distribution network and the phylogenetic network of outbreak isolates. This correlation can most plausibly be explained by different parts of the egg distribution network being supplied by eggs solely from independent premises of the Bavarian egg producer (Company X). Microbiological sampling from the source premises, traceback information and information on the interventions carried out at the egg production premises all supported this conclusion. The level of Downloaded from www.microbiologyresearch.org by 104.249.141.12 Powered by Editorial Manager® and IP: ProduXion Manager® from Aries Systems Corporation On: Fri, 27 May 2016 13:52:51
insight into the outbreak epidemiology provided by WGS would not have been possible using traditional microbial typing methods.
Downloaded from www.microbiologyresearch.org by 104.249.141.12 Powered by Editorial Manager® and IP: ProduXion Manager® from Aries Systems Corporation On: Fri, 27 May 2016 13:52:51
Manuscript Including References (Word document)
Click here to download Manuscript Including References (Word document) MGen_dallman_short_revision.docx
Short Paper template
1 2 3
Phylogenetic structure of European Salmonella Enteritidis outbreak correlates with national and international egg distribution network
4 5 6
ABSTRACT
7 8
Outbreaks of Salmonella Enteritidis have long been associated with contaminated poultry and eggs.
9
In the summer of 2014 a large multi-national outbreak of Salmonella Enteritidis phage type 14b
10
occurred with over 350 cases reported in the United Kingdom, Germany, Austria, France and
11
Luxembourg. Egg supply network investigation and microbiological sampling identified the source to
12
be a Bavarian egg producer. As part of the international investigation into the outbreak, over 400
13
isolates were sequenced including isolates from cases, implicated UK premises, and eggs from the
14
suspected source producer. We were able to show a clear statistical correlation between the
15
topology of the UK egg distribution network and the phylogenetic network of outbreak isolates. This
16
correlation can most plausibly be explained by different parts of the egg distribution network being
17
supplied by eggs solely from independent premises of the Bavarian egg producer (Company X).
18
Microbiological sampling from the source premises, traceback information and information on the
19
interventions carried out at the egg production premises all supported this conclusion. The level of
20
insight into the outbreak epidemiology provided by Whole Genome Sequencing (WGS) would not
21
have been possible using traditional microbial typing methods.
22 23 24
DATA SUMMARY
25 26
FASTQ sequences were deposited in the NCBI Short Read Archive under the BioProject PRJNA248792
27
(http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA248792)
28
Supplementary material is available at the following git repository
Downloaded from www.microbiologyresearch.org by IP: 104.249.141.12 On: Fri, 27 May 2016 13:52:51
29 30
https://github.com/timdallman/sent_14b.git
31 32 33
I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files.
34
35 36 37
IMPACT STATEMENT
38 39
In this article we show how the phylogenetic relationships between isolates in a foodborne outbreak
40
can be informative in revealing underlying epidemiological trends.
41 42
We were able to show a clear statistical correlation between the topology of the UK cases egg
43
distribution network and the phylogenetic network of outbreak isolates. This indicated that the
44
phylogeny clustered into distinct clades related to the source of eggs.
45 46
This study shows the benefit of whole genome sequencing of pathogens in revealing the true
47
epidemiology behind an outbreak allowing inference about source diversity and food-chain
48
contamination.
49 50 51
INTRODUCTION
52 53
Salmonella enterica serovar Enteritidis outbreaks in humans are often linked to contaminated
54
foodstuffs produced by the poultry industry (Harker et al., 2014; Lane et al., 2014). The incidence of
55
Salmonella Enteritidis in the United Kingdom and in Europe has decreased significantly following the
56
implementation of a vaccination program and other control measures in chicken flocks(Poirier et al.,
57
2008), but outbreaks associated with contaminated eggs continue to occur(Hugas and Beloeil, 2014).
58
Downloaded from www.microbiologyresearch.org by IP: 104.249.141.12 On: Fri, 27 May 2016 13:52:51
59
In 2014 a large multi-national outbreak of Salmonella Enteritidis phage type 14b was linked to
60
consumption of eggs (Inns et al., 2015). Over 350 cases were reported in the United Kingdom,
61
Germany, Austria, France and Luxembourg. Egg supply network investigation and microbiological
62
sampling identified the source to be a Bavarian egg producer.
63 64
Whole genome sequencing (WGS) is increasingly used for surveillance of food-borne
65
pathogens(Dallman et al., 2015b; Quick et al., 2015)(Jenkins et al., 2015) and prospective typing of
66
Salmonella Enteritidis isolates can identify possible outbreaks in real-time(den Bakker et al., 2014;
67
Deng et al., 2014). S. Enteritidis outbreaks can be investigated using WGS as part of the case
68
definition, as strains of a given outbreak are typically monophyletic (i.e. representing a single
69
evolutionary pathway), with limited diversity between outbreak isolates(Deng et al., 2014)(Wuyts et
70
al., 2015)(Taylor et al., 2015). WGS has however revealed significant polyclonal contamination
71
within chicken production farms (Allard et al., 2013).
72 73
Phylogenetic methods that explore the relationships between microbial genomes have been used to
74
study the emergence(Dallman et al., 2015a; Holt et al., 2012), geographical diffusion(Baker et al.,
75
2015; He et al., 2013) and transmission of infections(Bryant et al., 2013; Eyre et al., 2013).
76
Phylogenetic topologies can also be informative in terms of source attribution in outbreak
77
investigations(Harris et al., 2013). In this study we explore the relationships between the observed
78
phylogeny of 400 clinical, environmental and food isolates obtained in this outbreak and the
79
distribution network of the implicated foodstuff, as ascertained by national and international
80
traceback investigations.
81 82 83
METHODS
84 85
Strains
86 87
Since April 2014 all presumptive isolates of Salmonella in England and Wales received by the
88
Gastrointestinal Bacteria Reference Unit at Public Health England (PHE) have undergone WGS. As of
89
1st August 2015, 3,844 sequences of Salmonella enterica serovar Enteritidis had been analysed for
Downloaded from www.microbiologyresearch.org by IP: 104.249.141.12 On: Fri, 27 May 2016 13:52:51
90
routine surveillance purposes. Forty-four isolates from Germany, France, Austria and Luxembourg
91
were sequenced as part of this investigation with the inclusion criteria as follows; clinical isolates
92
belonging to phage type 14b, clinical isolates with matching multi-locus variable number tandem
93
repeat analysis (MLVA) profile 2-12-7-3-2, and implicated food or environmental samples. Of all the
94
isolates sequenced from the UK and mainland Europe, 401 strains of Salmonella Enteritidis matched
95
the outbreak SNP address 1.2.3.38.38.38 (Ashton et al., 2015), these included all the isolates
96
previously described by Inns et al. (2015). The strains are described in supplementary table 1.
97 98
Food chain investigations
99 100
Rapid Alert System for Food and Feed (RASFF) notifications were issued on 09 July 2014 (France), 31
101
July 2014 (Austria) and 1 August 2014 (France), which linked S. Enteritidis outbreaks in France and
102
Austria to chicken eggs from Company X in Germany. Company X had four separate premises, three
103
in Germany and one in Czech Republic; all are operationally independent. All four sites used young
104
chickens (pullets) from two locations: one in Germany and one in the Czech Republic. Food supply
105
network investigations involved obtaining information on the supply of eggs from Company X to UK
106
distributors and tracing onward supply to other UK companies. In addition, supply network
107
investigations were conducted in England to trace supplies of chicken and chicken eggs consumed by
108
cases to their source as described in Inns et al. (2015). In total, 198 of the 287 (69%) confirmed UK
109
cases could be plausibly linked to eggs supplied by one company, Company X, with no traceback
110
information available for the other cases.
111 112
Sequencing
113 114
Sequencing was performed by the PHE Genome Sequencing Unit using Nextera library preparation
115
on the Illumina HiSeq 2500 run in fast mode according to the manufacturers’ instructions, which
116
yielded 2 X 100 base pair paired end reads. At the Robert Koch Institute, University of Birmingham
117
and Institut Pasteur, libraries from genomic DNA were created using the Nextera library preparation
118
kit and subsequently run on the Illumina MiSeq sequencer using Illumina’s v3 Reagent Kit to produce
119
2 x 300 base pair paired end reads.
120
Downloaded from www.microbiologyresearch.org by IP: 104.249.141.12 On: Fri, 27 May 2016 13:52:51
121
High quality Illumina reads were mapped to the Salmonella enterica Enteritidis reference genome
122
(GenBank:AM933172) using BWA-MEM(Li and Durbin, 2010). Single nucleotide polymorphisms
123
(SNPs) were then identified using GATK2(McKenna et al., 2010) in unified genotyper mode. Core
124
genome positions that had a high quality SNP (>90% consensus, minimum depth 10x, GQ >= 30) in at
125
least one strain were extracted and RaxML v8.17 (Stamatakis, 2014) used to derive the maximum
126
likelihood phylogeny of the isolates under the GTRCAT model of evolution. Single linkage SNP
127
clustering was performed as previously described (Ashton et al., 2015). FASTQ reads from all
128
sequences in this study can be found at the PHE Pathogens BioProject at the National Center for
129
Biotechnology Information (Accession PRJNA248792).
130 131
Timed phylogenies
132 133
Timed phylogenies were constructed using BEAST-MCMC v1.80(Drummond et al., 2012) after first
134
removing regions of the genome predicted to have undergone recombination using Gubbins
135
v1.3(Croucher et al., 2015). Alternative clock models and population priors were computed and
136
assessed based on Bayes Factor (BF) tests using Tracer v1.6. The highest supported model was a
137
relaxed lognormal clock rate under a constant population size. All models were run with a chain
138
length of 1 billion. A maximum clade credibility tree was constructed using TreeAnnotator
139
v1.75(Drummond et al., 2012).
140 141
Network Comparison
142 143
Distances on the distribution network were measured as the number of intermediate nodes on the
144
shortest path between a pair of cases. For instance, the distance between two cases infected in the
145
same restaurant was one. The genetic distance between isolates of two different cases was
146
measured as the patristic distances between the corresponding tips of the phylogeny. A Monte-
147
Carlo Mantel test (Mantel, 1967) was used to investigate the degree of correlation between the
148
supply network and genetic distance matrices, using 167 cases that were both sequenced and
149
documented on the distribution network, with 9,999 random permutations of the data. Because the
150
relationship may be driven by cases infected by the same source, we also tested correlations
151
between distances based on pairs of cases with different sources of infection only.
Downloaded from www.microbiologyresearch.org by IP: 104.249.141.12 On: Fri, 27 May 2016 13:52:51
152 153
In addition, we accounted for the potential bias stemming from the existence of distinct genetic
154
clades in the sampled isolates using a partial Mantel test (Legendre and Legendre, 2012). In this
155
analysis, a linear regression is used to predict patristic distances as a function of a binary clade
156
membership distance (0: same clade; 1: different clades) and distances on the food distribution
157
network. Effects of each covariate, and their interaction, was tested using the classical ANOVA
158
framework (Legendre and Legendre, 2012).
159 160
All analyses were carried out in the R software(Team, 2014), using the packages igraph(Csardi and
161
Nepusz, 2006) for graph distances, adephylo (Jombart et al., 2010) for phylogenetic distances
162
computations and ade4(Dray et al., 2007, p. 4) for the Mantel test.
163 164 165
RESULTS
166 167
The phylogeny of 401 isolates implicated in the outbreak resolved into three clades all supported
168
with bootstraps > 95 (Figure1a). The isolates formed a single 5 SNP single linkage cluster with a
169
maximum distance between any two genomes of 23 SNPs. Within England, the outbreak consisted
170
of five point source, geographically distinct incidents and 101 sporadic cases. Isolates from the point
171
source outbreaks resolved into distinct sub-clades in the outbreak phylogeny with a maximum SNP
172
distance of 2 (mode of 0). Supply network information plausibly linked 198 cases in England to eggs
173
supplied by one company, Company X (Figure1b). Multiple trace-back pathways to the implicated
174
source of infection were identified (Inns et al., 2015).
175 176
According to the Monte-Carlo Mantel test, genetic distances between isolates significantly increased
177
with the distance (number of intermediate nodes) on the distribution network (r=0.62,p=0.0001).
178
This relationship remained significant when considering only pairs of cases from different direct
179
exposures (r=0.46; t-test: p=2.2x10-16) and was robust to non-linearities between distances
180
(Spearman rho=0.60, p=2.2x10-16). This relationship also remained significant when accounting for
181
the existence of distinct genetic clades (ANOVA: F=18,023; p