Alan Calder. Lori AS Snyder, Ph.D. Abstract: There are many types of repeated DNA sequences in the genomes of the Neisseria spp., from homopolymeric tracts ...
Microbial Genomics Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation. --Manuscript Draft-Manuscript Number:
MGEN-D-16-00046R1
Full Title:
Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation.
Article Type:
Short Paper
Section/Category:
Genomic Methodologies: Genome variation detection
Corresponding Author:
Lori AS Snyder, Ph.D. Kingston University Kingston upon Thames, UNITED KINGDOM
First Author:
Marta A. Zelewska
Order of Authors:
Marta A. Zelewska Madhuri Pulijala Russell Spencer-Smith Hiba-Tun-Noor A. Mahmood Billie Norman Colin P. Churchward Alan Calder Lori AS Snyder, Ph.D.
Abstract:
There are many types of repeated DNA sequences in the genomes of the Neisseria spp., from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the Neisseria spp. most often causes the gene to switch between ON and OFF states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straight forward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compared these results to other gonococcal genome sequences. Evidence is presented for an updated phase variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation On: Tue, 20 Sep 2016 09:20:44
Manuscript Including References (Word document)
Click here to download Manuscript Including References (Word document)
Short Paper template
1 2 3
Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
ABSTRACT There are many types of repeated DNA sequences in the genomes of the Neisseria spp., from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the Neisseria spp. most often causes the gene to switch between ON and OFF states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straight forward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compared these results to other gonococcal genome sequences. Evidence is presented for an updated phase variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae.
22 23
DATA SUMMARY
24 25 26 27 28 29 30 31 32 33
1. Sequence data for Neisseria gonorrhoeae strains investigated are available in GenBank under the following accession numbers: FA1090 (NC_002946.2; url http://www.ncbi.nlm.nih.gov/nuccore/NC_002946.2); NCCP11945 (NC_011035.1; url - http://www.ncbi.nlm.nih.gov/nuccore/NC_011035.1; & CP001050.1; url http://www.ncbi.nlm.nih.gov/nuccore/CP001050.1); MS11 (NC_022240.1; url http://www.ncbi.nlm.nih.gov/nuccore/NC_022240.1); FA19 (NZ_CP012026.1; url http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012026.1); FA6140 (NZ_CP012027.1; url - http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012027.1); 35/02 (NZ_CP012028.1; url - http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012028.1). These were accessed on the 15th of April 2016 for use in this study.
34 35
2. Sequence data were assessed via BLAST interrogation of the nr database restricted to the N. gonorrhoeae species (url - http://www.ncbi.nlm.nih.gov/).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
36 37 38
3. Genome resequencing data for N. gonorrhoeae strain NCCP11945 has been deposited under BioProject PRJNA322254 (url http://www.ncbi.nlm.nih.gov/bioproject/322254).
39
4. Genome resequencing data was assessed via Galaxy (url - usegalaxy.org).
40 41
We confirm all supporting data, code, and protocols have been provided within the article or through supplementary data files. ☒
42 43
IMPACT STATEMENT
44 45 46 47 48 49 50 51 52 53 54 55 56
Phase variation plays a vital role in the ability of Neisseria gonorrhoeae to adapt to the various niche environments encountered. Through stochastic switching in the expression of key genes and regulatory systems, mediated by simple sequence repeats, the population of bacteria are diverse and readily able to survive in the face of selective pressures. Not all simple sequence repeats within the genome mediate phase variation. Previous investigations have sought to define the phase variable repertoire of the Neisseria spp. and have identified a large number of candidates using a small number of genome sequences. With the availability of more genome sequence data and additional experimental data, we have refined the original repertoire to include those most likely to be phase variable in N. gonorrhoeae. As these genes are important for survival, their definition as phase variable is important for understanding pathogenesis and for potential future therapies. The advent of high throughput sequencing has the potential to reveal additional cases of within strain variations in repeat tracts, supporting phase variable candidacy of genes.
57 58
INTRODUCTION
59 60 61 62 63 64 65 66
In Neisseria gonorrhoeae, the causative agent of gonorrhoea, DNA repeats are intimately linked to the biology of the organism. N. gonorrhoeae, and the closely related bacterial species Neisseria meningitidis, undergo phase variable stochastic switching of gene expression for several surface structures, contributing to antigenic variation and immune evasion as well as niche adaptation in the course of infection (Bhat et al., 1991; Moxon et al., 2006; Carbonnelle et al., 2009; Srikhanta et al., 2009; Omer et al., 2011). Phase variation is mediated by simple sequence repeats associated with genes. In the Neisseria spp. the vast majority contain homopolymeric tracts within the coding sequences (Snyder et al., 2001).
67 68 69 70 71 72 73 74
Comparative sequence analysis between a single N. gonorrhoeae and several N. meningitidis genome sequences identified over 100 potentially phase variable genes (Snyder et al., 2001), some of which have later been demonstrated to be phase variable experimentally (Jordan et al., 2005). Transcriptional and translational phase variation have been extensively studied in the Neisseria spp., however an additional class of simple sequence repeat mediated phase variation has been described in Helicobacter canadensis following whole genome analysis (Snyder et al., 2010). Simple sequence repeat mediated changes in the presence or absence of C-terminal cell wall attachment motifs has also been described in
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
75 76 77
Streptococcus agalactiae (Janulczyk et al., 2010). In N. meningitidis, a gene fusion between pglB2 and the downstream phosphoglycosyltransferase gene appears to be mediated by a poly-A repeat tract (Viburiene et al., 2013).
78 79 80 81
With the availability of additional gonococcal genome sequences, the gonococcal phase variable repertoire has here been re-assessed. As a result, phase variation in which repeats at the 3' ends of genes mediate changes in the C-terminal sequence of the proteins is described as part of a refined phase variable gene repertoire.
82 83
MATERIALS AND METHODS
84 85 86 87 88 89 90 91 92 93 94 95
Identification of phase variable genes. Using the previous phase variable gene repertoires reported for N. gonorrhoeae and N. meningitidis (Snyder et al., 2001; Martin et al., 2003; Jordan et al., 2005), the homologues in N. gonorrhoeae strain NCCP11945 were sought (CP00150.1; Chung et al., 2008). In addition, pattern search in xBASE (Chaudhuri et al., 2008) was used to identify other repeats, based on previous evidence of phase variation in the Neisseria spp.: ≥ (G)8; ≥ (C)8; ≥ (CAAACAC)3; ≥ (CAAATAC)3; ≥ (CCCAA)3; ≥ (GCCA)3; ≥ (A)9; ≥ (T)9; ≥ (AAGC)3; ≥ (TTCC)3; and ≥ (CTTCT)3. No other repeats have been demonstrated to cause phase variation in this species. Genome sequences for N. gonorrhoeae strains NCCP11945 (NC_011035.1), FA1090 (NC_002946.2), FA19 (NZ_CP012026.1), FA6140 (NZ_CP012027.1), 35/02 (NZ_CP012028.1), and MS11 (NC_022240.1) were downloaded 15 th April 2016 and compared using progressive Mauve v2.3.1 (Darling et al., 2004) to identify orthologues (Table S1).
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
Identification of repeat variation within N. gonorrhoeae strain NCCP11945. N. gonorrhoeae strain NCCP11945 was grown on GC agar (Oxoid) with Kellogg’s (Kellogg et al., 1963) and 5% Fe(NO3)3 supplements at 37°C in a candle tin for a period of 8 weeks with passages to fresh agar plates every 2 days or at 37°C 5% CO2 for a period of 20 weeks with passages to fresh agar plates every 2-3 days. At each passage, cells were scraped from the plate and resuspended in 1 ml of GC broth to a turbidity equivalent to a 0.5 McFarland standard before inoculation onto fresh plates using a sterile cotton swab. DNA was extracted from such resuspensions using the Puregene Yeast/Bacterial kit (Qiagen). 1 μg or 100 ng of the DNA was genome sequenced using the Ion Personal Genome Machine, Ion Express Fragment Library kit, Ion Express Template kit, and Ion Sequencing kit (Life Technologies) or using the Illumina-based methods of the MicrobesNG service (microbesng.uk). Sequence read data was interpreted using Galaxy on usegalaxy.org (Afgan et al., 2016). Briefly, the reference sequence (NC_011035.1), Ion Torrent data for eight week passages (KU1-4, KU145), Ion Torrent data for 20 week passages (KU1-95, KU1-96), and Illumina data for 20 week passages (2928-NS1_1 & 2929-NS1_2 and 2929-NS2_1 & 2929-NS2_2) were uploaded to Galaxy. The Ion Torrent bam format files were converted to fastq format using BAMTools Convert (Barnett et al., 2011). FASTQ Groomer was used on all NGS data (Blankenberg et al., 2010). Bowtie2 was used to map the reads against the reference (Langmead et al., 2009; Langmead et al., 2012) before visualisation using the Integrated Genomics Viewer (Robinson et al., 2011; Thorvaldsdóttir et al., 2013).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
116 117
RESULTS AND DISCUSSION
118
Phase variable genes
119 120 121 122
The phase variable gene repertoire of N. gonorrhoeae strain NCCP11945 was investigated and compared against gonococcal strains FA1090, FA19, FA6140, 35/02, and MS11 to assess the presence of similar repeat tracts across the species and variations in repeat tract lengths between strains.
123 124 125 126 127 128 129 130
Transcriptional phase variation is mediated by repeats within or associated with the promoter region (Fig. 1a). Changes in the repeat alters the level of transcription of the gene, as in fetA (frpB; NGK_2557) where differences in the length of the poly-C homopolymeric tract between the -10 and -35 promoter regions alters expression (Carson et al., 2000). There are three transcriptional phase variable genes in N. gonorrhoeae strain NCCP11945 (Table 1), fetA (NGK_2557), a lipoprotein (NGK_2186), and porA (NGK_0906/NGK_0907), yet in gonococci porA does not have an intact coding region. Variation in the repeats between gonococcal strains is found for all three transcriptional phase variable genes (Table 1).
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
Most common in the Neisseria spp. is translational phase variation where, as in pilC, the repeat is within the 5' portion of the coding region of the gene (Fig. 1b). Changes in the repeat tract generate frame-shift mutations in two of the three open reading frames, with the gene only being translated into protein when the repeat tract length puts the gene inframe. Whilst many phase variable genes in the Neisseria spp. contain homopolymeric tracts, some experience copy number changes in repetitive sequences, such as the CTTCT repeat in opa (Muralidharan et al., 1987; Bhat et al., 1991) or the AAGC repeat in autA (Peak et al., 1999; Arenas et al., 2015). In the N. gonorrhoeae strains examined here, the AAGC repeat in virG (NGK_0804) is only present in two or three copies (Table 2), rather than several copies as in NGK_0831a and autA (NGK_2082). Although virG has low copy number for the repeat, variations between strains are observed and strains with many copies may yet be identified (there are currently none >(AAGC)3 in the NCBI nr/nt or wgs databases), therefore it is placed amongst the phase variable genes even though this may be at low frequency or be a strain specific effect. There are 36 translational phase variable genes in N. gonorrhoeae based on the species examined (Table 2).
146 147 148 149 150 151 152 153 154 155
In addition, a third class of repeated mediated phase variable gene was identified (Snyder et al., 2010). In these C-terminal phase variable genes, a repeat tract towards the 3' of the coding region is able to alter the sequence at the C-terminus of the encoded protein (Fig. 1c). In N. gonorrhoeae strain NCCP11945, four of these C-terminal phase variable genes were identified (Table 3). It is likely that in the case of the pilin sequence (NGK_2161), changes in the repeat are causing pilus protein changes, mediating antigenic variation through a phase variable mechanism. Comparisons also show repeat tract variation in a membrane protein (NGK_1211) and mafB cassette (NGK_1624), supporting C-terminal phase variation in the Neisseria spp. Variations in the products of mafB cassettes are believed to contribute to competition between species within the niche (Jamet et al., 2015).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
156 157 158
Although no variation was observed in these strains in ispH (NGK_0106), (G)8 repeats are known to vary in lgtC (NGK_1632), hpuA (NGK_2581), and hsdS (NGK_0571) (Table 2), therefore it is highly likely that the repeat in ispH also has the capacity to vary.
159 160 161 162 163 164 165 166 167 168
A number of previously reported candidates are not supported by evidence of phase variation, based on the absence of tract length changes between the strains (Table 4). For example, although tract variation was reported for cvaA (NGK_0168), mafA-3 (NGK_2270), and dca (NGK_1830) in N. meningitidis (Martin et al., 2003), there are no changes observed in the short (C)4 and (G)5 tracts in these genes in N. gonorrhoeae (Table 4). They are therefore unlikely to be phase variable in this species. Likewise, neither of the dinucleotide repeat containing genes (NGK_1607 and NGK_2274) show variations (Table 4); dinucleotides are not likely to be phase variable in the Neisseria spp. (Martin et al., 2003). All of these genes contain short repeats that do not vary or alternative nucleotide sequences in the strains investigated (Table 4).
169 170 171 172 173 174 175 176 177 178 179 180
This analysis identified 29 genes that are known to be phase variable (Tables 1 & 2), either in N. gonorrhoeae or N. meningitidis including 12 paralogues of opa (11 in each strain; Muralidharan et al., 1987; Bhat et al., 1991) and 17 other known phase variable genes (Stern et al., 1987; Jonsson et al., 1991; van der Ende et al., 1995; Erwin et al., 1996; Chen et al., 1998; Peak et al., 1999; Carson et al., 2000; Banerjee et al., 2002; Mackinnon et al., 2002; Shafer et al., 2002; Power et al., 2003; Srikhanta et al., 2009; Adamczyk-Poplawska et al., 2011; Arenas et al., 2015). Thirteen additional genes have variations in the repeat tracts when the six N. gonorrhoeae genome sequences are compared, one transcriptional, nine translational, and three C-terminal repeats. Based on homology and presence of conserved domains, these genes are believed to encode two replication initiation factors, an adhesion protein, a pyrimidine 5'-nucleotidase, two lipoproteins, two membrane proteins, two secreted proteins, and three hypothetical proteins (Tables 1-3).
181 182 183 184 185 186 187 188 189
Combined with the previous data on repeat variation within and between gonococcal strains and demonstration of phase variation (Sparling et al., 1986; Yang et al., 1996; Lewis et al., 1999; Snyder et al., 2001; Power et al., 2003; Jordan et al., 2005; Srikhanta et al., 2009), a revised repertoire of 43 transcriptional (Table 1), translational (Table 2), and Cterminal (Table 3) phase variable genes is proposed for N. gonorrhoeae as a species. This is fewer than previous predictions (76 in Jordan et al., 2005) and thus far two-thirds (67%, 29 out of 43) have been experimentally demonstrated to be phase variable (Tables 1 & 2). The additional 14 genes, 13 of which show strain to strain repeat variation, require additional investigation.
190
Phase variable repeat copy number variation in vitro.
191 192 193 194 195 196
Previously, for H. canadensis, 454 and Illumina genome sequence read data was used to support candidacy of phase variable genes (Snyder et al., 2010). In the present study, Ion Torrent and Illumina genome sequence read data from N. gonorrhoeae strain NCCP11945 that had been passaged in the laboratory for 8 weeks or for 20 weeks was analysed for changes to phase variable repeats for the 14 genes for which there is no within strain evidence of phase variation (Tables 1-3). Changes were observed in known phase variable
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
197 198 199 200 201 202 203 204
genes pilC1, opa, and fetA, suggesting read data can support phase variability by demonstrating within strain variation in tracts (Table 5). Of the 14 genes, only virG (NGK_0804) and the pyrimidine 5'-nucleotidase (NGK_2176) showed no changes in the repeats (Table 5). Likely, the virG (AAGC)3 copy number is too low to vary, however there may be strains with greater copy number in which it would. Likewise, the poly-C repeat in NGK_2176 has been replaced with CAAACCCC in strain NCCP11945 and therefore would not be expected to phase vary in this strain, however phase variation is likely in strain MS11, for example.
205 206 207 208 209 210 211 212
The Ion Torrent sequencing technology has been criticised for generating homopolymer associated indels (Loman et al., 2012) and that the tracts can be incorrect at >8 bases (Quail et al., 2012), the optimal length for phase variation. Homopolymeric tracts in Illumina data are believed to be less error prone (Schirmer et al., 2015). However, repeat sequence data from Illumina often agreed with Ion Torrent on the presence of variation (9 of 14 genes with variation in Ion Torrent, Table 5). When the Illumina data did not show repeat variation, this often corresponded to relatively low read coverage of the region compared to the Ion Torrent data (Table S2).
213 214 215 216 217
It is currently impossible to differentiate genuine biologically induced indels from sequencing errors (Narzisi & Schatz, 2015). We may find that what we ascribe to errors can also be subtle changes that are constantly being generated within the bacterial population. From this data, the expected biological variation supporting phase variation appears to be present in N. gonorrhoeae strain NCCP11945 for 12 as yet unexplored genes.
218 219
CONCLUSION
220 221 222 223 224 225
In conclusion, N. gonorrhoeae possesses three different mechanisms for phase variation: transcriptional; translational; and C-terminal. Stochastic systems obviously play important roles in the biology of the organism given the variety and number of genes involved. The functions of previously unexplored phase variable genes, including one transcriptional phase variable gene, nine translational phase variable genes, and four C-terminal phase variable genes require further investigation.
226 227
ACKNOWLEDGEMENTS
228 229 230
CPC was supported by a Sparks project grant 13KIN01. Illumina genome sequencing was provided by MicrobesNG (http://www.microbesng.uk), which is supported by the BBSRC (grant number BB/L024209/1).
231 232
REFERENCES
233 234 235
Adamczyk-Poplawska M., Lower M., Piekarowicz A. (2011). Deletion of one nucleotide within the homonucleotide tract present in the hsdS gene alters the DNA sequence specificity of type I restriction-modification system NgoAV. J Bacteriol 193, 6750-6759.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
236 237 238 239 240
Afgan E., Baker D., van den Beek M., Blankenberg D., Bouvier D., Čech M., Chilton J., Clements D., Coraor N., Eberhard C., Grüning B., Guerler A., Hillman-Jackson J., Von Kuster G., Rasche E., Soranzo N., Turaga N., Taylor J., Nekrutenko A., Goecks J. (2016). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res [epub ahead of print]
241 242 243
Arenas J., Cano S., Nijland R., van Dongen V., Rutten L., van der Ende A., Tommassen J. (2015). The meningococcal autotransporter AutA is implicated in autoaggregation and biofilm formation. Environ Microbiol 17, 1321-1337.
244 245 246
Banerjee A., Wang R., Supernavage S. L., Ghosh S. K., Parker J., Ganesh N. F., Wang P. G., Gulati S., Rice P. A. (2002). Implications of phase variation of a gene (pgtA) encoding a pilin galactosyl transferase in gonococcal pathogenesis. J Exp Med 196, 147-162.
247 248
Barnett D. W., Garrison E. K., Quinlan A. R., Stromberg M. P., Marth G. T. (2011). BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692.
249 250 251
Bhat K. S., Gibbs C. P., Barrera O., Morrison S. G., Jahnig F., Stern A., Kupsch E. M., Meyer T. F., Swanson J. (1991). The opacity proteins of Neisseria gonorrhoeae strain MS11 are encoded by a family of 11 complete genes. Mol Microbiol 5, 1889-1901.
252 253
Blankenberg D., Gordon A., Von Kuster G., Coraor N., Taylor J., Nekrutenko A., Galaxy Team. (2010). Manipulation of FASTQ data with Galaxy. Bioinformatics 26, 1783-1785.
254 255 256 257
Bolotin D. A., Mamedov I. Z., Britanova O. V., Zvyagin I. V., Shagin D., Ustyugova S. V., Turchaninova M. A., Lukyanov S., Lebedev Y. B., Chudakov D. M. (2012). Next generation sequencing for TCR repertoire profiling: Platform-specific features and correction algorithms. Eur J Immunol 42, 3073-3083.
258 259
Bragg L. M., Stone G., Butler M. K., Hugenholtz P., Tyson G. W. (2013). Shining a light on dark sequencing: Characterising errors in Ion Torrent PGM data. PLoS Comput Biol 9, e1003031.
260 261
Carbonnelle E., Hill D. J., Morand P., Griffiths N. J., Bourdoulous S., Murillo I., Nassif X., Virji M. (2009). Meningococcal interactions with the host. Vaccine 27 Suppl 2, B78-89.
262 263
Carson S. D., Stone B., Beucher M., Fu J., Sparling P. F. (2000). Phase variation of the gonococcal siderophore receptor FetA. Mol Microbiol 36, 585-593.
264 265
Chaudhuri R. R., Loman N. J., Snyder L. A., Bailey C. M., Stekel D. J., Pallen M. J. (2008). xBASE2: a comprehensive resource for comparative bacterial genomics. Nucleic Acids Res 36, D543-546.
266 267
Chen C. J., Elkins C., Sparling P. F. (1998). Phase variation of hemoglobin utilization in Neisseria gonorrhoeae. Infect Immun 66, 987-993.
268 269
Chung G. T., Yoo J. S., Oh H. B., Lee Y. S., Cha S. H., Kim S. J., Yoo C. K. (2008). Complete genome sequence of Neisseria gonorrhoeae NCCP11945. J Bacteriol 190, 6035-6036.
270 271
Darling A. C. E., Mau B., Blattner F. R., Perna N. T. (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome research 14, 1394-1403.
272 273 274
Erwin A. L., Haynes P. A., Rice P. A., Gotschlich E. C. (1996). Conservation of the lipooligosaccharide synthesis locus lgt among strains of Neisseria gonorrhoeae: requirement for lgtE in synthesis of the 2C7 epitope and of the beta chain of strain 15253. J Exp Med 184, 1233-1241.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
275 276 277
Jamet A., Jousset A. B., Euphrasie D., Mukorako P., Boucharlat A., Ducousso A., Charbit A., Nassif X. (2015). A new family of secreted toxins in pathogenic Neisseria species. PLoS Pathog 11, e1004592.
278 279
Janulczyk R., Madignani V., Maione D., Tettelin H., Grandi G., Telford J. L. (2010). Simple sequence repeats and genome plasticity in Streptococcus agalactiae. J Bacteriol 192, 3990-4000.
280 281
Jonsson A. B., Nyberg G., Normark S. (1991). Phase variation of gonococcal pili by frameshift mutation in pilC, a novel gene for pilus assembly. EMBO J 10, 477-488.
282 283
Jordan P. W., Snyder L. A., Saunders N. J. (2005). Strain-specific differences in Neisseria gonorrhoeae associated with the phase variable gene repertoire. BMC Microbiol 5, 21.
284 285
Kellogg D. S. Jr., Peacock W. L. Jr., Deacon W.E., Brown L., Pirkle D.I. (1963). Neisseria gonorrhoeae. I. Virulence genetically linked to clonal variation. J Bacteriol 85, 1274-1279.
286 287
Langmead B., Trapnell C., Pop M., Salzberg S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25.
288 289
Langmead B., Salzberg S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359.
290 291 292
Lewis L. A., Gipson M., Hartman K., Ownbey T., Vaughn J., Dyer D. W. (1999). Phase variation of HpuAB and HmbR, two distinct haemoglobin receptors of Neisseria meningitidis DNM2. Mol Microbiol 32, 977-989.
293 294 295
Loman N. J., Misra R. V., Dallman T. J., Constantinidou C., Gharbia S. E., Wain J., Pallen M. J, (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30, 434-439.
296 297 298 299 300
Mackinnon F. G., Cox A. D., Plested J. S., Tang C. M., Makepeace K., Coull P. A., Wright J. C., Chalmers R., Hood D. W., Richards J. C., Moxon E. R. (2002). Identification of a gene (lpt-3) required for the addition of phosphoethanolamine to the lipopolysaccharide inner core of Neisseria meningitidis and its role in mediating susceptibility to bactericidal killing and opsonophagocytosis. Mol Microbiol 43, 931-943.
301 302 303
Martin P., van de Ven T., Mouchel N., Jeffries A. C., Hood D. W., Moxon E. R. (2003). Experimentally revised repertoire of putative contingency loci in Neisseria meningitidis strain MC58: evidence for a novel mechanism of phase variation. Mol Microbiol 50, 245-257.
304 305
Moxon R., Bayliss C., Hood D, (2006). Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu Rev Genet 40, 307-333.
306 307
Muralidharan K., Stern A., Meyer T.F. (1987). The control mechanism of opacity protein expression in the pathogenic Neisseriae. Antonie Van Leeuwenhoek 53, 435-440.
308 309
Narzisi G., Schatz M. C. (2015). The challenge of small-scale repeats for indel discovery. Front Bioeng Biotechnol 3, 8.
310 311 312
Omer H., Rose G., Jolley K. A., Frapy E., Zahar J. R., Maiden M. C., Bentley S. D., Tinsley C. R., Nassif X., Bille E. (2011). Genotypic and phenotypic modifications of Neisseria meningitidis after an accidental human passage. PLoS One 6, e17145.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
313 314
Peak I. R., Jennings M. P., Hood D. W., Moxon E. R. (1999). Tetranucleotide repeats identify novel virulence determinant homologues in Neisseria meningitidis. Microb Pathog 26, 13-23.
315 316 317
Power P. M., Roddam L. F., Rutter K., Fitzpatrick S. Z., Srikhanta Y. N., Jennings M. P. (2003). Genetic characterization of pilin glycosylation and phase variation in Neisseria meningitidis. Mol Microbiol 49, 833-847.
318 319 320
Quail M. A., Smith M., Coupland P., Otto T. D., Harris S. R., Connor T. R., Bertoni A., Swerdlow H. P., Gu Y. (2012). A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341.
321 322
Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E. S., Getz G., Mesirov J.P. (2011). Integrative Genomics Viewer. Nature Biotechnology 29, 24–26.
323 324 325
Schirmer M., Ijaz U. Z., D'Amore R., Hall N., Sloan W. T., Quince C. (2015). Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 43, e37.
326 327 328 329
Shafer W. M., Datta A., Kolli V. S., Rahman M. M., Balthazar J. T., Martin L. E., Veal W. L., Stephens D. S., Carlson R. (2002). Phase variable changes in genes lgtA and lgtC within the lgtABCDE operon of Neisseria gonorrhoeae can modulate gonococcal susceptibility to normal human serum. J Endotoxin Res 8, 47-58.
330 331 332
Snyder L. A. S., Butcher S. A., Saunders N. J. (2001). Comparative whole-genome analyses reveal over 100 putative phase-variable genes in the pathogenic Neisseria spp. Microbiology 147, 23212332.
333 334 335
Snyder L. A. S., Loman N. J., Linton J. D., Langdon, R. R., Weinstock G. M., Wren B. W., Pallen, M. J. (2010). Simple sequence repeats in Helicobacter canadensis and their role in phase variable expression and C-terminal sequence switching. BMC Genomics 11, 67.
336 337
Sparling P. F., Cannon J. G., So M, (1986). Phase and antigenic variation of pili and outer membrane protein II of Neisseria gonorrhoeae. J Infect Dis 153, 196-201.
338 339 340 341
Srikhanta Y. N., Dowideit S. J., Edwards J. L., Falsetta M. L., Wu H. J., Harrison O. B., Fox K. L., Seib K. L., Maguire T. L., Wang A. H., Maiden M. C., Grimmond S. M., Apicella M. A., Jennings M. P. (2009). Phasevarions mediate random switching of gene expression in pathogenic Neisseria. PLoS Pathog 5, e1000400.
342 343
Stern A., Meyer T. F. (1987). Common mechanism controlling phase and antigenic variation in pathogenic neisseriae. Mol Microbiol 1, 5-12.
344 345
Thorvaldsdóttir H., Robinson J. T., Mesirov J. P. (2013). Integrative Genomics Viewer (IGV): highperformance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178-192.
346 347 348
van der Ende A., Hopman C. T., Zaat S., Essink B. B., Berkhout B., Dankert J. (1995). Variable expression of class 1 outer membrane protein in Neisseria meningitidis is caused by variation in the spacing between the -10 and -35 regions of the promoter. J Bacteriol 177, 2475-2480.
349 350 351
Viburiene R., Vik Å., Koomey M., Børud B. (2013). Allelic variation in a simple sequence repeat element of neisserial pglB2 and its consequences for protein expression and protein glycosylation. J Bacteriol 195, 3476-3485.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
352 353
Yang Q. L., Gotschlich E. C. (1996). Variation of gonococcal lipooligosaccharide structure is due to alterations in poly-G tracts in lgt genes encoding glycosyl transferases. J Exp Med 183, 323-327.
354 355
DATA BIBLIOGRAPHY
356 357
1. Marta A. Zelewska, Madhuri Pulijala & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547950 (2016).
358 359
2. Hiba-Tun-Noor A. Mahmood & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547951 (2016).
360 361
3. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547952 (2016).
362 363
4. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547953 (2016).
364 365
5. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547954 (2016).
366 367
6. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547955 (2016).
368 369
7. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547956 (2016).
370 371
8. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547957 (2016).
372 373
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
374
FIGURES AND TABLES
375
Figure 1
376 377 378 379 380 381 382 383
Fig. 1. Illustrations of the types of phase variation in N. gonorrhoeae. (a): Transcriptional phase variation, in which changes in a repeat tract alters the facing and spacing of the -10 and -35 promoter elements and alters the level of transcription of the gene. Phase variation of fetA is used as an example, where it has been shown that differences in spacing of the -10 and -35 elements due to changes in the poly-C repeat tract alters expression levels, represented by arrow width (Carson et al., 2000). (b): Translational phase variation, in which changes in a repeat tract towards the 5' end of the CDS alters the reading frame of a coding
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
384 385 386 387 388 389 390 391 392
region and switches expression ON and OFF due to frame-shift. Phase variation of pilC is used as an example, where it has been shown that changes in the poly-G repeat tract generate frame-shifts which switch protein expression ON and OFF (Jonsson et al., 1991). (c): C-terminal phase variation, in which changes in a repeat tract towards the 3' end of the CDS alters the reading frame of a coding region and switches the encoded C-terminal amino acids between the three reading frames. In the example NGK_1211, two of the reading frames offer different C-terminal ends to the protein, while the third generates a fusion with the downstream CDS, NGK_1212. Only some examples of C-terminal phase variation result in this type of fusion (Table 3).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
393
Table 1. Transcriptional phase variable genes in N. gonorrhoeae. Repeat in FA1090*
N. gonorrhoeae NCCP11945 Repeat in Repeat in Repeat in Repeat in Repeat in Reference locus† NCCP11945† FA19‡ FA6140§ 35/02ǁ MS11¶ candidacy# NGK_0906 & (T )9C(G)6T (T )8C(G)8T (T )9C(G)6T (T )9C(G)6T (T )9C(G)6T T Known van der Ende et al ., 1995 NGK_0907**
Gene
FA1090 locus*
porA
NGO_04715 (T )11C(G)6T
lipoprotein
NGO2047
(A)9
NGK_2186†† (A)8
(A)8
(A)9
(A)9
(A)8
Yes
394
fetA / frpB
NGO2093
(C)13
NGK_2557
(C)10
(C)11
(C)11
(C)11
Known
395
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).
396
† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).
397
‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).
398
§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).
399
ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).
400
¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).
401 402 403
# Gene phase variation candidacy in N. gonorrhoeae. Known – phase variation has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation.
404
** This CDS appears to be frame-shifted and annotated as two CDSs.
405
†† NGK_2186 and NGO2047 annotations are on opposite strands.
(C)14
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Carson et al ., 2000
406
Table 2. Translational phase variable genes in N. gonorrhoeae. Ge ne
FA1090 locus*
Re pe at in FA1090*
NCCP11945 locus†
Re pe at in Re pe at in NCCP11945† FA19‡
Re pe at in FA6140§
Re pe at in 35/02ǁ
Re pe at in MS11¶
N. gonorrhoeae candidacy#
Re fe re nce
pilC 2
NGO0055
(G)13
NGK_0074
(G)9
(G)10
(G)13
(G)9CAGG
(G)12
Known
Jonsson et al ., 1991
opa
NGO0066a
opa
NGO0070
(CT T CT )13C NGK_0096 T T CG (CT T CT )9CT NGK_0102 T CG
(CT T CT )8CT T CC (CT T CT )7CT T CC
(CT T CT )4CT T CC (CT T CT )8CT T CC
CT T (CT T CT )1 0CT T CC (CT T CT )17CT T CC
(CT T CT )6(CT T CC)2 (CT T CT )7CT T CC
(CT T CT )8CT T Known CC (CT T CT )7CT T Known CC
pglH
NGO0086
(C)10
NP
NP
no repeat
no repeat
no repeat
NP
Known
Power et al ., 2003
pglG
NGO0087
A(C)7
NP
NP
A(C)9
A(C)6
A(C)6
NP
Known
Power et al ., 2003
Power et al ., 2003
Adamczyk-Poplawska et al., 2011
pglE
NGO0207
(CAAACAC)4 NGK_0339
(CAAACAC)10 (CAAACAC)6 CAAAT ACCA (CAAACAC)8 CAAAT ACCA (CAAACAC)24 (CAAACAC)15 AACACCAAA Known (CAAAT AC)3 AACACCAAA CAAAT AC (CAAAT AC)3 T ACCAAACA T AC C(CAAAT AC)2
hsdS
NGO_02155
(G)7
(G)7
(G)8
hypothetical
NGO0527
(C)6A(C)9GC NGK_1405
(C)6A(C)8GC
(C)7A(C)4T (C (C)7A(C)4T (C) (C)11A(C)8GC (C)6A(C)14GC Yes )3GC 3GC GC C
modB
NGO0545
(CCCAA)12
(CCCAA)11
(CCCAA)12
NGK_0571
NGK_1384
(G)7
(CCCAA)11
(C)6T T AT CT AACA(G)5
(C)11T T AT CT (C)10T T AT CT Yes AACA(G)8 AACA(G)7
(GCCA)24
(GCCA)19
(GCCA)24
(C)8T T AT CT NGK_1486 AACA(G)7 (CT T CT )16C NGK_0847 T T CC
(C)9T T AT CT AACA(G)6 (CT T CT )16CT T CC
(C)11T T AT CT AACA(G)8 (CT T CT )8CT T CC
(C)9T T AT CT A Yes ACA(G)7 (CT T CT )4CT T Known CC
modA
(GCCA)37
NGK_1272
(CCCAA)7
Known
(CCCAA)4
(C)8T T AT CT NGK_1957 AACA(G)7
NGO0641
(G)7
(C)11T T AT C (C)7T T AT CT T AACA(G)8 AACA(G)7 (GCCA)24GT (GCCA)18 CA (C)9T T AT CT (C)7T T AT CT AACA(G)7 AACA(G)7 (CT T CT )19C (CT T CT )13C T T CC T T CC
replication NGO_06135 initiation factor
replication NGO_06695 initiation factor
(G)9
Known
Known
Stern & Meyer, 1987 Stern & Meyer, 1987
Srikhanta et al ., 2009
Srikhanta et al ., 2009
opa
NGO0950a
hypothetical
NGO0964
(AAGC)4
NGK_0831a
(AAGC)8
(AAGC)15
(AAGC)7
(AAGC)9
(AAGC)7
Yes
virG
NGO0985
(AAGC)3
NGK_0804
(AAGC)3
(AAGC)2
(AAGC)3
(AAGC)3
NGO1040a
opa
NGO1073a
opa
NGO1277a
(CT T CT )12CT T CC (CT T CT )18CT T CC (CT T CT )7CT T CC
NGO1445
opa
NGO1463a
opa
NGO1513
opa
NGO1553a
(CT T CT )10C T T CC (CT T CT )11C T T CC CT T (CT T CT ) 11CT T CC (CAAG)12CA AA (CT T CT )11C T T CC CT T (CT T CT ) 10CT T CC (CT T CT )17C T T CC
(CT T CT )7CT T CC (CT T CT )12CT T CC (CT T CT )11CT T CC
adhesion
(CT T CT )20C T T CC (CT T CT )10C T T CC (CT T CT )7CT T CC (CAAG)12CA AA (CT T CT )7CT T CC (CT T CT )14C T T CC (CT T CT )9CT T CC
(AAGC)3 (CT T CT )13CT T CCCT T CT (C T T CC)2 (CT T CT )7CT T CC (CT T CT )8CT T CC
Yes
opa
(CT T CT )12CT (CT T CT )12CT T CC T CC (CT T CT )6CT T NP CG (CT T CT )8CT T (CT T CT )8CT T CC CC
(CT T CT )10CT Known T CC (CT T CT )7CT T Known CG (CT T CT )14CT Known T CC
autA
NGO1689
(AAGC)3
NGK_2082
(AAGC)3
(AAGC)14
(AAGC)3
(AAGC)3
(AAGC)3
Known
Peak et al ., 1999; Arenas et al ., 2015
pgtA
NGO1765
(G)11
NGK_2516
GGGAGCGGG (G)19
(G)19
GGGAGCGGG
GGGAGCGGG
Known
Banerjee et al ., 2002
(CT T CT )20C T T CC (CT T CT )2CT T CC (CT T CT )11C T T CC (CAAG)20CA AA (CT T CT )10C T T CC (CT T CT )12C T T CG (CT T CT )4CT T CC
NGK_0749 NGK_0693 NGK_1495 NGK_1705 NGK_1729 NGK_1799 NGK_1847
Stern & Meyer, 1987
Known
Stern & Meyer, 1987
Known
Stern & Meyer, 1987
Known
Stern & Meyer, 1987
(CAAG)6CAAA (CAAG)9CAAA (CAAG)6CAAA Yes
repetitive large NGO_09875 & NGK_2422 & surface (G)8 (G)7 (G)7 (G)7 (G)7 (G)7 Yes NGO_09870** NGK_2423** lipoprotein (CT T CT )13C (CT T CT )11C (CT T CT )13C (CT T CT )2CT T (CT T CT )13CT (CT T CT )30CT opa NGO1861a NGK_2410 Known T T CC T T CC T T CC CC T CC T CC
Stern & Meyer, 1987 Stern & Meyer, 1987 Stern & Meyer, 1987
Stern & Meyer, 1987
pilC 1
NGO1912
(G)11
NGK_2342
(G)13
(G)11
GGGC(G)11
(G)15
(G)11
Known
hypothetical
NGO1953
(C)8
NGK_2297
(C)8
(C)8
(C)8
(C)9
(C)8
Yes
pyrimidine 5'nucleotidase
NGO2055 & NGO2054**
(C)6
NGK_2176
CAAACCCC
CAAACCCC
CAAACCCC
(C)9
(C)10
Yes
NGO2060a
(CT T CT )10C NP T T CG
NP
NP
(CT T CT )6CT T (CT T CT )7CT T Known CC CC
Stern & Meyer, 1987
(CT T CT )14C NP T T CC
opa
NGK_2170
Jonsson et al., 1991
opa
NP
NP
NP
NP
Known
Stern & Meyer, 1987
lgtG
NGO2072
(C)11
NGK_2534 & (C)12 NGK_2533**
(C)10
(C)10
(C)10
(C)10
Known
Mackinnon et al ., 2002
hpuA
NGO2110
(G)9
NGK_2581
(G)10
(G)9
(G)9
(G)8
(G)8
Known
Chen et al ., 1998
lgtA
NGO11610
(G)11
NGK_2630
(G)11
(G)14A
(G)17A
(G)20A
(G)10A
Known
Erwin et al ., 1996
lgtC
NGO2156
(G)14
NGK_2632
(G)13
(G)13
(G)10
(G)16
(G)8
Known
Shafer et al ., 2002
407
lgtD
NGO2158
A(G)14
NGK_2634
A(G)16
(G)13
A(G)12
A(G)18
A(G)13
Known
Shafer et al ., 2002
408
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).
409
† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).
410
‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).
411
§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).
412
ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).
413
¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
414 415 416
# Gene phase variation candidacy in N. gonorrhoeae. Known – phase variation has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation.
417
** This CDS appears to be frame-shifted and annotated as two CDSs.
418
NP The CDS is not present in this strain.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
419
Table 3. C-terminal phase variable genes in N. gonorrhoeae. Gene
FA1090 locus*
Repeat in FA1090*
NCCP1194 Repeat in 5 locus† NCCP11945†
Amino acids after the repeat in each frame of NCCP11945‡
FA19§
FA6140ǁ
35/02¶
MS11#
N. gonorrhoeae candidacy**
ispH
NGO0072
(G)8
NGK_0106
(G)8
26
50
membrane protein
NGO0691
(G)6
NGK_1211
(G)7
23
8
61
(G)8
(G)8
(G)8
(G)8
(yes)
186^
(G)7
(G)6
(G)7
(G)7
mafB cassette
NGO1386
(C)7
NGK_1624
(C)9
46
Yes
2
37
deletion††
(C)10
(C)8
(C)10
420
pilin cassette
NGO_11140 CCGC
NGK_2161
(C)8
20
Yes
8
91¶
(C)5GCC
CCGCC
(C)5
CCT (C)5
Yes
421
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).
422
† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).
423 424
‡ In each column are the number of amino acids encoded 3' of the repeat before the closest termination codon in each of the three reading frames.
425
§ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).
426
ǁ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).
427
¶ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).
428
# From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).
429 430 431 432
** Gene phase variation candidacy in N. gonorrhoeae. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of this length vary on other genes (Chen et al., 1998; Shafer et al., 2002; Adamczyk-Poplawska et al., 2011).
433 434
†† There is a 400 bp deletion in this strain encompassing the region that would contain this repeat.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
435
Table 4. Genes for which there is no evidence of phase variation in N. gonorrhoeae. Gene
FA1090 locus*
Repeat in FA1090*
NCCP11945 locus†
Repeat in NCCP11945†
Repeat in FA19‡
Repeat in FA6140§
Repeat in 35/02ǁ
Repeat in MS11¶
N. gonorrhoeae candidacy#
Prolyl endopeptidase
NGO0026
GGGGCGG
NGK_0034
GGGGCGG
GGGGCGG
GGGGCGG
GGGGCGG
GGGGCGG
No. Replacement tract
pilI / wbpC
NGO0065
C(G)6
NGK_0089**
C(G)6
C(G)6
C(G)6
C(G)6
C(G)6
No. No variation.
phosphoesterase
NGO0081
(C)7
NGK_0119
(C)7
(C)7
(C)7
(C)7
(C)7
No. No variation.
hypothetical
NGO0121
(A)6
NGK_0167
(A)6
(A)6
(A)6
(A)6
(A)6
No. No variation.
cvaA
NGO0123
(C)4
NGK_0168
(C)4
(C)4
(C)4
(C)4
(C)4
No. No variation.
potD -2
NGO0206
AA(C)5
NGK_0338
AA(C)5
AA(C)5
AA(C)5
AA(C)5
AA(C)5
No. No variation.
hypothetical
NGO0532
AACCGGCAAACA NGK_1400
AACCGGCAAACA AACCGGCAAACA
AACCGGCAAACA AACCGGCAAACA
AACCGGCAAACA No. Replacement tract
nifS
NGO0636
CCACACCC
NGK_1278
CCACACCC
CCACACCC
CCACACCC
CCACACCC
CCACACCC
No. Replacement tract
lldD
NGO0639
(G)7
NGK_1275
(G)7
(G)7
(G)7
(G)7
(G)7
No. No variation.
methylase NlaIV
NGO0676
(A)9
NGK_1230
(A)9
(A)9
(A)9
(A)9
(A)9
No. No variation.
dnaX
NGO0743
(C)7
NGK_1135
(C)7
(C)7
(C)7
(C)7
(C)7
No. No variation.
mobA
NGO0754
GGAAGG
NGK_1123
GGAAGG
GGAAGG
GGAAGG
GGAAGG
GGAAGG
No. Replacement tract
ppx
NGO1041
(C)7
NGK_0745
(C)7
(C)7
(C)7
(C)7
(C)7
No. No variation.
fixP / ccoP
NGO1371
(AT )5
NGK_1607
(AT )5
(AT )5
(AT )5
(AT )5
(AT )5
No. No variation.
hypothetical
NGO1384
G(A)7
NGK_1622
(A)8
(A)8
(A)8
(A)8
G(A)7
No. No variation in length.
pntA
NGO1470
CCCT GCT GG
NGK_1735
CCCT GCT GG
CCCT GCT GG
CCCT GCT GG
CCCT GCT GG
CCCT GCT GG
No. Replacement tract
amiC
NGO1501
T T CGCCC
NGK_1783
T T CGCCC
T T CGCCC
T T CGCCC
T T CGCCC
T T CGCCC
No. Replacement tract
dca
NGO1540
T GT GGGGG
NGK_1830
T GT GGGGG
T GT GGGGG
T GT GGGGG
T GT GGGGG
T GT GGGGG
No. Replacement tract
anmK
NGO1583
(C)7
NGK_1884
(C)7
(C)7
(C)7
(C)7
(C)7
No. No variation.
dinG
NGO1708
(C)4T CC
NGK_2106
(C)4T CC
(C)4T CC
(C)4T CC
(C)4T CC
(C)4T CC
No. Replacement tract
rplK
NGO1855
(C)7
NGK_2416
(C)7
(C)7
(C)7
(C)7
(C)7
No. No variation.
hypothetical
NGO1970
(T A)5
NGK_2274
(T A)5
(T A)5
(T A)5
(T A)5
(T A)5
No. No variation.
mafA -3
NGO1972
(G)5
NGK_2270
(G)5
(G)5
(G)5
(G)5
(G)5
No. No variation.
map
NGO1983
(C)6
NGK_2258
(C)6
(C)6
(C)6
(C)6
(C)6
No. No variation.
plsX
NGO2171
(T T CC)3
NGK_2652
(T T CC)3
(T T CC)3
(T T CC)3
(T T CC)3
(T T CC)3
No. No variation.
436
lbpA
NGO0260a
NR
NGK_0401
GGGGGCGG
GGGGGCGG
NR
NR
T GAAACGG
No. Replacement tract
437
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).
438
† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).
439
‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).
440
§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).
441
ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).
442
¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).
443 444 445 446
# Gene phase variation candidacy in N. gonorrhoeae. No. Replacement tract – due to the replacement of the repeat tract with other nucleotides, this is not phase variable. No. No variation – due to no observed variation in the repeat tract, this is not phase variable. No. No variation in length – due to the equal length tract in all strains, this is not phase variable.
447
** This CDS contains a point mutation, which generates a premature termination codon.
448 449
NR The region of the CDS containing the repeat tract does not have homology to the aligned region in these strains.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
450 451
Table 5. Genes for which there is sequencing-based evidence of phase variation in N. gonorrhoeae strain NCCP11945. Gene
FA1090 locus*
Repeat in FA1090*
NCCP11945 Repeat in Repeat in locus† NCCP11945† FA19‡
Repeat in FA6140§
Repeat in 35/02ǁ
Repeat in MS11¶
N. gonorrhoeae candidacy#
Ion Torrent** Illumina††
ispH
NGO0072
(G)8
NGK_0106
(G)8
(G)8
(G)8
(G)8
(G)8
(yes)
Repeat varies
virG
NGO0985
(AAGC)3
NGK_0804
(AAGC)3
(AAGC)2
(AAGC)3
(AAGC)3
(AAGC)3
Yes
Repeat does not vary
Repeat does not vary Repeat does not vary
hypothetical
NGO0964
(AAGC)4
NGK_0831a
(AAGC)8
(AAGC)15
(AAGC)7
(AAGC)9
(AAGC)7
Yes
Repeat varies
Repeat varies
membrane protein
NGO0691
(G)6
NGK_1211
(G)7
(G)7
(G)6
(G)7
(G)7
Yes
Repeat varies
hypothetical
NGO0527
(C)6A(C)9GC NGK_1405
(C)6A(C)8GC
(C)7A(C)4T ( (C)7A(C)4T ( (C)11A(C)8G (C)6A(C)14GC Yes C)3GC C)3GC CGC C
Repeat varies
replication initiation factor
NGO_06695
(C)8T T AT CT NGK_1486 AACA(G)7
(C)9T T AT C T AACA(G)7
(C)7T T AT C T AACA(G)7
(C)9T T AT C T AACA(G)6
(C)11T T AT C (C)9T T AT CT Yes T AACA(G)8 AACA(G)7
Repeat varies
Repeat varies
mafB cassette NGO1386
(C)7
(C)9
deletion‡‡
(C)10
(C)8
Yes
Repeat varies
Repeat varies
adhesion
NGO1445
(CAAG)20CA NGK_1705 AA
(CAAG)12CA (CAAG)12CA (CAAG)6CAA (CAAG)9CAA (CAAG)6CAA Yes AA AA A A A
Repeat varies
Repeat varies
replication initiation factor
NGO_06135
(C)8T T AT CT NGK_1957 AACA(G)7
(C)11T T AT C (C)7T T AT C T AACA(G)8 T AACA(G)7
(C)6T T AT C T AACA(G)5
(C)11T T AT C (C)10T T AT C Yes T AACA(G)8 T AACA(G)7
Repeat varies
Repeat varies
pilin cassette
NGO_11140
NGK_1624
(C)10
Repeat does not vary Repeat does not vary
CCGC
NGK_2161
(C)8
(C)5GCC
CCGCC
(C)5
CCT (C)5
Yes
Repeat varies
Repeat varies
pyrimidine 5'- NGO2055 & nucleotidase NGO2054ǁǁ
(C)6
NGK_2176
CAAACCCC
CAAACCCC
CAAACCCC
(C)9
(C)10
Yes
No repeat
No repeat
lipoprotein
NGO2047
(A)9
NGK_2186§§ (A)8
(A)8
(A)9
(A)9
(A)8
Yes
Repeat varies
Repeat varies
hypothetical
NGO1953
(C)8
NGK_2297
(C)8
(C)8
(C)8
(C)9
(C)8
Yes
Repeat varies
Repeat does not vary
repetitive large surface lipoprotein
NGO_09875 & (G)8 NGO_09870ǁǁ
NGK_2422 & (G)7 NGK_2423ǁǁ
(G)7
(G)7
(G)7
(G)7
Yes
Repeat varies
Repeat does not vary
fetA / frpB
NGO2093
(C)13
NGK_2557
(C)14
(C)10
(C)11
(C)11
(C)11
Known
Repeat varies
Repeat varies
pilC 1
NGO1912
(G)11
NGK_2342
(G)13
(G)11
GGGC(G)11
(G)15
(G)11
Known
Repeat varies
Repeat varies
NGO0950a
(CT T CT )16C NGK_0847 T T CC
(CT T CT )19C (CT T CT )13C (CT T CT )16C (CT T CT )8CT (CT T CT )4CT Known T T CC T T CC T T CC T CC T CC
No reads Repeat varies through repeat
452
opa
453
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).
454
† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).
455
‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).
456
§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).
457
ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).
458
¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).
459 460 461 462 463
# Gene phase variation candidacy in N. gonorrhoeae. Known – phase variation has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of this length vary on other genes (Chen et al., 1998; Shafer et al., 2002; Adamczyk-Poplawska et al., 2011).
464 465 466
** Based on Ion Torrent sequencing data from cultures grown with passage for 8 weeks and from cultures grown with passage for 20 weeks (accession numbers SRR3547950, SRR3547951, SRR3547952, SRR3547953).
467 468
†† Based on Illumina sequencing data from cultures grown with passage for 20 weeks (accession numbers SRR3547954, SRR3547955, SRR3547956, SRR3547957).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
469 470
‡‡ There is a 400 bp deletion in this strain encompassing the region that would contain this repeat.
471
§§ NGK_2186 and NGO2047 annotations are on opposite strands.
472
ǁǁ This CDS appears to be frame-shifted and annotated as two CDSs.
473
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Figure
Click here to download Figure Figure 1 v2.tif
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Supplementary Material Files
Click here to download Supplementary Material Files Supplementary Tables.xlsx
Supplementary Table 1. Complete list of candidate phase variable genes including locus identifiers for all six N. gonorrhoeae co Gene
FA1090 locus*
Repeat in FA1090*
NCCP11945 locus†
Repeat in NCCP11945†
FA19 locus‡
Repeat in FA19‡
FA6140 locus§
porA
NGO_04715
(T)11C(G)6T
NGK_0906 & NGK_0907**
(T)9C(G)6T
VT05_RS03585
(T)8C(G)8T
WX60_RS01320
lipoprotein
NGO2047
(A)9
NGK_2186††
(A)8
VT05_RS09915
(A)8
WX60_RS08685
fetA / frpB
NGO2093
(C)13
NGK_2557
(C)14
VT05_RS10220
(C)10
WX60_RS08950
pilC 2
NGO0055
(G)13
NGK_0074
(G)9
VT05_RS10940
(G)10
WX60_RS09660
opa
NGO0066a
(CTTCT)13CTT NGK_0096 CG
(CTTCT)8CTTC VT05_RS11020 C
(CTTCT)4CT WX60_RS09735 TCC
opa
NGO0070
(CTTCT)9CTTC NGK_0102 G
(CTTCT)7CTTC VT05_RS11040 C
(CTTCT)8CT WX60_RS09755 TCC
pglH
NGO0086
(C)10
NP
NP
VT05_RS11130
no repeat
WX60_RS09845
pglG
NGO0087
A(C)7
NP
NP
VT05_RS11135
A(C)9
WX60_RS09850
pglE
NGO0207
(CAAACAC)4
NGK_0339
(CAAACAC)8(C VT05_RS11775 AAATAC)3
(CAAACAC)6 CAAATACC WX60_RS10475 AAACACCA AATAC
hsdS
NGO_02155
(G)7
NGK_0571
(G)7
VT05_RS00760
(G)8
hypothetical
NGO0527
(C)6A(C)9GC
NGK_1405
(C)6A(C)8GC
VT05_RS01375
(C)7A(C)4T(C WX60_RS03235 )3GC
modB
NGO0545
(CCCAA)12
NGK_1384
(CCCAA)11
VT05_RS01460
(CCCAA)12
replication initiation factor
NGO_06135
(C)8TTATCTAA NGK_1957 CA(G)7
(C)11TTATCTA VT05_RS04875 ACA(G)8
(C)7TTATCT WX60_RS03635 AACA(G)7
modA
NGO0641
(GCCA)37
(GCCA)18
VT05_RS01960
(GCCA)24GT WX60_RS02640 CA
replication initiation factor
NGO_06695
(C)8TTATCTAA NGK_1486 CA(G)7
(C)9TTATCTAA VT05_RS05440 CA(G)7
(C)7TTATCT WX60_RS04195 AACA(G)7
opa
NGO0950a
(CTTCT)16CTT NGK_0847 CC
(CTTCT)19CTT VT05_RS03845 CC
(CTTCT)13CT WX60_RS01060 TCC
hypothetical
NGO0964
(AAGC)4
NGK_0831a
(AAGC)8
VT05_RS03910
(AAGC)15
WX60_RS00995
virG
NGO0985
(AAGC)3
NGK_0804
(AAGC)3
VT05_RS04020
(AAGC)2
WX60_RS00880
opa
NGO1040a
(CTTCT)20CTT NGK_0749 CC
(CTTCT)20CTT VT05_RS04320 CC
(CTTCT)10CT WX60_RS00610 TCC
opa
NGO1073a
(CTTCT)2CTTC NGK_0693 C
(CTTCT)10CTT VT05_RS04555 CC
(CTTCT)11CT WX60_RS00375 TCC
opa
NGO1277a
(CTTCT)11CTT NGK_1495 CC
(CTTCT)7CTTC VT05_RS05475 C
CTT(CTTCT) WX60_RS04230 11CTTCC
adhesion
NGO1445
(CAAG)20CAAA NGK_1705
(CAAG)12CAAA VT05_RS06410
(CAAG)12CA WX60_RS05175 AA
opa
NGO1463a
(CTTCT)10CTT NGK_1729 CC
(CTTCT)7CTTC VT05_RS06485 C
(CTTCT)11CT WX60_RS05265 TCC
opa
NGO1513
(CTTCT)12CTT NGK_1799 CG
(CTTCT)14CTT VT05_RS06740 CC
CTT(CTTCT) WX60_RS05520 10CTTCC
opa
NGO1553a
(CTTCT)4CTTC NGK_1847 C
(CTTCT)9CTTC VT05_RS06965 C
(CTTCT)17CT WX60_RS05745 TCC
NGK_1272
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
WX60_RS11515
WX60_RS03145
autA
NGO1689
(AAGC)3
NGK_2082
(AAGC)3
VT05_RS07825
(AAGC)14
WX60_RS06615
pgtA
NGO1765
(G)11
NGK_2516
GGGAGCGGG
VT05_RS08235
(G)19
WX60_RS07040
NGK_2422 & NGK_2423**
(G)7
VT05_RS08730
(G)7
WX60_RS07535
NGO_09875 repetitive large & surface (G)8 NGO_09870* lipoprotein * opa
NGO1861a
(CTTCT)13CTT NGK_2410 CC
(CTTCT)11CTT VT05_RS08825 CC
(CTTCT)13CT WX60_RS07630 TCC
pilC 1
NGO1912
(G)11
NGK_2342
(G)13
VT05_RS09130
(G)11
WX60_RS07935
hypothetical
NGO1953
(C)8
NGK_2297
(C)8
VT05_RS09330
(C)8
WX60_RS08135
pyrimidine 5'- NGO2055 & nucleotidase NGK2054
(C)6
NGK_2176
CAAACCCC
VT05_RS09960
CAAACCCC
WX60_RS08730
opa
NGO2060a
(CTTCT)10CTT NP CG
NP
NP
NP
NP
opa
NP
NP
NGK_2170
(CTTCT)14CTT NP CC
NP
NP
lgtG
NGO2072
(C)11
NGK_2534 & NGK_2533**
(C)12
VT05_RS10115
(C)10
WX60_RS08845
hpuA
NGO2110
(G)9
NGK_2581
(G)10
VT05_RS10310
(G)9
WX60_RS09040
lgtA
NGO11610
(G)11
NGK_2630
(G)11
VT05_RS10530
(G)14A
WX60_RS09250
lgtC
NGO2156
(G)14
NGK_2632
(G)13
VT05_RS10540
(G)13
WX60_RS09260
lgtD
NGO2158
A(G)14
NGK_2634
A(G)16
VT05_RS10550
(G)13
WX60_RS09270
ispH
NGO0072
(G)8
NGK_0106
(G)8
VT05_RS11050
(G)8
WX60_RS09765
membrane protein
NGO0691
(G)6
NGK_1211
(G)7
VT05_RS02225
(G)7
WX60_RS02380
mafB cassette NGO1386
(C)7
NGK_1624
(C)9
VT05_RS06060
deletion‡‡
WX60_RS04830
pilin cassette
CCGC
NGK_2161
(C)8
VT05_RS10015
(C)5GCC
WX60_RS08810
NGO_11140
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2). † From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1). ‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1). § From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1). ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1). ¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1). # Gene phase variation candidacy in N. gonorrhoeae . Known – phase variation has been reported in the literature. Yes ** This CDS appears to be frame-shifted and annotated as two CDSs. †† NGK_2186 and NGO2047 annotations are on opposite strands. ‡‡ There is a 400 bp deletion in this strain encompassing the region that would contain this repeat. NP The CDS is not present in this strain.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
cus identifiers for all six N. gonorrhoeae complete genomes investigated. Repeat in FA6140§
35/02 locusǁ
Repeat in 35/02ǁ
MS11 locus¶
Repeat in MS11¶
N. gonorrhoeae candidacy#
Reference
(T)9C(G)6T
WX61_RS10730
(T)9C(G)6T
NGFG_RS05045
(T)9C(G)6TT
Known
van der Ende et al ., 1995
(A)9
WX61_RS04400
(A)9
NGFG_RS11385
(A)8
Yes
(C)11
WX61_RS06255
(C)11
NGFG_RS11700
(C)11
Known
Carson et al ., 2000
(G)13
WX61_RS06975
(G)9CAGG
NGFG_RS00285
(G)12
Known
Jonsson et al ., 1991
CTT(CTTCT) WX61_RS07050 10CTTCC
(CTTCT)6(CT NGFG_RS00365 TCC)2
(CTTCT)8CTTC Known C
Stern & Meyer, 1987
(CTTCT)17CT WX61_RS07070 TCC
(CTTCT)7CT NGFG_RS00385 TCC
(CTTCT)7CTTC Known C
Stern & Meyer, 1987
no repeat
WX61_RS07170
no repeat
NP
NP
Known
Power et al ., 2003
A(C)6
WX61_RS07175
A(C)6
NP
NP
Known
Power et al ., 2003
(CAAACAC)1 0CAAATACC AAACACCA WX61_RS08150 AATACCAA ACAC(CAAA TAC)2
(CAAACAC)2 NGFG_RS01110 4CAAATAC
(CAAACAC)15( Known CAAATAC)3
Power et al ., 2003
(G)9
(G)7
(G)7
Adamczyk-Poplawska et al., 2011
WX61_RS09195
NGFG_RS02160
Known
(C)7A(C)4T(C WX61_RS00895 )3GC
(C)11A(C)8G NGFG_RS02800 CGC
(C)6A(C)14GCC Yes
(CCCAA)4
(CCCAA)11
(CCCAA)7
WX61_RS00800
NGFG_RS02880
Known
(C)6TTATCT WX61_RS03490 AACA(G)5
(C)11TTATC NGFG_RS06490 TAACA(G)8
(C)10TTATCTA Yes ACA(G)7
(GCCA)24
(GCCA)19
(GCCA)24
Known
Yes
WX61_RS00300
NGFG_RS03385
(C)9TTATCT WX61_RS01270 AACA(G)6
(C)11TTATC NGFG_RS07045 TAACA(G)8
(C)9TTATCTA ACA(G)7
(CTTCT)16CT WX61_RS10470 TCC
(CTTCT)8CT NGFG_RS05305 TCC
(CTTCT)4CTTC Known C
(AAGC)7
WX61_RS10405
(AAGC)9
NGFG_RS05375
(AAGC)7
Yes
(AAGC)3
WX61_RS10290
(AAGC)3
NGFG_RS05480
(AAGC)3
Yes
Srikhanta et al ., 2009
Srikhanta et al ., 2009
Stern & Meyer, 1987
(CTTCT)7CT WX61_RS10010 TCC
(CTTCT)12CT NGFG_RS05770 TCC
(CTTCT)13CTT CCCTTCT(CTT Known CC)2
Stern & Meyer, 1987
(CTTCT)12CT WX61_RS09790 TCC
(CTTCT)18CT NGFG_RS06015 TCC
(CTTCT)7CTTC Known C
Stern & Meyer, 1987
(CTTCT)11CT WX61_RS01310 TCC
(CTTCT)7CT NGFG_RS07085 TCC
(CTTCT)8CTTC Known C
Stern & Meyer, 1987
(CAAG)6CAA WX61_RS02245 A
(CAAG)9CAA NGFG_RS08015 A
(CAAG)6CAAA Yes
(CTTCT)12CT WX61_RS02330 TCC
(CTTCT)12CT NGFG_RS08090 TCC
(CTTCT)10CTT Known CC
Stern & Meyer, 1987
NP
WX61_RS02595
(CTTCT)6CT NGFG_RS08385 TCG
(CTTCT)7CTTC Known G
Stern & Meyer, 1987
(CTTCT)8CT WX61_RS02820 TCC
(CTTCT)8CT NGFG_RS08610 TCC
(CTTCT)14CTT Known CC
Stern & Meyer, 1987
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
(AAGC)3
Known
Peak et al ., 1999; Arenas et al ., 2015
GGGAGCGG NGFG_RS09750 G
GGGAGCGGG
Known
Banerjee et al ., 2002
(G)7
(G)7
Yes
(AAGC)3
WX61_RS03920
(AAGC)3
(G)19
WX61_RS06075
(G)7
WX61_RS05585
NGFG_RS09330
NGFG_RS10240
(CTTCT)2CT WX61_RS05490 TCC
(CTTCT)13CT NGFG_RS10330 TCC
(CTTCT)30CTT Known CC
Stern & Meyer, 1987
GGGC(G)11
WX61_RS05130
(G)15
NGFG_RS10640
(G)11
Known
Jonsson et al., 1991
(C)8
WX61_RS04935
(C)9
NGFG_RS10845
(C)8
Yes
CAAACCCC
WX61_RS04355
(C)9
NGFG_RS11430
(C)10
Yes
NP
WX61_RS04325
(CTTCT)6CT NP TCC
NP
Known
Stern & Meyer, 1987
NP
NP
NP
NP
NP
Known
Stern & Meyer, 1987
(C)10
WX61_RS06150
(C)10
NGFG_RS11595
(C)10
Known
Mackinnon et al ., 2002
(G)9
WX61_RS06345
(G)8
NGFG_RS11790
(G)8
Known
Chen et al ., 1998
(G)17A
WX61_RS06565
(G)20A
NGFG_RS12010
(G)10A
Known
Erwin et al ., 1996
(G)10
WX61_RS06575
(G)16
NGFG_RS12020
(G)8
Known
Shafer et al ., 2002
A(G)12
WX61_RS06585
A(G)18
NGFG_RS12025
A(G)13
Known
Shafer et al ., 2002
(G)8
WX61_RS07090
(G)8
NGFG_RS00400
(G)8
(yes)
(G)6
WX61_RS00040
(G)7
NGFG_RS03645
(G)7
Yes
(C)10
WX61_RS01905
(C)8
NGFG_RS07670
(C)10
Yes
CCGCC
WX61_RS04290
(C)5
NGFG_RS11490
CCT(C)5
Yes
has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes) – a
d contain this repeat.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
rains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of this length vary on oth
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
ated here, tracts of this length vary on other genes (Chen et al., 1998; Shafer et al., 2002; Adamczyk-Poplawska et al., 2011).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
czyk-Poplawska et al., 2011).
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
Supplementary Table 2. Next-generation sequencing read data for candidate phase variable genes in N. gonorrhoeae . Gene
FA1090 locus*
Repeat in FA1090*
NCCP11945 locus†
Repeat in NCCP11945†
Repeat in FA19‡
Repeat in FA6140§
Repeat in 35/02ǁ
ispH
NGO0072
(G)8
NGK_0106
(G)8
(G)8
(G)8
(G)8
virG
NGO0985
(AAGC)3
NGK_0804
(AAGC)3
(AAGC)2
(AAGC)3
(AAGC)3
hypothetical
NGO0964
(AAGC)4
NGK_0831a
(AAGC)8
(AAGC)15
(AAGC)7
(AAGC)9
membrane protein
NGO0691
(G)6
NGK_1211
(G)7
(G)7
(G)6
(G)7
hypothetical
NGO0527
(C)6A(C)9GC
NGK_1405
(C)6A(C)8GC
(C)7A(C)4T(C) (C)7A(C)4T(C) (C)11A(C)8GCG 3GC 3GC C
replication initiation factor
NGO_06695
(C)8TTATCTA NGK_1486 ACA(G)7
(C)9TTATCTA (C)7TTATCTA (C)9TTATCTA (C)11TTATCTA ACA(G)7 ACA(G)7 ACA(G)6 ACA(G)8
mafB cassette
NGO1386
(C)7
(C)9
adhesion
NGO1445
(CAAG)20CAA NGK_1705 A
(CAAG)12CAA (CAAG)12CAA (CAAG)6CAA (CAAG)9CAAA A A A
replication initiation factor
NGO_06135
(C)8TTATCTA NGK_1957 ACA(G)7
(C)11TTATCTA (C)7TTATCTA (C)6TTATCTA (C)11TTATCTA ACA(G)8 ACA(G)7 ACA(G)5 ACA(G)8
pilin cassette
NGO_11140
CCGC
NGK_2161
(C)8
(C)5GCC
CCGCC
(C)5
pyrimidine 5'nucleotidase
NGO2055 & NGK2054
(C)6
NGK_2176
CAAACCCC
CAAACCCC
CAAACCCC
(C)9
lipoprotein
NGO2047
(A)9
NGK_2186***
(A)8
(A)8
(A)9
(A)9
hypothetical
NGO1953
(C)8
NGK_2297
(C)8
(C)8
(C)8
(C)9
NGK_1624
deletion##
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
(C)10
(C)8
repetitive large surface lipoprotein
NGO_09875 & NGO_09870†††
(G)8
NGK_2422 & NGK_2423†††
(G)7
(G)7
(G)7
(G)7
fetA / frpB
NGO2093
(C)13
NGK_2557
(C)14
(C)10
(C)11
(C)11
pilC 1
NGO1912
(G)11
NGK_2342
(G)13
(G)11
GGGC(G)11
(G)15
opa
NGO0950a
(CTTCT)16CT NGK_0847 TCC
(CTTCT)19CTT (CTTCT)13CT (CTTCT)16CT (CTTCT)8CTTC CC TCC TCC C
* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2). † From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1). ‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1). § From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1). ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1). ¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1). # Gene phase variation candidacy in N. gonorrhoeae . Known – phase variation has been reported in the literature. ** From Ion Torrent sequencing data of 8 week cultures (accession number SRR3547950). For each repeat identifie †† From Ion Torrent sequencing data of 8 week cultures (accession number SRR3547951). For each repeat identifie ‡‡ From Ion Torrent sequencing data of 20 week cultures (accession number SRR3547952). For each repeat identif §§ From Ion Torrent sequencing data of 20 week cultures (accession number SRR3547953). For each repeat identif ǁǁ From Illumina sequencing data of 20 week cultures (accession numbers SRR3547954 & SRR3547955). These are ¶¶ From Illumina sequencing data of 20 week cultures (accession numbers SRR3547956 & SRR3547957). These are ## There is a 400 bp deletion in this strain encompassing the region that would contain this repeat. *** NGK_2186 and NGO2047 annotations are on opposite strands. ††† This CDS appears to be frame-shifted and annotated as two CDSs.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
genes in N. gonorrhoeae . Repeat in MS11¶
N. gonorrhoeae candidacy#
KU1-4 (8 wks)**
(G)8
(yes)
(G)8(4F,4R), (G)7(6F,8R), (G)10(1R)
(AAGC)3
Yes
(AAGC)3(8F,9R), AAGCAAAGCAAGC(1R)
(AAGC)7
Yes
(AAGC)8(1F,1R)
(AAGC)8(8F), (AAGC)2C(AAGC)6(1F), (AAGC)7(1F)
(G)7
Yes
(G)7(5F,5R), (G)6(2F,3R), (G)8(1R)
(G)7(8F,8R), (G)6(2F,1R), (G)8(1F,3R)
(C)6A(C)14GC Yes C
(C)6A(C)7(1F), (C)5A(C)8(2F), (C)5A(C)7(3F), (C)6A(C)8(1F), (C)5A(C)5(1R)
(C)6A(C)8(3F), CT(C)4A(C)9(1F), (C)6A(C)9(3F), (C)5A(C)7(1F), (C)5A(C)8(2F), (C)6(1F)
(C)9TTATCTA Yes ACA(G)7
(C)9&(G)7(2F,5R), (C)8&(G)7(3R), (C)8&(G)5(1R), (C)8(1R), (G)7(3F,4R), (C)9&(G)6(2F), (C)9&(G)8(1R), (C)10&(G)7(1R), (C)7&(G)6(1R), (C)7&(G)7(1R)
(C)9&(G)9(1F), (G)7(1F), (C)9&(G)7(5F,4R), (C)10&(G)7(1F, 3R), (C)10&(G)8(2F), (C)8&(G)7(3F), (C)10(1F), (C)9&(G)6(3R), (C)10&(G)6(1R), (G)6(1R), (C)11&(G)7(2R), (C)9(1R), (C)9&(G)8(1F)
(C)10
(C)8(9F), (C)9(6F,3R)
(C)10(5F), (C)7(2R), (C)8(1F), (C)9(2F, 1R)
(CAAG)12CAAA(4R), (CAAG)9CAAA(1F), (CAAG)12CAA(2F)
(CAAG)12ACTAAA(1R), (CAAG)8CAAAG(CAAG)2CAAAGC AA(1F), (CAAG)11CAAAGCAA(2F), (CAAG)12CAA(3F,1R), (CAAG)12ACAAA(2R), (CAAG)3CAAAG(CAAG)4CAAAG(C CAG)2CAAAGCAAA(1F)
(C)10TTATCTA Yes ACA(G)7
(C)10TTATCTAACA(G)7(1F), (C)12TTATCTAACA(G)7(1F), (C)9TTATCTAACA(G)5(1F), (C)11TTATCTAACA(G)8(1F), (C)10TTATCTAACA(G)6(1F), (C)10(1F), (G)8(3R), (G)6(1F)
(C)13TTATCTAACA(G)8(1R), (C)12TTATCTAACA(G)9(4F,4R), (C)9TTATCTAACA(G)8(1F), (C)11TTATCTAACA(G)8(3F,1R), (C)12TTATCTAACA(G)8(7R), (C)11TTATCTAACA(G)9(3F), (C)10TTATCTAACA(G)8(1F,2R), (C)11TTATCTAACA(G)7(1R), (C)12TTATCTAACA(G)10(1F), (G)9(2F,1R), (G)8(2F,1R)
CCT(C)5
Yes
(C)8(3F, 4R), (C)7(3F, 4R), (C)6(1F)
(C)8(11R), (C)7(3R), (C)9(1R)
(C)10
Yes
CAAACCCC(17F,10R), CAAACCC(2F), CAAACCCC(7F,8R), CAACCC(1R), CAAACCC(1F,2R), CAACCCC(1R) CAAAACCCC(1R)
(A)8
Yes
(A)8(11F,3R), (A)7(6F,3R), (A)9(2R)
(A)7(11F,2R), (A)8(7F, 7R), (A)9(3F,11R), (A)10(7R), (A)6(1F)
(C)8
Yes
(C)7(1F,3R), (C)8(9F,10R)
(C)8(9F,7R), (C)7(2F,2R), (C)9(4F,6R)
Yes
(CAAG)6CAAA Yes
KU1-45 (8 wks)†† (G)9(7F,3R), (G)8(10F,13R), (G)10(1R), (G)11(1F), (G)7(2F) (AAGC)3(5F,8R), AAGCAAGACAAGC(1F)
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
(G)7
Yes
(G)7(8F, 5R), (G)6(3F, 3R)
(G)7(5F,2R), (G)6(2R)
(C)11
Known
(C)11(3F,2R), (C)10(2F), (C)12(1F)
(C)12 (2F), (C)10 (3F), (C)11(2F), (C)14(1F), (C)15(1F), (C)8(1F)
(G)11
Known
(G)12(3F), (G)11(5F,1R), (G)9(1F,5R), (G)14(4F), (G)13(6F), (G)12(4F), (G)10(1R) (G)11(1F)
(CTTCT)4CTTC Known C
No reads through repeat
No reads through repeat
en reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes 950). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repea 951). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repea 7952). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repe 7953). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repe 4 & SRR3547955). These are combined results from paired-end forward (2928-NS1_1) and reverse (2928-NS1_2) reads. For ea 56 & SRR3547957). These are combined results from paired-end forward (2929-NS2_1) and reverse (2929-NS2_2) reads. For e in this repeat.
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
KU1-95 (20 wks)‡‡
KU1-96 (20 wks)§§
2928-NS1ǁǁ
(G)10(2R), (G)8(6F,5R), (G)9(2F,5R), (G)7(5F)
(G)7(1F,1R), (G)6CG(1R)
(G)8(6F,12R)
(AAGC)3(3F,9R)
(AAGC)3(2F,1R)
(AAGC)3(8F,7R)
(AAGC)8(7F,4R)
(AAGC)8(2F,1R)
(AAGC)8(1F)
(G)7(5F,7R), (G)8(1F,3R), (G)6(5F)
(G)7(4F,1R), (G)6(1F)
(G)7(4F,3R)
(C)5A(C)7(2F), (C)6A(C)8(5F), (C)6A(C)9(1F), (C)3(G)3(C)5A(C)7(1F), (C)5A(C)7(1F)
(C)6A(C)8(1F), (C)6A(C)7(1F)
(C)6A(C)8(6F)
(C)11&(G)7(2R), (C)10(1F), (C)8&(G)7(1F), (C)9&(G)6(1R), (C)10&(G)4TCC(1R), (G)7(1F,2R), (C)7(1R), (C)9&(G)7(3F,1R), (C)9(2F,1R)
(G)7(1R), (C)8&(G)6(2F)
(C)9&(G)7(14F,13R), (C)9(3F,2R), (G)7(3F,2R)
(C)9(5F,3R), (C)8(4F,2R), (C)10(4F), (C)7(1F,1R)
(C)8(1F,1R), (C)7(1F), (C)5A(C)4(1R) (C)9(12F,12R)
(CAAG)12CAAA(7F,13R), (CAAG)13CAAA(1F), CAAGAAG(CAAG)11(1R)
(CAAG)12(1F)
(CAAG)12(6F,12R), (CAAG)13(1F), (CAAG)11(1R)
(C)11&(G)9(2F,1R), (C)12&(G)9(1F), (C)10&(G)7(1R), (C)10&(G)9(1F), (C)9&(G)7(1R), (C)11&(G)10(1F), (C)9&(G)6(1R), (C)11&(G)7(1R), (G)9(1F,1R) (C)12&(G)8(1F,2R), (G)8(2F,1R), (C)11&(G)8(2F,1R), (C)12&(G)7(1R), (C)12(G)4T(G)3(1R), (C)10&(G)8(2F), (C)11(1F)
(C)10&(G)8(1F), (C)11&(G)8(10F,4R), (G)8(1F,1R), (C)11(2F,5R), (C)6A(C)4(1F)
(C)8(4F,12R), (C)7(4F), (C)9(1F,1R), (C)6(1R)
(C)8(1F)
(C)8(7F,11R), (C)7(1R)
CAAACCC(3F), CAAACCCC(10F,11R), CAAACCCCC(1F)
CAAACCCC(2R)
CAAACCCC(16F,7R)
(A)8(10F,10R), (A)7(1F), (A)9(2F,5R) no data
(A)8(24F,8R), (A)9(1R)
(C)8(9F,10R), (C)9(2F), (C)7(1F,2R)
(C)8(9F,9R)
(C)7(1F), (C)6(1R)
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
(G)7(7F,5R), (G)6(4F,7R), A(G)6(1F), (G)5TG(1F), (G)8(2F), (G)5(1R)
(G)8(1F)
(G)7(18F,14R)
(C)11(5F), (C)12(1F)
no data
(C)11(2R), (C)10(1F), (C)12(3F,3R)
(G)11(4F), (G)12CGG(1F), (G)6(1R)
(G)10(1F)
(G)13(1F), (G)14(1F), (G)12(2F,2R), (G)16(1R), (G)15(1R), (G)11(1F)
No reads through repeat
No reads through repeat
(CTTCT)19CTTCC(3F,1R), (CTTCT)18CTTCC(1F)
tween strains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of th te the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). te the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). cate the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). cate the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). NS1_1) and reverse (2928-NS1_2) reads. For each repeat identified, the values in parenthesis after the repeat indicate the num NS2_1) and reverse (2929-NS2_2) reads. For each repeat identified, the values in parenthesis after the repeat indicate the num
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
2929-NS2¶¶
Repeat location in NCCP11945†
(G)8(6F,11R)
79,691
(AAGC)3(1F,2R)
656,995
(AAGC)7(1F), (AAGC)8(1R) 676,106 (G)7(3F,4R)
990,022
(C)6A(C)8(3F) 1,172,254
(C)9&(G)7(11F,2R), (C)9(2F,1R), (C)10&(G)7(1F,1R), (G)7(2F)
1,228,002 (C)9(11F,8R), (C)10(1F,1R)
1,355,998
(CAAG)12(6F,2R)
1,419,715
(C)11&(G)8(8F), (C)10(1R), (C)9&(G)8(1F), (C)10&(G)8(2F), (G)8(2F), (C)11(1R)
1,615,314 (C)8(3F,8R)
1,792,194
CAAACCCC(13F,7R), CACACCCC(1F) 1,803,603 (A)8(15F,8R) 1,811,461 (C)8(4F,2R)
1,907,607
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44
(G)7(8F,8R) 2,024,561 (C)6A(C)5(1R), (C)11(1F) 2,149,455 (G)12(1F), (G)16(2R), (G)13(2F) 1,942,124 (CTTCT)19CTTCC(3F), CATCT(CTTCT)18CTTCC(1R)
689,033
etween strains investigated here, tracts of this either forward (F) or reverse (R). either forward (F) or reverse (R). d, either forward (F) or reverse (R). d, either forward (F) or reverse (R). arenthesis after the repeat indicate the number of reads that include this repeat and the directionality of the read, either forw parenthesis after the repeat indicate the number of reads that include this repeat and the directionality of the read, either for
Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44