Microbial Genomics

4 downloads 0 Views 2MB Size Report
Alan Calder. Lori AS Snyder, Ph.D. Abstract: There are many types of repeated DNA sequences in the genomes of the Neisseria spp., from homopolymeric tracts ...
Microbial Genomics Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation. --Manuscript Draft-Manuscript Number:

MGEN-D-16-00046R1

Full Title:

Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation.

Article Type:

Short Paper

Section/Category:

Genomic Methodologies: Genome variation detection

Corresponding Author:

Lori AS Snyder, Ph.D. Kingston University Kingston upon Thames, UNITED KINGDOM

First Author:

Marta A. Zelewska

Order of Authors:

Marta A. Zelewska Madhuri Pulijala Russell Spencer-Smith Hiba-Tun-Noor A. Mahmood Billie Norman Colin P. Churchward Alan Calder Lori AS Snyder, Ph.D.

Abstract:

There are many types of repeated DNA sequences in the genomes of the Neisseria spp., from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the Neisseria spp. most often causes the gene to switch between ON and OFF states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straight forward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compared these results to other gonococcal genome sequences. Evidence is presented for an updated phase variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation On: Tue, 20 Sep 2016 09:20:44

Manuscript Including References (Word document)

Click here to download Manuscript Including References (Word document)

Short Paper template

1 2 3

Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation.

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

ABSTRACT There are many types of repeated DNA sequences in the genomes of the Neisseria spp., from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the Neisseria spp. most often causes the gene to switch between ON and OFF states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straight forward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compared these results to other gonococcal genome sequences. Evidence is presented for an updated phase variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae.

22 23

DATA SUMMARY

24 25 26 27 28 29 30 31 32 33

1. Sequence data for Neisseria gonorrhoeae strains investigated are available in GenBank under the following accession numbers: FA1090 (NC_002946.2; url http://www.ncbi.nlm.nih.gov/nuccore/NC_002946.2); NCCP11945 (NC_011035.1; url - http://www.ncbi.nlm.nih.gov/nuccore/NC_011035.1; & CP001050.1; url http://www.ncbi.nlm.nih.gov/nuccore/CP001050.1); MS11 (NC_022240.1; url http://www.ncbi.nlm.nih.gov/nuccore/NC_022240.1); FA19 (NZ_CP012026.1; url http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012026.1); FA6140 (NZ_CP012027.1; url - http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012027.1); 35/02 (NZ_CP012028.1; url - http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP012028.1). These were accessed on the 15th of April 2016 for use in this study.

34 35

2. Sequence data were assessed via BLAST interrogation of the nr database restricted to the N. gonorrhoeae species (url - http://www.ncbi.nlm.nih.gov/).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

36 37 38

3. Genome resequencing data for N. gonorrhoeae strain NCCP11945 has been deposited under BioProject PRJNA322254 (url http://www.ncbi.nlm.nih.gov/bioproject/322254).

39

4. Genome resequencing data was assessed via Galaxy (url - usegalaxy.org).

40 41

We confirm all supporting data, code, and protocols have been provided within the article or through supplementary data files. ☒

42 43

IMPACT STATEMENT

44 45 46 47 48 49 50 51 52 53 54 55 56

Phase variation plays a vital role in the ability of Neisseria gonorrhoeae to adapt to the various niche environments encountered. Through stochastic switching in the expression of key genes and regulatory systems, mediated by simple sequence repeats, the population of bacteria are diverse and readily able to survive in the face of selective pressures. Not all simple sequence repeats within the genome mediate phase variation. Previous investigations have sought to define the phase variable repertoire of the Neisseria spp. and have identified a large number of candidates using a small number of genome sequences. With the availability of more genome sequence data and additional experimental data, we have refined the original repertoire to include those most likely to be phase variable in N. gonorrhoeae. As these genes are important for survival, their definition as phase variable is important for understanding pathogenesis and for potential future therapies. The advent of high throughput sequencing has the potential to reveal additional cases of within strain variations in repeat tracts, supporting phase variable candidacy of genes.

57 58

INTRODUCTION

59 60 61 62 63 64 65 66

In Neisseria gonorrhoeae, the causative agent of gonorrhoea, DNA repeats are intimately linked to the biology of the organism. N. gonorrhoeae, and the closely related bacterial species Neisseria meningitidis, undergo phase variable stochastic switching of gene expression for several surface structures, contributing to antigenic variation and immune evasion as well as niche adaptation in the course of infection (Bhat et al., 1991; Moxon et al., 2006; Carbonnelle et al., 2009; Srikhanta et al., 2009; Omer et al., 2011). Phase variation is mediated by simple sequence repeats associated with genes. In the Neisseria spp. the vast majority contain homopolymeric tracts within the coding sequences (Snyder et al., 2001).

67 68 69 70 71 72 73 74

Comparative sequence analysis between a single N. gonorrhoeae and several N. meningitidis genome sequences identified over 100 potentially phase variable genes (Snyder et al., 2001), some of which have later been demonstrated to be phase variable experimentally (Jordan et al., 2005). Transcriptional and translational phase variation have been extensively studied in the Neisseria spp., however an additional class of simple sequence repeat mediated phase variation has been described in Helicobacter canadensis following whole genome analysis (Snyder et al., 2010). Simple sequence repeat mediated changes in the presence or absence of C-terminal cell wall attachment motifs has also been described in

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

75 76 77

Streptococcus agalactiae (Janulczyk et al., 2010). In N. meningitidis, a gene fusion between pglB2 and the downstream phosphoglycosyltransferase gene appears to be mediated by a poly-A repeat tract (Viburiene et al., 2013).

78 79 80 81

With the availability of additional gonococcal genome sequences, the gonococcal phase variable repertoire has here been re-assessed. As a result, phase variation in which repeats at the 3' ends of genes mediate changes in the C-terminal sequence of the proteins is described as part of a refined phase variable gene repertoire.

82 83

MATERIALS AND METHODS

84 85 86 87 88 89 90 91 92 93 94 95

Identification of phase variable genes. Using the previous phase variable gene repertoires reported for N. gonorrhoeae and N. meningitidis (Snyder et al., 2001; Martin et al., 2003; Jordan et al., 2005), the homologues in N. gonorrhoeae strain NCCP11945 were sought (CP00150.1; Chung et al., 2008). In addition, pattern search in xBASE (Chaudhuri et al., 2008) was used to identify other repeats, based on previous evidence of phase variation in the Neisseria spp.: ≥ (G)8; ≥ (C)8; ≥ (CAAACAC)3; ≥ (CAAATAC)3; ≥ (CCCAA)3; ≥ (GCCA)3; ≥ (A)9; ≥ (T)9; ≥ (AAGC)3; ≥ (TTCC)3; and ≥ (CTTCT)3. No other repeats have been demonstrated to cause phase variation in this species. Genome sequences for N. gonorrhoeae strains NCCP11945 (NC_011035.1), FA1090 (NC_002946.2), FA19 (NZ_CP012026.1), FA6140 (NZ_CP012027.1), 35/02 (NZ_CP012028.1), and MS11 (NC_022240.1) were downloaded 15 th April 2016 and compared using progressive Mauve v2.3.1 (Darling et al., 2004) to identify orthologues (Table S1).

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

Identification of repeat variation within N. gonorrhoeae strain NCCP11945. N. gonorrhoeae strain NCCP11945 was grown on GC agar (Oxoid) with Kellogg’s (Kellogg et al., 1963) and 5% Fe(NO3)3 supplements at 37°C in a candle tin for a period of 8 weeks with passages to fresh agar plates every 2 days or at 37°C 5% CO2 for a period of 20 weeks with passages to fresh agar plates every 2-3 days. At each passage, cells were scraped from the plate and resuspended in 1 ml of GC broth to a turbidity equivalent to a 0.5 McFarland standard before inoculation onto fresh plates using a sterile cotton swab. DNA was extracted from such resuspensions using the Puregene Yeast/Bacterial kit (Qiagen). 1 μg or 100 ng of the DNA was genome sequenced using the Ion Personal Genome Machine, Ion Express Fragment Library kit, Ion Express Template kit, and Ion Sequencing kit (Life Technologies) or using the Illumina-based methods of the MicrobesNG service (microbesng.uk). Sequence read data was interpreted using Galaxy on usegalaxy.org (Afgan et al., 2016). Briefly, the reference sequence (NC_011035.1), Ion Torrent data for eight week passages (KU1-4, KU145), Ion Torrent data for 20 week passages (KU1-95, KU1-96), and Illumina data for 20 week passages (2928-NS1_1 & 2929-NS1_2 and 2929-NS2_1 & 2929-NS2_2) were uploaded to Galaxy. The Ion Torrent bam format files were converted to fastq format using BAMTools Convert (Barnett et al., 2011). FASTQ Groomer was used on all NGS data (Blankenberg et al., 2010). Bowtie2 was used to map the reads against the reference (Langmead et al., 2009; Langmead et al., 2012) before visualisation using the Integrated Genomics Viewer (Robinson et al., 2011; Thorvaldsdóttir et al., 2013).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

116 117

RESULTS AND DISCUSSION

118

Phase variable genes

119 120 121 122

The phase variable gene repertoire of N. gonorrhoeae strain NCCP11945 was investigated and compared against gonococcal strains FA1090, FA19, FA6140, 35/02, and MS11 to assess the presence of similar repeat tracts across the species and variations in repeat tract lengths between strains.

123 124 125 126 127 128 129 130

Transcriptional phase variation is mediated by repeats within or associated with the promoter region (Fig. 1a). Changes in the repeat alters the level of transcription of the gene, as in fetA (frpB; NGK_2557) where differences in the length of the poly-C homopolymeric tract between the -10 and -35 promoter regions alters expression (Carson et al., 2000). There are three transcriptional phase variable genes in N. gonorrhoeae strain NCCP11945 (Table 1), fetA (NGK_2557), a lipoprotein (NGK_2186), and porA (NGK_0906/NGK_0907), yet in gonococci porA does not have an intact coding region. Variation in the repeats between gonococcal strains is found for all three transcriptional phase variable genes (Table 1).

131 132 133 134 135 136 137 138 139 140 141 142 143 144 145

Most common in the Neisseria spp. is translational phase variation where, as in pilC, the repeat is within the 5' portion of the coding region of the gene (Fig. 1b). Changes in the repeat tract generate frame-shift mutations in two of the three open reading frames, with the gene only being translated into protein when the repeat tract length puts the gene inframe. Whilst many phase variable genes in the Neisseria spp. contain homopolymeric tracts, some experience copy number changes in repetitive sequences, such as the CTTCT repeat in opa (Muralidharan et al., 1987; Bhat et al., 1991) or the AAGC repeat in autA (Peak et al., 1999; Arenas et al., 2015). In the N. gonorrhoeae strains examined here, the AAGC repeat in virG (NGK_0804) is only present in two or three copies (Table 2), rather than several copies as in NGK_0831a and autA (NGK_2082). Although virG has low copy number for the repeat, variations between strains are observed and strains with many copies may yet be identified (there are currently none >(AAGC)3 in the NCBI nr/nt or wgs databases), therefore it is placed amongst the phase variable genes even though this may be at low frequency or be a strain specific effect. There are 36 translational phase variable genes in N. gonorrhoeae based on the species examined (Table 2).

146 147 148 149 150 151 152 153 154 155

In addition, a third class of repeated mediated phase variable gene was identified (Snyder et al., 2010). In these C-terminal phase variable genes, a repeat tract towards the 3' of the coding region is able to alter the sequence at the C-terminus of the encoded protein (Fig. 1c). In N. gonorrhoeae strain NCCP11945, four of these C-terminal phase variable genes were identified (Table 3). It is likely that in the case of the pilin sequence (NGK_2161), changes in the repeat are causing pilus protein changes, mediating antigenic variation through a phase variable mechanism. Comparisons also show repeat tract variation in a membrane protein (NGK_1211) and mafB cassette (NGK_1624), supporting C-terminal phase variation in the Neisseria spp. Variations in the products of mafB cassettes are believed to contribute to competition between species within the niche (Jamet et al., 2015).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

156 157 158

Although no variation was observed in these strains in ispH (NGK_0106), (G)8 repeats are known to vary in lgtC (NGK_1632), hpuA (NGK_2581), and hsdS (NGK_0571) (Table 2), therefore it is highly likely that the repeat in ispH also has the capacity to vary.

159 160 161 162 163 164 165 166 167 168

A number of previously reported candidates are not supported by evidence of phase variation, based on the absence of tract length changes between the strains (Table 4). For example, although tract variation was reported for cvaA (NGK_0168), mafA-3 (NGK_2270), and dca (NGK_1830) in N. meningitidis (Martin et al., 2003), there are no changes observed in the short (C)4 and (G)5 tracts in these genes in N. gonorrhoeae (Table 4). They are therefore unlikely to be phase variable in this species. Likewise, neither of the dinucleotide repeat containing genes (NGK_1607 and NGK_2274) show variations (Table 4); dinucleotides are not likely to be phase variable in the Neisseria spp. (Martin et al., 2003). All of these genes contain short repeats that do not vary or alternative nucleotide sequences in the strains investigated (Table 4).

169 170 171 172 173 174 175 176 177 178 179 180

This analysis identified 29 genes that are known to be phase variable (Tables 1 & 2), either in N. gonorrhoeae or N. meningitidis including 12 paralogues of opa (11 in each strain; Muralidharan et al., 1987; Bhat et al., 1991) and 17 other known phase variable genes (Stern et al., 1987; Jonsson et al., 1991; van der Ende et al., 1995; Erwin et al., 1996; Chen et al., 1998; Peak et al., 1999; Carson et al., 2000; Banerjee et al., 2002; Mackinnon et al., 2002; Shafer et al., 2002; Power et al., 2003; Srikhanta et al., 2009; Adamczyk-Poplawska et al., 2011; Arenas et al., 2015). Thirteen additional genes have variations in the repeat tracts when the six N. gonorrhoeae genome sequences are compared, one transcriptional, nine translational, and three C-terminal repeats. Based on homology and presence of conserved domains, these genes are believed to encode two replication initiation factors, an adhesion protein, a pyrimidine 5'-nucleotidase, two lipoproteins, two membrane proteins, two secreted proteins, and three hypothetical proteins (Tables 1-3).

181 182 183 184 185 186 187 188 189

Combined with the previous data on repeat variation within and between gonococcal strains and demonstration of phase variation (Sparling et al., 1986; Yang et al., 1996; Lewis et al., 1999; Snyder et al., 2001; Power et al., 2003; Jordan et al., 2005; Srikhanta et al., 2009), a revised repertoire of 43 transcriptional (Table 1), translational (Table 2), and Cterminal (Table 3) phase variable genes is proposed for N. gonorrhoeae as a species. This is fewer than previous predictions (76 in Jordan et al., 2005) and thus far two-thirds (67%, 29 out of 43) have been experimentally demonstrated to be phase variable (Tables 1 & 2). The additional 14 genes, 13 of which show strain to strain repeat variation, require additional investigation.

190

Phase variable repeat copy number variation in vitro.

191 192 193 194 195 196

Previously, for H. canadensis, 454 and Illumina genome sequence read data was used to support candidacy of phase variable genes (Snyder et al., 2010). In the present study, Ion Torrent and Illumina genome sequence read data from N. gonorrhoeae strain NCCP11945 that had been passaged in the laboratory for 8 weeks or for 20 weeks was analysed for changes to phase variable repeats for the 14 genes for which there is no within strain evidence of phase variation (Tables 1-3). Changes were observed in known phase variable

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

197 198 199 200 201 202 203 204

genes pilC1, opa, and fetA, suggesting read data can support phase variability by demonstrating within strain variation in tracts (Table 5). Of the 14 genes, only virG (NGK_0804) and the pyrimidine 5'-nucleotidase (NGK_2176) showed no changes in the repeats (Table 5). Likely, the virG (AAGC)3 copy number is too low to vary, however there may be strains with greater copy number in which it would. Likewise, the poly-C repeat in NGK_2176 has been replaced with CAAACCCC in strain NCCP11945 and therefore would not be expected to phase vary in this strain, however phase variation is likely in strain MS11, for example.

205 206 207 208 209 210 211 212

The Ion Torrent sequencing technology has been criticised for generating homopolymer associated indels (Loman et al., 2012) and that the tracts can be incorrect at >8 bases (Quail et al., 2012), the optimal length for phase variation. Homopolymeric tracts in Illumina data are believed to be less error prone (Schirmer et al., 2015). However, repeat sequence data from Illumina often agreed with Ion Torrent on the presence of variation (9 of 14 genes with variation in Ion Torrent, Table 5). When the Illumina data did not show repeat variation, this often corresponded to relatively low read coverage of the region compared to the Ion Torrent data (Table S2).

213 214 215 216 217

It is currently impossible to differentiate genuine biologically induced indels from sequencing errors (Narzisi & Schatz, 2015). We may find that what we ascribe to errors can also be subtle changes that are constantly being generated within the bacterial population. From this data, the expected biological variation supporting phase variation appears to be present in N. gonorrhoeae strain NCCP11945 for 12 as yet unexplored genes.

218 219

CONCLUSION

220 221 222 223 224 225

In conclusion, N. gonorrhoeae possesses three different mechanisms for phase variation: transcriptional; translational; and C-terminal. Stochastic systems obviously play important roles in the biology of the organism given the variety and number of genes involved. The functions of previously unexplored phase variable genes, including one transcriptional phase variable gene, nine translational phase variable genes, and four C-terminal phase variable genes require further investigation.

226 227

ACKNOWLEDGEMENTS

228 229 230

CPC was supported by a Sparks project grant 13KIN01. Illumina genome sequencing was provided by MicrobesNG (http://www.microbesng.uk), which is supported by the BBSRC (grant number BB/L024209/1).

231 232

REFERENCES

233 234 235

Adamczyk-Poplawska M., Lower M., Piekarowicz A. (2011). Deletion of one nucleotide within the homonucleotide tract present in the hsdS gene alters the DNA sequence specificity of type I restriction-modification system NgoAV. J Bacteriol 193, 6750-6759.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

236 237 238 239 240

Afgan E., Baker D., van den Beek M., Blankenberg D., Bouvier D., Čech M., Chilton J., Clements D., Coraor N., Eberhard C., Grüning B., Guerler A., Hillman-Jackson J., Von Kuster G., Rasche E., Soranzo N., Turaga N., Taylor J., Nekrutenko A., Goecks J. (2016). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res [epub ahead of print]

241 242 243

Arenas J., Cano S., Nijland R., van Dongen V., Rutten L., van der Ende A., Tommassen J. (2015). The meningococcal autotransporter AutA is implicated in autoaggregation and biofilm formation. Environ Microbiol 17, 1321-1337.

244 245 246

Banerjee A., Wang R., Supernavage S. L., Ghosh S. K., Parker J., Ganesh N. F., Wang P. G., Gulati S., Rice P. A. (2002). Implications of phase variation of a gene (pgtA) encoding a pilin galactosyl transferase in gonococcal pathogenesis. J Exp Med 196, 147-162.

247 248

Barnett D. W., Garrison E. K., Quinlan A. R., Stromberg M. P., Marth G. T. (2011). BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27, 1691–1692.

249 250 251

Bhat K. S., Gibbs C. P., Barrera O., Morrison S. G., Jahnig F., Stern A., Kupsch E. M., Meyer T. F., Swanson J. (1991). The opacity proteins of Neisseria gonorrhoeae strain MS11 are encoded by a family of 11 complete genes. Mol Microbiol 5, 1889-1901.

252 253

Blankenberg D., Gordon A., Von Kuster G., Coraor N., Taylor J., Nekrutenko A., Galaxy Team. (2010). Manipulation of FASTQ data with Galaxy. Bioinformatics 26, 1783-1785.

254 255 256 257

Bolotin D. A., Mamedov I. Z., Britanova O. V., Zvyagin I. V., Shagin D., Ustyugova S. V., Turchaninova M. A., Lukyanov S., Lebedev Y. B., Chudakov D. M. (2012). Next generation sequencing for TCR repertoire profiling: Platform-specific features and correction algorithms. Eur J Immunol 42, 3073-3083.

258 259

Bragg L. M., Stone G., Butler M. K., Hugenholtz P., Tyson G. W. (2013). Shining a light on dark sequencing: Characterising errors in Ion Torrent PGM data. PLoS Comput Biol 9, e1003031.

260 261

Carbonnelle E., Hill D. J., Morand P., Griffiths N. J., Bourdoulous S., Murillo I., Nassif X., Virji M. (2009). Meningococcal interactions with the host. Vaccine 27 Suppl 2, B78-89.

262 263

Carson S. D., Stone B., Beucher M., Fu J., Sparling P. F. (2000). Phase variation of the gonococcal siderophore receptor FetA. Mol Microbiol 36, 585-593.

264 265

Chaudhuri R. R., Loman N. J., Snyder L. A., Bailey C. M., Stekel D. J., Pallen M. J. (2008). xBASE2: a comprehensive resource for comparative bacterial genomics. Nucleic Acids Res 36, D543-546.

266 267

Chen C. J., Elkins C., Sparling P. F. (1998). Phase variation of hemoglobin utilization in Neisseria gonorrhoeae. Infect Immun 66, 987-993.

268 269

Chung G. T., Yoo J. S., Oh H. B., Lee Y. S., Cha S. H., Kim S. J., Yoo C. K. (2008). Complete genome sequence of Neisseria gonorrhoeae NCCP11945. J Bacteriol 190, 6035-6036.

270 271

Darling A. C. E., Mau B., Blattner F. R., Perna N. T. (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome research 14, 1394-1403.

272 273 274

Erwin A. L., Haynes P. A., Rice P. A., Gotschlich E. C. (1996). Conservation of the lipooligosaccharide synthesis locus lgt among strains of Neisseria gonorrhoeae: requirement for lgtE in synthesis of the 2C7 epitope and of the beta chain of strain 15253. J Exp Med 184, 1233-1241.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

275 276 277

Jamet A., Jousset A. B., Euphrasie D., Mukorako P., Boucharlat A., Ducousso A., Charbit A., Nassif X. (2015). A new family of secreted toxins in pathogenic Neisseria species. PLoS Pathog 11, e1004592.

278 279

Janulczyk R., Madignani V., Maione D., Tettelin H., Grandi G., Telford J. L. (2010). Simple sequence repeats and genome plasticity in Streptococcus agalactiae. J Bacteriol 192, 3990-4000.

280 281

Jonsson A. B., Nyberg G., Normark S. (1991). Phase variation of gonococcal pili by frameshift mutation in pilC, a novel gene for pilus assembly. EMBO J 10, 477-488.

282 283

Jordan P. W., Snyder L. A., Saunders N. J. (2005). Strain-specific differences in Neisseria gonorrhoeae associated with the phase variable gene repertoire. BMC Microbiol 5, 21.

284 285

Kellogg D. S. Jr., Peacock W. L. Jr., Deacon W.E., Brown L., Pirkle D.I. (1963). Neisseria gonorrhoeae. I. Virulence genetically linked to clonal variation. J Bacteriol 85, 1274-1279.

286 287

Langmead B., Trapnell C., Pop M., Salzberg S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25.

288 289

Langmead B., Salzberg S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359.

290 291 292

Lewis L. A., Gipson M., Hartman K., Ownbey T., Vaughn J., Dyer D. W. (1999). Phase variation of HpuAB and HmbR, two distinct haemoglobin receptors of Neisseria meningitidis DNM2. Mol Microbiol 32, 977-989.

293 294 295

Loman N. J., Misra R. V., Dallman T. J., Constantinidou C., Gharbia S. E., Wain J., Pallen M. J, (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30, 434-439.

296 297 298 299 300

Mackinnon F. G., Cox A. D., Plested J. S., Tang C. M., Makepeace K., Coull P. A., Wright J. C., Chalmers R., Hood D. W., Richards J. C., Moxon E. R. (2002). Identification of a gene (lpt-3) required for the addition of phosphoethanolamine to the lipopolysaccharide inner core of Neisseria meningitidis and its role in mediating susceptibility to bactericidal killing and opsonophagocytosis. Mol Microbiol 43, 931-943.

301 302 303

Martin P., van de Ven T., Mouchel N., Jeffries A. C., Hood D. W., Moxon E. R. (2003). Experimentally revised repertoire of putative contingency loci in Neisseria meningitidis strain MC58: evidence for a novel mechanism of phase variation. Mol Microbiol 50, 245-257.

304 305

Moxon R., Bayliss C., Hood D, (2006). Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu Rev Genet 40, 307-333.

306 307

Muralidharan K., Stern A., Meyer T.F. (1987). The control mechanism of opacity protein expression in the pathogenic Neisseriae. Antonie Van Leeuwenhoek 53, 435-440.

308 309

Narzisi G., Schatz M. C. (2015). The challenge of small-scale repeats for indel discovery. Front Bioeng Biotechnol 3, 8.

310 311 312

Omer H., Rose G., Jolley K. A., Frapy E., Zahar J. R., Maiden M. C., Bentley S. D., Tinsley C. R., Nassif X., Bille E. (2011). Genotypic and phenotypic modifications of Neisseria meningitidis after an accidental human passage. PLoS One 6, e17145.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

313 314

Peak I. R., Jennings M. P., Hood D. W., Moxon E. R. (1999). Tetranucleotide repeats identify novel virulence determinant homologues in Neisseria meningitidis. Microb Pathog 26, 13-23.

315 316 317

Power P. M., Roddam L. F., Rutter K., Fitzpatrick S. Z., Srikhanta Y. N., Jennings M. P. (2003). Genetic characterization of pilin glycosylation and phase variation in Neisseria meningitidis. Mol Microbiol 49, 833-847.

318 319 320

Quail M. A., Smith M., Coupland P., Otto T. D., Harris S. R., Connor T. R., Bertoni A., Swerdlow H. P., Gu Y. (2012). A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341.

321 322

Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E. S., Getz G., Mesirov J.P. (2011). Integrative Genomics Viewer. Nature Biotechnology 29, 24–26.

323 324 325

Schirmer M., Ijaz U. Z., D'Amore R., Hall N., Sloan W. T., Quince C. (2015). Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 43, e37.

326 327 328 329

Shafer W. M., Datta A., Kolli V. S., Rahman M. M., Balthazar J. T., Martin L. E., Veal W. L., Stephens D. S., Carlson R. (2002). Phase variable changes in genes lgtA and lgtC within the lgtABCDE operon of Neisseria gonorrhoeae can modulate gonococcal susceptibility to normal human serum. J Endotoxin Res 8, 47-58.

330 331 332

Snyder L. A. S., Butcher S. A., Saunders N. J. (2001). Comparative whole-genome analyses reveal over 100 putative phase-variable genes in the pathogenic Neisseria spp. Microbiology 147, 23212332.

333 334 335

Snyder L. A. S., Loman N. J., Linton J. D., Langdon, R. R., Weinstock G. M., Wren B. W., Pallen, M. J. (2010). Simple sequence repeats in Helicobacter canadensis and their role in phase variable expression and C-terminal sequence switching. BMC Genomics 11, 67.

336 337

Sparling P. F., Cannon J. G., So M, (1986). Phase and antigenic variation of pili and outer membrane protein II of Neisseria gonorrhoeae. J Infect Dis 153, 196-201.

338 339 340 341

Srikhanta Y. N., Dowideit S. J., Edwards J. L., Falsetta M. L., Wu H. J., Harrison O. B., Fox K. L., Seib K. L., Maguire T. L., Wang A. H., Maiden M. C., Grimmond S. M., Apicella M. A., Jennings M. P. (2009). Phasevarions mediate random switching of gene expression in pathogenic Neisseria. PLoS Pathog 5, e1000400.

342 343

Stern A., Meyer T. F. (1987). Common mechanism controlling phase and antigenic variation in pathogenic neisseriae. Mol Microbiol 1, 5-12.

344 345

Thorvaldsdóttir H., Robinson J. T., Mesirov J. P. (2013). Integrative Genomics Viewer (IGV): highperformance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178-192.

346 347 348

van der Ende A., Hopman C. T., Zaat S., Essink B. B., Berkhout B., Dankert J. (1995). Variable expression of class 1 outer membrane protein in Neisseria meningitidis is caused by variation in the spacing between the -10 and -35 regions of the promoter. J Bacteriol 177, 2475-2480.

349 350 351

Viburiene R., Vik Å., Koomey M., Børud B. (2013). Allelic variation in a simple sequence repeat element of neisserial pglB2 and its consequences for protein expression and protein glycosylation. J Bacteriol 195, 3476-3485.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

352 353

Yang Q. L., Gotschlich E. C. (1996). Variation of gonococcal lipooligosaccharide structure is due to alterations in poly-G tracts in lgt genes encoding glycosyl transferases. J Exp Med 183, 323-327.

354 355

DATA BIBLIOGRAPHY

356 357

1. Marta A. Zelewska, Madhuri Pulijala & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547950 (2016).

358 359

2. Hiba-Tun-Noor A. Mahmood & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547951 (2016).

360 361

3. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547952 (2016).

362 363

4. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547953 (2016).

364 365

5. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547954 (2016).

366 367

6. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547955 (2016).

368 369

7. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547956 (2016).

370 371

8. Colin P. Churchward, Alan Calder & Lori A. S. Snyder. NCBI Sequence Read Archive. Accession number SRR3547957 (2016).

372 373

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

374

FIGURES AND TABLES

375

Figure 1

376 377 378 379 380 381 382 383

Fig. 1. Illustrations of the types of phase variation in N. gonorrhoeae. (a): Transcriptional phase variation, in which changes in a repeat tract alters the facing and spacing of the -10 and -35 promoter elements and alters the level of transcription of the gene. Phase variation of fetA is used as an example, where it has been shown that differences in spacing of the -10 and -35 elements due to changes in the poly-C repeat tract alters expression levels, represented by arrow width (Carson et al., 2000). (b): Translational phase variation, in which changes in a repeat tract towards the 5' end of the CDS alters the reading frame of a coding

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

384 385 386 387 388 389 390 391 392

region and switches expression ON and OFF due to frame-shift. Phase variation of pilC is used as an example, where it has been shown that changes in the poly-G repeat tract generate frame-shifts which switch protein expression ON and OFF (Jonsson et al., 1991). (c): C-terminal phase variation, in which changes in a repeat tract towards the 3' end of the CDS alters the reading frame of a coding region and switches the encoded C-terminal amino acids between the three reading frames. In the example NGK_1211, two of the reading frames offer different C-terminal ends to the protein, while the third generates a fusion with the downstream CDS, NGK_1212. Only some examples of C-terminal phase variation result in this type of fusion (Table 3).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

393

Table 1. Transcriptional phase variable genes in N. gonorrhoeae. Repeat in FA1090*

N. gonorrhoeae NCCP11945 Repeat in Repeat in Repeat in Repeat in Repeat in Reference locus† NCCP11945† FA19‡ FA6140§ 35/02ǁ MS11¶ candidacy# NGK_0906 & (T )9C(G)6T (T )8C(G)8T (T )9C(G)6T (T )9C(G)6T (T )9C(G)6T T Known van der Ende et al ., 1995 NGK_0907**

Gene

FA1090 locus*

porA

NGO_04715 (T )11C(G)6T

lipoprotein

NGO2047

(A)9

NGK_2186†† (A)8

(A)8

(A)9

(A)9

(A)8

Yes

394

fetA / frpB

NGO2093

(C)13

NGK_2557

(C)10

(C)11

(C)11

(C)11

Known

395

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).

396

† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).

397

‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).

398

§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).

399

ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).

400

¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).

401 402 403

# Gene phase variation candidacy in N. gonorrhoeae. Known – phase variation has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation.

404

** This CDS appears to be frame-shifted and annotated as two CDSs.

405

†† NGK_2186 and NGO2047 annotations are on opposite strands.

(C)14

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Carson et al ., 2000

406

Table 2. Translational phase variable genes in N. gonorrhoeae. Ge ne

FA1090 locus*

Re pe at in FA1090*

NCCP11945 locus†

Re pe at in Re pe at in NCCP11945† FA19‡

Re pe at in FA6140§

Re pe at in 35/02ǁ

Re pe at in MS11¶

N. gonorrhoeae candidacy#

Re fe re nce

pilC 2

NGO0055

(G)13

NGK_0074

(G)9

(G)10

(G)13

(G)9CAGG

(G)12

Known

Jonsson et al ., 1991

opa

NGO0066a

opa

NGO0070

(CT T CT )13C NGK_0096 T T CG (CT T CT )9CT NGK_0102 T CG

(CT T CT )8CT T CC (CT T CT )7CT T CC

(CT T CT )4CT T CC (CT T CT )8CT T CC

CT T (CT T CT )1 0CT T CC (CT T CT )17CT T CC

(CT T CT )6(CT T CC)2 (CT T CT )7CT T CC

(CT T CT )8CT T Known CC (CT T CT )7CT T Known CC

pglH

NGO0086

(C)10

NP

NP

no repeat

no repeat

no repeat

NP

Known

Power et al ., 2003

pglG

NGO0087

A(C)7

NP

NP

A(C)9

A(C)6

A(C)6

NP

Known

Power et al ., 2003

Power et al ., 2003

Adamczyk-Poplawska et al., 2011

pglE

NGO0207

(CAAACAC)4 NGK_0339

(CAAACAC)10 (CAAACAC)6 CAAAT ACCA (CAAACAC)8 CAAAT ACCA (CAAACAC)24 (CAAACAC)15 AACACCAAA Known (CAAAT AC)3 AACACCAAA CAAAT AC (CAAAT AC)3 T ACCAAACA T AC C(CAAAT AC)2

hsdS

NGO_02155

(G)7

(G)7

(G)8

hypothetical

NGO0527

(C)6A(C)9GC NGK_1405

(C)6A(C)8GC

(C)7A(C)4T (C (C)7A(C)4T (C) (C)11A(C)8GC (C)6A(C)14GC Yes )3GC 3GC GC C

modB

NGO0545

(CCCAA)12

(CCCAA)11

(CCCAA)12

NGK_0571

NGK_1384

(G)7

(CCCAA)11

(C)6T T AT CT AACA(G)5

(C)11T T AT CT (C)10T T AT CT Yes AACA(G)8 AACA(G)7

(GCCA)24

(GCCA)19

(GCCA)24

(C)8T T AT CT NGK_1486 AACA(G)7 (CT T CT )16C NGK_0847 T T CC

(C)9T T AT CT AACA(G)6 (CT T CT )16CT T CC

(C)11T T AT CT AACA(G)8 (CT T CT )8CT T CC

(C)9T T AT CT A Yes ACA(G)7 (CT T CT )4CT T Known CC

modA

(GCCA)37

NGK_1272

(CCCAA)7

Known

(CCCAA)4

(C)8T T AT CT NGK_1957 AACA(G)7

NGO0641

(G)7

(C)11T T AT C (C)7T T AT CT T AACA(G)8 AACA(G)7 (GCCA)24GT (GCCA)18 CA (C)9T T AT CT (C)7T T AT CT AACA(G)7 AACA(G)7 (CT T CT )19C (CT T CT )13C T T CC T T CC

replication NGO_06135 initiation factor

replication NGO_06695 initiation factor

(G)9

Known

Known

Stern & Meyer, 1987 Stern & Meyer, 1987

Srikhanta et al ., 2009

Srikhanta et al ., 2009

opa

NGO0950a

hypothetical

NGO0964

(AAGC)4

NGK_0831a

(AAGC)8

(AAGC)15

(AAGC)7

(AAGC)9

(AAGC)7

Yes

virG

NGO0985

(AAGC)3

NGK_0804

(AAGC)3

(AAGC)2

(AAGC)3

(AAGC)3

NGO1040a

opa

NGO1073a

opa

NGO1277a

(CT T CT )12CT T CC (CT T CT )18CT T CC (CT T CT )7CT T CC

NGO1445

opa

NGO1463a

opa

NGO1513

opa

NGO1553a

(CT T CT )10C T T CC (CT T CT )11C T T CC CT T (CT T CT ) 11CT T CC (CAAG)12CA AA (CT T CT )11C T T CC CT T (CT T CT ) 10CT T CC (CT T CT )17C T T CC

(CT T CT )7CT T CC (CT T CT )12CT T CC (CT T CT )11CT T CC

adhesion

(CT T CT )20C T T CC (CT T CT )10C T T CC (CT T CT )7CT T CC (CAAG)12CA AA (CT T CT )7CT T CC (CT T CT )14C T T CC (CT T CT )9CT T CC

(AAGC)3 (CT T CT )13CT T CCCT T CT (C T T CC)2 (CT T CT )7CT T CC (CT T CT )8CT T CC

Yes

opa

(CT T CT )12CT (CT T CT )12CT T CC T CC (CT T CT )6CT T NP CG (CT T CT )8CT T (CT T CT )8CT T CC CC

(CT T CT )10CT Known T CC (CT T CT )7CT T Known CG (CT T CT )14CT Known T CC

autA

NGO1689

(AAGC)3

NGK_2082

(AAGC)3

(AAGC)14

(AAGC)3

(AAGC)3

(AAGC)3

Known

Peak et al ., 1999; Arenas et al ., 2015

pgtA

NGO1765

(G)11

NGK_2516

GGGAGCGGG (G)19

(G)19

GGGAGCGGG

GGGAGCGGG

Known

Banerjee et al ., 2002

(CT T CT )20C T T CC (CT T CT )2CT T CC (CT T CT )11C T T CC (CAAG)20CA AA (CT T CT )10C T T CC (CT T CT )12C T T CG (CT T CT )4CT T CC

NGK_0749 NGK_0693 NGK_1495 NGK_1705 NGK_1729 NGK_1799 NGK_1847

Stern & Meyer, 1987

Known

Stern & Meyer, 1987

Known

Stern & Meyer, 1987

Known

Stern & Meyer, 1987

(CAAG)6CAAA (CAAG)9CAAA (CAAG)6CAAA Yes

repetitive large NGO_09875 & NGK_2422 & surface (G)8 (G)7 (G)7 (G)7 (G)7 (G)7 Yes NGO_09870** NGK_2423** lipoprotein (CT T CT )13C (CT T CT )11C (CT T CT )13C (CT T CT )2CT T (CT T CT )13CT (CT T CT )30CT opa NGO1861a NGK_2410 Known T T CC T T CC T T CC CC T CC T CC

Stern & Meyer, 1987 Stern & Meyer, 1987 Stern & Meyer, 1987

Stern & Meyer, 1987

pilC 1

NGO1912

(G)11

NGK_2342

(G)13

(G)11

GGGC(G)11

(G)15

(G)11

Known

hypothetical

NGO1953

(C)8

NGK_2297

(C)8

(C)8

(C)8

(C)9

(C)8

Yes

pyrimidine 5'nucleotidase

NGO2055 & NGO2054**

(C)6

NGK_2176

CAAACCCC

CAAACCCC

CAAACCCC

(C)9

(C)10

Yes

NGO2060a

(CT T CT )10C NP T T CG

NP

NP

(CT T CT )6CT T (CT T CT )7CT T Known CC CC

Stern & Meyer, 1987

(CT T CT )14C NP T T CC

opa

NGK_2170

Jonsson et al., 1991

opa

NP

NP

NP

NP

Known

Stern & Meyer, 1987

lgtG

NGO2072

(C)11

NGK_2534 & (C)12 NGK_2533**

(C)10

(C)10

(C)10

(C)10

Known

Mackinnon et al ., 2002

hpuA

NGO2110

(G)9

NGK_2581

(G)10

(G)9

(G)9

(G)8

(G)8

Known

Chen et al ., 1998

lgtA

NGO11610

(G)11

NGK_2630

(G)11

(G)14A

(G)17A

(G)20A

(G)10A

Known

Erwin et al ., 1996

lgtC

NGO2156

(G)14

NGK_2632

(G)13

(G)13

(G)10

(G)16

(G)8

Known

Shafer et al ., 2002

407

lgtD

NGO2158

A(G)14

NGK_2634

A(G)16

(G)13

A(G)12

A(G)18

A(G)13

Known

Shafer et al ., 2002

408

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).

409

† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).

410

‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).

411

§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).

412

ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).

413

¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

414 415 416

# Gene phase variation candidacy in N. gonorrhoeae. Known – phase variation has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation.

417

** This CDS appears to be frame-shifted and annotated as two CDSs.

418

NP The CDS is not present in this strain.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

419

Table 3. C-terminal phase variable genes in N. gonorrhoeae. Gene

FA1090 locus*

Repeat in FA1090*

NCCP1194 Repeat in 5 locus† NCCP11945†

Amino acids after the repeat in each frame of NCCP11945‡

FA19§

FA6140ǁ

35/02¶

MS11#

N. gonorrhoeae candidacy**

ispH

NGO0072

(G)8

NGK_0106

(G)8

26

50

membrane protein

NGO0691

(G)6

NGK_1211

(G)7

23

8

61

(G)8

(G)8

(G)8

(G)8

(yes)

186^

(G)7

(G)6

(G)7

(G)7

mafB cassette

NGO1386

(C)7

NGK_1624

(C)9

46

Yes

2

37

deletion††

(C)10

(C)8

(C)10

420

pilin cassette

NGO_11140 CCGC

NGK_2161

(C)8

20

Yes

8

91¶

(C)5GCC

CCGCC

(C)5

CCT (C)5

Yes

421

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).

422

† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).

423 424

‡ In each column are the number of amino acids encoded 3' of the repeat before the closest termination codon in each of the three reading frames.

425

§ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).

426

ǁ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).

427

¶ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).

428

# From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).

429 430 431 432

** Gene phase variation candidacy in N. gonorrhoeae. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of this length vary on other genes (Chen et al., 1998; Shafer et al., 2002; Adamczyk-Poplawska et al., 2011).

433 434

†† There is a 400 bp deletion in this strain encompassing the region that would contain this repeat.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

435

Table 4. Genes for which there is no evidence of phase variation in N. gonorrhoeae. Gene

FA1090 locus*

Repeat in FA1090*

NCCP11945 locus†

Repeat in NCCP11945†

Repeat in FA19‡

Repeat in FA6140§

Repeat in 35/02ǁ

Repeat in MS11¶

N. gonorrhoeae candidacy#

Prolyl endopeptidase

NGO0026

GGGGCGG

NGK_0034

GGGGCGG

GGGGCGG

GGGGCGG

GGGGCGG

GGGGCGG

No. Replacement tract

pilI / wbpC

NGO0065

C(G)6

NGK_0089**

C(G)6

C(G)6

C(G)6

C(G)6

C(G)6

No. No variation.

phosphoesterase

NGO0081

(C)7

NGK_0119

(C)7

(C)7

(C)7

(C)7

(C)7

No. No variation.

hypothetical

NGO0121

(A)6

NGK_0167

(A)6

(A)6

(A)6

(A)6

(A)6

No. No variation.

cvaA

NGO0123

(C)4

NGK_0168

(C)4

(C)4

(C)4

(C)4

(C)4

No. No variation.

potD -2

NGO0206

AA(C)5

NGK_0338

AA(C)5

AA(C)5

AA(C)5

AA(C)5

AA(C)5

No. No variation.

hypothetical

NGO0532

AACCGGCAAACA NGK_1400

AACCGGCAAACA AACCGGCAAACA

AACCGGCAAACA AACCGGCAAACA

AACCGGCAAACA No. Replacement tract

nifS

NGO0636

CCACACCC

NGK_1278

CCACACCC

CCACACCC

CCACACCC

CCACACCC

CCACACCC

No. Replacement tract

lldD

NGO0639

(G)7

NGK_1275

(G)7

(G)7

(G)7

(G)7

(G)7

No. No variation.

methylase NlaIV

NGO0676

(A)9

NGK_1230

(A)9

(A)9

(A)9

(A)9

(A)9

No. No variation.

dnaX

NGO0743

(C)7

NGK_1135

(C)7

(C)7

(C)7

(C)7

(C)7

No. No variation.

mobA

NGO0754

GGAAGG

NGK_1123

GGAAGG

GGAAGG

GGAAGG

GGAAGG

GGAAGG

No. Replacement tract

ppx

NGO1041

(C)7

NGK_0745

(C)7

(C)7

(C)7

(C)7

(C)7

No. No variation.

fixP / ccoP

NGO1371

(AT )5

NGK_1607

(AT )5

(AT )5

(AT )5

(AT )5

(AT )5

No. No variation.

hypothetical

NGO1384

G(A)7

NGK_1622

(A)8

(A)8

(A)8

(A)8

G(A)7

No. No variation in length.

pntA

NGO1470

CCCT GCT GG

NGK_1735

CCCT GCT GG

CCCT GCT GG

CCCT GCT GG

CCCT GCT GG

CCCT GCT GG

No. Replacement tract

amiC

NGO1501

T T CGCCC

NGK_1783

T T CGCCC

T T CGCCC

T T CGCCC

T T CGCCC

T T CGCCC

No. Replacement tract

dca

NGO1540

T GT GGGGG

NGK_1830

T GT GGGGG

T GT GGGGG

T GT GGGGG

T GT GGGGG

T GT GGGGG

No. Replacement tract

anmK

NGO1583

(C)7

NGK_1884

(C)7

(C)7

(C)7

(C)7

(C)7

No. No variation.

dinG

NGO1708

(C)4T CC

NGK_2106

(C)4T CC

(C)4T CC

(C)4T CC

(C)4T CC

(C)4T CC

No. Replacement tract

rplK

NGO1855

(C)7

NGK_2416

(C)7

(C)7

(C)7

(C)7

(C)7

No. No variation.

hypothetical

NGO1970

(T A)5

NGK_2274

(T A)5

(T A)5

(T A)5

(T A)5

(T A)5

No. No variation.

mafA -3

NGO1972

(G)5

NGK_2270

(G)5

(G)5

(G)5

(G)5

(G)5

No. No variation.

map

NGO1983

(C)6

NGK_2258

(C)6

(C)6

(C)6

(C)6

(C)6

No. No variation.

plsX

NGO2171

(T T CC)3

NGK_2652

(T T CC)3

(T T CC)3

(T T CC)3

(T T CC)3

(T T CC)3

No. No variation.

436

lbpA

NGO0260a

NR

NGK_0401

GGGGGCGG

GGGGGCGG

NR

NR

T GAAACGG

No. Replacement tract

437

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).

438

† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).

439

‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).

440

§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).

441

ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).

442

¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).

443 444 445 446

# Gene phase variation candidacy in N. gonorrhoeae. No. Replacement tract – due to the replacement of the repeat tract with other nucleotides, this is not phase variable. No. No variation – due to no observed variation in the repeat tract, this is not phase variable. No. No variation in length – due to the equal length tract in all strains, this is not phase variable.

447

** This CDS contains a point mutation, which generates a premature termination codon.

448 449

NR The region of the CDS containing the repeat tract does not have homology to the aligned region in these strains.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

450 451

Table 5. Genes for which there is sequencing-based evidence of phase variation in N. gonorrhoeae strain NCCP11945. Gene

FA1090 locus*

Repeat in FA1090*

NCCP11945 Repeat in Repeat in locus† NCCP11945† FA19‡

Repeat in FA6140§

Repeat in 35/02ǁ

Repeat in MS11¶

N. gonorrhoeae candidacy#

Ion Torrent** Illumina††

ispH

NGO0072

(G)8

NGK_0106

(G)8

(G)8

(G)8

(G)8

(G)8

(yes)

Repeat varies

virG

NGO0985

(AAGC)3

NGK_0804

(AAGC)3

(AAGC)2

(AAGC)3

(AAGC)3

(AAGC)3

Yes

Repeat does not vary

Repeat does not vary Repeat does not vary

hypothetical

NGO0964

(AAGC)4

NGK_0831a

(AAGC)8

(AAGC)15

(AAGC)7

(AAGC)9

(AAGC)7

Yes

Repeat varies

Repeat varies

membrane protein

NGO0691

(G)6

NGK_1211

(G)7

(G)7

(G)6

(G)7

(G)7

Yes

Repeat varies

hypothetical

NGO0527

(C)6A(C)9GC NGK_1405

(C)6A(C)8GC

(C)7A(C)4T ( (C)7A(C)4T ( (C)11A(C)8G (C)6A(C)14GC Yes C)3GC C)3GC CGC C

Repeat varies

replication initiation factor

NGO_06695

(C)8T T AT CT NGK_1486 AACA(G)7

(C)9T T AT C T AACA(G)7

(C)7T T AT C T AACA(G)7

(C)9T T AT C T AACA(G)6

(C)11T T AT C (C)9T T AT CT Yes T AACA(G)8 AACA(G)7

Repeat varies

Repeat varies

mafB cassette NGO1386

(C)7

(C)9

deletion‡‡

(C)10

(C)8

Yes

Repeat varies

Repeat varies

adhesion

NGO1445

(CAAG)20CA NGK_1705 AA

(CAAG)12CA (CAAG)12CA (CAAG)6CAA (CAAG)9CAA (CAAG)6CAA Yes AA AA A A A

Repeat varies

Repeat varies

replication initiation factor

NGO_06135

(C)8T T AT CT NGK_1957 AACA(G)7

(C)11T T AT C (C)7T T AT C T AACA(G)8 T AACA(G)7

(C)6T T AT C T AACA(G)5

(C)11T T AT C (C)10T T AT C Yes T AACA(G)8 T AACA(G)7

Repeat varies

Repeat varies

pilin cassette

NGO_11140

NGK_1624

(C)10

Repeat does not vary Repeat does not vary

CCGC

NGK_2161

(C)8

(C)5GCC

CCGCC

(C)5

CCT (C)5

Yes

Repeat varies

Repeat varies

pyrimidine 5'- NGO2055 & nucleotidase NGO2054ǁǁ

(C)6

NGK_2176

CAAACCCC

CAAACCCC

CAAACCCC

(C)9

(C)10

Yes

No repeat

No repeat

lipoprotein

NGO2047

(A)9

NGK_2186§§ (A)8

(A)8

(A)9

(A)9

(A)8

Yes

Repeat varies

Repeat varies

hypothetical

NGO1953

(C)8

NGK_2297

(C)8

(C)8

(C)8

(C)9

(C)8

Yes

Repeat varies

Repeat does not vary

repetitive large surface lipoprotein

NGO_09875 & (G)8 NGO_09870ǁǁ

NGK_2422 & (G)7 NGK_2423ǁǁ

(G)7

(G)7

(G)7

(G)7

Yes

Repeat varies

Repeat does not vary

fetA / frpB

NGO2093

(C)13

NGK_2557

(C)14

(C)10

(C)11

(C)11

(C)11

Known

Repeat varies

Repeat varies

pilC 1

NGO1912

(G)11

NGK_2342

(G)13

(G)11

GGGC(G)11

(G)15

(G)11

Known

Repeat varies

Repeat varies

NGO0950a

(CT T CT )16C NGK_0847 T T CC

(CT T CT )19C (CT T CT )13C (CT T CT )16C (CT T CT )8CT (CT T CT )4CT Known T T CC T T CC T T CC T CC T CC

No reads Repeat varies through repeat

452

opa

453

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2).

454

† From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1).

455

‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1).

456

§ From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).

457

ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1).

458

¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1).

459 460 461 462 463

# Gene phase variation candidacy in N. gonorrhoeae. Known – phase variation has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of this length vary on other genes (Chen et al., 1998; Shafer et al., 2002; Adamczyk-Poplawska et al., 2011).

464 465 466

** Based on Ion Torrent sequencing data from cultures grown with passage for 8 weeks and from cultures grown with passage for 20 weeks (accession numbers SRR3547950, SRR3547951, SRR3547952, SRR3547953).

467 468

†† Based on Illumina sequencing data from cultures grown with passage for 20 weeks (accession numbers SRR3547954, SRR3547955, SRR3547956, SRR3547957).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

469 470

‡‡ There is a 400 bp deletion in this strain encompassing the region that would contain this repeat.

471

§§ NGK_2186 and NGO2047 annotations are on opposite strands.

472

ǁǁ This CDS appears to be frame-shifted and annotated as two CDSs.

473

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Figure

Click here to download Figure Figure 1 v2.tif

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Supplementary Material Files

Click here to download Supplementary Material Files Supplementary Tables.xlsx

Supplementary Table 1. Complete list of candidate phase variable genes including locus identifiers for all six N. gonorrhoeae co Gene

FA1090 locus*

Repeat in FA1090*

NCCP11945 locus†

Repeat in NCCP11945†

FA19 locus‡

Repeat in FA19‡

FA6140 locus§

porA

NGO_04715

(T)11C(G)6T

NGK_0906 & NGK_0907**

(T)9C(G)6T

VT05_RS03585

(T)8C(G)8T

WX60_RS01320

lipoprotein

NGO2047

(A)9

NGK_2186††

(A)8

VT05_RS09915

(A)8

WX60_RS08685

fetA / frpB

NGO2093

(C)13

NGK_2557

(C)14

VT05_RS10220

(C)10

WX60_RS08950

pilC 2

NGO0055

(G)13

NGK_0074

(G)9

VT05_RS10940

(G)10

WX60_RS09660

opa

NGO0066a

(CTTCT)13CTT NGK_0096 CG

(CTTCT)8CTTC VT05_RS11020 C

(CTTCT)4CT WX60_RS09735 TCC

opa

NGO0070

(CTTCT)9CTTC NGK_0102 G

(CTTCT)7CTTC VT05_RS11040 C

(CTTCT)8CT WX60_RS09755 TCC

pglH

NGO0086

(C)10

NP

NP

VT05_RS11130

no repeat

WX60_RS09845

pglG

NGO0087

A(C)7

NP

NP

VT05_RS11135

A(C)9

WX60_RS09850

pglE

NGO0207

(CAAACAC)4

NGK_0339

(CAAACAC)8(C VT05_RS11775 AAATAC)3

(CAAACAC)6 CAAATACC WX60_RS10475 AAACACCA AATAC

hsdS

NGO_02155

(G)7

NGK_0571

(G)7

VT05_RS00760

(G)8

hypothetical

NGO0527

(C)6A(C)9GC

NGK_1405

(C)6A(C)8GC

VT05_RS01375

(C)7A(C)4T(C WX60_RS03235 )3GC

modB

NGO0545

(CCCAA)12

NGK_1384

(CCCAA)11

VT05_RS01460

(CCCAA)12

replication initiation factor

NGO_06135

(C)8TTATCTAA NGK_1957 CA(G)7

(C)11TTATCTA VT05_RS04875 ACA(G)8

(C)7TTATCT WX60_RS03635 AACA(G)7

modA

NGO0641

(GCCA)37

(GCCA)18

VT05_RS01960

(GCCA)24GT WX60_RS02640 CA

replication initiation factor

NGO_06695

(C)8TTATCTAA NGK_1486 CA(G)7

(C)9TTATCTAA VT05_RS05440 CA(G)7

(C)7TTATCT WX60_RS04195 AACA(G)7

opa

NGO0950a

(CTTCT)16CTT NGK_0847 CC

(CTTCT)19CTT VT05_RS03845 CC

(CTTCT)13CT WX60_RS01060 TCC

hypothetical

NGO0964

(AAGC)4

NGK_0831a

(AAGC)8

VT05_RS03910

(AAGC)15

WX60_RS00995

virG

NGO0985

(AAGC)3

NGK_0804

(AAGC)3

VT05_RS04020

(AAGC)2

WX60_RS00880

opa

NGO1040a

(CTTCT)20CTT NGK_0749 CC

(CTTCT)20CTT VT05_RS04320 CC

(CTTCT)10CT WX60_RS00610 TCC

opa

NGO1073a

(CTTCT)2CTTC NGK_0693 C

(CTTCT)10CTT VT05_RS04555 CC

(CTTCT)11CT WX60_RS00375 TCC

opa

NGO1277a

(CTTCT)11CTT NGK_1495 CC

(CTTCT)7CTTC VT05_RS05475 C

CTT(CTTCT) WX60_RS04230 11CTTCC

adhesion

NGO1445

(CAAG)20CAAA NGK_1705

(CAAG)12CAAA VT05_RS06410

(CAAG)12CA WX60_RS05175 AA

opa

NGO1463a

(CTTCT)10CTT NGK_1729 CC

(CTTCT)7CTTC VT05_RS06485 C

(CTTCT)11CT WX60_RS05265 TCC

opa

NGO1513

(CTTCT)12CTT NGK_1799 CG

(CTTCT)14CTT VT05_RS06740 CC

CTT(CTTCT) WX60_RS05520 10CTTCC

opa

NGO1553a

(CTTCT)4CTTC NGK_1847 C

(CTTCT)9CTTC VT05_RS06965 C

(CTTCT)17CT WX60_RS05745 TCC

NGK_1272

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

WX60_RS11515

WX60_RS03145

autA

NGO1689

(AAGC)3

NGK_2082

(AAGC)3

VT05_RS07825

(AAGC)14

WX60_RS06615

pgtA

NGO1765

(G)11

NGK_2516

GGGAGCGGG

VT05_RS08235

(G)19

WX60_RS07040

NGK_2422 & NGK_2423**

(G)7

VT05_RS08730

(G)7

WX60_RS07535

NGO_09875 repetitive large & surface (G)8 NGO_09870* lipoprotein * opa

NGO1861a

(CTTCT)13CTT NGK_2410 CC

(CTTCT)11CTT VT05_RS08825 CC

(CTTCT)13CT WX60_RS07630 TCC

pilC 1

NGO1912

(G)11

NGK_2342

(G)13

VT05_RS09130

(G)11

WX60_RS07935

hypothetical

NGO1953

(C)8

NGK_2297

(C)8

VT05_RS09330

(C)8

WX60_RS08135

pyrimidine 5'- NGO2055 & nucleotidase NGK2054

(C)6

NGK_2176

CAAACCCC

VT05_RS09960

CAAACCCC

WX60_RS08730

opa

NGO2060a

(CTTCT)10CTT NP CG

NP

NP

NP

NP

opa

NP

NP

NGK_2170

(CTTCT)14CTT NP CC

NP

NP

lgtG

NGO2072

(C)11

NGK_2534 & NGK_2533**

(C)12

VT05_RS10115

(C)10

WX60_RS08845

hpuA

NGO2110

(G)9

NGK_2581

(G)10

VT05_RS10310

(G)9

WX60_RS09040

lgtA

NGO11610

(G)11

NGK_2630

(G)11

VT05_RS10530

(G)14A

WX60_RS09250

lgtC

NGO2156

(G)14

NGK_2632

(G)13

VT05_RS10540

(G)13

WX60_RS09260

lgtD

NGO2158

A(G)14

NGK_2634

A(G)16

VT05_RS10550

(G)13

WX60_RS09270

ispH

NGO0072

(G)8

NGK_0106

(G)8

VT05_RS11050

(G)8

WX60_RS09765

membrane protein

NGO0691

(G)6

NGK_1211

(G)7

VT05_RS02225

(G)7

WX60_RS02380

mafB cassette NGO1386

(C)7

NGK_1624

(C)9

VT05_RS06060

deletion‡‡

WX60_RS04830

pilin cassette

CCGC

NGK_2161

(C)8

VT05_RS10015

(C)5GCC

WX60_RS08810

NGO_11140

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2). † From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1). ‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1). § From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1). ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1). ¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1). # Gene phase variation candidacy in N. gonorrhoeae . Known – phase variation has been reported in the literature. Yes ** This CDS appears to be frame-shifted and annotated as two CDSs. †† NGK_2186 and NGO2047 annotations are on opposite strands. ‡‡ There is a 400 bp deletion in this strain encompassing the region that would contain this repeat. NP The CDS is not present in this strain.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

cus identifiers for all six N. gonorrhoeae complete genomes investigated. Repeat in FA6140§

35/02 locusǁ

Repeat in 35/02ǁ

MS11 locus¶

Repeat in MS11¶

N. gonorrhoeae candidacy#

Reference

(T)9C(G)6T

WX61_RS10730

(T)9C(G)6T

NGFG_RS05045

(T)9C(G)6TT

Known

van der Ende et al ., 1995

(A)9

WX61_RS04400

(A)9

NGFG_RS11385

(A)8

Yes

(C)11

WX61_RS06255

(C)11

NGFG_RS11700

(C)11

Known

Carson et al ., 2000

(G)13

WX61_RS06975

(G)9CAGG

NGFG_RS00285

(G)12

Known

Jonsson et al ., 1991

CTT(CTTCT) WX61_RS07050 10CTTCC

(CTTCT)6(CT NGFG_RS00365 TCC)2

(CTTCT)8CTTC Known C

Stern & Meyer, 1987

(CTTCT)17CT WX61_RS07070 TCC

(CTTCT)7CT NGFG_RS00385 TCC

(CTTCT)7CTTC Known C

Stern & Meyer, 1987

no repeat

WX61_RS07170

no repeat

NP

NP

Known

Power et al ., 2003

A(C)6

WX61_RS07175

A(C)6

NP

NP

Known

Power et al ., 2003

(CAAACAC)1 0CAAATACC AAACACCA WX61_RS08150 AATACCAA ACAC(CAAA TAC)2

(CAAACAC)2 NGFG_RS01110 4CAAATAC

(CAAACAC)15( Known CAAATAC)3

Power et al ., 2003

(G)9

(G)7

(G)7

Adamczyk-Poplawska et al., 2011

WX61_RS09195

NGFG_RS02160

Known

(C)7A(C)4T(C WX61_RS00895 )3GC

(C)11A(C)8G NGFG_RS02800 CGC

(C)6A(C)14GCC Yes

(CCCAA)4

(CCCAA)11

(CCCAA)7

WX61_RS00800

NGFG_RS02880

Known

(C)6TTATCT WX61_RS03490 AACA(G)5

(C)11TTATC NGFG_RS06490 TAACA(G)8

(C)10TTATCTA Yes ACA(G)7

(GCCA)24

(GCCA)19

(GCCA)24

Known

Yes

WX61_RS00300

NGFG_RS03385

(C)9TTATCT WX61_RS01270 AACA(G)6

(C)11TTATC NGFG_RS07045 TAACA(G)8

(C)9TTATCTA ACA(G)7

(CTTCT)16CT WX61_RS10470 TCC

(CTTCT)8CT NGFG_RS05305 TCC

(CTTCT)4CTTC Known C

(AAGC)7

WX61_RS10405

(AAGC)9

NGFG_RS05375

(AAGC)7

Yes

(AAGC)3

WX61_RS10290

(AAGC)3

NGFG_RS05480

(AAGC)3

Yes

Srikhanta et al ., 2009

Srikhanta et al ., 2009

Stern & Meyer, 1987

(CTTCT)7CT WX61_RS10010 TCC

(CTTCT)12CT NGFG_RS05770 TCC

(CTTCT)13CTT CCCTTCT(CTT Known CC)2

Stern & Meyer, 1987

(CTTCT)12CT WX61_RS09790 TCC

(CTTCT)18CT NGFG_RS06015 TCC

(CTTCT)7CTTC Known C

Stern & Meyer, 1987

(CTTCT)11CT WX61_RS01310 TCC

(CTTCT)7CT NGFG_RS07085 TCC

(CTTCT)8CTTC Known C

Stern & Meyer, 1987

(CAAG)6CAA WX61_RS02245 A

(CAAG)9CAA NGFG_RS08015 A

(CAAG)6CAAA Yes

(CTTCT)12CT WX61_RS02330 TCC

(CTTCT)12CT NGFG_RS08090 TCC

(CTTCT)10CTT Known CC

Stern & Meyer, 1987

NP

WX61_RS02595

(CTTCT)6CT NGFG_RS08385 TCG

(CTTCT)7CTTC Known G

Stern & Meyer, 1987

(CTTCT)8CT WX61_RS02820 TCC

(CTTCT)8CT NGFG_RS08610 TCC

(CTTCT)14CTT Known CC

Stern & Meyer, 1987

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

(AAGC)3

Known

Peak et al ., 1999; Arenas et al ., 2015

GGGAGCGG NGFG_RS09750 G

GGGAGCGGG

Known

Banerjee et al ., 2002

(G)7

(G)7

Yes

(AAGC)3

WX61_RS03920

(AAGC)3

(G)19

WX61_RS06075

(G)7

WX61_RS05585

NGFG_RS09330

NGFG_RS10240

(CTTCT)2CT WX61_RS05490 TCC

(CTTCT)13CT NGFG_RS10330 TCC

(CTTCT)30CTT Known CC

Stern & Meyer, 1987

GGGC(G)11

WX61_RS05130

(G)15

NGFG_RS10640

(G)11

Known

Jonsson et al., 1991

(C)8

WX61_RS04935

(C)9

NGFG_RS10845

(C)8

Yes

CAAACCCC

WX61_RS04355

(C)9

NGFG_RS11430

(C)10

Yes

NP

WX61_RS04325

(CTTCT)6CT NP TCC

NP

Known

Stern & Meyer, 1987

NP

NP

NP

NP

NP

Known

Stern & Meyer, 1987

(C)10

WX61_RS06150

(C)10

NGFG_RS11595

(C)10

Known

Mackinnon et al ., 2002

(G)9

WX61_RS06345

(G)8

NGFG_RS11790

(G)8

Known

Chen et al ., 1998

(G)17A

WX61_RS06565

(G)20A

NGFG_RS12010

(G)10A

Known

Erwin et al ., 1996

(G)10

WX61_RS06575

(G)16

NGFG_RS12020

(G)8

Known

Shafer et al ., 2002

A(G)12

WX61_RS06585

A(G)18

NGFG_RS12025

A(G)13

Known

Shafer et al ., 2002

(G)8

WX61_RS07090

(G)8

NGFG_RS00400

(G)8

(yes)

(G)6

WX61_RS00040

(G)7

NGFG_RS03645

(G)7

Yes

(C)10

WX61_RS01905

(C)8

NGFG_RS07670

(C)10

Yes

CCGCC

WX61_RS04290

(C)5

NGFG_RS11490

CCT(C)5

Yes

has been reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes) – a

d contain this repeat.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

rains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of this length vary on oth

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

ated here, tracts of this length vary on other genes (Chen et al., 1998; Shafer et al., 2002; Adamczyk-Poplawska et al., 2011).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

czyk-Poplawska et al., 2011).

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

Supplementary Table 2. Next-generation sequencing read data for candidate phase variable genes in N. gonorrhoeae . Gene

FA1090 locus*

Repeat in FA1090*

NCCP11945 locus†

Repeat in NCCP11945†

Repeat in FA19‡

Repeat in FA6140§

Repeat in 35/02ǁ

ispH

NGO0072

(G)8

NGK_0106

(G)8

(G)8

(G)8

(G)8

virG

NGO0985

(AAGC)3

NGK_0804

(AAGC)3

(AAGC)2

(AAGC)3

(AAGC)3

hypothetical

NGO0964

(AAGC)4

NGK_0831a

(AAGC)8

(AAGC)15

(AAGC)7

(AAGC)9

membrane protein

NGO0691

(G)6

NGK_1211

(G)7

(G)7

(G)6

(G)7

hypothetical

NGO0527

(C)6A(C)9GC

NGK_1405

(C)6A(C)8GC

(C)7A(C)4T(C) (C)7A(C)4T(C) (C)11A(C)8GCG 3GC 3GC C

replication initiation factor

NGO_06695

(C)8TTATCTA NGK_1486 ACA(G)7

(C)9TTATCTA (C)7TTATCTA (C)9TTATCTA (C)11TTATCTA ACA(G)7 ACA(G)7 ACA(G)6 ACA(G)8

mafB cassette

NGO1386

(C)7

(C)9

adhesion

NGO1445

(CAAG)20CAA NGK_1705 A

(CAAG)12CAA (CAAG)12CAA (CAAG)6CAA (CAAG)9CAAA A A A

replication initiation factor

NGO_06135

(C)8TTATCTA NGK_1957 ACA(G)7

(C)11TTATCTA (C)7TTATCTA (C)6TTATCTA (C)11TTATCTA ACA(G)8 ACA(G)7 ACA(G)5 ACA(G)8

pilin cassette

NGO_11140

CCGC

NGK_2161

(C)8

(C)5GCC

CCGCC

(C)5

pyrimidine 5'nucleotidase

NGO2055 & NGK2054

(C)6

NGK_2176

CAAACCCC

CAAACCCC

CAAACCCC

(C)9

lipoprotein

NGO2047

(A)9

NGK_2186***

(A)8

(A)8

(A)9

(A)9

hypothetical

NGO1953

(C)8

NGK_2297

(C)8

(C)8

(C)8

(C)9

NGK_1624

deletion##

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

(C)10

(C)8

repetitive large surface lipoprotein

NGO_09875 & NGO_09870†††

(G)8

NGK_2422 & NGK_2423†††

(G)7

(G)7

(G)7

(G)7

fetA / frpB

NGO2093

(C)13

NGK_2557

(C)14

(C)10

(C)11

(C)11

pilC 1

NGO1912

(G)11

NGK_2342

(G)13

(G)11

GGGC(G)11

(G)15

opa

NGO0950a

(CTTCT)16CT NGK_0847 TCC

(CTTCT)19CTT (CTTCT)13CT (CTTCT)16CT (CTTCT)8CTTC CC TCC TCC C

* From the N. gonorrhoeae strain FA1090 genome sequence (NC_002946.2). † From the N. gonorrhoeae strain NCCP11945 genome sequence (CP00150.1). ‡ From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1). § From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1). ǁ From the N. gonorrhoeae strain 35/02 genome sequence (NZ_CP012028.1). ¶ From the N. gonorrhoeae strain MS11 genome sequence (NC_022240.1). # Gene phase variation candidacy in N. gonorrhoeae . Known – phase variation has been reported in the literature. ** From Ion Torrent sequencing data of 8 week cultures (accession number SRR3547950). For each repeat identifie †† From Ion Torrent sequencing data of 8 week cultures (accession number SRR3547951). For each repeat identifie ‡‡ From Ion Torrent sequencing data of 20 week cultures (accession number SRR3547952). For each repeat identif §§ From Ion Torrent sequencing data of 20 week cultures (accession number SRR3547953). For each repeat identif ǁǁ From Illumina sequencing data of 20 week cultures (accession numbers SRR3547954 & SRR3547955). These are ¶¶ From Illumina sequencing data of 20 week cultures (accession numbers SRR3547956 & SRR3547957). These are ## There is a 400 bp deletion in this strain encompassing the region that would contain this repeat. *** NGK_2186 and NGO2047 annotations are on opposite strands. ††† This CDS appears to be frame-shifted and annotated as two CDSs.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

genes in N. gonorrhoeae . Repeat in MS11¶

N. gonorrhoeae candidacy#

KU1-4 (8 wks)**

(G)8

(yes)

(G)8(4F,4R), (G)7(6F,8R), (G)10(1R)

(AAGC)3

Yes

(AAGC)3(8F,9R), AAGCAAAGCAAGC(1R)

(AAGC)7

Yes

(AAGC)8(1F,1R)

(AAGC)8(8F), (AAGC)2C(AAGC)6(1F), (AAGC)7(1F)

(G)7

Yes

(G)7(5F,5R), (G)6(2F,3R), (G)8(1R)

(G)7(8F,8R), (G)6(2F,1R), (G)8(1F,3R)

(C)6A(C)14GC Yes C

(C)6A(C)7(1F), (C)5A(C)8(2F), (C)5A(C)7(3F), (C)6A(C)8(1F), (C)5A(C)5(1R)

(C)6A(C)8(3F), CT(C)4A(C)9(1F), (C)6A(C)9(3F), (C)5A(C)7(1F), (C)5A(C)8(2F), (C)6(1F)

(C)9TTATCTA Yes ACA(G)7

(C)9&(G)7(2F,5R), (C)8&(G)7(3R), (C)8&(G)5(1R), (C)8(1R), (G)7(3F,4R), (C)9&(G)6(2F), (C)9&(G)8(1R), (C)10&(G)7(1R), (C)7&(G)6(1R), (C)7&(G)7(1R)

(C)9&(G)9(1F), (G)7(1F), (C)9&(G)7(5F,4R), (C)10&(G)7(1F, 3R), (C)10&(G)8(2F), (C)8&(G)7(3F), (C)10(1F), (C)9&(G)6(3R), (C)10&(G)6(1R), (G)6(1R), (C)11&(G)7(2R), (C)9(1R), (C)9&(G)8(1F)

(C)10

(C)8(9F), (C)9(6F,3R)

(C)10(5F), (C)7(2R), (C)8(1F), (C)9(2F, 1R)

(CAAG)12CAAA(4R), (CAAG)9CAAA(1F), (CAAG)12CAA(2F)

(CAAG)12ACTAAA(1R), (CAAG)8CAAAG(CAAG)2CAAAGC AA(1F), (CAAG)11CAAAGCAA(2F), (CAAG)12CAA(3F,1R), (CAAG)12ACAAA(2R), (CAAG)3CAAAG(CAAG)4CAAAG(C CAG)2CAAAGCAAA(1F)

(C)10TTATCTA Yes ACA(G)7

(C)10TTATCTAACA(G)7(1F), (C)12TTATCTAACA(G)7(1F), (C)9TTATCTAACA(G)5(1F), (C)11TTATCTAACA(G)8(1F), (C)10TTATCTAACA(G)6(1F), (C)10(1F), (G)8(3R), (G)6(1F)

(C)13TTATCTAACA(G)8(1R), (C)12TTATCTAACA(G)9(4F,4R), (C)9TTATCTAACA(G)8(1F), (C)11TTATCTAACA(G)8(3F,1R), (C)12TTATCTAACA(G)8(7R), (C)11TTATCTAACA(G)9(3F), (C)10TTATCTAACA(G)8(1F,2R), (C)11TTATCTAACA(G)7(1R), (C)12TTATCTAACA(G)10(1F), (G)9(2F,1R), (G)8(2F,1R)

CCT(C)5

Yes

(C)8(3F, 4R), (C)7(3F, 4R), (C)6(1F)

(C)8(11R), (C)7(3R), (C)9(1R)

(C)10

Yes

CAAACCCC(17F,10R), CAAACCC(2F), CAAACCCC(7F,8R), CAACCC(1R), CAAACCC(1F,2R), CAACCCC(1R) CAAAACCCC(1R)

(A)8

Yes

(A)8(11F,3R), (A)7(6F,3R), (A)9(2R)

(A)7(11F,2R), (A)8(7F, 7R), (A)9(3F,11R), (A)10(7R), (A)6(1F)

(C)8

Yes

(C)7(1F,3R), (C)8(9F,10R)

(C)8(9F,7R), (C)7(2F,2R), (C)9(4F,6R)

Yes

(CAAG)6CAAA Yes

KU1-45 (8 wks)†† (G)9(7F,3R), (G)8(10F,13R), (G)10(1R), (G)11(1F), (G)7(2F) (AAGC)3(5F,8R), AAGCAAGACAAGC(1F)

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

(G)7

Yes

(G)7(8F, 5R), (G)6(3F, 3R)

(G)7(5F,2R), (G)6(2R)

(C)11

Known

(C)11(3F,2R), (C)10(2F), (C)12(1F)

(C)12 (2F), (C)10 (3F), (C)11(2F), (C)14(1F), (C)15(1F), (C)8(1F)

(G)11

Known

(G)12(3F), (G)11(5F,1R), (G)9(1F,5R), (G)14(4F), (G)13(6F), (G)12(4F), (G)10(1R) (G)11(1F)

(CTTCT)4CTTC Known C

No reads through repeat

No reads through repeat

en reported in the literature. Yes – there is evidence of repeat tract variation between strains supporting phase variation. (yes 950). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repea 951). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repea 7952). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repe 7953). For each repeat identified, the values in parenthesis after the repeat indicate the number of reads that include this repe 4 & SRR3547955). These are combined results from paired-end forward (2928-NS1_1) and reverse (2928-NS1_2) reads. For ea 56 & SRR3547957). These are combined results from paired-end forward (2929-NS2_1) and reverse (2929-NS2_2) reads. For e in this repeat.

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

KU1-95 (20 wks)‡‡

KU1-96 (20 wks)§§

2928-NS1ǁǁ

(G)10(2R), (G)8(6F,5R), (G)9(2F,5R), (G)7(5F)

(G)7(1F,1R), (G)6CG(1R)

(G)8(6F,12R)

(AAGC)3(3F,9R)

(AAGC)3(2F,1R)

(AAGC)3(8F,7R)

(AAGC)8(7F,4R)

(AAGC)8(2F,1R)

(AAGC)8(1F)

(G)7(5F,7R), (G)8(1F,3R), (G)6(5F)

(G)7(4F,1R), (G)6(1F)

(G)7(4F,3R)

(C)5A(C)7(2F), (C)6A(C)8(5F), (C)6A(C)9(1F), (C)3(G)3(C)5A(C)7(1F), (C)5A(C)7(1F)

(C)6A(C)8(1F), (C)6A(C)7(1F)

(C)6A(C)8(6F)

(C)11&(G)7(2R), (C)10(1F), (C)8&(G)7(1F), (C)9&(G)6(1R), (C)10&(G)4TCC(1R), (G)7(1F,2R), (C)7(1R), (C)9&(G)7(3F,1R), (C)9(2F,1R)

(G)7(1R), (C)8&(G)6(2F)

(C)9&(G)7(14F,13R), (C)9(3F,2R), (G)7(3F,2R)

(C)9(5F,3R), (C)8(4F,2R), (C)10(4F), (C)7(1F,1R)

(C)8(1F,1R), (C)7(1F), (C)5A(C)4(1R) (C)9(12F,12R)

(CAAG)12CAAA(7F,13R), (CAAG)13CAAA(1F), CAAGAAG(CAAG)11(1R)

(CAAG)12(1F)

(CAAG)12(6F,12R), (CAAG)13(1F), (CAAG)11(1R)

(C)11&(G)9(2F,1R), (C)12&(G)9(1F), (C)10&(G)7(1R), (C)10&(G)9(1F), (C)9&(G)7(1R), (C)11&(G)10(1F), (C)9&(G)6(1R), (C)11&(G)7(1R), (G)9(1F,1R) (C)12&(G)8(1F,2R), (G)8(2F,1R), (C)11&(G)8(2F,1R), (C)12&(G)7(1R), (C)12(G)4T(G)3(1R), (C)10&(G)8(2F), (C)11(1F)

(C)10&(G)8(1F), (C)11&(G)8(10F,4R), (G)8(1F,1R), (C)11(2F,5R), (C)6A(C)4(1F)

(C)8(4F,12R), (C)7(4F), (C)9(1F,1R), (C)6(1R)

(C)8(1F)

(C)8(7F,11R), (C)7(1R)

CAAACCC(3F), CAAACCCC(10F,11R), CAAACCCCC(1F)

CAAACCCC(2R)

CAAACCCC(16F,7R)

(A)8(10F,10R), (A)7(1F), (A)9(2F,5R) no data

(A)8(24F,8R), (A)9(1R)

(C)8(9F,10R), (C)9(2F), (C)7(1F,2R)

(C)8(9F,9R)

(C)7(1F), (C)6(1R)

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

(G)7(7F,5R), (G)6(4F,7R), A(G)6(1F), (G)5TG(1F), (G)8(2F), (G)5(1R)

(G)8(1F)

(G)7(18F,14R)

(C)11(5F), (C)12(1F)

no data

(C)11(2R), (C)10(1F), (C)12(3F,3R)

(G)11(4F), (G)12CGG(1F), (G)6(1R)

(G)10(1F)

(G)13(1F), (G)14(1F), (G)12(2F,2R), (G)16(1R), (G)15(1R), (G)11(1F)

No reads through repeat

No reads through repeat

(CTTCT)19CTTCC(3F,1R), (CTTCT)18CTTCC(1F)

tween strains supporting phase variation. (yes) – although there is no variation between strains investigated here, tracts of th te the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). te the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). cate the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). cate the number of reads that include this repeat and the directionality of the read, either forward (F) or reverse (R). NS1_1) and reverse (2928-NS1_2) reads. For each repeat identified, the values in parenthesis after the repeat indicate the num NS2_1) and reverse (2929-NS2_2) reads. For each repeat identified, the values in parenthesis after the repeat indicate the num

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

2929-NS2¶¶

Repeat location in NCCP11945†

(G)8(6F,11R)

79,691

(AAGC)3(1F,2R)

656,995

(AAGC)7(1F), (AAGC)8(1R) 676,106 (G)7(3F,4R)

990,022

(C)6A(C)8(3F) 1,172,254

(C)9&(G)7(11F,2R), (C)9(2F,1R), (C)10&(G)7(1F,1R), (G)7(2F)

1,228,002 (C)9(11F,8R), (C)10(1F,1R)

1,355,998

(CAAG)12(6F,2R)

1,419,715

(C)11&(G)8(8F), (C)10(1R), (C)9&(G)8(1F), (C)10&(G)8(2F), (G)8(2F), (C)11(1R)

1,615,314 (C)8(3F,8R)

1,792,194

CAAACCCC(13F,7R), CACACCCC(1F) 1,803,603 (A)8(15F,8R) 1,811,461 (C)8(4F,2R)

1,907,607

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44

(G)7(8F,8R) 2,024,561 (C)6A(C)5(1R), (C)11(1F) 2,149,455 (G)12(1F), (G)16(2R), (G)13(2F) 1,942,124 (CTTCT)19CTTCC(3F), CATCT(CTTCT)18CTTCC(1R)

689,033

etween strains investigated here, tracts of this either forward (F) or reverse (R). either forward (F) or reverse (R). d, either forward (F) or reverse (R). d, either forward (F) or reverse (R). arenthesis after the repeat indicate the number of reads that include this repeat and the directionality of the read, either forw parenthesis after the repeat indicate the number of reads that include this repeat and the directionality of the read, either for

Downloaded from www.microbiologyresearch.org by IP: 181.215.131.9 On: Tue, 20 Sep 2016 09:20:44